There are two basic causes of this error, plus a third special case of the second which occurs quite frequently:
1) The undo segments are too small
2) The application fetches across commits (a design flaw)
3) Delayed block cleanout
[LOBs are handled differently but can still generate an ORA-01555; this is discussed toward the end of this post]
The first two are related to the read consistency mechanism Oracle employs; this involves undo segments to reconstruct data as it appeared at the moment the query started. Since UNDO segments are used to provide both read-consistency and to rollback transactions the dreaded ORA-01555 can appear. It is the first situation which is 'addressed' by our 'poor soul', and ONLY the first situation. According to his blog this is the sole reason for an ORA-01555. Because of that his suggestions, which are valid for that situation alone, are:
* Adjust the undo_retention parameter (for 9i and higher releases of Oracle)
* Increase the size of the UNDO tablespace
* Increase the size of the undo segments
You'll know if this is the likely cause by perusing the alert log for the currently generated ORA-01555 error; you should find a message similar to this:
ORA-01555 caused by SQL statement below (SQL ID: dxswvyyhkkg43, Query Duration=2347 sec, SCN: 0x0915.5ce46fff):
The offending query text will follow. Notice in the error message the elapsed query time is presented:
... Query Duration=2347 sec, ...
If that query time exceeds your setting for undo_retention you have a possible candidate for the corrective steps listed above. If, on the other hand, that query time is well within the configured undo retention period you'll need to look elsewhere for the cause. And this is where he stops, leaving one to believe those are the ONLY solutions necessary. Sadly he's wrong.Since one can receive this error for fetching across commits let's look at that for a moment. As the ORA-01555 error indicates to you that the UNDO information is no longer available one wonders why a developer would decide to commit inside a loop, as that's the surest way to generate an ORA-01555. The reasoning for this tactic is to, hopefully, conserve on the use of UNDO/rollback segments, and, unfortunately, it does just that. I consider the loop the entire transaction; commiting in that loop causes Oracle to complete the initial transaction then begin another, thus freeing the UNDO generated up until that point. 'Freeing' the UNDO means that Oracle is now free to overwrite it with the next transaction (which is the next 'section' of data you want to modify from the original query). Since you can no longer roll back to the original starting point, and you may need one or more data blocks from that point in time, you generate the "snapshot too old" error. Another aspect of committing in a loop is that when the ORA-01555 occurs you leave the database in an unknown state; part of the updates have completed, others have not. I'd hate to be the one to sort through the committed updates and have to manually roll them back to start over again. Oh, and starting over again, using the same code, invites the same problem that stopped the process the last time, so it becomes a never-ending cycle until the code is corrected.
The third is the probably the worst of the group, as it can occur with only a single user on the system performing a SELECT. Yes, a SELECT. How can a select statement generate UNDO and thus an ORA-01555? It's called Delayed Block Cleanout and it's a dastardly event because it cannot be eliminated entirely. Delayed Block Cleanout is caused by a session accessing a block that's been recently modified and hasn't been cleaned out yet; the current session needs to check if the modifying session is still active. If the modifying transaction is not active the current session 'cleans out' the block so any subsequent sessions accessing that block won't need to go through the same 'song and dance'. The cleanout process reads the UNDO header, and determines if the blocks are marked as committed or not. If the changes are committed the session 'cleans out' the block removing any transaction-related information for the modified blocks, which generates redo. It also frees the UNDO segments used to modify those blocks. So how can this generate an ORA-01555? The following conditions must be met:
-- Data is modified and committed and the blocks are not cleaned out automatically (because they exceed the 10% limit to the SGA block buffer cache).
-- The modified blocks are not accessed by another session and won't be accessed until the current session does so.
-- The current session begins a long-running query and starts at SCN t_a, the SCN to roll back to in order to ensure a read-consistent image.
-- During this query other sessions modify and commit changes to the database, but don't touch the blocks this long-running query needs.
-- The transaction tables roll around due to the high number of COMMITS and 'step on' the transaction entry for SCN t_a.
-- The lowest SCN (call it t_b) is now higher than t_a (the read-consistent SCN of the long-running query) preventing a read-consistent image due to the large number of COMMITS.
Bingo, the ORA-01555 rears its ugly head because the query is now attempting to access a block that has not been modified since the query began, but the high number of commits has overwritten the rollback data. Since Oracle is now unsure of whether this block is usable or not (even though it was the query, not some other transaction, which generated the UNDO), it throws the ORA-01555 error.
To prevent such things from happening after large data loads, batch updates or deletes a good idea would be to run DBMS_STATS to access the blocks and clean them out. It's also a good idea to run that anyway, since you've changed the data and such changes may have altered the 'picture' Oracle should see for those tables and indexes.
LOB updates are a different matter entirely, and can also throw the ORA-01555 error for the same reason but using a different mechanism. Before images of LOB data are stored in the segment itself provided that MAXEXTENTS has not been reached or the tablespace containing the LOB segments has not filled to capacity. Additionally Oracle keeps PCTVERSION of the storage allocated for LOBs for before images; if PCTVERSION is set to too low of a value older images will be overwritten and, you guessed it, an ORA-01555 error appears. The PCTVERSION setting reserves that percentage of the total number of chunks of LOB data which have been allocated during the update process. Let's say you have 400 chunks already allocated, PCTVERSION is 5 and you need to update 33 chunks. 33 additional chunks are allocated for the before images of that data. This transaction succeeds and commits, freeing those 33 chunks for the next update. But, wait, PCTVERSION is set to 5 so 5% of 400 (20) of those chunks can't be touched to preserve the before images. This leaves 13 chunks of the previous allocation available for the next update transaction. If the next update affects 20 chunks then 7 additional chunks need to be allocated to complete that request. For an active table with LOB data this could continue on and on, where Oracle reuses some chunks from a prior update but also allocates additional chunks, until MAXEXTENTS is reached, the tablespace is filled or the series of update transactions comes to an end. A long-running select against that data will probably need those now-overwritten before images and, because they're no longer available both an ORA-01555 ("snapshot too old, rollback segment too small") and an ORA-22924 ("snapshot too old", relating to the PCTVERSION setting) are raised. This occurrence of an ORA-01555 isn't corrected using the methods listed previously, it's corrected by increasing the PCTVERSION for the LOB segment in question:
SQL> alter table lobex modify lob (isalob) (pctversion 10);
Table altered.
SQL>
Verifying the setting has been changed:
SQL> select table_name, column_name, pctversion
2 from user_lobs
3 where table_name = 'LOBEX'
4 /
TABLE_NAME COLUMN_NAME PCTVERSION
------------------------------ -------------------- ----------
LOBEX ISALOB 10
SQL>
So what if you query USER_LOBS and find that PCTVERSION is NULL? The LOB has been created with the RETENTION property set which retains the old versions of the LOB data for a period of time, and you'll see the RETENTION column of the USER_LOBS view populated with the current UNDO_RETENTION value:
SQL> select table_name, column_name, pctversion, retention
2 from user_lobs
3 where pctversion is null
4 /
TABLE_NAME COLUMN_NAME PCTVERSION RETENTION
------------------------------ -------------------- ---------- ----------
LOBEX2 ISALOB 900
SQL>
Should this be the case the solution to curing the ORA-01555 is back to the usual remedy of increasing the UNDO_RETENTION, which in turn increases the RETENTION for the old LOB versions. You can, of course, also modify the given LOB segment to set the pctversion, as shown above. The choice is yours, however managing heavily updated LOB data may be easier with RETENTION set rather than configuring the PCTVERSION.A rather nasty bug (2643723) affects LOB segments in Oracle versions 9.2.0.1, 9.2.0.2 and 9.2.0.3 and LOB segments in version 9.2.0.4 (3213101) that use Auto Segment Space Management (ASSM). Concurrent writes to the same LOB segment can create data corruption in that LOB causing an ORA-01555 to be displayed. Metalink Note 253131.1 describes the problem and provides diagnostic code to determine if the error is due to the segment corruption bug.
The ORA-01555 error may have a simple list of causes, but the solutions may be far from simply adjusting the UNDO segment size or resetting the undo_retention parameter. I wish more people offering 'advice' would realize this. There would be a lot less myth and a lot more usable information.
1 comments:
This post is awaesome and very enlighting. Thank you very much for your work.
Post a Comment