ORA-01555 Snapshot Too Old and Delayed Block Cleanout

The ORA-01555 error (snapshot too old) normally occurs when you try to query a table whilst it’s being changed. If the necessary undo to provide a read-consistent view of data is no longer available, then an ORA-01555 error will occur. This is typically due to bad design practices such as fetching across commits. However it is possible to encounter the same error when the table is not being changed. This is due to a database feature called Delayed Block Cleanout. The purpose of this post is to describe this scenario in more detail, so that it can be recognised should it occur.

This post provides an overview of the delayed block cleanout mechanism and the read-consistency mechanism, and how together these can result in an ORA-01555 error. It is not intended to be a comprehensive description of the two mechanisms, which is far more complex than I could possibly hope to manage in a simple blog post on a Friday afternoon. However, this post should hopefully provide enough information to establish that it might be happening, and provide enough context to explain this to an irate customer or project manager.

The scenario

The usual explanation for ORA-01555 is that data is being changed whilst it’s being read, and there is insufficient undo to satisfy read-consistent requests. However, ORA-01555 can occur whilst querying a table whilst no other processes are touching the table. This seems counter-intuitive – why would undo be required when the table is static and unchanging? Enter Delayed Block Cleanout…

Delayed block cleanout

Data is stored in blocks of a fixed size, usually 8KB in most databases. Each block consists of a header and some data. The header contains a list of transactions which involved changing records in the block (inserts, updates, or deletes). This is called the Interested Transaction List (ITL) and can be seen by dumping out the contents of a database block.

Ordinarily, when writing small amounts of data, the database is able to keep the block headers up-to-date, so that subsequent processes know exactly when a particular change was committed.

However, when loading a large volume of data (typically more than 10% of the SGA size), the database will continue to write blocks to disk whilst the operation is ongoing. At the time the data is written to disk, the ITL in the blocks show an active transaction (because the data was not committed at the time it was was written). Upon committing, rather than going back over all of the written blocks and updating the ITLs (which would make the commit VERY slow), the database leaves it to the next process which comes along to “clean-out” the block header. This is delayed block cleanout.

The cleanout process is triggered as soon as a block is read. If a table was loaded with delayed block cleanout, any process reading all of the table at a later time will cause the blocks to be cleaned out. This includes optimiser stats gathering (providing the entire table is read), or a query involving a full table scan. However, if the table is not touched, then in theory, the data could remain un-cleaned for some time.

The cleanout process involves checking active transactions in the ITL. If an active transaction is found in the ITL, it has to be double-checked to see if it really is active. This is done by checking for active and recent transactions in the undo tablespace. If an active transaction is found, the ITL is left alone. If a recently committed transaction is encountered, then the ITL is updated to reflect this, recording the SCN when the transactions was committed. If the transaction cannot be found in the database, then it can be assumed that it was committed a long time ago. Since there is no longer a record of the SCN for the transaction, the best the database can do is estimate an “upper bound”, based on the oldest SCN in the database. So if the oldest transaction in the database was at 1.15pm, we can be sure that any older transactions must have been committed before 1.15pm. The upper-bound SCN corresponding to 1.15pm would be recorded in the ITL.

Read consistency

The database guarantees read consistency for every query. This means that a query which began at 2.30pm will only use records which were current at 2.30pm. If the query takes two hours to finish, and another process changes a record at 4.00pm, the change will not be visible to the long-running query. It is only interested in the way things were at 2.30pm. So each time the query reads a block of data from disk, it checks the ITL to determine whether the data was valid at 2.30pm. By checking each transaction in the ITL, it can determine whether the data in the block is involved in an active transaction (uncommitted), recently committed, or commited a long time ago.

Possible outcomes

For a long running query starting at 2.30pm, each time a block of data is read, one of the following outcomes is possible for each transaction found in the ITL.

  1. If the transaction was committed before 2.30pm, then it can safely assume all the data in the block was current as of 2.30pm, and it can be used by the query.
  2. If the transaction was committed after 2.30pm then the data cannot be used as-is, and the database must retrieve undo data to find out what the data really looked like at 2.30pm.
    1. The records were created after 2.30pm, so are not used in the query
    2. The records were modified since 2.30pm, so are rolled-back to the way they were at 2.30pm
    3. The records were modified since 2.30pm, but the undo records are no longer available. In this case, the query cannot continue and the “classic” ORA-01555 error is raised. This arises from a combination of too many changes being made on the table being queried whilst the query is running, and not enough undo space to keep all the changes to permit rollback for consistent read.
  3. If the transaction is still showing as Active (i.e. uncommitted), then it must determine the actual state of that transaction in the undo tablespace. One of four things may happen depending on what it finds.
    1. It finds the transaction is not yet committed, so the data cannot be used as-is. As per (2) above, it must use undo to rollback to 2.30pm.
    2. It finds the transaction committed after 2.30pm, so the data cannot be used as-is. As per (2) above, it must use undo to rollback to 2.30pm.
    3. It finds the transaction committed before 2.30pm, so the data can be used as is.
    4. It can’t find the transaction at all, meaning it must have been overwritten by more recent transactions. In order to decide if it can be used as-is, it finds the oldest transaction still on record (the “upper bound” described previously), and deduces that it must have committed before this. It now uses this point in time to determine whether the records can be used for the 2.30pm query.
      1. If the oldest transaction committed before 2.30pm, then it can be sure that the data in this block was definitely committed before 2.30pm, so it can be used.
      2. If the oldest transaction committed after 2.30pm, then it has no way of knowing whether the data in this block was committed before or after 2.30pm, so an ORA-01555 error is raised. This arises because the query started before the oldest known transaction, so it can’t determine whether a particular row is valid or not.

So that’s the cause of the ORA-01555. After encountering a block of data which had not been cleaned out, it knows the data was committed, but doesn’t know whether it was committed before or after the query started, so it can’t decide whether to use the data or not, and throws the error.

Criteria and solutions

Based on the above analysis, the criteria for this to occur, and the solutions to prevent it from occurring, should make more sense.

For an ORA-01555 to occur from delayed block cleanout, the following criteria must be met.

  • A large volume of data was loaded and committed, and has not been touched since (no queries, no stats gathering).
  • A long running query begins on the data, which begins reading the uncleaned blocks.
  • Meanwhile, many other smaller transactions occur on other tables in the database.
  • Due to the volume of transactions, the oldest transaction in the undo tablespace occurred after the start of the long-running query.
  • The query reaches a block requiring cleanout, and is unable to determine if it can be used or not, so fails with ORA-01555.

The following steps can be taken to reduce the risk of an ORA-01555 error occurring.

  • Rewrite the long-running query to run more quickly
  • Touch all the data before starting the query (stats gathering or force a full table scan)
  • Reduce the volume of other transactions on the database
  • Increase the amount of undo space

Conclusions

Hopefully this provides a high-level explanation of why a seemingly innocuous query on an otherwise untouched table can result in an ORA-01555. There is a lot more complexity to the delayed block cleanout and read consistency mechanisms, and I would refer to the following two great books for a comprehensive explanation.

Oracle Core: Essential Internals for DBAs and Developers by Jonathan Lewis (Apress 2011)

Expert Oracle Database Architecture: Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions, Second Edition by Thomas Kyte (Apress 2010)

Advertisements

One thought on “ORA-01555 Snapshot Too Old and Delayed Block Cleanout

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s