Tag Archives: Transactions

enq: TM – contention due to parallel DML and foreign keys

This is a write-up of an issue I recently posted to the OTN discussion forum (https://community.oracle.com/message/12798407). I thought the associated test case was useful in demonstrating the issue, so is captured here for future reference. There were some useful replies to the OTN post, confirming my suspicions.

The test was performed using Oracle Enterprise Edition 11.2.0.3.0 on Linux.

Problem statement

A parallel delete blocks insert into dependent table, and shows up as waits on ‘enq: TM – contention’. The scenario is as follows.

  • Session S1 is performing a parallel delete from table T1.
  • Session S2 is performing a single-row insert into table T2.
  • Table T2 has a foreign key referencing table T1. The foreign key column is indexed.
  • Neither table is partitioned.
  • The row being inserted into T2 does not depend on any of the rows being deleted from T1.
  • Session S2 is blocked by session S1 until the parallel delete is committed, with an ‘enq: TM – contention’ wait event.

Analysis

Due to the presence of a foreign key, it is necessary for S2 to ensure there is a valid parent record in T1 before permitting the insert into T2.

  • In the case of a serial delete from S1, the process locks the affected rows in T1. Consequently, providing S2 is inserting records which are not affected by the delete, it is not blocked.
  • In the case of a parallel delete from S1, the process locks the entire table with an exclusive (mode 6) lock. Consequently, S2 cannot insert any records into T2 and is blocked until S1 commits the transaction.

This is demonstrated in the following test case.

-- Set up tables and data

CREATE TABLE t1
( id NUMBER
, data VARCHAR2(4000)
, CONSTRAINT pk_t1 PRIMARY KEY (id)
);

CREATE TABLE t2
( id NUMBER
, t1_id NUMBER
, CONSTRAINT pk_t2 PRIMARY KEY (id)
, CONSTRAINT fk_t2_t1 FOREIGN KEY (t1_id) REFERENCES t1
);

CREATE INDEX ix_t2_fk ON t2 (t1_id);

-- Insert 1,000 rows into T1
INSERT INTO t1
SELECT level
     , RPAD('X',4000,'X')
FROM   dual
CONNECT BY level <= 1000;

-- Insert 10 rows into T2
INSERT INTO t2
SELECT level
     , level
FROM   dual
CONNECT BY level <= 10;


EXEC dbms_stats.gather_table_stats(user,'T1')
EXEC dbms_stats.gather_table_stats(user,'T2')

COMMIT;

Firstly, insert into a related table whilst performing a serial delete.

-- Session 1 - serial delete
ALTER SESSION DISABLE PARALLEL DML;

DELETE FROM t1
WHERE id BETWEEN 901 AND 1000;
-- No commit

-- Session 2
INSERT INTO t2 VALUES (11,11);
-- No wait event

-- Check locks
SELECT session_id
     , lock_type
     , mode_held
     , mode_requested
     , lock_id1
     , lock_id2
     , blocking_others
     , object_type ||' ' || object_name AS locked_object
FROM   dba_locks
LEFT JOIN user_objects
ON     lock_id1 = object_id
WHERE  lock_type IN ('DML','Transaction')
AND    session_id IN (&1,&2.)
ORDER BY 1,2,5;

/*
1 DML Row-X (SX) None 75618 0 Not Blocking TABLE T1
1 DML Row-X (SX) None 75620 0 Not Blocking TABLE T2
1 Transaction Exclusive None 262158 669 Not Blocking
2 DML Row-X (SX) None 75618 0 Not Blocking TABLE T1
2 DML Row-X (SX) None 75620 0 Not Blocking TABLE T2
2 Transaction Exclusive None 327680 830 Not Blocking
*/

-- Session 2
COMMIT;

Session 2 is able to insert a record into the dependent table T2 without encountering contention from Session 1. The serial delete in session 1 takes a share lock on T1 (SX), so session S2 is able to also take a share lock on T1 to validate the foreign key constraint.

Now see what happens when parallel DML is used to delete records in Session 1.

-- Session 1 - parallel delete
COMMIT;
ALTER SESSION ENABLE PARALLEL DML;

DELETE /*+ PARALLEL */ FROM t1
WHERE id BETWEEN 801 AND 900;
-- No commit

-- Session 2
INSERT INTO t2 VALUES (12,12);
-- Lock-wait
-- Check locks again
/*
1 DML Exclusive None 75618 0 Blocking TABLE T1 <-- this is the TM-X (mode 6) blocking lock
1 DML Row-X (SX) None 75620 0 Not Blocking TABLE T2
1 Transaction Exclusive None 458759 650 Not Blocking
2 DML None Row-X (SX) 75618 0 Not Blocking TABLE T1 <-- this is the blocked session
*/

-- Check wait events
SELECT event
FROM   v$session
WHERE  sid = &2.;
-- event = enq: TM - contention

Session 2 cannot insert into table T2 because it is blocked by session 1 deleting from table T1. The parallel delete has taken a table exclusive lock (X) on the table, so no other process can get a lock, even the SX share lock required to validate the foreign key. The insert from session 2 is blocked until session 1 commits.

-- Session 1
COMMIT;

-- Session 2
-- Insert completes successfully

This test case has shown that parallel DML can create contention, not only for the table being written to, but also to other tables related by foreign keys. This should be borne in mind when deciding whether to use parallel DML to speed-up data maintenance processes.

ORA-01555 Snapshot Too Old and Delayed Block Cleanout

The ORA-01555 error (snapshot too old) normally occurs when you try to query a table whilst it’s being changed. If the necessary undo to provide a read-consistent view of data is no longer available, then an ORA-01555 error will occur. This is typically due to bad design practices such as fetching across commits. However it is possible to encounter the same error when the table is not being changed. This is due to a database feature called Delayed Block Cleanout. The purpose of this post is to describe this scenario in more detail, so that it can be recognised should it occur.

This post provides an overview of the delayed block cleanout mechanism and the read-consistency mechanism, and how together these can result in an ORA-01555 error. It is not intended to be a comprehensive description of the two mechanisms, which is far more complex than I could possibly hope to manage in a simple blog post on a Friday afternoon. However, this post should hopefully provide enough information to establish that it might be happening, and provide enough context to explain this to an irate customer or project manager.

The scenario

The usual explanation for ORA-01555 is that data is being changed whilst it’s being read, and there is insufficient undo to satisfy read-consistent requests. However, ORA-01555 can occur whilst querying a table whilst no other processes are touching the table. This seems counter-intuitive – why would undo be required when the table is static and unchanging? Enter Delayed Block Cleanout…

Delayed block cleanout

Data is stored in blocks of a fixed size, usually 8KB in most databases. Each block consists of a header and some data. The header contains a list of transactions which involved changing records in the block (inserts, updates, or deletes). This is called the Interested Transaction List (ITL) and can be seen by dumping out the contents of a database block.

Ordinarily, when writing small amounts of data, the database is able to keep the block headers up-to-date, so that subsequent processes know exactly when a particular change was committed.

However, when loading a large volume of data (typically more than 10% of the SGA size), the database will continue to write blocks to disk whilst the operation is ongoing. At the time the data is written to disk, the ITL in the blocks show an active transaction (because the data was not committed at the time it was was written). Upon committing, rather than going back over all of the written blocks and updating the ITLs (which would make the commit VERY slow), the database leaves it to the next process which comes along to “clean-out” the block header. This is delayed block cleanout.

The cleanout process is triggered as soon as a block is read. If a table was loaded with delayed block cleanout, any process reading all of the table at a later time will cause the blocks to be cleaned out. This includes optimiser stats gathering (providing the entire table is read), or a query involving a full table scan. However, if the table is not touched, then in theory, the data could remain un-cleaned for some time.

The cleanout process involves checking active transactions in the ITL. If an active transaction is found in the ITL, it has to be double-checked to see if it really is active. This is done by checking for active and recent transactions in the undo tablespace. If an active transaction is found, the ITL is left alone. If a recently committed transaction is encountered, then the ITL is updated to reflect this, recording the SCN when the transactions was committed. If the transaction cannot be found in the database, then it can be assumed that it was committed a long time ago. Since there is no longer a record of the SCN for the transaction, the best the database can do is estimate an “upper bound”, based on the oldest SCN in the database. So if the oldest transaction in the database was at 1.15pm, we can be sure that any older transactions must have been committed before 1.15pm. The upper-bound SCN corresponding to 1.15pm would be recorded in the ITL.

Read consistency

The database guarantees read consistency for every query. This means that a query which began at 2.30pm will only use records which were current at 2.30pm. If the query takes two hours to finish, and another process changes a record at 4.00pm, the change will not be visible to the long-running query. It is only interested in the way things were at 2.30pm. So each time the query reads a block of data from disk, it checks the ITL to determine whether the data was valid at 2.30pm. By checking each transaction in the ITL, it can determine whether the data in the block is involved in an active transaction (uncommitted), recently committed, or commited a long time ago.

Possible outcomes

For a long running query starting at 2.30pm, each time a block of data is read, one of the following outcomes is possible for each transaction found in the ITL.

  1. If the transaction was committed before 2.30pm, then it can safely assume all the data in the block was current as of 2.30pm, and it can be used by the query.
  2. If the transaction was committed after 2.30pm then the data cannot be used as-is, and the database must retrieve undo data to find out what the data really looked like at 2.30pm.
    1. The records were created after 2.30pm, so are not used in the query
    2. The records were modified since 2.30pm, so are rolled-back to the way they were at 2.30pm
    3. The records were modified since 2.30pm, but the undo records are no longer available. In this case, the query cannot continue and the “classic” ORA-01555 error is raised. This arises from a combination of too many changes being made on the table being queried whilst the query is running, and not enough undo space to keep all the changes to permit rollback for consistent read.
  3. If the transaction is still showing as Active (i.e. uncommitted), then it must determine the actual state of that transaction in the undo tablespace. One of four things may happen depending on what it finds.
    1. It finds the transaction is not yet committed, so the data cannot be used as-is. As per (2) above, it must use undo to rollback to 2.30pm.
    2. It finds the transaction committed after 2.30pm, so the data cannot be used as-is. As per (2) above, it must use undo to rollback to 2.30pm.
    3. It finds the transaction committed before 2.30pm, so the data can be used as is.
    4. It can’t find the transaction at all, meaning it must have been overwritten by more recent transactions. In order to decide if it can be used as-is, it finds the oldest transaction still on record (the “upper bound” described previously), and deduces that it must have committed before this. It now uses this point in time to determine whether the records can be used for the 2.30pm query.
      1. If the oldest transaction committed before 2.30pm, then it can be sure that the data in this block was definitely committed before 2.30pm, so it can be used.
      2. If the oldest transaction committed after 2.30pm, then it has no way of knowing whether the data in this block was committed before or after 2.30pm, so an ORA-01555 error is raised. This arises because the query started before the oldest known transaction, so it can’t determine whether a particular row is valid or not.

So that’s the cause of the ORA-01555. After encountering a block of data which had not been cleaned out, it knows the data was committed, but doesn’t know whether it was committed before or after the query started, so it can’t decide whether to use the data or not, and throws the error.

Criteria and solutions

Based on the above analysis, the criteria for this to occur, and the solutions to prevent it from occurring, should make more sense.

For an ORA-01555 to occur from delayed block cleanout, the following criteria must be met.

  • A large volume of data was loaded and committed, and has not been touched since (no queries, no stats gathering).
  • A long running query begins on the data, which begins reading the uncleaned blocks.
  • Meanwhile, many other smaller transactions occur on other tables in the database.
  • Due to the volume of transactions, the oldest transaction in the undo tablespace occurred after the start of the long-running query.
  • The query reaches a block requiring cleanout, and is unable to determine if it can be used or not, so fails with ORA-01555.

The following steps can be taken to reduce the risk of an ORA-01555 error occurring.

  • Rewrite the long-running query to run more quickly
  • Touch all the data before starting the query (stats gathering or force a full table scan)
  • Reduce the volume of other transactions on the database
  • Increase the amount of undo space

Conclusions

Hopefully this provides a high-level explanation of why a seemingly innocuous query on an otherwise untouched table can result in an ORA-01555. There is a lot more complexity to the delayed block cleanout and read consistency mechanisms, and I would refer to the following two great books for a comprehensive explanation.

Oracle Core: Essential Internals for DBAs and Developers by Jonathan Lewis (Apress 2011)

Expert Oracle Database Architecture: Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions, Second Edition by Thomas Kyte (Apress 2010)