Joining to a pipelined table function and “left correlation”

Oracle 11.2.0.4

A pipelined table function may be called from regular SQL using the TABLE collection expression, e.g.

SELECT *
FROM   TABLE(my_pipelined_function('ABC','DEF'));

where ‘ABC’ and ‘DEF’ are the inputs to the function.

What if you want to call the function repeatedly for several sets of inputs, e.g. testing the function for a variety of values? If those inputs are stored in a table somewhere, it ought to be as easy as doing a cross join to the table.

SELECT *
FROM   TABLE(my_pipelined_function(a,b))
CROSS JOIN my_input_table;

However, this returns the error ORA-00904: invalid identifier. It’s as though the SQL engine is unable to resolve the relationship between the inputs to the function and the joining table.

Today I stumbled across the answer, buried deep in the Oracle 11g SQL Language Reference.

The collection_expression can reference columns of tables defined to its left in the FROM clause. This is called left correlation.

So it seems that by reversing the order of the join so that the joining table comes before the function (i.e. to the left of it in SQL parlance), the SQL engine is able to figure out the join.

SELECT *
FROM   my_input_table
CROSS JOIN TABLE(my_pipelined_function(a,b));

Who ever knew that join order mattered?

Worked example

-- Simple pipelined function to return a couple of records per input
SQL> CREATE OR REPLACE FUNCTION my_pipelined_function
  2  ( a VARCHAR2
  3  , b VARCHAR2
  4  )
  5  RETURN SYS.DBMS_DEBUG_VC2COLL PIPELINED
  6  IS
  7  BEGIN
  8    PIPE ROW(UPPER(a)||LOWER(b));
  9    PIPE ROW(LOWER(a)||UPPER(b));
 10    RETURN;
 11  END;
 12  /

-- Single call to the function
SQL> SELECT *
  2  FROM   TABLE(my_pipelined_function('ABC','DEF'));

COLUMN_VALUE
---------------
ABCdef
abcDEF

-- Join to an input rowset
SQL> WITH src AS (
  2    SELECT 'ABC' AS a, 'DEF' AS b FROM dual
  3    UNION ALL
  4    SELECT 'GHI' AS a, 'JKL' AS b FROM dual
  5    UNION ALL
  6    SELECT 'MNO' AS a, 'PQR' AS b FROM dual
  7  )
  8  SELECT *
  9  FROM   TABLE(my_pipelined_function(a,b))
 10  CROSS JOIN src;
CROSS JOIN src
             *
ERROR at line 10:
ORA-00904: "A": invalid identifier

-- Reverse the order of join
SQL> WITH src AS (
  2    SELECT 'ABC' AS a, 'DEF' AS b FROM dual
  3    UNION ALL
  4    SELECT 'GHI' AS a, 'JKL' AS b FROM dual
  5    UNION ALL
  6    SELECT 'MNO' AS a, 'PQR' AS b FROM dual
  7  )
  8  SELECT *
  9  FROM   src
 10  CROSS JOIN TABLE(my_pipelined_function(a,b));

A   B   COLUMN_VALUE
--- --- ---------------
ABC DEF ABCdef
ABC DEF abcDEF
GHI JKL GHIjkl
GHI JKL ghiJKL
MNO PQR MNOpqr
MNO PQR mnoPQR

References

http://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_10002.htm#sthref6754

Advertisements

Standard Edition is dead – long live Standard Edition 2

Update 8th July – the plot thickens

After posting this blog two days ago, it seems Oracle Corp. may be taking a little time out to “clarify” their position. The original MOS note referenced below is no longer available (despite the link still sitting in my favourites in MOS), and furthermore, the helpful and rather detailed blog post by Mike Dietrich (News on Oracle Database STANDARD EDITION 12.1.0.2) has been redacted to a “watch this space” note.

So I guess we’ll have to watch this space…

My original blog posting from 6th July remains below.

Disclaimer – these are my personal musings based on very limited information, which I’m sure will rapidly become out of date as more details emerge over the coming months.

After a year of uncertainty, Oracle has finally announced the future of Standard Edition (SE). Along with its smaller sibling Standard Edition 1 (SE1), it is being retired and replaced by Standard Edition 2 (SE2). Perhaps the most significant point of this change is that SE2 can only be run on one- or two-socket servers, which raises interesting questions for existing SE deployments on four-socket servers.

When 12.1.0.2 was released in July 2014, it was only available for Enterprise Edition (EE). No date was given for SE and SE1, and with patching for 12.1.0.1 scheduled to end in July 2015, there was an uncertain future for SE and SE1. With Extended Support fees waived on 11g for a year, it seemed like sticking with 11g was a better option for SE and SE1 than moving to 12c with no clear future. However, with 11g ending Premier Support in early 2015, things were beginning to look very bleak for Oracle’s most cost-effective database platforms.

After months of silence, a MOS note was quietly published in early July 2015 (2027072.1) with a rather surprise announcement. The key points I took from this brief note are :

  • 12.1.0.1 is the final release for SE and SE1
  • Beginning with 12.1.0.2, SE and SE1 will be superseded by SE2
  • SE2 can be run on a maximum of 2 sockets, and can optionally support a two-node RAC cluster
  • Customers running SE or SE1 will have to migrate their licenses to SE2 if they wish to upgrade to 12.1.0.2 (i.e. if they wish to remain supported beyond January 2016)
  • SE2 is scheduled for release in Q3 2015
  • Patching for 12.1.0.1 SE and SE1 will be extended until January 2016, beyond the originally published date of July 2015

So what does this mean for SE and SE1 users?

Firstly, there’s a small lifeline, in that 12.1.0.1 will be patched for another six months. However, if like me you’ve stuck with 11g and haven’t yet upgraded to 12c, is it really worth upgrading for a further six months, when 11g remains in (fees-waived) Extended Support for the same period?

The most interesting point is what happens to systems running SE on a 4-socket system, or a 2×2-socket RAC system. When the lights finally go out on SE and SE1 in January 2016, you will have to migrate to different hardware to keep within the 2-socket limit. Alternatively you may be faced with a large bill to upgrade to EE.

There’s been no mention of pricing yet. It’s been suggested that the cost of SE2 could fall somewhere between SE1 and SE. However, with the MOS note saying you have to migrate your licenses in order to upgrade, I wonder how many customers will end up paying Oracle for the privilege of migrating. Cue some lengthy discussions with your favourite account manager.

What will SE2 look like? Will it keep to the same limited set of features as SE/SE1, or will a few EE features or options be allowed to creep in? Parallel execution would be a nice treat. Will SE2 offer any improved performance monitoring capabilities, or will we still be forced to hunt around for the install script for Statspack?

Finally, will Oracle use this as an opportunity to retire the per-socket metric for licensing of SE/SE1, instead moving to per-core licensing, as is done with EE? With the advance of multi-core chips, it seems inevitable at some point. Again, this could adversely impact customers who have invested in high-end multi-core chips.

I’m no lawyer, but having recently been debating perpetual vs term licenses, it makes me wonder what “Perpetual” really means, when support for said license is no longer available. I’m sure it’s in the small print somewhere.

ALL or ANY in SQL

The other day I was studying a SQL statement that was performing poorly, when I noticed a peculiar syntax that I have not come across before :

SELECT COUNT(*)
FROM   customer
WHERE  annual_spend > ALL (
  SELECT spend_threshold
  FROM   promotion
  WHERE  promo_name = 'Summer 2015'
);

The only time I’d ever come across the keyword ALL in SQL was in UNION ALL. However, a quick check of the manual led me to the Group Comparison Conditions :

http://docs.oracle.com/database/121/SQLRF/conditions002.htm#sthref1005

There are three possible comparisons – ALL, ANY and SOME (although ANY and SOME are identical). They allow you to compare an expression with any or all members of a list or subquery, and are defined as follows :

  • ANY, SOME – Compares a value to each value in a list or returned by a query. Must be preceded by =, !=, >, <, =. Can be followed by any expression or subquery that returns one or more values.
  • ALL – Compares a value to every value in a list or returned by a query. Must be preceded by =, !=, >, <, =. Can be followed by any expression or subquery that returns one or more values.

So how might these be useful? I’ll look at three possible use cases, and for each, consider the alternatives and look at potential performance implications. For the examples given, I’m using the OE and HR sample schema supplied with Oracle 12.1.0.2.

Equality

Using the “equality test” (=) we are testing whether a value matches one or more values in the subquery or expression list.

First, using the ALL comparison in the OE schema :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id = ALL (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
(null)

This first query returns no results. Given that the subquery returns several different product IDs, then it is logically impossible for a single value to match all the results.

Next, using the ANY comparison :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id = ANY (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
         2235

This query returns the quantity sold of all products in category 11. However, the same result can be obtained using the more familiar IN-clause.

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id IN (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
         2235

Let’s look at the execution plans of the ANY and the IN queries. For the sample data in OE, both queries return the same execution plan :

Plan hash value: 2275080078
 
-------------------------------------------------------------------------------------------
| Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     1 |    15 |     8   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE     |                     |     1 |    15 |            |          |
|*  2 |   HASH JOIN         |                     |    58 |   870 |     8   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| PRODUCT_INFORMATION |    16 |   112 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| ORDER_ITEMS         |   665 |  5320 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("PRODUCT_ID"="PRODUCT_ID")
   3 - filter("CATEGORY_ID"=11)

So, it seems that with the equality condition, ALL and ANY offer no benefits over the traditional IN-clause.

Inequality

Using the “inequality test”, which can be expressed as !=, ^= or , we are testing whether a value does not match one or more values in the subquery or expression list.

First, using the ALL comparison :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id != ALL (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
        27852

This first query returns the quantity of all products not in category 11. The same could be achieved with the more familiar NOT IN clause :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id NOT IN (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
        27852

Next, using the ANY comparison :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id != ANY (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
        30087

This query returns the quantity of products which don’t match one of the products in category 11. However, this also happens to be the total quantity of all products, since products which don’t match one product in category 11 match another product :

SELECT SUM(quantity)
FROM   order_items;

SUM(QUANTITY)
-------------
        30087

So arguably, combining != with ANY is not particularly useful.

Returning to != ALL, how does the execution plan compare with NOT IN? For the sample data in OE, both queries return the same execution plan :

Plan hash value: 712943640
 
---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    15 |     8   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |                     |     1 |    15 |            |          |
|*  2 |   HASH JOIN RIGHT ANTI|                     |   665 |  9975 |     8   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |    16 |   112 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | ORDER_ITEMS         |   665 |  5320 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("PRODUCT_ID"="PRODUCT_ID")
   3 - filter("CATEGORY_ID"=11)

So, it seems that with the inequality condition, ALL and ANY offer no benefits over the traditional NOT IN-clause.

Greater-than, less-than

Finally, consider the greater-than, less-than, greater-than-or-equals, and less-than-or-equals operators. Using greater-than as an example, we are testing whether a value exceeds one or more values in the subquery or expression list.

First, using the ALL comparison in the HR schema :

SELECT COUNT(*)
FROM   employees
WHERE  salary > ALL (
  SELECT salary
  FROM   employees
  WHERE  department_id = 50
);

  COUNT(*)
----------
        31

This first query returns employees with a salary greater than all the employees in department 50. However, the same can be achieved using the MAX aggregate function to obtain the highest salary in department 50 :

SELECT COUNT(*)
FROM   employees
WHERE  salary > (
  SELECT MAX(salary)
  FROM   employees
  WHERE  department_id = 50
);

  COUNT(*)
----------
        31

Let’s see how the execution plans compare for these two approaches.

First, using the ALL comparison :

Plan hash value: 1561682600
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |    11 |     8  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE      |           |     1 |    11 |            |          |
|   2 |   MERGE JOIN ANTI NA |           |     1 |    11 |     8  (25)| 00:00:01 |
|   3 |    SORT JOIN         |           |   107 |   428 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMPLOYEES |   107 |   428 |     3   (0)| 00:00:01 |
|*  5 |    SORT UNIQUE       |           |    45 |   315 |     4  (25)| 00:00:01 |
|*  6 |     TABLE ACCESS FULL| EMPLOYEES |    45 |   315 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access("SALARY"<="SALARY")
       filter("SALARY"<="SALARY")
   6 - filter("DEPARTMENT_ID"=50)

Next, using the MAX function :

Plan hash value: 1008057839
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |     4 |     6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE      |           |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | EMPLOYEES |     5 |    20 |     3   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |           |     1 |     7 |            |          |
|*  4 |     TABLE ACCESS FULL| EMPLOYEES |    45 |   315 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("SALARY"> (SELECT MAX("SALARY") FROM "EMPLOYEES" 
              "EMPLOYEES" WHERE "DEPARTMENT_ID"=50))
   4 - filter("DEPARTMENT_ID"=50)

The query using MAX appears to be more efficient. It computes the highest value from department 50, and then compares this with the employees table. In contrast, the query using ALL has to perform a comparison with every employee in department 50. This additional overhead could be significant for large datasets.

With the ANY comparison, we are looking for employees whose salary is greater than ANY employees in department 50.

SELECT COUNT(*)
FROM   employees
WHERE  salary > ANY (
  SELECT salary
  FROM   employees
  WHERE  department_id = 50
);

  COUNT(*)
----------
       106

This can similarly be optimised by using the MIN aggregate function.

SELECT COUNT(*)
FROM   employees
WHERE  salary > (
  SELECT MIN(salary)
  FROM   employees
  WHERE  department_id = 50
);

  COUNT(*)
----------
       106

A similar improvement can be seen in the execution plans.

Using the ANY comparison :

Plan hash value: 67141836
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |    11 |     8  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE      |           |     1 |    11 |            |          |
|   2 |   MERGE JOIN SEMI    |           |   105 |  1155 |     8  (25)| 00:00:01 |
|   3 |    SORT JOIN         |           |   107 |   428 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMPLOYEES |   107 |   428 |     3   (0)| 00:00:01 |
|*  5 |    SORT UNIQUE       |           |    45 |   315 |     4  (25)| 00:00:01 |
|*  6 |     TABLE ACCESS FULL| EMPLOYEES |    45 |   315 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access(INTERNAL_FUNCTION("SALARY")>INTERNAL_FUNCTION("SALARY"))
       filter(INTERNAL_FUNCTION("SALARY")>INTERNAL_FUNCTION("SALARY"))
   6 - filter("DEPARTMENT_ID"=50)

Using the MIN aggregate function :

Plan hash value: 1008057839
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |     4 |     6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE      |           |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | EMPLOYEES |     5 |    20 |     3   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |           |     1 |     7 |            |          |
|*  4 |     TABLE ACCESS FULL| EMPLOYEES |    45 |   315 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("SALARY"> (SELECT MIN("SALARY") FROM "EMPLOYEES" 
              "EMPLOYEES" WHERE "DEPARTMENT_ID"=50))
   4 - filter("DEPARTMENT_ID"=50)

Again, the use of an aggregate function appears to be more efficient by reducing the number of comparisons which have to be done.

Expression lists

The previous examples have focussed on comparisons with subqueries. The comparison conditions can also be used with expression lists, e.g.

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id = ANY (1820,1910,2255);

SUM(QUANTITY)
-------------
          103

However, this can also be replaced by the more familiar IN clause :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id IN (1820,1910,2255);

SUM(QUANTITY)
-------------
          103

The same applies to “!= ALL”, which can be replaced with NOT IN.

Again, there seems little benefit in using ALL or ANY compared with IN and NOT IN.

Conclusion

The group comparison conditions ALL or ANY can be used when comparing rows with the results of a subquery or expression list. However, the same can be achieved using more familiar constructs such as IN, NOT IN, MIN and MA, often more efficiently.

The following table summarises the alternatives to ALL or ANY :

Condition ALL ANY
Equality Illogical Use IN clause
Inequality Use NOT IN clause Illogical
Greater/Less Use MIN/MAX Use MIN/MAX

If you know of any examples where ALL or ANY are beneficial, please do share in the comments below!

EZCONNECT without a password

After encountering this little annoyance for the nth time, I thought I should write it up once and for all, so it is committed to my electronic memory.

The EZCONNECT naming method is a handy way of connecting to an Oracle database by its service name, which avoids the need for aliases in the TNSNAMES file, or lengthy SQL*Net connection strings :

SQL> connect user/password@hostname:port/service_name
Connected.

However, one would expect that if you don’t want to specify the password on the command line, then simply omit the password and await the prompt. But…

SQL> connect user@hostname:port/service_name
ERROR:
ORA-12504: TNS:listener was not given the SERVICE_NAME in CONNECT_DATA

Without the password present, the command interpreter parses the string differently, and (according to Oracle Metalink note 267427.1) treats the service_name as the password and the hostname as the service_name. Basically it’s not handling the syntax very well.

The solution is to quote the part of the connection string after the @ symbol. The actual quoting required may vary by environment, but something along the following lines works in Windows.

SQL> connect user@'hostname:port/service_name'
Enter password:
Connected.

enq: TM – contention due to parallel DML and foreign keys

This is a write-up of an issue I recently posted to the OTN discussion forum (https://community.oracle.com/message/12798407). I thought the associated test case was useful in demonstrating the issue, so is captured here for future reference. There were some useful replies to the OTN post, confirming my suspicions.

The test was performed using Oracle Enterprise Edition 11.2.0.3.0 on Linux.

Problem statement

A parallel delete blocks insert into dependent table, and shows up as waits on ‘enq: TM – contention’. The scenario is as follows.

  • Session S1 is performing a parallel delete from table T1.
  • Session S2 is performing a single-row insert into table T2.
  • Table T2 has a foreign key referencing table T1. The foreign key column is indexed.
  • Neither table is partitioned.
  • The row being inserted into T2 does not depend on any of the rows being deleted from T1.
  • Session S2 is blocked by session S1 until the parallel delete is committed, with an ‘enq: TM – contention’ wait event.

Analysis

Due to the presence of a foreign key, it is necessary for S2 to ensure there is a valid parent record in T1 before permitting the insert into T2.

  • In the case of a serial delete from S1, the process locks the affected rows in T1. Consequently, providing S2 is inserting records which are not affected by the delete, it is not blocked.
  • In the case of a parallel delete from S1, the process locks the entire table with an exclusive (mode 6) lock. Consequently, S2 cannot insert any records into T2 and is blocked until S1 commits the transaction.

This is demonstrated in the following test case.

-- Set up tables and data

CREATE TABLE t1
( id NUMBER
, data VARCHAR2(4000)
, CONSTRAINT pk_t1 PRIMARY KEY (id)
);

CREATE TABLE t2
( id NUMBER
, t1_id NUMBER
, CONSTRAINT pk_t2 PRIMARY KEY (id)
, CONSTRAINT fk_t2_t1 FOREIGN KEY (t1_id) REFERENCES t1
);

CREATE INDEX ix_t2_fk ON t2 (t1_id);

-- Insert 1,000 rows into T1
INSERT INTO t1
SELECT level
     , RPAD('X',4000,'X')
FROM   dual
CONNECT BY level <= 1000;

-- Insert 10 rows into T2
INSERT INTO t2
SELECT level
     , level
FROM   dual
CONNECT BY level <= 10;


EXEC dbms_stats.gather_table_stats(user,'T1')
EXEC dbms_stats.gather_table_stats(user,'T2')

COMMIT;

Firstly, insert into a related table whilst performing a serial delete.

-- Session 1 - serial delete
ALTER SESSION DISABLE PARALLEL DML;

DELETE FROM t1
WHERE id BETWEEN 901 AND 1000;
-- No commit

-- Session 2
INSERT INTO t2 VALUES (11,11);
-- No wait event

-- Check locks
SELECT session_id
     , lock_type
     , mode_held
     , mode_requested
     , lock_id1
     , lock_id2
     , blocking_others
     , object_type ||' ' || object_name AS locked_object
FROM   dba_locks
LEFT JOIN user_objects
ON     lock_id1 = object_id
WHERE  lock_type IN ('DML','Transaction')
AND    session_id IN (&1,&2.)
ORDER BY 1,2,5;

/*
1 DML Row-X (SX) None 75618 0 Not Blocking TABLE T1
1 DML Row-X (SX) None 75620 0 Not Blocking TABLE T2
1 Transaction Exclusive None 262158 669 Not Blocking
2 DML Row-X (SX) None 75618 0 Not Blocking TABLE T1
2 DML Row-X (SX) None 75620 0 Not Blocking TABLE T2
2 Transaction Exclusive None 327680 830 Not Blocking
*/

-- Session 2
COMMIT;

Session 2 is able to insert a record into the dependent table T2 without encountering contention from Session 1. The serial delete in session 1 takes a share lock on T1 (SX), so session S2 is able to also take a share lock on T1 to validate the foreign key constraint.

Now see what happens when parallel DML is used to delete records in Session 1.

-- Session 1 - parallel delete
COMMIT;
ALTER SESSION ENABLE PARALLEL DML;

DELETE /*+ PARALLEL */ FROM t1
WHERE id BETWEEN 801 AND 900;
-- No commit

-- Session 2
INSERT INTO t2 VALUES (12,12);
-- Lock-wait
-- Check locks again
/*
1 DML Exclusive None 75618 0 Blocking TABLE T1 <-- this is the TM-X (mode 6) blocking lock
1 DML Row-X (SX) None 75620 0 Not Blocking TABLE T2
1 Transaction Exclusive None 458759 650 Not Blocking
2 DML None Row-X (SX) 75618 0 Not Blocking TABLE T1 <-- this is the blocked session
*/

-- Check wait events
SELECT event
FROM   v$session
WHERE  sid = &2.;
-- event = enq: TM - contention

Session 2 cannot insert into table T2 because it is blocked by session 1 deleting from table T1. The parallel delete has taken a table exclusive lock (X) on the table, so no other process can get a lock, even the SX share lock required to validate the foreign key. The insert from session 2 is blocked until session 1 commits.

-- Session 1
COMMIT;

-- Session 2
-- Insert completes successfully

This test case has shown that parallel DML can create contention, not only for the table being written to, but also to other tables related by foreign keys. This should be borne in mind when deciding whether to use parallel DML to speed-up data maintenance processes.

ORA-01555 Snapshot Too Old and Delayed Block Cleanout

The ORA-01555 error (snapshot too old) normally occurs when you try to query a table whilst it’s being changed. If the necessary undo to provide a read-consistent view of data is no longer available, then an ORA-01555 error will occur. This is typically due to bad design practices such as fetching across commits. However it is possible to encounter the same error when the table is not being changed. This is due to a database feature called Delayed Block Cleanout. The purpose of this post is to describe this scenario in more detail, so that it can be recognised should it occur.

This post provides an overview of the delayed block cleanout mechanism and the read-consistency mechanism, and how together these can result in an ORA-01555 error. It is not intended to be a comprehensive description of the two mechanisms, which is far more complex than I could possibly hope to manage in a simple blog post on a Friday afternoon. However, this post should hopefully provide enough information to establish that it might be happening, and provide enough context to explain this to an irate customer or project manager.

The scenario

The usual explanation for ORA-01555 is that data is being changed whilst it’s being read, and there is insufficient undo to satisfy read-consistent requests. However, ORA-01555 can occur whilst querying a table whilst no other processes are touching the table. This seems counter-intuitive – why would undo be required when the table is static and unchanging? Enter Delayed Block Cleanout…

Delayed block cleanout

Data is stored in blocks of a fixed size, usually 8KB in most databases. Each block consists of a header and some data. The header contains a list of transactions which involved changing records in the block (inserts, updates, or deletes). This is called the Interested Transaction List (ITL) and can be seen by dumping out the contents of a database block.

Ordinarily, when writing small amounts of data, the database is able to keep the block headers up-to-date, so that subsequent processes know exactly when a particular change was committed.

However, when loading a large volume of data (typically more than 10% of the SGA size), the database will continue to write blocks to disk whilst the operation is ongoing. At the time the data is written to disk, the ITL in the blocks show an active transaction (because the data was not committed at the time it was was written). Upon committing, rather than going back over all of the written blocks and updating the ITLs (which would make the commit VERY slow), the database leaves it to the next process which comes along to “clean-out” the block header. This is delayed block cleanout.

The cleanout process is triggered as soon as a block is read. If a table was loaded with delayed block cleanout, any process reading all of the table at a later time will cause the blocks to be cleaned out. This includes optimiser stats gathering (providing the entire table is read), or a query involving a full table scan. However, if the table is not touched, then in theory, the data could remain un-cleaned for some time.

The cleanout process involves checking active transactions in the ITL. If an active transaction is found in the ITL, it has to be double-checked to see if it really is active. This is done by checking for active and recent transactions in the undo tablespace. If an active transaction is found, the ITL is left alone. If a recently committed transaction is encountered, then the ITL is updated to reflect this, recording the SCN when the transactions was committed. If the transaction cannot be found in the database, then it can be assumed that it was committed a long time ago. Since there is no longer a record of the SCN for the transaction, the best the database can do is estimate an “upper bound”, based on the oldest SCN in the database. So if the oldest transaction in the database was at 1.15pm, we can be sure that any older transactions must have been committed before 1.15pm. The upper-bound SCN corresponding to 1.15pm would be recorded in the ITL.

Read consistency

The database guarantees read consistency for every query. This means that a query which began at 2.30pm will only use records which were current at 2.30pm. If the query takes two hours to finish, and another process changes a record at 4.00pm, the change will not be visible to the long-running query. It is only interested in the way things were at 2.30pm. So each time the query reads a block of data from disk, it checks the ITL to determine whether the data was valid at 2.30pm. By checking each transaction in the ITL, it can determine whether the data in the block is involved in an active transaction (uncommitted), recently committed, or commited a long time ago.

Possible outcomes

For a long running query starting at 2.30pm, each time a block of data is read, one of the following outcomes is possible for each transaction found in the ITL.

  1. If the transaction was committed before 2.30pm, then it can safely assume all the data in the block was current as of 2.30pm, and it can be used by the query.
  2. If the transaction was committed after 2.30pm then the data cannot be used as-is, and the database must retrieve undo data to find out what the data really looked like at 2.30pm.
    1. The records were created after 2.30pm, so are not used in the query
    2. The records were modified since 2.30pm, so are rolled-back to the way they were at 2.30pm
    3. The records were modified since 2.30pm, but the undo records are no longer available. In this case, the query cannot continue and the “classic” ORA-01555 error is raised. This arises from a combination of too many changes being made on the table being queried whilst the query is running, and not enough undo space to keep all the changes to permit rollback for consistent read.
  3. If the transaction is still showing as Active (i.e. uncommitted), then it must determine the actual state of that transaction in the undo tablespace. One of four things may happen depending on what it finds.
    1. It finds the transaction is not yet committed, so the data cannot be used as-is. As per (2) above, it must use undo to rollback to 2.30pm.
    2. It finds the transaction committed after 2.30pm, so the data cannot be used as-is. As per (2) above, it must use undo to rollback to 2.30pm.
    3. It finds the transaction committed before 2.30pm, so the data can be used as is.
    4. It can’t find the transaction at all, meaning it must have been overwritten by more recent transactions. In order to decide if it can be used as-is, it finds the oldest transaction still on record (the “upper bound” described previously), and deduces that it must have committed before this. It now uses this point in time to determine whether the records can be used for the 2.30pm query.
      1. If the oldest transaction committed before 2.30pm, then it can be sure that the data in this block was definitely committed before 2.30pm, so it can be used.
      2. If the oldest transaction committed after 2.30pm, then it has no way of knowing whether the data in this block was committed before or after 2.30pm, so an ORA-01555 error is raised. This arises because the query started before the oldest known transaction, so it can’t determine whether a particular row is valid or not.

So that’s the cause of the ORA-01555. After encountering a block of data which had not been cleaned out, it knows the data was committed, but doesn’t know whether it was committed before or after the query started, so it can’t decide whether to use the data or not, and throws the error.

Criteria and solutions

Based on the above analysis, the criteria for this to occur, and the solutions to prevent it from occurring, should make more sense.

For an ORA-01555 to occur from delayed block cleanout, the following criteria must be met.

  • A large volume of data was loaded and committed, and has not been touched since (no queries, no stats gathering).
  • A long running query begins on the data, which begins reading the uncleaned blocks.
  • Meanwhile, many other smaller transactions occur on other tables in the database.
  • Due to the volume of transactions, the oldest transaction in the undo tablespace occurred after the start of the long-running query.
  • The query reaches a block requiring cleanout, and is unable to determine if it can be used or not, so fails with ORA-01555.

The following steps can be taken to reduce the risk of an ORA-01555 error occurring.

  • Rewrite the long-running query to run more quickly
  • Touch all the data before starting the query (stats gathering or force a full table scan)
  • Reduce the volume of other transactions on the database
  • Increase the amount of undo space

Conclusions

Hopefully this provides a high-level explanation of why a seemingly innocuous query on an otherwise untouched table can result in an ORA-01555. There is a lot more complexity to the delayed block cleanout and read consistency mechanisms, and I would refer to the following two great books for a comprehensive explanation.

Oracle Core: Essential Internals for DBAs and Developers by Jonathan Lewis (Apress 2011)

Expert Oracle Database Architecture: Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions, Second Edition by Thomas Kyte (Apress 2010)

The cost of using an index

When trying to understand why the optimiser might choose not to use an index to read rows from a table, I find the following diagram helpful.

index cost

The axis along the bottom represents the selectivity of the query (0% being no rows selected, 100% being all rows selected), and the axis up the side represents the cost of the chosen operation (more costly = more work to do).

When reading from a table without using an index, the cost of retrieving one row from the table is the same as the cost of retrieving all rows from the table (the red line on the chart). The database has to read the entire table in both cases.

When using an index to read from a table, the cost is proportional to the number of rows retrieved (the green line on the chart).

The point at which they cross over is the “break-even” point; if the selectivity of the query is above this point, expect the optimiser to ignore the index and scan the entire table.

There are a number of factors which determine the placement of these two lines, such as the size of the table, the size of the index, the way in which data is organised within the index, and so on. These are beyond the scope of this (brief) post.