I recently encountered one long-running session and several blocked sessions, all performing the same delete operation.
DELETE FROM SYS.WRI$_OPTSTAT_SYNOPSIS$ WHERE BO# = :B2 AND INTCOL# = :B1;
One session had been running for several hours, and the other sessions were all blocked by a row lock from the first session. Unfortunately, each of these sessions was trying to gather stats on new partitions of tables as part of various ETL processes, so being blocked for several hours was not ideal.
The SYS table holds synopses of column statistics from partitioned tables, which are used to maintain incremental statistics on partitioned tables. For more details on this feature, see here.
The table is a range-hash partitioned table consisting of the following columns :
- BO# NUMBER (partition key)
- GROUP# NUMBER (subpartition key)
- INTCOL# NUMBER
- HASHVALUE NUMBER
In my case, the table contains around 11 billion rows across 1,700 partitions, and up to 32 subpartitions per partition. The total size of the table on disk is around 500GB, and the table has no indexes. The database itself contains around 1,800 partitioned tables, and over 1 million partitions and subpartitions.
The real-time SQL monitoring report from each session showed over 2,000 executions of the DELETE, suggesting it was being run from a PL/SQL FORALL statement. With each execution having to scan a whole partition, we were seeing over 1.5 billion buffer gets over a 4 hour period, which is around 100k buffer gets per second.
The report also showed that all of the sessions had the same values for the bind variables. This explains why all but one were blocked by a row lock held by the one “active” session.
On a hunch, I took these values of the bind variables and matched BO# to the OBJECT_ID in DBA_OBJECTS and INTCOL# to the INTERNAL_COLUMN_ID in DBA_TAB_COLS. Funnily enough, the object ID matched to an unrelated application table which, earlier in the day, had had a column dropped, and the column ID was one more than the remaining number of columns in the table. So it seems that every single stats gathering operation was trying to clean out the same “orphaned” synopses, and would not finish gathering stats until this operation was done.
A little peek inside the DBMS_STATS_INTERNAL package confirmed my suspicions – when gathering table stats, it fetches a list of all synopses where the table and column IDs are not in the data dictionary, and then runs a FORALL … DELETE.
Fortunately, Oracle Support was able to quickly supply a patch (19450139) which fixed the issue by bypassing the DELETE operation in DBMS_STATS_INTERNAL. I wonder now whether any other process is going to clean up the orphaned synopses, or whether they will simply remain in the table forever.
In conclusion then, I learnt a few things from this :
- The WRI$_OPTSTAT_SYNOPSIS$ table contains table/partition/column-level synopsis data. There is one partition per table, with the table’s partitions distributed randomly across 32 subpartitions, and each column identified by an unindexed numeric column. The volume of data in this table is therefore probably a function of the number of tables, partitions and columns.
- Without the patch in place, dropping columns on one partitioned table can impact the ability to gather stats on unrelated partitioned tables. Every time GATHER_TABLE_STATS was run on a partitioned table, it tried to delete the same rows, creating significant row-lock contention for other sessions.
- The execution counter in the top row of the real-time SQL monitoring report shows the number of times the operation has been called by the FORALL statement.
- Having to read through billions of rows is going to take time, even with fast I/O and a 50GB buffer cache.
- DBMS_STATS_INTERNAL is surprisingly readable, and it is interesting to see how an Oracle patch can consist of just changing a few lines in a package.