ALL or ANY in SQL

The other day I was studying a SQL statement that was performing poorly, when I noticed a peculiar syntax that I have not come across before :

SELECT COUNT(*)
FROM   customer
WHERE  annual_spend > ALL (
  SELECT spend_threshold
  FROM   promotion
  WHERE  promo_name = 'Summer 2015'
);

The only time I’d ever come across the keyword ALL in SQL was in UNION ALL. However, a quick check of the manual led me to the Group Comparison Conditions :

http://docs.oracle.com/database/121/SQLRF/conditions002.htm#sthref1005

There are three possible comparisons – ALL, ANY and SOME (although ANY and SOME are identical). They allow you to compare an expression with any or all members of a list or subquery, and are defined as follows :

  • ANY, SOME – Compares a value to each value in a list or returned by a query. Must be preceded by =, !=, >, <, =. Can be followed by any expression or subquery that returns one or more values.
  • ALL – Compares a value to every value in a list or returned by a query. Must be preceded by =, !=, >, <, =. Can be followed by any expression or subquery that returns one or more values.

So how might these be useful? I’ll look at three possible use cases, and for each, consider the alternatives and look at potential performance implications. For the examples given, I’m using the OE and HR sample schema supplied with Oracle 12.1.0.2.

Equality

Using the “equality test” (=) we are testing whether a value matches one or more values in the subquery or expression list.

First, using the ALL comparison in the OE schema :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id = ALL (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
(null)

This first query returns no results. Given that the subquery returns several different product IDs, then it is logically impossible for a single value to match all the results.

Next, using the ANY comparison :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id = ANY (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
         2235

This query returns the quantity sold of all products in category 11. However, the same result can be obtained using the more familiar IN-clause.

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id IN (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
         2235

Let’s look at the execution plans of the ANY and the IN queries. For the sample data in OE, both queries return the same execution plan :

Plan hash value: 2275080078
 
-------------------------------------------------------------------------------------------
| Id  | Operation           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                     |     1 |    15 |     8   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE     |                     |     1 |    15 |            |          |
|*  2 |   HASH JOIN         |                     |    58 |   870 |     8   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| PRODUCT_INFORMATION |    16 |   112 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| ORDER_ITEMS         |   665 |  5320 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("PRODUCT_ID"="PRODUCT_ID")
   3 - filter("CATEGORY_ID"=11)

So, it seems that with the equality condition, ALL and ANY offer no benefits over the traditional IN-clause.

Inequality

Using the “inequality test”, which can be expressed as !=, ^= or , we are testing whether a value does not match one or more values in the subquery or expression list.

First, using the ALL comparison :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id != ALL (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
        27852

This first query returns the quantity of all products not in category 11. The same could be achieved with the more familiar NOT IN clause :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id NOT IN (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
        27852

Next, using the ANY comparison :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id != ANY (
  SELECT product_id
  FROM   product_information
  WHERE  category_id = 11
);

SUM(QUANTITY)
-------------
        30087

This query returns the quantity of products which don’t match one of the products in category 11. However, this also happens to be the total quantity of all products, since products which don’t match one product in category 11 match another product :

SELECT SUM(quantity)
FROM   order_items;

SUM(QUANTITY)
-------------
        30087

So arguably, combining != with ANY is not particularly useful.

Returning to != ALL, how does the execution plan compare with NOT IN? For the sample data in OE, both queries return the same execution plan :

Plan hash value: 712943640
 
---------------------------------------------------------------------------------------------
| Id  | Operation             | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                     |     1 |    15 |     8   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE       |                     |     1 |    15 |            |          |
|*  2 |   HASH JOIN RIGHT ANTI|                     |   665 |  9975 |     8   (0)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL  | PRODUCT_INFORMATION |    16 |   112 |     5   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | ORDER_ITEMS         |   665 |  5320 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("PRODUCT_ID"="PRODUCT_ID")
   3 - filter("CATEGORY_ID"=11)

So, it seems that with the inequality condition, ALL and ANY offer no benefits over the traditional NOT IN-clause.

Greater-than, less-than

Finally, consider the greater-than, less-than, greater-than-or-equals, and less-than-or-equals operators. Using greater-than as an example, we are testing whether a value exceeds one or more values in the subquery or expression list.

First, using the ALL comparison in the HR schema :

SELECT COUNT(*)
FROM   employees
WHERE  salary > ALL (
  SELECT salary
  FROM   employees
  WHERE  department_id = 50
);

  COUNT(*)
----------
        31

This first query returns employees with a salary greater than all the employees in department 50. However, the same can be achieved using the MAX aggregate function to obtain the highest salary in department 50 :

SELECT COUNT(*)
FROM   employees
WHERE  salary > (
  SELECT MAX(salary)
  FROM   employees
  WHERE  department_id = 50
);

  COUNT(*)
----------
        31

Let’s see how the execution plans compare for these two approaches.

First, using the ALL comparison :

Plan hash value: 1561682600
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |    11 |     8  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE      |           |     1 |    11 |            |          |
|   2 |   MERGE JOIN ANTI NA |           |     1 |    11 |     8  (25)| 00:00:01 |
|   3 |    SORT JOIN         |           |   107 |   428 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMPLOYEES |   107 |   428 |     3   (0)| 00:00:01 |
|*  5 |    SORT UNIQUE       |           |    45 |   315 |     4  (25)| 00:00:01 |
|*  6 |     TABLE ACCESS FULL| EMPLOYEES |    45 |   315 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access("SALARY"<="SALARY")
       filter("SALARY"<="SALARY")
   6 - filter("DEPARTMENT_ID"=50)

Next, using the MAX function :

Plan hash value: 1008057839
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |     4 |     6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE      |           |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | EMPLOYEES |     5 |    20 |     3   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |           |     1 |     7 |            |          |
|*  4 |     TABLE ACCESS FULL| EMPLOYEES |    45 |   315 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("SALARY"> (SELECT MAX("SALARY") FROM "EMPLOYEES" 
              "EMPLOYEES" WHERE "DEPARTMENT_ID"=50))
   4 - filter("DEPARTMENT_ID"=50)

The query using MAX appears to be more efficient. It computes the highest value from department 50, and then compares this with the employees table. In contrast, the query using ALL has to perform a comparison with every employee in department 50. This additional overhead could be significant for large datasets.

With the ANY comparison, we are looking for employees whose salary is greater than ANY employees in department 50.

SELECT COUNT(*)
FROM   employees
WHERE  salary > ANY (
  SELECT salary
  FROM   employees
  WHERE  department_id = 50
);

  COUNT(*)
----------
       106

This can similarly be optimised by using the MIN aggregate function.

SELECT COUNT(*)
FROM   employees
WHERE  salary > (
  SELECT MIN(salary)
  FROM   employees
  WHERE  department_id = 50
);

  COUNT(*)
----------
       106

A similar improvement can be seen in the execution plans.

Using the ANY comparison :

Plan hash value: 67141836
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |    11 |     8  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE      |           |     1 |    11 |            |          |
|   2 |   MERGE JOIN SEMI    |           |   105 |  1155 |     8  (25)| 00:00:01 |
|   3 |    SORT JOIN         |           |   107 |   428 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMPLOYEES |   107 |   428 |     3   (0)| 00:00:01 |
|*  5 |    SORT UNIQUE       |           |    45 |   315 |     4  (25)| 00:00:01 |
|*  6 |     TABLE ACCESS FULL| EMPLOYEES |    45 |   315 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - access(INTERNAL_FUNCTION("SALARY")>INTERNAL_FUNCTION("SALARY"))
       filter(INTERNAL_FUNCTION("SALARY")>INTERNAL_FUNCTION("SALARY"))
   6 - filter("DEPARTMENT_ID"=50)

Using the MIN aggregate function :

Plan hash value: 1008057839
 
----------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |     1 |     4 |     6   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE      |           |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | EMPLOYEES |     5 |    20 |     3   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |           |     1 |     7 |            |          |
|*  4 |     TABLE ACCESS FULL| EMPLOYEES |    45 |   315 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("SALARY"> (SELECT MIN("SALARY") FROM "EMPLOYEES" 
              "EMPLOYEES" WHERE "DEPARTMENT_ID"=50))
   4 - filter("DEPARTMENT_ID"=50)

Again, the use of an aggregate function appears to be more efficient by reducing the number of comparisons which have to be done.

Expression lists

The previous examples have focussed on comparisons with subqueries. The comparison conditions can also be used with expression lists, e.g.

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id = ANY (1820,1910,2255);

SUM(QUANTITY)
-------------
          103

However, this can also be replaced by the more familiar IN clause :

SELECT SUM(quantity)
FROM   order_items
WHERE  product_id IN (1820,1910,2255);

SUM(QUANTITY)
-------------
          103

The same applies to “!= ALL”, which can be replaced with NOT IN.

Again, there seems little benefit in using ALL or ANY compared with IN and NOT IN.

Conclusion

The group comparison conditions ALL or ANY can be used when comparing rows with the results of a subquery or expression list. However, the same can be achieved using more familiar constructs such as IN, NOT IN, MIN and MA, often more efficiently.

The following table summarises the alternatives to ALL or ANY :

Condition ALL ANY
Equality Illogical Use IN clause
Inequality Use NOT IN clause Illogical
Greater/Less Use MIN/MAX Use MIN/MAX

If you know of any examples where ALL or ANY are beneficial, please do share in the comments below!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s