Tuesday, January 6, 2015

Scalar Subquery un-nesting Transformation

A scalar subquery is a subquery which can only return a single value (single row, single column) back to the parent block. It can be used both in the WHERE clause of the parent query or right in the SELECT list of the parent query instead of some column name. (In this post I am discussing only the case of a scalar subquery in SELECT clause). Whatever the scalar subquery returns, will be put in the “fake” column in the query result-set. If it returns more than one row, you will get the ORA-01427: single-row subquery returns more than one row error, if it returns no rows for the given lookup, the result will be NULL.
 

rajesh@PDB1> create table t1
  2  as
  3  select *
  4  from all_users; 

Table created. 

rajesh@PDB1>
rajesh@PDB1> create table t2
  2  as
  3  select b.*
  4  from all_objects b ,
  5     all_users a
  6  where b.owner = a.username; 

Table created. 

rajesh@PDB1> set serveroutput off

 
So starting with Oracle 12c, the CBO transformation engine can unnest some types of scalar subqueries and convert those to Outer joins internally


rajesh@PDB1> select /*+ gather_plan_statistics no_unnest(@ssq) */ t1.user_id,
  2     ( select /*+ qb_name(ssq) */ max(created)
  3             from t2 where t1.username = t2.owner) max_created
  4  from t1
  5  where username like 'S%'; 

   USER_ID MAX_CREATED
---------- -----------------------
       105 28-NOV-2014 02:28:46 PM
        90
        87
        78 11-SEP-2014 09:58:01 AM
2147483619
2147483618
2147483617
         8 11-SEP-2014 09:13:44 AM
         0 06-JAN-2015 09:43:48 AM 

9 rows selected. 

rajesh@PDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last') );
 

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
SQL_ID  3892tkr4xayk5, child number 0
-------------------------------------
select /*+ gather_plan_statistics no_unnest(@ssq) */ t1.user_id,  (
select /*+ qb_name(ssq) */ max(created)   from t2 where t1.username =
t2.owner) max_created from t1 where username like 'S%' 

Plan hash value: 367820
 

---------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      9 |       4 |      5 |
|   1 |  SORT AGGREGATE    |      |      9 |      1 |      9 |    7857 |    868 |
|*  2 |   TABLE ACCESS FULL| T2   |      9 |   2090 |  42155 |    7857 |    868 |
|*  3 |  TABLE ACCESS FULL | T1   |      1 |      3 |      9 |       4 |      5 |
--------------------------------------------------------------------------------- 

Predicate Information (identified by operation id):
--------------------------------------------------- 

   2 - filter("T2"."OWNER"=:B1)
   3 - filter("USERNAME" LIKE 'S%')


The above plan indicates that there is some scalar subquery is going on in the SQL statement, from the “Starts” column we can see that outer table T1 is scanned only once but the table T2 is scanned 9 times (once for each row returned by outer table T1), so the scalar subquery behaves like a FILTER loop. Where it executes the subquery once for each row passed to it by the parent query. 

So now let us remove this no_unnest(@ssq)hint and hope for 12c improvement to kick in


rajesh@PDB1> select /*+ gather_plan_statistics */ t1.user_id,
  2     ( select max(created)
  3             from t2 where t1.username = t2.owner) max_created
  4  from t1
  5  where username like 'S%';
 
   USER_ID MAX_CREATED
---------- -----------------------
         0 06-JAN-2015 09:43:48 AM
2147483618
         8 11-SEP-2014 09:13:44 AM
2147483619
        90
        87
       105 28-NOV-2014 02:28:46 PM
2147483617
        78 11-SEP-2014 09:58:01 AM 

9 rows selected. 

rajesh@PDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last') ); 

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------
SQL_ID  7p5b151cb0tg8, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ t1.user_id,  ( select
max(created)   from t2 where t1.username = t2.owner)
max_created from t1 where username like 'S%' 

Plan hash value: 577572187 

----------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      9 |     876 |       |       |          |
|   1 |  HASH GROUP BY      |      |      1 |      5 |      9 |     876 |  1034K|  1034K| 1232K (0)|
|*  2 |   HASH JOIN OUTER   |      |      1 |    471 |  42160 |     876 |  1214K|  1214K|  987K (0)|
|*  3 |    TABLE ACCESS FULL| T1   |      1 |      3 |      9 |       3 |       |       |          |
|*  4 |    TABLE ACCESS FULL| T2   |      1 |   4367 |  42155 |     873 |       |       |          |
---------------------------------------------------------------------------------------------------- 

Predicate Information (identified by operation id):
--------------------------------------------------- 

   2 - access("T1"."USERNAME"="T2"."OWNER")
   3 - filter("USERNAME" LIKE 'S%')
   4 - filter("T2"."OWNER" LIKE 'S%')
 
Both tables were scanned only once and the total number of buffer gets dropped from 7857 to 876. In the real life case, flattening and de-compartmentalizing a big query will open up more join orders and data access optimizations for the CBO, resulting in a better plans. 
Whenever the scalar subquery transformation is done, the following entry shows up in the optimizer 10053 trace.
 
Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "T1"."USER_ID" "USER_ID",
  MAX("T2"."CREATED") "MAX_CREATED"
FROM "RAJESH"."T2" "T2",
  "RAJESH"."T1" "T1"
WHERE "T1"."USERNAME" LIKE 'S%'
AND "T1"."USERNAME"="T2"."OWNER"(+)
AND "T2"."OWNER"(+) LIKE 'S%'
GROUP BY "T2"."OWNER",
  "T1".ROWID,
  "T1"."USER_ID" 
 
So over all the auto trace results for this transformation are below.  
 
rajesh@PDB1> set autotrace traceonly explain statistics
rajesh@PDB1> select /*+ no_unnest(@ssq) */ t1.user_id,
  2     ( select /*+ qb_name(ssq) */ max(created)
  3             from t2 where t1.username = t2.owner) max_created
  4  from t1
  5  where username like 'S%'; 
 
9 rows selected. 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 367820
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     3 |    42 |   331   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |    13 |            |          |
|*  2 |   TABLE ACCESS FULL| T2   |  2090 | 27170 |   243   (0)| 00:00:01 |
|*  3 |  TABLE ACCESS FULL | T1   |     3 |    42 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------- 
 
Predicate Information (identified by operation id):
--------------------------------------------------- 
   2 - filter("T2"."OWNER"=:B1)
   3 - filter("USERNAME" LIKE 'S%')
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       7861  consistent gets
          0  physical reads
          0  redo size
        770  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          9  rows processed
 
rajesh@PDB1> select t1.user_id,
  2     ( select  max(created)
  3             from t2 where t1.username = t2.owner) max_created
  4  from t1
  5  where username like 'S%'; 
 
9 rows selected. 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 577572187 
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     5 |   195 |   248   (1)| 00:00:01 |
|   1 |  HASH GROUP BY      |      |     5 |   195 |   248   (1)| 00:00:01 |
|*  2 |   HASH JOIN OUTER   |      |   471 | 18369 |   247   (1)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |     3 |    78 |     3   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T2   |  4367 | 56771 |   244   (1)| 00:00:01 |
---------------------------------------------------------------------------- 
 
Predicate Information (identified by operation id):
--------------------------------------------------- 
   2 - access("T1"."USERNAME"="T2"."OWNER"(+))
   3 - filter("USERNAME" LIKE 'S%')
   4 - filter("T2"."OWNER"(+) LIKE 'S%') 
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        876  consistent gets
          0  physical reads
          0  redo size
        771  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          9  rows processed 
rajesh@PDB1> set autotrace off
rajesh@PDB1>
 
 
Note that when I use COUNT(*) instead of MAX(created) in the above query, I don’t see Outer join + group by combo. With the following message in trace file. 
 
Subquery Unnest (SU)
********************
SU: bypassed: Scalar subquery has null-mutating select item.
SJC: Considering set-join conversion in query block SEL$1 (#0)
************************* 
 
This transformation in 12c database is controlled by parameter “_optimizer_unnest_scalar_sq

3 comments:

  1. Note to self.
    In case of count (*) we don’t see Scalar sub-queries un-nesting happen.

    rajesh@ORA12C> select t1.user_id,
    2 ( select count(*)
    3 from t2
    4 where t1.username = t2.owner) max_created
    5 from t1
    6 where username like 'S%'
    7 /

    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 367820

    ---------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    ---------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 3 | 39 | 356 (0)| 00:00:01 |
    | 1 | SORT AGGREGATE | | 1 | 5 | | |
    |* 2 | TABLE ACCESS FULL| T2 | 1707 | 8535 | 246 (0)| 00:00:01 |
    |* 3 | TABLE ACCESS FULL | T1 | 3 | 39 | 3 (0)| 00:00:01 |
    ---------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - filter("T2"."OWNER"=:B1)
    3 - filter("USERNAME" LIKE 'S%')



    Optimizer cannot transform the given query

    select t1.user_id,
    ( select count(*)
    from t2
    where t1.username = t2.owner) max_created
    from t1
    where username like 'S%';

    Into like this

    SELECT "T1"."USER_ID" "USER_ID",
    count(*) "MAX_CREATED"
    FROM "RAJESH"."T2" "T2",
    "RAJESH"."T1" "T1"
    WHERE "T1"."USERNAME" LIKE 'S%'
    AND "T1"."USERNAME"="T2"."OWNER"(+)
    AND "T2"."OWNER"(+) LIKE 'S%'
    GROUP BY "T2"."OWNER",
    "T1".ROWID,
    "T1"."USER_ID"

    a) Do an outer join between T1 and T2
    b) Perform count(*) on top of the result set. Which is very different from just counting the rows in T2.

    We cannot take an optimization performed for an aggregate function MAX() and apply the same to COUNT(*) (since both max() and count(*) belong to aggregates, they are different in functionality )

    So in short the rule to perform scalar subquery unnesting for COUNT (*) queries should be like this.

    a) Check if T2 has any not null column
    1) if yes then kick-in scalar subquery unnesting (perform outer join between T1 and T2)
    2) replace count(*) in the query with count(that_not_nullable_column)

    b) if T2 dont have any not null column then
    1) dont perform scalar subquery unnesting, (since doing that would produce incorrect results). So have it as such.

    But as of 12.1.0.2 the optimizer don't have the above steps coded for COUNT(*) aggregates in scalar subquery un-nesting.

    ReplyDelete
  2. Is it good to set the value of _optimizer_unnest_scalar_sq to FALSE or leave the default value TRUE

    ReplyDelete
    Replies
    1. @Anonymous:

      don't set any hidden parameters ( parameter start with _) without Oracle support guidance.


      Delete