Wednesday, September 17, 2014

Join Factorization transformation

The Join Factorization transformation was introduced in Oracle 11g Release 2 and applies to UNION ALL queries.

In many scenarios the branches in a UNION All query share a common processing, i.e, refer to the same tables. In the current Oracle execution strategy, each branch of a UNION ALL query is evaluated independently, which leads to repetitive processing, including data access and join. The join factorization transformation offers an opportunity to share the common computations across the UNION ALL branches

rajesh@ORA11G> create table t1 as select * from all_objects;

Table created.

rajesh@ORA11G> create table t2 as select * from t1;

Table created.

rajesh@ORA11G> create table t3 as select * from t1;

Table created.

rajesh@ORA11G> create table t4 as select * from t1;

Table created.

rajesh@ORA11G> begin
  2     dbms_stats.gather_table_stats(user,'T1');
  3     dbms_stats.gather_table_stats(user,'T2');
  4     dbms_stats.gather_table_stats(user,'T3');
  5     dbms_stats.gather_table_stats(user,'T4');
  6  end;
  7  /

PL/SQL procedure successfully completed.

rajesh@ORA11G>

So we have four tables loaded with some decent volume of data.

So when a query like this goes for execution, optimizer applies join transformation. And this is visible in explain plan

rajesh@ORA11G> set autotrace traceonly explain statistics
rajesh@ORA11G> select t1.owner, t2.object_name, t3.object_type
  2  from t1, t2, t3
  3  where t1.object_id = t2.object_id
  4  and t2.object_id = t3.object_id
  5  and t1.object_id > 0
  6  union all
  7  select t1.owner, t2.object_name, t4.object_type
  8  from t1, t2, t4
  9  where t1.object_id = t2.object_id
 10  and t2.object_id = t4.object_id
 11  and t1.object_id > 0 ;

169378 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3798759592

---------------------------------------------------------------------------------
| Id  | Operation             | Name               | Rows  | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                    |   169K|  8601K|  2756   (1)|
|*  1 |  HASH JOIN            |                    |   169K|  8601K|  2756   (1)|
|*  2 |   TABLE ACCESS FULL   | T1                 | 84689 |   909K|   338   (1)|
|   3 |   VIEW                | VW_JF_SET$37540893 |   169K|  6781K|  1900   (1)|
|   4 |    UNION-ALL          |                    |       |       |            |
|*  5 |     HASH JOIN         |                    | 84689 |  3638K|   950   (1)|
|*  6 |      TABLE ACCESS FULL| T3                 | 84689 |  1157K|   338   (1)|
|*  7 |      TABLE ACCESS FULL| T2                 | 84689 |  2481K|   338   (1)|
|*  8 |     HASH JOIN         |                    | 84689 |  3638K|   950   (1)|
|*  9 |      TABLE ACCESS FULL| T4                 | 84689 |  1157K|   338   (1)|
|* 10 |      TABLE ACCESS FULL| T2                 | 84689 |  2481K|   338   (1)|
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."OBJECT_ID"="ITEM_1")
   2 - filter("T1"."OBJECT_ID">0)
   5 - access("T2"."OBJECT_ID"="T3"."OBJECT_ID")
   6 - filter("T3"."OBJECT_ID">0)
   7 - filter("T2"."OBJECT_ID">0)
   8 - access("T2"."OBJECT_ID"="T4"."OBJECT_ID")
   9 - filter("T4"."OBJECT_ID">0)
  10 - filter("T2"."OBJECT_ID">0)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       7178  consistent gets
       2419  physical reads
          0  redo size
    5559340  bytes sent via SQL*Net to client
      12918  bytes received via SQL*Net from client
       1131  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     169378  rows processed

rajesh@ORA11G>
rajesh@ORA11G> set autotrace off 

So what happened is optimizer scanned table T1 only once, not twice as provided in union all queries. So given this query

select t1.owner, t2.object_name, t3.object_type
from t1, t2, t3
where t1.object_id = t2.object_id
and t2.object_id = t3.object_id
and t1.object_id > 0
union all
select t1.owner, t2.object_name, t4.object_type
from t1, t2, t4
where t1.object_id = t2.object_id
and t2.object_id = t4.object_id
and t1.object_id > 0 ;

Appliying join factorization, it’s transformed into this. So that table T1 is scanned once not twice.

select t1.owner, t.object_name, t.object_type
from t1, (
select t2.object_name, t3.object_type , t2.object_id
from t2, t3
where t2.object_id = t3.object_id
union all
select t2.object_name, t4.object_type , t2.object_id
from t2, t4
where t2.object_id = t4.object_id ) t
where t1.object_id = t.object_id
and t1.object_id > 0

Now comparing this explain plan with explain plan produced without join factorization

·         Table “T1” got scanned twice.
·         Overall logical IO got increases by 16%
·         Overall query cost increased from 2756 to 3187.

rajesh@ORA11G> alter session set "_optimizer_join_factorization"=false;

Session altered.

rajesh@ORA11G> set autotrace traceonly explain statistics
rajesh@ORA11G> select t1.owner, t2.object_name, t3.object_type
  2  from t1, t2, t3
  3  where t1.object_id = t2.object_id
  4  and t2.object_id = t3.object_id
  5  and t1.object_id > 0
  6  union all
  7  select t1.owner, t2.object_name, t4.object_type
  8  from t1, t2, t4
  9  where t1.object_id = t2.object_id
 10  and t2.object_id = t4.object_id
 11  and t1.object_id > 0 ;

169378 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 4252849160

-------------------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |   169K|  9097K|       |  3187   (1)| 00:00:39 |
|   1 |  UNION-ALL           |      |       |       |       |            |          |
|*  2 |   HASH JOIN          |      | 84689 |  4548K|  2152K|  1594   (1)| 00:00:20 |
|*  3 |    TABLE ACCESS FULL | T3   | 84689 |  1157K|       |   338   (1)| 00:00:05 |
|*  4 |    HASH JOIN         |      | 84689 |  3390K|  1904K|   938   (1)| 00:00:12 |
|*  5 |     TABLE ACCESS FULL| T1   | 84689 |   909K|       |   338   (1)| 00:00:05 |
|*  6 |     TABLE ACCESS FULL| T2   | 84689 |  2481K|       |   338   (1)| 00:00:05 |
|*  7 |   HASH JOIN          |      | 84689 |  4548K|  2152K|  1594   (1)| 00:00:20 |
|*  8 |    TABLE ACCESS FULL | T4   | 84689 |  1157K|       |   338   (1)| 00:00:05 |
|*  9 |    HASH JOIN         |      | 84689 |  3390K|  1904K|   938   (1)| 00:00:12 |
|* 10 |     TABLE ACCESS FULL| T1   | 84689 |   909K|       |   338   (1)| 00:00:05 |
|* 11 |     TABLE ACCESS FULL| T2   | 84689 |  2481K|       |   338   (1)| 00:00:05 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T2"."OBJECT_ID"="T3"."OBJECT_ID")
   3 - filter("T3"."OBJECT_ID">0)
   4 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
   5 - filter("T1"."OBJECT_ID">0)
   6 - filter("T2"."OBJECT_ID">0)
   7 - access("T2"."OBJECT_ID"="T4"."OBJECT_ID")
   8 - filter("T4"."OBJECT_ID">0)
   9 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")
  10 - filter("T1"."OBJECT_ID">0)
  11 - filter("T2"."OBJECT_ID">0)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       8391  consistent gets
       2418  physical reads
          0  redo size
    5559340  bytes sent via SQL*Net to client
      12918  bytes received via SQL*Net from client
       1131  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     169378  rows processed

rajesh@ORA11G>
rajesh@ORA11G> set autotrace off

Join factorization is performed under cost-based transformation framework; this means that we cost the plans with and without join factorization and choose the cheapest plan.


In this case the cost of Join factorization query is 2756, but without join factorization cost is 3187, where join factorization seems to be the cheapest so the optimizer adopts to join factorization transformation

No comments:

Post a Comment