Wednesday, January 26, 2011

Full Partition Wise Joins

A full partition-wise join divides a large join into smaller joins between a pair of partitions from the two joined tables. To use this feature, you must equipartition both tables on their join keys. For example, consider a large join between a sales table and a customer table on the column cust_id, the query " find the sales details for all customers who are effective ( having cust_date in Q3 2005 ) in Q3 2005 " is a typical example of a SQL Statement performing such a join is

SELECT c.* , s.*
from customers c, sales s
where c.cust_id = s.cust_id
and c.cust_date between  to_date('01-jul-2005','dd-mon-yyyy')
and to_date('30-sep-2005','dd-mon-yyyy');


If we equipartition both customers and sales table based on this join keys like below.

rajesh@10GR2> create table customers
  2  (
  3     CUST_ID                                                 ,
  4     CUST_FIRST_NAME                                 ,
  5     CUST_LAST_NAME                                  ,
  6     CUST_GENDER                                             ,
  7     CUST_DATE                                               ,
  8     CUST_POSTAL_CODE                                ,
  9     CUST_CITY                                               ,
 10     CUST_TOTAL
 11  )
 12  partition by range(CUST_DATE)
 13  subpartition by hash(cust_id)
 14  subpartition template
 15  (
 16     subpartition sp1,
 17     subpartition sp2,
 18     subpartition sp3,
 19     subpartition sp4
 20  )
 21  (
 22     partition p_q1_2005  values less than ( to_date('01-apr-2005','dd-mon-yyyy') ),
 23     partition p_q2_2005  values less than ( to_date('01-jul-2005','dd-mon-yyyy') ),
 24     partition p_q3_2005  values less than ( to_date('01-oct-2005','dd-mon-yyyy') ),
 25     partition p_q4_2005  values less than ( to_date('01-jan-2006','dd-mon-yyyy') ),
 26     partition p_q1_2006  values less than ( to_date('01-apr-2006','dd-mon-yyyy') ),
 27     partition p_q2_2006  values less than ( to_date('01-jul-2006','dd-mon-yyyy') ),
 28     partition p_q3_2006  values less than ( to_date('01-oct-2006','dd-mon-yyyy') ),
 29     partition p_q4_2006  values less than ( to_date('01-jan-2007','dd-mon-yyyy') ),
 30     partition p_q1_2007  values less than ( to_date('01-apr-2007','dd-mon-yyyy') ),
 31     partition p_q2_2007  values less than ( to_date('01-jul-2007','dd-mon-yyyy') ),
 32     partition p_q3_2007  values less than ( to_date('01-oct-2007','dd-mon-yyyy') ),
 33     partition p_q4_2007  values less than ( to_date('01-jan-2008','dd-mon-yyyy') ),
 34     partition p_q1_2008  values less than ( to_date('01-apr-2008','dd-mon-yyyy') ),
 35     partition p_q2_2008  values less than ( to_date('01-jul-2008','dd-mon-yyyy') ),
 36     partition p_q3_2008  values less than ( to_date('01-oct-2008','dd-mon-yyyy') ),
 37     partition p_q4_2008  values less than ( to_date('01-jan-2009','dd-mon-yyyy') ),
 38     partition p_q1_2009  values less than ( to_date('01-apr-2009','dd-mon-yyyy') ),
 39     partition p_q2_2009  values less than ( to_date('01-jul-2009','dd-mon-yyyy') ),
 40     partition p_q3_2009  values less than ( to_date('01-oct-2009','dd-mon-yyyy') ),
 41     partition p_q4_2009  values less than ( to_date('01-jan-2010','dd-mon-yyyy') ),
 42     partition p_q1_2010  values less than ( to_date('01-apr-2010','dd-mon-yyyy') ),
 43     partition p_q2_2010  values less than ( to_date('01-jul-2010','dd-mon-yyyy') ),
 44     partition p_q3_2010  values less than ( to_date('01-oct-2010','dd-mon-yyyy') ),
 45     partition p_q4_2010  values less than ( to_date('01-jan-2011','dd-mon-yyyy') ),
 46     partition p_q1_2011  values less than ( to_date('01-apr-2011','dd-mon-yyyy') ),
 47     partition p_q2_2011  values less than ( to_date('01-jul-2011','dd-mon-yyyy') ),
 48     partition p_q3_2011  values less than ( to_date('01-oct-2011','dd-mon-yyyy') ),
 49     partition p_q4_2011  values less than ( to_date('01-jan-2012','dd-mon-yyyy') )
 50  )
 51  nologging
 52  as
 53  select     CUST_ID                 ,
 54             CUST_FIRST_NAME ,
 55             CUST_LAST_NAME  ,
 56             CUST_GENDER             ,
 57             CUST_DATE               ,
 58             CUST_POSTAL_CODE ,
 59             CUST_CITY               ,
 60             CUST_TOTAL
 61  from customers_data;

Table created.

Elapsed: 00:00:16.24
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> exec dbms_stats.gather_table_stats(user,'CUSTOMERS');

PL/SQL procedure successfully completed.

Elapsed: 00:00:08.64
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> create table sales
  2  (
  3     SALES_ID                ,
  4     CUST_ID                 ,
  5     QUANTITY_SOLD   ,
  6     AMOUNT_SOLD             ,
  7     CUST_DATE
  8  )
  9  partition by range(CUST_DATE)
 10  subpartition by hash(cust_id)
 11  subpartition template
 12  (
 13     subpartition sp1,
 14     subpartition sp2,
 15     subpartition sp3,
 16     subpartition sp4
 17  )
 18  (
 19     partition p_q1_2005  values less than ( to_date('01-apr-2005','dd-mon-yyyy') ),
 20     partition p_q2_2005  values less than ( to_date('01-jul-2005','dd-mon-yyyy') ),
 21     partition p_q3_2005  values less than ( to_date('01-oct-2005','dd-mon-yyyy') ),
 22     partition p_q4_2005  values less than ( to_date('01-jan-2006','dd-mon-yyyy') ),
 23     partition p_q1_2006  values less than ( to_date('01-apr-2006','dd-mon-yyyy') ),
 24     partition p_q2_2006  values less than ( to_date('01-jul-2006','dd-mon-yyyy') ),
 25     partition p_q3_2006  values less than ( to_date('01-oct-2006','dd-mon-yyyy') ),
 26     partition p_q4_2006  values less than ( to_date('01-jan-2007','dd-mon-yyyy') ),
 27     partition p_q1_2007  values less than ( to_date('01-apr-2007','dd-mon-yyyy') ),
 28     partition p_q2_2007  values less than ( to_date('01-jul-2007','dd-mon-yyyy') ),
 29     partition p_q3_2007  values less than ( to_date('01-oct-2007','dd-mon-yyyy') ),
 30     partition p_q4_2007  values less than ( to_date('01-jan-2008','dd-mon-yyyy') ),
 31     partition p_q1_2008  values less than ( to_date('01-apr-2008','dd-mon-yyyy') ),
 32     partition p_q2_2008  values less than ( to_date('01-jul-2008','dd-mon-yyyy') ),
 33     partition p_q3_2008  values less than ( to_date('01-oct-2008','dd-mon-yyyy') ),
 34     partition p_q4_2008  values less than ( to_date('01-jan-2009','dd-mon-yyyy') ),
 35     partition p_q1_2009  values less than ( to_date('01-apr-2009','dd-mon-yyyy') ),
 36     partition p_q2_2009  values less than ( to_date('01-jul-2009','dd-mon-yyyy') ),
 37     partition p_q3_2009  values less than ( to_date('01-oct-2009','dd-mon-yyyy') ),
 38     partition p_q4_2009  values less than ( to_date('01-jan-2010','dd-mon-yyyy') ),
 39     partition p_q1_2010  values less than ( to_date('01-apr-2010','dd-mon-yyyy') ),
 40     partition p_q2_2010  values less than ( to_date('01-jul-2010','dd-mon-yyyy') ),
 41     partition p_q3_2010  values less than ( to_date('01-oct-2010','dd-mon-yyyy') ),
 42     partition p_q4_2010  values less than ( to_date('01-jan-2011','dd-mon-yyyy') ),
 43     partition p_q1_2011  values less than ( to_date('01-apr-2011','dd-mon-yyyy') ),
 44     partition p_q2_2011  values less than ( to_date('01-jul-2011','dd-mon-yyyy') ),
 45     partition p_q3_2011  values less than ( to_date('01-oct-2011','dd-mon-yyyy') ),
 46     partition p_q4_2011  values less than ( to_date('01-jan-2012','dd-mon-yyyy') )
 47  )
 48  nologging
 49  as
 50  select     SALES_ID                ,
 51             CUST_ID                 ,
 52             QUANTITY_SOLD   ,
 53             AMOUNT_SOLD             ,
 54             CUST_DATE
 55  from sales_data;

Table created.

Elapsed: 00:07:03.90
rajesh@10GR2> alter table customers add constraint cust_pk primary key(cust_id);

Table altered.

Elapsed: 00:00:11.01
rajesh@10GR2> alter table sales add constraint sales_fk foreign key(cust_id) references customers;

Table altered.

Elapsed: 00:00:19.17
rajesh@10GR2> select count(*) from sales;

  COUNT(*)
----------
  11863049

Elapsed: 00:01:12.48
rajesh@10GR2> select count(*) from customers;

  COUNT(*)
----------
   1026072

Elapsed: 00:00:00.71
rajesh@10GR2>

rajesh@10GR2> SELECT c.* , s.*
  2  from customers c, sales s
  3  where c.cust_id = s.cust_id
  4  and c.cust_date between  to_date('01-jul-2005','dd-mon-yyyy')
  5  and to_date('30-sep-2005','dd-mon-yyyy');

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 3704956898

-----------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Cost (%CPU)| Pstart| Pstop |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |           |    11M| 39198   (2)|       |       |
|   1 |  PARTITION HASH ALL      |           |    11M| 39198   (2)|     1 |     4 |
|*  2 |   HASH JOIN              |           |    11M| 39198   (2)|       |       |
|   3 |    PARTITION RANGE SINGLE|           |   929K|  1875   (2)|     3 |     3 |
|*  4 |     TABLE ACCESS FULL    | CUSTOMERS |   929K|  1875   (2)|     9 |    12 |
|   5 |    PARTITION RANGE ALL   |           |    11M| 12125   (2)|     1 |    28 |
|   6 |     TABLE ACCESS FULL    | SALES     |    11M| 12125   (2)|     1 |   112 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C"."CUST_ID"="S"."CUST_ID")
   4 - filter("C"."CUST_DATE"<=TO_DATE('2005-09-30 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

rajesh@10GR2>

  • All HASH Partition in SALES table is scanned (this is shown in plan as Pstart = 1 and Pstop = 112 at Id=6)
  • Only 4 Hash Partition in CUSTOMERS table corresponding to 2005, Q3 is scanned (this is shown in plan as Pstart = 9 and Pstop = 12 at Id=4)
  • the Access path PARTITION HASH ALL from plan says, only the 4 Hash Partition from CUSTOMERS table is joined (Full Partition-Wise Joins) with 4 Hash Partition from sales table (this is shown in plan as Pstart = 1 and Pstop = 4 at Id=1)
And, this is what Tkprof shows,


SELECT c.* , s.*
from customers c, sales s
where c.cust_id = s.cust_id
and c.cust_date between  to_date('01-jul-2005','dd-mon-yyyy')
and to_date('30-sep-2005','dd-mon-yyyy')

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch    68435     52.31      83.57      47086     130237          0    10265035
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    68437     52.31      83.59      47086     130237          0    10265035

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 100 

Rows     Row Source Operation
-------  ---------------------------------------------------
10265035  PARTITION HASH ALL PARTITION: 1 4 (cr=130237 pr=47086 pw=744 time=64025850 us)
10265035   HASH JOIN  (cr=130237 pr=47086 pw=744 time=54649725 us)
 884070    PARTITION RANGE SINGLE PARTITION: 3 3 (cr=8054 pr=0 pw=0 time=1768764 us)
 884070     TABLE ACCESS FULL CUSTOMERS PARTITION: 9 12 (cr=8054 pr=0 pw=0 time=884626 us)
11863049    PARTITION RANGE ALL PARTITION: 1 28 (cr=122183 pr=46342 pw=0 time=71307400 us)
11863049     TABLE ACCESS FULL SALES PARTITION: 1 112 (cr=122183 pr=46342 pw=0 time=5957447


( Awesome, it processed 10M records in 83 secs )

Now, just changing to query to  "find the sales details for all customers who made Transactions in Q3 2005 ", causes the plan to change like this.


rajesh@10GR2> SELECT c.* , s.*
  2  from customers c, sales s
  3  where c.cust_id = s.cust_id
  4  and   c.cust_date = s.cust_date
  5  and c.cust_date between  to_date('01-jul-2005','dd-mon-yyyy')
  6  and to_date('30-sep-2005','dd-mon-yyyy')
  7  /
Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 3606662981

-----------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Cost (%CPU)| Pstart| Pstop |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |           |   801K| 34547   (2)|       |       |
|   1 |  PARTITION HASH ALL      |           |   801K| 34547   (2)|     1 |     4 ||*  2 |   HASH JOIN              |           |   801K| 34547   (2)|       |       |
|   3 |    PARTITION RANGE SINGLE|           |   929K|  1875   (2)|     3 |     3 |
|*  4 |     TABLE ACCESS FULL    | CUSTOMERS |   929K|  1875   (2)|     9 |    12 |
|   5 |    PARTITION RANGE SINGLE|           |    10M| 10510   (3)|     3 |     3 |
|*  6 |     TABLE ACCESS FULL    | SALES     |    10M| 10510   (3)|     9 |    12 |
-----------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("C"."CUST_ID"="S"."CUST_ID" AND "C"."CUST_DATE"="S"."CUST_DATE")
   4 - filter("C"."CUST_DATE"<=TO_DATE('2005-09-30 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))
   6 - filter("S"."CUST_DATE"<=TO_DATE('2005-09-30 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

rajesh@10GR2>

And the Tkprof show me,

SELECT c.* , s.*
from customers c, sales s
where c.cust_id = s.cust_id
and   c.cust_date = s.cust_date
and c.cust_date between  to_date('01-jul-2005','dd-mon-yyyy')
and to_date('30-sep-2005','dd-mon-yyyy')

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     7205     14.59      63.64      46889      61643          0     1080530
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     7207     14.59      63.65      46889      61643          0     1080530

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 100 

Rows     Row Source Operation
-------  ---------------------------------------------------
1080530  PARTITION HASH ALL PARTITION: 1 4 (cr=61643 pr=46889 pw=775 time=13521195 us)
1080530   HASH JOIN  (cr=61643 pr=46889 pw=775 time=13366665 us)
 884070    PARTITION RANGE SINGLE PARTITION: 3 3 (cr=7992 pr=0 pw=0 time=1768864 us)
 884070     TABLE ACCESS FULL CUSTOMERS PARTITION: 9 12 (cr=7992 pr=0 pw=0 time=884703 us)
10265035    PARTITION RANGE SINGLE PARTITION: 3 3 (cr=53651 pr=46114 pw=0 time=61625325 us)
10265035     TABLE ACCESS FULL SALES PARTITION: 9 12 (cr=53651 pr=46114 pw=0 time=41095102

Full partition-wise join will takes place only if you, equipartition both tables on their join keys. So, what will happen if you don't equipartition both tables.

scott@10GR2> create table customers
  2  (
  3     CUST_ID                                                 ,
  4     CUST_FIRST_NAME                                 ,
  5     CUST_LAST_NAME                                  ,
  6     CUST_GENDER                                             ,
  7     CUST_DATE                                               ,
  8     CUST_POSTAL_CODE                                ,
  9     CUST_CITY                                               ,
 10     CUST_TOTAL
 11  )
 12  partition by hash(cust_id) partitions 8
 13  nologging
 14  as
 15  select     CUST_ID                 ,
 16             CUST_FIRST_NAME ,
 17             CUST_LAST_NAME  ,
 18             CUST_GENDER             ,
 19             CUST_DATE               ,
 20             CUST_POSTAL_CODE ,
 21             CUST_CITY               ,
 22             CUST_TOTAL
 23  from customers_data
 24  where 1=0;

Table created.

Elapsed: 00:00:00.25
scott@10GR2>
scott@10GR2>
scott@10GR2> create table sales
  2  (
  3     SALES_ID                ,
  4     CUST_ID                 ,
  5     QUANTITY_SOLD   ,
  6     AMOUNT_SOLD             ,
  7     CUST_DATE
  8  )
  9  partition by hash(SALES_ID) partitions 8
 10  nologging
 11  as
 12  select     SALES_ID                ,
 13             CUST_ID                 ,
 14             QUANTITY_SOLD   ,
 15             AMOUNT_SOLD             ,
 16             CUST_DATE
 17  from sales_data
 18  where 1=0;

Table created.

Elapsed: 00:00:00.21
scott@10GR2>

Now, Customer table is HASH partitioned on customer_id and Sales table is HASH partitioned on sales_id columns and now the Plan changes like below.

scott@10GR2>
scott@10GR2> begin
  2     dbms_stats.set_table_stats(
  3             ownname=>user,
  4             tabname=>'CUSTOMERS',
  5             numrows => 1026072,
  6             numblks => 1000000 );
  7
  8     dbms_stats.set_table_stats(
  9             ownname=>user,
 10             tabname=>'SALES',
 11             numrows => 11863049,
 12             numblks => 1000000 );
 13  end;
 14  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.10
scott@10GR2>
scott@10GR2> set autotrace traceonly explain;
scott@10GR2>
scott@10GR2> SELECT c.* , s.*
  2  from customers c, sales s
  3  where c.cust_id = s.cust_id
  4  and c.cust_date between  to_date('01-jul-2005','dd-mon-yyyy')
  5  and to_date('30-sep-2005','dd-mon-yyyy');

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 742433711

------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows  | Cost (%CPU)| Pstart| Pstop |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           | 29658 |   438K  (1)|       |       |
|*  1 |  HASH JOIN          |           | 29658 |   438K  (1)|       |       |
|   2 |   PARTITION HASH ALL|           |  2565 |   219K  (1)|     1 |     8 |
|*  3 |    TABLE ACCESS FULL| CUSTOMERS |  2565 |   219K  (1)|     1 |     8 |
|   4 |   PARTITION HASH ALL|           |    11M|   219K  (1)|     1 |     8 |
|   5 |    TABLE ACCESS FULL| SALES     |    11M|   219K  (1)|     1 |     8 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("C"."CUST_ID"="S"."CUST_ID")
   3 - filter("C"."CUST_DATE">=TO_DATE('2005-07-01 00:00:00', 'yyyy-mm-dd hh24:mi:ss')
              AND "C"."CUST_DATE"<=TO_DATE('2005-09-30 00:00:00', 'yyyy-mm-dd hh24:mi:ss'))

scott@10GR2>

You can see the Full partitioned wise join doesn't happen now ( PARTITION HASH ALL access path missing above HASH JOIN in plans ).

These kinds of large join is typical in data warehousing environments, when data-model is designed properly Full Partition Wise Joins will really be help full.

No comments:

Post a Comment