Friday, July 23, 2010

Multi column Statistics - 11g New Features

Starting with 11gR1 Oracle can even gather Statistics on group of columns within a table using extended statistics available in dbms_stats package.When multiple columns from a single table are used together in a query (multi column predicate) and If information about these combined columns are available to the optimizer then it can yield a better execution plan for optimal performance.

scott@11GR1> create table t
  2  nologging parallel 4
  3  as
  4  select level as x,
  5             case when mod(level,2) = 0 then 'Y' else 'N' end as flag_1
  6             case when mod(level,2) = 0 then 'N' else 'Y' end as flag_2
  7  from dual
  8  connect by level <= 1000000
  9  /

Table created.

scott@11GR1> create index t_ind on t(flag_1,flag_2) nologging;

Index created.

scott@11GR1> begin
  2     dbms_stats.gather_table_stats(ownname=>USER,tabname=>'T',cascade=>true);
  3  end;
  4  /

PL/SQL procedure successfully completed.

Any record in the table T don't have the same value for the column FLAG_1 and FLAG_2.

scott@11GR1> set autotrace traceonly explain statistics;
scott@11GR1> select *
  2  from t
  3  where flag_1 ='Y'
  4  and flag_2  ='Y';

no rows selected

Elapsed: 00:00:01.68

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   500K|  3906K|   159   (5)| 00:00:02 |        |      |         |
|   1 |  PX COORDINATOR      |          |       |       |            |          |        |      |         |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   500K|  3906K|   159   (5)| 00:00:02 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   500K|  3906K|   159   (5)| 00:00:02 |  Q1,00 | PCWC |         |
|*  4 |     TABLE ACCESS FULL| T        |   500K|  3906K|   159   (5)| 00:00:02 |  Q1,00 | PCWP |         |
--------------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
         12  recursive calls
          0  db block gets
       2107  consistent gets
       1948  physical reads
          0  redo size
        389  bytes sent via SQL*Net to client
        405  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          0  rows processed

scott@11GR1> set autotrace off;

It took 4000 IO's to answer our query without proper information about these combined columns available to Optimizer. But information about these combined columns in a table can be provided to optimizer by using Extended Stats available in Oracle 11gR1.

scott@11GR1> declare
  2     l_name varchar2(200);
  3  begin
  4     l_name := dbms_stats.create_extended_stats(ownname=>null,tabname=>'T',extension=>'(FLAG_1,FLAG_2)');
  5  end;
  6  /

PL/SQL procedure successfully completed.

Now, we have defined our column group, lets collect the statistics for this defined column groups.

scott@11GR1> begin
  2     dbms_stats.gather_table_stats(ownname=>USER,tabname=>'T',method_opt=>'for all columns size skewonly for columns (FLAG_1,FLAG_2) ');
  3  end;
  4  /

PL/SQL procedure successfully completed.


Now, we have collected the multicolumn statistics and it is available to Optimizer.

scott@11GR1> set autotrace traceonly explain statistics;
scott@11GR1> select *
  2  from t
  3  where flag_1 ='Y'
  4  and flag_2  ='Y';

no rows selected

Elapsed: 00:00:00.03

Execution Plan
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |     9 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |     9 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IND |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        389  bytes sent via SQL*Net to client
        405  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed

scott@11GR1> set autotrace off;

Now the  plan changes  drastically  and it took only 3 logical IO's  and it scanned the Index segment which is far better than the Table segment to answer our queries.

1 comment: