Tuesday, January 12, 2016

Amazing Optimization to get distinct and TopN values from index - Part I

Suppose we have a Table like this
rajesh@ORA11G> create table t as
  2  select *
  3  from all_objects;
 
Table created.
 
rajesh@ORA11G>
rajesh@ORA11G> create index t_idx on t(owner,object_id);
 
Index created.
 
rajesh@ORA11G> begin
  2     dbms_stats.gather_table_stats(user,'T',
  3             method_opt=>'for all indexed columns size 254',
  4             cascade=>true);
  5  end;
  6  /
 
PL/SQL procedure successfully completed.
 
rajesh@ORA11G>
rajesh@ORA11G> select index_name,blevel,leaf_blocks,
  2      distinct_keys,num_rows,
  3      avg_leaf_blocks_per_key lf_per_key,
  4      avg_data_blocks_per_key blks_per_key
  5  from user_indexes
  6  where index_name ='T_IDX';
 
INDEX_NAME     BLEVEL LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS LF_PER_KEY BLKS_PER_KEY
---------- ---------- ----------- ------------- ---------- ---------- ------------
T_IDX               1         256         84718      84718          1            1
 
1 row selected.
 
rajesh@ORA11G> set linesize 71
rajesh@ORA11G> desc T
 Name                                Null?    Type
 ----------------------------------- -------- -------------------------
 OWNER                               NOT NULL VARCHAR2(30)
 OBJECT_NAME                         NOT NULL VARCHAR2(30)
 SUBOBJECT_NAME                               VARCHAR2(30)
 OBJECT_ID                           NOT NULL NUMBER
 DATA_OBJECT_ID                               NUMBER
 OBJECT_TYPE                                  VARCHAR2(19)
 CREATED                             NOT NULL DATE
 LAST_DDL_TIME                       NOT NULL DATE
 TIMESTAMP                                    VARCHAR2(19)
 STATUS                                       VARCHAR2(7)
 TEMPORARY                                    VARCHAR2(1)
 GENERATED                                    VARCHAR2(1)
 SECONDARY                                    VARCHAR2(1)
 NAMESPACE                           NOT NULL NUMBER
 EDITION_NAME                                 VARCHAR2(30)
 
rajesh@ORA11G>
 
The column owner and object_id are indexed and both are NOT NULL. Let’s say the question would be to get distinct list of all “owner” available.
 
The standard query using “Distinct” would be like this
 
rajesh@ORA11G> select distinct owner from t;
 
31 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1741570181
 
------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)|
------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |    31 |   160  (20)|
|   1 |  HASH UNIQUE          |       |    31 |   160  (20)|
|   2 |   INDEX FAST FULL SCAN| T_IDX | 84718 |   134   (4)|
------------------------------------------------------------
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        263  consistent gets
          0  physical reads
          0  redo size
        939  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         31  rows processed
 
rajesh@ORA11G>
 
From the explain plan we infer that
 
  • The optimizer has decided to choose Index as skinny version of table.
  • Read all the root, branch and leaf blocks from index tree structure using multi block IO
  • Discarded all the root and branch blocks, and discard the rowid and other indexed key column values except the column “OWNER” from leaf blocks
  • Apply “Hash unique” operation on those, to discard duplicates and retain only unique values and provide that as query output.
     
Instead of scanning through all the leaf blocks, We also could go along the tree entering only the required blocks, but not all leaf blocks! However, Oracle can’t manage this on its own so we have to make a certain twist: aside from Index Full Scan(min/max) Oracle also has Index Range Scan(min/max) which works well with ranges and boundaries. We can use recursive query to make it read only what we need!
 
rajesh@ORA11G> with datas(x) as
  2  (
  3     select min(owner) from t
  4     union all
  5     select (select min(owner) from t t2
  6              where t2.owner > datas.x )
  7     from datas
  8     where x is not null )
  9  select * from datas
 10  where x is not null
 11  /
 
31 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 946542369
 
-------------------------------------------------------------------------------
| Id  | Operation                                 | Name  | Rows  |Cost (%CPU)|
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |       |     2 |    4   (0)|
|*  1 |  VIEW                                     |       |     2 |    4   (0)|
|   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|       |       |           |
|   3 |    SORT AGGREGATE                         |       |     1 |           |
|   4 |     INDEX FULL SCAN (MIN/MAX)             | T_IDX |     1 |    2   (0)|
|   5 |    SORT AGGREGATE                         |       |     1 |           |
|   6 |     FIRST ROW                             |       |     1 |    2   (0)|
|*  7 |      INDEX RANGE SCAN (MIN/MAX)           | T_IDX |     1 |    2   (0)|
|*  8 |    RECURSIVE WITH PUMP                    |       |       |           |
-------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("X" IS NOT NULL)
   7 - access("T2"."OWNER">:B1)
   8 - filter("X" IS NOT NULL)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         35  consistent gets
          0  physical reads
          0  redo size
        935  bytes sent via SQL*Net to client
        500  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         33  sorts (memory)
          0  sorts (disk)
         31  rows processed
 
rajesh@ORA11G>
 
This makes a significant different difference of about 35 logical IO to get 31 values from index instead of 263 logical IO.
 
Here is the explanation of the algorithm
 
  • In the first part of union all (3-4 strings of plan) we specify where to start the recursion, and more specifically we choose a minimal (first) the value from the index.
  • After that we choose the first value that is bigger than the one chosen in the previous step, using IRS(min/max) (7-6-5 stings of the plan).
  • Repeat the recursion while we find anything

No comments:

Post a Comment