Thursday, June 30, 2022

Automatic Indexing - Part XXIV ( equality and inequality predicates mixture..)


In the previous post we discussed how automatic indexing (AI) currently does not consider non-equality predicates, AI will index the column based only on equality predicates. So how does Oracle handle the scenario when a SQL got a mixture of both equality and non-equality predicates?
 
Will begin by creating two very similar tables, but the second table having a more selective “sales_amt” column.
 
ai_demo@PDB19> create table demo026a
  2  nologging as
  3  select
  4         sysdate - mod(rownum,50000)+1 as sales_dt ,
  5         mod(rownum,500)+1 as sales_amt,
  6         rownum as sales_id,
  7         substr(a.object_name,1,30) as customer_name
  8  from stage a, stage b, stage c
  9  where rownum <= 10000000 ;
 
Table created.
 
ai_demo@PDB19> create table demo027a
  2  nologging as
  3  select sysdate - mod(rownum,50000)+1 as sales_dt ,
  4         mod(rownum,250000)+1 as sales_amt,
  5         rownum as sales_id,
  6         substr(a.object_name,1,30) as customer_name
  7  from stage a, stage b, stage c
  8  where rownum <= 10000000 ;
 
Table created.
 
Table DEMO026A got 500 distinct values for the column SALES_AMT whereas the other table DEMO027A got 250000 distinct values, we will run a few identical SQL’s, which both use an equality predicate on SALES_AMT column and non-equality predicate on SALES_DT column, the SALES_AMT column provides some filtering but in combination with the SALES_DT column, results in ultimate filtering with no rows returned:
 
ai_demo@PDB19> select * from demo026a where sales_amt = 55
  2  and sales_dt > to_date('30-jun-2022','dd-mon-yyyy') ;
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 86708632
 
--------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |     1 |    33 | 14305   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO026A |     1 |    33 | 14305   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("SALES_DT">TO_DATE(' 2022-06-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "SALES_AMT"=55)
       filter("SALES_DT">TO_DATE(' 2022-06-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "SALES_AMT"=55)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      52216  consistent gets
          0  physical reads
          0  redo size
        443  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
 
ai_demo@PDB19> select * from demo027a where sales_amt = 55
  2  and sales_dt > to_date('30-jun-2022','dd-mon-yyyy') ;
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3781020016
 
--------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |     1 |    34 | 14754   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO027A |     1 |    34 | 14754   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("SALES_AMT"=55 AND "SALES_DT">TO_DATE(' 2022-06-30 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
       filter("SALES_AMT"=55 AND "SALES_DT">TO_DATE(' 2022-06-30 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      53868  consistent gets
          0  physical reads
          0  redo size
        443  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
 
So how does AI handle this scenario ? if we look at the subsequent AI report:
 
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
 Index candidates                              : 4
 Indexes created (visible / invisible)         : 2 (1 / 1)
 Space used (visible / invisible)              : 268.44 MB (134.22 MB / 134.22 MB)
 Indexes dropped                               : 0
 SQL statements verified                       : 2
 SQL statements improved (improvement factor)  : 1 (2155.2x)
 SQL plan baselines created                    : 0
 Overall improvement factor                    : 2x
-------------------------------------------------------------------------------
 
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
 Unused indexes    : 0
 Space used        : 0 B
 Unusable indexes  : 0
-------------------------------------------------------------------------------
 
INDEX DETAILS
-------------------------------------------------------------------------------
1. The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
---------------------------------------------------------------------------------
| Owner   | Table    | Index                  | Key       | Type   | Properties |
---------------------------------------------------------------------------------
| AI_DEMO | DEMO026A | * SYS_AI_fadrntvnb5uww | SALES_AMT | B-TREE | NONE       |
| AI_DEMO | DEMO027A | SYS_AI_g1acyfvut3dj2   | SALES_AMT | B-TREE | NONE       |
---------------------------------------------------------------------------------
-------------------------------------------------------------------------------
VERIFICATION DETAILS
-------------------------------------------------------------------------------
1. The performance of the following statements improved:
-------------------------------------------------------------------------------
 Parsing Schema Name  : AI_DEMO
 SQL ID               : 54x1vd6qsxv75
 SQL Text             : select * from demo027a where sales_amt = 55 and
                      sales_dt > to_date('30-jun-2022','dd-mon-yyyy')
 Improvement Factor   : 2155.2x
 
Execution Statistics:
-----------------------------
                    Original Plan                 Auto Index Plan
                    ----------------------------  ----------------------------
 Elapsed Time (s):  7613599                       1393
 CPU Time (s):      2406857                       824
 Buffer Gets:       161643                        43
 Optimizer Cost:    14754                         44
 Disk Reads:        53858                         2
 Direct Writes:     0                             0
 Rows Processed:    0                             0
 Executions:        3                             1
 
 
PLANS SECTION
---------------------------------------------------------------------------------------------
 
- Original
-----------------------------
 Plan Hash Value  : 3781020016
 
---------------------------------------------------------------------------------
| Id | Operation                   | Name     | Rows | Bytes | Cost  | Time     |
---------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |          |      |       | 14754 |          |
|  1 |   TABLE ACCESS STORAGE FULL | DEMO027A |    1 |    34 | 14754 | 00:00:01 |
---------------------------------------------------------------------------------
 
- With Auto Indexes
-----------------------------
 Plan Hash Value  : 1442602725
 
-------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name                 | Rows | Bytes | Cost | Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |    1 |    34 |   44 | 00:00:01 |
| * 1 |   TABLE ACCESS BY INDEX ROWID BATCHED | DEMO027A             |    1 |    34 |   44 | 00:00:01 |
| * 2 |    INDEX RANGE SCAN                   | SYS_AI_g1acyfvut3dj2 |   40 |       |    3 | 00:00:01 |
-------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
------------------------------------------
* 1 - filter("SALES_DT">TO_DATE(' 2022-06-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
* 2 - access("SALES_AMT"=55)
 
 
Notes
-----
- Dynamic sampling used for this statement ( level = 11 )
 
 
If we look at the definition of all indexes currently on these tables:
 
ai_demo@PDB19> select table_name,index_name,indexing,auto,
  2         status,visibility,clustering_factor,leaf_blocks
  3  from all_indexes
  4  where table_name in ( 'DEMO027A','DEMO026A' )
  5  and owner ='AI_DEMO';
 
TABLE_NAME INDEX_NAME                INDEXIN AUT STATUS   VISIBILIT CLUSTERING_FACTOR LEAF_BLOCKS
---------- ------------------------- ------- --- -------- --------- ----------------- -----------
DEMO026A   SYS_AI_fadrntvnb5uww      FULL    YES UNUSABLE INVISIBLE          10000000       15362
DEMO027A   SYS_AI_g1acyfvut3dj2      FULL    YES VALID    VISIBLE            10000000       15938
 
ai_demo@PDB19> select table_name,index_name,column_name,column_position
  2  from user_ind_columns
  3  where table_name in ( 'DEMO027A','DEMO026A' )
  4  order by 1,2;
 
TABLE_NAME INDEX_NAME                COLUMN_NAM COLUMN_POSITION
---------- ------------------------- ---------- ---------------
DEMO026A   SYS_AI_fadrntvnb5uww      SALES_AMT                1
DEMO027A   SYS_AI_g1acyfvut3dj2      SALES_AMT                1
 
 
In both these cases, AI only created index on SALES_AMT column, as it was the only column with an equality predicate. However the AI on the table DEMO026A remained in an INVISIBLE/UNUSABLE state, that’s because the index on only the SALES_AMT column  was not enough to improve the performance of the SQL, due to the filtering not being sufficient enough and because of the relative poor clustering factor.
 
The index on the table DEMO027A was eventually created as VISIBLE/VALID state, as its better filtering was sufficient to actually improve the performance of the SQL.
 
So if we re-run the first query,
 
ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select * from demo026a where sales_amt = 55
  2  and sales_dt > to_date('30-jun-2022','dd-mon-yyyy') ;
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 86708632
 
--------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |     1 |    33 | 14305   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO026A |     1 |    33 | 14305   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("SALES_DT">TO_DATE(' 2022-06-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "SALES_AMT"=55)
       filter("SALES_DT">TO_DATE(' 2022-06-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "SALES_AMT"=55)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      52216  consistent gets
          0  physical reads
          0  redo size
        443  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
 
 
It continues to use a Full table scan, if we re-run the second query
 
ai_demo@PDB19> select * from demo027a where sales_amt = 55
  2  and sales_dt > to_date('30-jun-2022','dd-mon-yyyy') ;
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1442602725
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |     1 |    34 |    43   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO027A             |     1 |    34 |    43   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_g1acyfvut3dj2 |    40 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("SALES_DT">TO_DATE(' 2022-06-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   2 - access("SALES_AMT"=55)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         43  consistent gets
          0  physical reads
          0  redo size
        443  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
 
It now uses the newly created AI, with an improved consistent gets of 43, but if we were to create a manual index both (sales_amt and sales_dt) the column
 
ai_demo@PDB19> create index idx_demo027a_02
  2  on demo027a( sales_amt,sales_dt )
  3  compress advanced low;
 
Index created.
 
ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select * from demo027a where sales_amt = 55
  2  and sales_dt > to_date('30-jun-2022','dd-mon-yyyy') ;
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1668119774
 
-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |     1 |    34 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO027A        |     1 |    34 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | IDX_DEMO027A_02 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("SALES_AMT"=55 AND "SALES_DT">TO_DATE(' 2022-06-30 00:00:00', 'syyyy-mm-dd
              hh24:mi:ss') AND "SALES_DT" IS NOT NULL)
 
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          2  physical reads
          0  redo size
        443  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
 
Performance improves significantly further, by reducing consistent gets down by just 3. So if you have sql statements with a mixture of both equality and non-equality predicates, then we may encounter these two scenarios
 
A potentially efficient index that is not created at all as the filtering on just the equality based predicates are not sufficient to create viable index or
 
A potentially suboptimal AI that doesn’t contain useful filtering columns, because they are used in non-equality predicates.
 
 
 

Tuesday, June 28, 2022

Automatic Indexing - Part XXIII (AI limitations..)


Have been waiting for while to post a series of various limitations associated with automatic indexing (AI), one of the most serious limitation is associate with range-based predicates, going to run a series of range-based predicates that heavily filter and would greatly benefit from an index.
 
ai_demo@PDB19> create table demo025
  2  nologging as
  3  select a.*,
  4         sysdate - mod(rownum,2800) as release_dt ,
  5         ceil( dbms_random.value(1,500000) ) as sales_amt,
  6         rownum as sales_id
  7  from stage a, stage b, stage c
  8  where rownum <= 10000000 ;
 
Table created.
 
Then I run the following range scan queries several times that each return exactly only few rows.
 
ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select * from demo025 where sales_id between 42 and 50;
 
9 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 625612404
 
-------------------------------------------------------------------------------------
| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |         |    10 |  1140 | 48595   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO025 |    10 |  1140 | 48595   (1)| 00:00:02 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("SALES_ID"<=50 AND "SALES_ID">=42)
       filter("SALES_ID"<=50 AND "SALES_ID">=42)
 
ai_demo@PDB19> select * from demo025 where sales_id < 0;
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 625612404
 
-------------------------------------------------------------------------------------
| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |         |     1 |   114 | 48595   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO025 |     1 |   114 | 48595   (1)| 00:00:02 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("SALES_ID"<0)
       filter("SALES_ID"<0)
 
ai_demo@PDB19> select * from demo025 where sales_id > 10000000 ;
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 625612404
 
-------------------------------------------------------------------------------------
| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |         |     1 |   114 | 48595   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO025 |     1 |   114 | 48595   (1)| 00:00:02 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("SALES_ID">10000000)
       filter("SALES_ID">10000000)
 
ai_demo@PDB19> set autotrace off
 
 
if we look at the subsequent AI report, 
 
 
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
 Index candidates                              : 2
 Indexes created (visible / invisible)         : 1 (1 / 0)
 Space used (visible / invisible)              : 137.36 MB (137.36 MB / 0 B)
 Indexes dropped                               : 0
 SQL statements verified                       : 1
 SQL statements improved (improvement factor)  : 1 (10504.7x)
 SQL plan baselines created                    : 0
 Overall improvement factor                    : 10504.7x
-------------------------------------------------------------------------------
 
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
 Unused indexes    : 0
 Space used        : 0 B
 Unusable indexes  : 0
-------------------------------------------------------------------------------
 
INDEX DETAILS
-------------------------------------------------------------------------------
1. The following indexes were created:
*: invisible
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
| Owner   | Table   | Index                | Key        | Type   | Properties |
-------------------------------------------------------------------------------
| AI_DEMO | DEMO023 | SYS_AI_11muvn2pw0ym6 | RELEASE_DT | B-TREE | LOCAL      |
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
 
We notice NO index were created on DEMO025 table.
 
Currently AI do not support non-equality predicates, AI are only created based on the equality based predicates. Obviously, Automatic Indexing is a fabulous feature and this might all change in the future. But with Non-Equality predicates so prevalent in SQL, it’s vital to note this current limitation when using and relying on Automatic Indexing…