Wednesday, June 22, 2022

Automatic Indexing - Part XVII

As I discuss in the previous series the problems and inconsistencies can appear between what the Automatic Indexing (AI) process thinks will happen with the newly created AI and what actually happens in other database sessions, this is because AI process session uses a much higher degree of dynamic sampling ( dynamic sampling = 11 ) than other database session uses by default (level =2)
 
As we in the previous series, an SQL statement may be deemed not to use an Index in the AI deliberations, where it is actually used in the normal session (perhaps incorrectly so) where the data is heavily skewed and current statistics are insufficient for the CBO to accurately detect such skewness is one such scenario where we might encounter this issue.
 
One option to get around this is to hint any such queries with a dynamic sampling value that matches that of the AI process.
 
If we now re-run the problematic query from the previous series (where a new AI was inappropriately used by the CBO) with such a dynamic sampling hint.
 
ai_demo@PDB19> show parameter sampling
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_dynamic_sampling           integer     2
ai_demo@PDB19>
ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19>
ai_demo@PDB19> select x1,x2,object_name
  2  from DEMO018a
  3  where x1 = 17
  4  and x2 = 17;
 
100000 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2414111795
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |  1000 | 30000 |   560   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO018A             |  1000 | 30000 |   560   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_gawwp6r5fuzfr |  1000 |       |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("X1"=17 AND "X2"=17)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      57097  consistent gets
      55520  physical reads
          0  redo size
    4068922  bytes sent via SQL*Net to client
       7872  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed
 
ai_demo@PDB19> select /*+ dynamic_sampling(11) */ x1,x2,object_name
  2  from DEMO018a
  3  where x1 = 17
  4  and x2 = 17;
 
100000 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3250860338
 
--------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |   103K|  3034K| 15208   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO018A |   105K|  3085K| 15208   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("X1"=17 AND "X2"=17)
       filter("X1"=17 AND "X2"=17)
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      56197  consistent gets
          0  physical reads
          0  redo size
    2835214  bytes sent via SQL*Net to client
       7872  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed
 
ai_demo@PDB19> set autotrace off
 
  
We can see that with Dynamic sampling hint (level=11) in place, the CBO this time correctly calculated the cardinality and hence correctly decided against the use of the AI.
 
Although these parameters can’t be changed in Oracle Autonomous database cloud services, on Exadata platform if using AI we might want to consider setting the OPTIMIZER_DYNAMIC_SAMPLING to 11 and OPTIMIZER_ADAPTIVE_STATISTICS to true in order to be consistent with AI process. These settings can obviously add significant overhead during parsing and so need to be set with caution.
 
In these scenarios where there is an inherent relationship between columns which the CBO is not detecting, the creation of extended statistics can be beneficial
 
We currently have the following columns and statistics in the base table
 
ai_demo@PDB19> select column_name, num_distinct, density,num_nulls, num_buckets,histogram
  2  from all_tab_cols
  3  where table_name ='DEMO018A'
  4  and owner ='AI_DEMO';
 
COLUMN_NAME     NUM_DISTINCT    DENSITY  NUM_NULLS NUM_BUCKETS HISTOGRAM
--------------- ------------ ---------- ---------- ----------- ---------------
X1                       100  .00000005          0         100 FREQUENCY
X2                       100  .00000005          0         100 FREQUENCY
ID                   9914368 1.0086E-07          0           1 NONE
OBJECT_NAME            94824 .000010546          0           1 NONE
 
If we now collect extended statistics on both X1 and X2 columns
 
ai_demo@PDB19> begin
  2      dbms_stats.gather_table_stats(user,'DEMO018A',
  3          method_opt=>'for columns(x1,x2) size 254' );
  4  end;
  5  /
 
PL/SQL procedure successfully completed.
 
ai_demo@PDB19> select column_name, num_distinct, density,num_nulls, num_buckets,histogram
  2  from all_tab_cols
  3  where table_name ='DEMO018A'
  4  and owner ='AI_DEMO';
 
COLUMN_NAME     NUM_DISTINCT    DENSITY  NUM_NULLS NUM_BUCKETS HISTOGRAM
--------------- ------------ ---------- ---------- ----------- ---------------
X1                       100  .00000005          0         100 FREQUENCY
X2                       100  .00000005          0         100 FREQUENCY
ID                   9914368 1.0086E-07          0           1 NONE
OBJECT_NAME            94824 .000010546          0           1 NONE
SYS_STUXTXIJPH7          100  .00000005          0         100 FREQUENCY
P8F5TH2A09PZ12X
 
The CBO now have some idea on the cardinality if both the columns are used within the predicates, if we re-run the problematic query without the hint
 
 
ai_demo@PDB19> select x1,x2,object_name
  2  from DEMO018a
  3  where x1 = 17
  4  and x2 = 17;
 
100000 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3250860338
 
--------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          |   100K|  2929K| 15208   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO018A |   100K|  2929K| 15208   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("X1"=17 AND "X2"=17)
       filter("X1"=17 AND "X2"=17)
 
 
Statistics
----------------------------------------------------------
          5  recursive calls
          0  db block gets
      56200  consistent gets
          0  physical reads
          0  redo size
    2835214  bytes sent via SQL*Net to client
       7872  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed
 
 
The CBO is correctly estimating the cardinality of 100K rows and so is not using the automatic index. However we can still get ourselves in problem, if we now re-run the query that returns no rows
 
ai_demo@PDB19> select x1,x2,object_name
  2  from DEMO018a
  3  where x1 = 1
  4  and x2 = 2;
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3250860338
 
--------------------------------------------------------------------------------------
| Id  | Operation                 | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |          | 50000 |  1464K| 15208   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO018A | 50000 |  1464K| 15208   (1)| 00:00:01 |
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("X1"=1 AND "X2"=2)
       filter("X1"=1 AND "X2"=2)
 
 
Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
      55537  consistent gets
          0  physical reads
          0  redo size
        369  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed
 
 
We can see that the CBO is now getting this execution plan wrong and is now estimating incorrectly that 50K rows are to be returned (and not about 1000 rows as estimated previously), this increased estimate is now deemed too expensive for the AI to retrieve and is now incorrectly using the Full table scan.
 
This is because with frequency based histogram in place, Oracle assumes that 50% of the lowest recorded frequency within the histogram is returned, if the value doesn’t exists but resides within the know min-max range of values.
 
ai_demo@PDB19> select endpoint_number , endpoint_number/2 as half_value
  2  from user_tab_histograms
  3  where table_name ='DEMO018A'
  4  and column_name like 'SYS%'
  5  order by endpoint_number
  6  fetch first 1 row only ;
 
ENDPOINT_NUMBER HALF_VALUE
--------------- ----------
         100000      50000
 
So we need to be very careful about how to potentially collect any additional statistics and its potential impact on other SQL statements.
 
One possible solution would be to use DS hint with level = 11.
 
ai_demo@PDB19> select /*+ dynamic_sampling(11) */ x1,x2,object_name
  2  from DEMO018a
  3  where x1 = 1
  4  and x2 = 2;
 
no rows selected
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2414111795
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |     2 |    60 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO018A             |     2 |    60 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_gawwp6r5fuzfr |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("X1"=1 AND "X2"=2)
 
Note
-----
   - dynamic statistics used: dynamic sampling (level=AUTO)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        369  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0    rows processed
 
We can see that with Dynamic sampling hint (level=11) in place, the CBO this time correctly calculated the cardinality and hence correctly decided the use of the AI.
 

No comments:

Post a Comment