Monday, May 9, 2022

Automatic Indexing - Part XIII

One of the common performance issue we encounter in relation to inefficient SQL is because of deficiency in the default manner by which index clustering factor is calculated. When it comes to automatic indexing (AI) and the Oracle autonomous database cloud services, the flawed default manner by which the index clustering factor calculated is still applies. So, we need to exercise some caution when AI are created and the impact their default statistics can have on the performance of subsequent SQL statements.
 
To illustrate with a simple example, we will first create a table with Key columns – which will be effectively unique, and the table will be populated via a procedure that just inserts 1M rows, the procedure uses an ORDER sequence such that generated values are monotonically increasing manner. 
 
 
ai_demo@PDB19> create table demo014( x number, y number, z varchar2(30) );
 
Table created.
 
ai_demo@PDB19> create sequence demo014_seq order;
 
Sequence created.
 
ai_demo@PDB19>
ai_demo@PDB19> create or replace procedure load_data
  2  as
  3  begin
  4     for i in 1..1000000
  5     loop
  6             insert into demo014(x,y,z) values(i, mod(i,100),
  7                     'Hello_world'||mod(i,100) );
  8             commit;
  9     end loop;
 10  end;
 11  /
 
Procedure created.
 
 
The procedure will be executed by three different sessions concurrently to simulate a multi-user environment inserting into a table.
 
ai_demo@PDB19> variable x number
ai_demo@PDB19> exec dbms_job.submit(:x,' begin ai_demo.load_data; end; ');
 
PL/SQL procedure successfully completed.
 
ai_demo@PDB19> print x
 
         X
----------
         1
 
ai_demo@PDB19> exec dbms_job.submit(:x,' begin ai_demo.load_data; end; ');
 
PL/SQL procedure successfully completed.
 
ai_demo@PDB19> print x
 
         X
----------
         2
 
ai_demo@PDB19> exec dbms_job.submit(:x,' begin ai_demo.load_data; end; ');
 
PL/SQL procedure successfully completed.
 
ai_demo@PDB19> print x
 
         X
----------
         3
 
ai_demo@PDB19> commit;
 
Commit complete.
 
ai_demo@PDB19>
ai_demo@PDB19> select sid,job from dba_jobs_running;
 
       SID        JOB
---------- ----------
      2428          1
      1654          2
      1752          3
 
ai_demo@PDB19> select sid,job from dba_jobs_running;
 
no rows selected
 
 
we will now collect the statistics on this table
 
ai_demo@PDB19> exec dbms_stats.gather_table_stats(user,'demo014');
 
PL/SQL procedure successfully completed.
 
ai_demo@PDB19> select num_rows,blocks
  2  from user_tables
  3  where table_name ='DEMO014';
 
  NUM_ROWS     BLOCKS
---------- ----------
   3000000      12137
 
 
If we run a SQL few times with filter on the column X (which is extremely unique to return just few rows from the table)
 
ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select * from demo014 where x = 77;
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 259284532
 
--------------------------------------------------------------------------
| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |         |     3 |    66 |  3324   (2)|
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO014 |     3 |    66 |  3324   (2)|
--------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("X"=77)
       filter("X"=77)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11391  consistent gets
          0  physical reads
          0  redo size
        521  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed
 
ai_demo@PDB19> set autotrace off
 
 
the execution plan shows a Full table scan is involved, the only choice the CBO has without an index on the column X, clearly an index on the column X would make the plan subsequently more efficient with just 3 rows returned from 3M rows table. Hopefully AI will come to rescue, so let’s check out the subsequent AI report. 
 
ai_demo@PDB19> set linesize 1000 trimspool on trim on long 2000000 longchunksize 2000000
ai_demo@PDB19> host timeout /T 900
 
Waiting for   0 seconds, press a key to continue ...
 
ai_demo@PDB19> select dbms_auto_index.report_activity( activity_start=> systimestamp - 2/24 ) report from dual;
 
REPORT
-------------------------------------------------------------------------------------------------------------------------------
------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
 Activity start               : 25-MAR-2022 22:07:44
 Activity end                 : 26-MAR-2022 00:07:44
 Executions completed         : 8
 Executions interrupted       : 0
 Executions with fatal error  : 0
-------------------------------------------------------------------------------
 
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
 Index candidates                              : 4
 Indexes created (visible / invisible)         : 1 (1 / 0)
 Space used (visible / invisible)              : 51.38 MB (51.38 MB / 0 B)
 Indexes dropped                               : 0
 SQL statements verified                       : 1
 SQL statements improved (improvement factor)  : 1 (11406.8x)
 SQL plan baselines created                    : 0
 Overall improvement factor                    : 11406.8x
-------------------------------------------------------------------------------
 
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
 Unused indexes    : 0
 Space used        : 0 B
 Unusable indexes  : 0
-------------------------------------------------------------------------------
 
INDEX DETAILS
-------------------------------------------------------------------------------
1. The following indexes were created:
-------------------------------------------------------------------------------
------------------------------------------------------------------------
| Owner   | Table   | Index                | Key | Type   | Properties |
------------------------------------------------------------------------
| AI_DEMO | DEMO014 | SYS_AI_dd1d6ufhks2nd | X   | B-TREE | NONE       |
------------------------------------------------------------------------
-------------------------------------------------------------------------------
 
 
So yes indeed, an AI (SYS_AI_dd1d6ufhks2nd) was created on the column X and if we look at the default clustering factor of this index
 
ai_demo@PDB19> select t.num_rows,t.blocks,i.index_name,i.clustering_factor
  2  from user_tables t,
  3      user_indexes i
  4  where t.table_name ='DEMO014'
  5  and t.table_name = i.table_name;
 
  NUM_ROWS     BLOCKS INDEX_NAME                CLUSTERING_FACTOR
---------- ---------- ------------------------- -----------------
   3000000      12137 SYS_AI_dd1d6ufhks2nd                2999016
 
ai_demo@PDB19> select table_name,index_name,tablespace_name,compression,
  2          status,visibility,indexing,auto
  3  from  user_indexes
  4  where table_name = 'DEMO014'
  5  order by 1,2;
 
TABLE_NAME INDEX_NAME           TABLESPACE_NAME COMPRESSION   STATUS VISIBILIT INDEXIN AUT
---------- -------------------- --------------- ------------- ------ --------- ------- ---
DEMO014    SYS_AI_dd1d6ufhks2nd TS_INDEX_DEMO   ADVANCED LOW  VALID  VISIBLE   FULL    YES
 
ai_demo@PDB19> select table_name,index_name,column_name,column_position
  2  from user_ind_columns
  3  where table_name = 'DEMO014'
  4  order by 1,2,3;
 
TABLE_NAME INDEX_NAME           COLUMN_NAM COLUMN_POSITION
---------- -------------------- ---------- ---------------
DEMO014    SYS_AI_dd1d6ufhks2nd X                        1
 
We notice that the clustering factor is relatively high and must closer to the number of rows in the table.
 
But if the column "X" in the table has been loaded via a monotonically increasing sequence, doesn’t that mean the column "X" values have been inserted in approximately in column "X" order? If so, doesn’t that mean the column "X" should have a “good” Clustering Factor” as the order of the rows in the table matches the order of the indexed values in the column "X" index?
Clearly not, the reason being that the table resides in the default ASSM tablespace storage. Which is designed to avoid contention by concurrent inserts from different sessions. Therefore each of the three session inserting into the table are each assigned to different table blocks, resulting in the rows not being precisely inserted in the column X order. 
 
However by default, the clustering factor is calculated by reading each index entry and determining if it references a ROWID that access a table block different from a pervious index entry, if it differs, it increments the clustering factor, if it does not differ and access the same table block as the pervious index entry, then clustering factor is not incremented.  
 
So in theory we could have 100 rows  that reside in just two different table blocks, will all odd values into one block and all even values into another block, then very well we will end up with a clustering factor would have a value of 100 for these 100 rows. 
 
If we run the same SQL as previously which only select few values from the column X
 
ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select * from demo014 where x = 77;
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 406969913
 
-------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |     3 |    66 |     6   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO014              |     3 |    66 |     6   (0)|
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_dd1d6ufhks2nd |     3 |       |     3   (0)|
-------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("X"=77)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        549  bytes sent via SQL*Net to client
        546  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed
 
ai_demo@PDB19> set autotrace off
 
the CBO now uses the new AI to return just few rows, the index is more efficient regardless of the clustering factor value.
 
However if we run a query that selects a range of values from the column X, in this example from 77 to 1200 which represents only a relatively low of 0.1% from the table.
 
ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select  * from demo014 where x between 77 and 1200;
 
3372 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 259284532
 
-------------------------------------------------------------------------------------
| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |         |  3375 | 74250 |  3324   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO014 |  3375 | 74250 |  3324   (2)| 00:00:01 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("X"<=1200 AND "X">=77)
       filter("X"<=1200 AND "X">=77)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11413  consistent gets
          0  physical reads
          0  redo size
      76870  bytes sent via SQL*Net to client
        788  bytes received via SQL*Net from client
         24  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3372  rows processed
 
 
The CBO decides to use a Full Table Scan as it deems the index with the massive Clustering Factor to be too expensive, with it having to visit differing blocks for the majority of the estimated 3300 rows
 
If we force the use of the index via the hint
 
ai_demo@PDB19> select /*+ index(demo014) */ * from demo014 where x between 77 and 1200;
 
3372 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 406969913
 
-------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |  3375 | 74250 |  3384   (1)|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO014              |  3375 | 74250 |  3384   (1)|
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_dd1d6ufhks2nd |  3375 |       |     9   (0)|
-------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("X">=77 AND "X"<=1200)
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       3325  consistent gets
          0  physical reads
          0  redo size
      89762  bytes sent via SQL*Net to client
        788  bytes received via SQL*Net from client
         24  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3372  rows processed
 
 
Note at an estimated cost of 3384, this is greater than the 3324 cost of the FTS which explains why the CBO decides the FTS is best. However, if we look at the number of Consistent Gets, it’s only 3325, meaning the CBO is actually getting these costs way wrong.
 
Why? Because of the grossly inflated clustering factor.  Oracle 12.1 introduced a new TABLE_CACHED_BLOCKS preference. Rather than the default value of 1, we can set this to any value up to 255. When calculating the Clustering Factor during statistics collection, it will NOT increment the Clustering Factor if the index visits a table block again that was one of the last “x” distinct table blocks visited.
 
So let’s change the TABLE_CACHED_BLOCKS preference for this table to 77 and recollect the segment statistics
 
ai_demo@PDB19> select t.num_rows,t.blocks,i.index_name,i.clustering_factor
  2  from user_tables t,
  3      user_indexes i
  4  where t.table_name ='DEMO014'
  5  and t.table_name = i.table_name;
 
  NUM_ROWS     BLOCKS INDEX_NAME                CLUSTERING_FACTOR
---------- ---------- ------------------------- -----------------
   3000000      12137 SYS_AI_dd1d6ufhks2nd                2999016
 
ai_demo@PDB19> exec dbms_stats.set_table_prefs(user,'DEMO014','TABLE_CACHED_BLOCKS',77);
 
PL/SQL procedure successfully completed.
 
ai_demo@PDB19> exec dbms_stats.gather_table_stats(user,'DEMO014');
 
PL/SQL procedure successfully completed.
 
ai_demo@PDB19> select t.num_rows,t.blocks,i.index_name,i.clustering_factor
  2  from user_tables t,
  3      user_indexes i
  4  where t.table_name ='DEMO014'
  5  and t.table_name = i.table_name;
 
  NUM_ROWS     BLOCKS INDEX_NAME                CLUSTERING_FACTOR
---------- ---------- ------------------------- -----------------
   3000000      12137 SYS_AI_dd1d6ufhks2nd                  14087
 
We can see that the clustering factor of the index got reduced and it is close to the number of blocks in the table segment.
 
If we now rerun the pervious sql again without hint in it
 
ai_demo@PDB19> set autotrace traceonly exp statistics
ai_demo@PDB19> select  * from demo014 t1 where x between 77 and 1200;
 
3372 rows selected.
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 406969913
 
-------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |  3375 | 74250 |    25   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO014              |  3375 | 74250 |    25   (0)|
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_dd1d6ufhks2nd |  3375 |       |     9   (0)|
-------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("X">=77 AND "X"<=1200)
 
 
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       3325  consistent gets
          0  physical reads
          0  redo size
      89762  bytes sent via SQL*Net to client
        788  bytes received via SQL*Net from client
         24  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       3372  rows processed
 
ai_demo@PDB19> set autotrace off
 
 
we can see that the CBO now automatically uses the new AI at the new cost of just 25, it is subsequently less than the pervious index cost of 3384 and much less than the full table scan cost of 3324 and hence the index is automatically chosen by the CBO.
 
When Automatic Indexes are created, it’s usually a good idea to check on the Clustering Factor and because default ASSM tablespaces have a tendency to significantly escalate the values of index Clustering Factors, to look at recalculating them with an non-default setting of the TABLE_CACHED_BLOCKS statistics collection preference.
 
Of course, not only is this a good idea for Automatic Indexes, but for manually created indexes as well.
 
 

No comments:

Post a Comment