One of the common performance issue we
encounter in relation to inefficient SQL is because of deficiency in the
default manner by which index clustering factor is calculated. When it comes to
automatic indexing (AI) and the Oracle autonomous database cloud services, the
flawed default manner by which the index clustering factor calculated is still
applies. So, we need to exercise some caution when AI are created and the
impact their default statistics can have on the performance of subsequent SQL
statements.
To illustrate with a simple example, we
will first create a table with Key columns – which will be effectively unique,
and the table will be populated via a procedure that just inserts 1M rows, the
procedure uses an ORDER sequence such that generated values are monotonically
increasing manner.
ai_demo@PDB19> create
table demo014( x number, y number, z varchar2(30) );
Table created.
ai_demo@PDB19> create
sequence demo014_seq order;
Sequence created.
ai_demo@PDB19>
ai_demo@PDB19> create or replace procedure load_data
2 as
3 begin
4 for i in 1..1000000
5 loop
6 insert into demo014(x,y,z) values(i, mod(i,100),
7 'Hello_world'||mod(i,100) );
8 commit;
9 end loop;
10 end;
11 /
Procedure created.
The procedure will be executed by three
different sessions concurrently to simulate a multi-user environment inserting
into a table.
ai_demo@PDB19> variable x
number
ai_demo@PDB19> exec dbms_job.submit(:x,' begin ai_demo.load_data; end; ');
PL/SQL procedure successfully
completed.
ai_demo@PDB19> print x
X
----------
1
ai_demo@PDB19> exec
dbms_job.submit(:x,' begin ai_demo.load_data; end; ');
PL/SQL procedure successfully
completed.
ai_demo@PDB19> print x
X
----------
2
ai_demo@PDB19> exec
dbms_job.submit(:x,' begin ai_demo.load_data; end; ');
PL/SQL procedure successfully
completed.
ai_demo@PDB19> print x
X
----------
3
ai_demo@PDB19> commit;
Commit complete.
ai_demo@PDB19>
ai_demo@PDB19> select sid,job from dba_jobs_running;
SID JOB
---------- ----------
2428 1
1654 2
1752 3
ai_demo@PDB19> select
sid,job from dba_jobs_running;
no rows selected
we will now collect the statistics on
this table
ai_demo@PDB19> exec
dbms_stats.gather_table_stats(user,'demo014');
PL/SQL procedure successfully
completed.
ai_demo@PDB19> select num_rows,blocks
2 from user_tables
3 where table_name ='DEMO014';
NUM_ROWS
BLOCKS
---------- ----------
3000000 12137
If we run a SQL few times with filter on
the column X (which is extremely unique to return just few rows from the table)
ai_demo@PDB19> set
autotrace traceonly exp statistics
ai_demo@PDB19> select * from demo014 where x = 77;
Execution Plan
----------------------------------------------------------
Plan hash value: 259284532
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 66 | 3324 (2)|
|* 1 | TABLE ACCESS STORAGE FULL| DEMO014 | 3 | 66 | 3324 (2)|
--------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
1 - storage("X"=77)
filter("X"=77)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11391 consistent gets
0
physical reads
0 redo size
521 bytes sent via SQL*Net to client
546 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
ai_demo@PDB19> set
autotrace off
the execution plan shows a Full table
scan is involved, the only choice the CBO has without an index on the column X,
clearly an index on the column X would make the plan subsequently more
efficient with just 3 rows returned from 3M rows table. Hopefully AI will come
to rescue, so let’s check out the subsequent AI report.
ai_demo@PDB19> set
linesize 1000 trimspool on trim on long 2000000 longchunksize 2000000
ai_demo@PDB19> host timeout /T 900
Waiting for 0 seconds, press a key to continue ...
ai_demo@PDB19> select
dbms_auto_index.report_activity( activity_start=> systimestamp - 2/24 )
report from dual;
REPORT
-------------------------------------------------------------------------------------------------------------------------------
------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start : 25-MAR-2022 22:07:44
Activity end : 26-MAR-2022 00:07:44
Executions completed : 8
Executions interrupted : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates : 4
Indexes created (visible / invisible) : 1 (1 / 0)
Space used (visible / invisible) : 51.38 MB (51.38 MB / 0 B)
Indexes dropped : 0
SQL statements verified : 1
SQL statements improved (improvement
factor) : 1 (11406.8x)
SQL plan baselines created : 0
Overall improvement factor : 11406.8x
-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes : 0
Space used : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------
INDEX DETAILS
-------------------------------------------------------------------------------
1. The following indexes were created:
-------------------------------------------------------------------------------
------------------------------------------------------------------------
| Owner | Table | Index | Key | Type | Properties |
------------------------------------------------------------------------
| AI_DEMO | DEMO014 | SYS_AI_dd1d6ufhks2nd | X | B-TREE | NONE |
------------------------------------------------------------------------
-------------------------------------------------------------------------------
So yes indeed, an AI (SYS_AI_dd1d6ufhks2nd)
was created on the column X and if we look at the default clustering factor of
this index
ai_demo@PDB19> select
t.num_rows,t.blocks,i.index_name,i.clustering_factor
2 from user_tables t,
3 user_indexes i
4 where t.table_name ='DEMO014'
5 and t.table_name = i.table_name;
NUM_ROWS
BLOCKS INDEX_NAME CLUSTERING_FACTOR
---------- ---------- ------------------------- -----------------
3000000 12137 SYS_AI_dd1d6ufhks2nd 2999016
ai_demo@PDB19> select
table_name,index_name,tablespace_name,compression,
2 status,visibility,indexing,auto
3 from user_indexes
4 where table_name = 'DEMO014'
5 order by 1,2;
TABLE_NAME INDEX_NAME TABLESPACE_NAME COMPRESSION STATUS VISIBILIT INDEXIN AUT
---------- -------------------- --------------- ------------- ------ --------- ------- ---
DEMO014 SYS_AI_dd1d6ufhks2nd TS_INDEX_DEMO ADVANCED LOW VALID VISIBLE FULL YES
ai_demo@PDB19> select
table_name,index_name,column_name,column_position
2 from user_ind_columns
3 where table_name = 'DEMO014'
4 order by 1,2,3;
TABLE_NAME INDEX_NAME COLUMN_NAM COLUMN_POSITION
---------- -------------------- ---------- ---------------
DEMO014 SYS_AI_dd1d6ufhks2nd X 1
We notice that the clustering factor is
relatively high and must closer to the number of rows in the table.
But if the
column "X" in the table has been loaded via a monotonically
increasing sequence, doesn’t that mean the column "X" values have
been inserted in approximately in column "X" order? If so, doesn’t
that mean the column "X" should have a “good” Clustering Factor” as
the order of the rows in the table matches the order of the indexed values in
the column "X" index?
Clearly not, the reason being that the table resides in the default ASSM tablespace storage. Which is designed to avoid contention by concurrent inserts from different sessions. Therefore each of the three session inserting into the table are each assigned to different table blocks, resulting in the rows not being precisely inserted in the column X order.
However by default, the clustering factor
is calculated by reading each index entry and determining if it references a
ROWID that access a table block different from a pervious index entry, if it
differs, it increments the clustering factor, if it does not differ and access
the same table block as the pervious index entry, then clustering factor is not
incremented.
So in theory we could have 100 rows that reside in just two different table
blocks, will all odd values into one block and all even values into another
block, then very well we will end up with a clustering factor would have a
value of 100 for these 100 rows.
If we run the same SQL as previously
which only select few values from the column X
ai_demo@PDB19> set
autotrace traceonly exp statistics
ai_demo@PDB19> select * from demo014 where x = 77;
Execution Plan
----------------------------------------------------------
Plan hash value: 406969913
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 66 | 6 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO014 | 3 | 66 | 6 (0)|
|* 2 | INDEX RANGE SCAN | SYS_AI_dd1d6ufhks2nd | 3 | | 3 (0)|
-------------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 - access("X"=77)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
549 bytes sent via SQL*Net to client
546 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
ai_demo@PDB19> set
autotrace off
the CBO now uses the new AI to return
just few rows, the index is more efficient regardless of the clustering factor
value.
However if we run a query that selects a
range of values from the column X, in this example from 77 to 1200 which
represents only a relatively low of 0.1% from the table.
ai_demo@PDB19> set
autotrace traceonly exp statistics
ai_demo@PDB19> select * from demo014 where x between 77 and 1200;
3372 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 259284532
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3375 | 74250 | 3324 (2)| 00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| DEMO014 | 3375 | 74250 | 3324 (2)| 00:00:01 |
-------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
1 - storage("X"<=1200 AND
"X">=77)
filter("X"<=1200 AND "X">=77)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
11413 consistent gets
0 physical reads
0 redo size
76870 bytes sent via SQL*Net to client
788 bytes received via SQL*Net from client
24 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3372 rows processed
The CBO decides to use a Full Table Scan
as it deems the index with the massive Clustering Factor to be too expensive,
with it having to visit differing blocks for the majority of the estimated 3300
rows
If we force the use of the index via the
hint
ai_demo@PDB19> select /*+ index(demo014) */ * from demo014 where x between
77 and 1200;
3372 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 406969913
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3375 | 74250 | 3384 (1)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO014 | 3375 | 74250 | 3384 (1)|
|* 2 | INDEX RANGE SCAN | SYS_AI_dd1d6ufhks2nd | 3375 | | 9 (0)|
-------------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 - access("X">=77 AND
"X"<=1200)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3325 consistent gets
0
physical reads
0 redo size
89762 bytes sent via SQL*Net to client
788 bytes received via SQL*Net from client
24 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3372 rows processed
Note at an estimated cost of 3384, this
is greater than the 3324 cost of the FTS which explains why the CBO decides the
FTS is best. However, if we look at the number of Consistent Gets, it’s only
3325, meaning the CBO is actually getting these costs way wrong.
Why? Because of the grossly inflated
clustering factor. Oracle 12.1
introduced a new TABLE_CACHED_BLOCKS preference. Rather than the default value
of 1, we can set this to any value up to 255. When calculating the Clustering
Factor during statistics collection, it will NOT increment the Clustering
Factor if the index visits a table block again that was one of the last “x”
distinct table blocks visited.
So let’s change the TABLE_CACHED_BLOCKS
preference for this table to 77 and recollect the segment statistics
ai_demo@PDB19> select
t.num_rows,t.blocks,i.index_name,i.clustering_factor
2 from user_tables t,
3 user_indexes i
4 where t.table_name ='DEMO014'
5 and t.table_name = i.table_name;
NUM_ROWS
BLOCKS INDEX_NAME CLUSTERING_FACTOR
---------- ---------- ------------------------- -----------------
3000000 12137 SYS_AI_dd1d6ufhks2nd 2999016
ai_demo@PDB19> exec
dbms_stats.set_table_prefs(user,'DEMO014','TABLE_CACHED_BLOCKS',77);
PL/SQL procedure successfully
completed.
ai_demo@PDB19> exec
dbms_stats.gather_table_stats(user,'DEMO014');
PL/SQL procedure successfully
completed.
ai_demo@PDB19> select
t.num_rows,t.blocks,i.index_name,i.clustering_factor
2 from user_tables t,
3 user_indexes i
4 where t.table_name ='DEMO014'
5 and t.table_name = i.table_name;
NUM_ROWS
BLOCKS INDEX_NAME CLUSTERING_FACTOR
---------- ---------- ------------------------- -----------------
3000000 12137 SYS_AI_dd1d6ufhks2nd 14087
We can see that the clustering factor of
the index got reduced and it is close to the number of blocks in the table
segment.
If we now rerun the pervious sql again
without hint in it
ai_demo@PDB19> set
autotrace traceonly exp statistics
ai_demo@PDB19> select * from demo014 t1 where x between 77 and 1200;
3372 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 406969913
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3375 | 74250 | 25 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO014 | 3375 | 74250 | 25 (0)|
|* 2 | INDEX RANGE SCAN | SYS_AI_dd1d6ufhks2nd | 3375 | | 9 (0)|
-------------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 - access("X">=77 AND
"X"<=1200)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3325 consistent gets
0
physical reads
0 redo size
89762 bytes sent via SQL*Net to client
788 bytes received via SQL*Net from client
24 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3372 rows processed
ai_demo@PDB19> set
autotrace off
we can see that the CBO now automatically
uses the new AI at the new cost of just 25, it is subsequently less than the
pervious index cost of 3384 and much less than the full table scan cost of 3324
and hence the index is automatically chosen by the CBO.
When Automatic Indexes are created, it’s
usually a good idea to check on the Clustering Factor and because default ASSM
tablespaces have a tendency to significantly escalate the values of index Clustering
Factors, to look at recalculating them with an non-default setting of the
TABLE_CACHED_BLOCKS statistics collection preference.
Of course, not only is this a good idea
for Automatic Indexes, but for manually created indexes as well.
ai_demo@PDB19> create or replace procedure load_data
2 as
3 begin
4 for i in 1..1000000
5 loop
6 insert into demo014(x,y,z) values(i, mod(i,100),
7 'Hello_world'||mod(i,100) );
8 commit;
9 end loop;
10 end;
11 /
ai_demo@PDB19> exec dbms_job.submit(:x,' begin ai_demo.load_data; end; ');
----------
1
----------
2
----------
3
ai_demo@PDB19> select sid,job from dba_jobs_running;
---------- ----------
2428 1
1654 2
1752 3
2 from user_tables
3 where table_name ='DEMO014';
---------- ----------
3000000 12137
ai_demo@PDB19> select * from demo014 where x = 77;
----------------------------------------------------------
Plan hash value: 259284532
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 66 | 3324 (2)|
|* 1 | TABLE ACCESS STORAGE FULL| DEMO014 | 3 | 66 | 3324 (2)|
---------------------------------------------------
filter("X"=77)
----------------------------------------------------------
0 recursive calls
0 db block gets
11391 consistent gets
0 redo size
521 bytes sent via SQL*Net to client
546 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
ai_demo@PDB19> host timeout /T 900
-------------------------------------------------------------------------------------------------------------------------------
------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start : 25-MAR-2022 22:07:44
Activity end : 26-MAR-2022 00:07:44
Executions completed : 8
Executions interrupted : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Index candidates : 4
Indexes created (visible / invisible) : 1 (1 / 0)
Indexes dropped : 0
SQL statements verified : 1
SQL plan baselines created : 0
Overall improvement factor : 11406.8x
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Unused indexes : 0
Space used : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
1. The following indexes were created:
-------------------------------------------------------------------------------
------------------------------------------------------------------------
| Owner | Table | Index | Key | Type | Properties |
------------------------------------------------------------------------
| AI_DEMO | DEMO014 | SYS_AI_dd1d6ufhks2nd | X | B-TREE | NONE |
------------------------------------------------------------------------
-------------------------------------------------------------------------------
2 from user_tables t,
3 user_indexes i
4 where t.table_name ='DEMO014'
5 and t.table_name = i.table_name;
---------- ---------- ------------------------- -----------------
3000000 12137 SYS_AI_dd1d6ufhks2nd 2999016
2 status,visibility,indexing,auto
3 from user_indexes
4 where table_name = 'DEMO014'
5 order by 1,2;
---------- -------------------- --------------- ------------- ------ --------- ------- ---
DEMO014 SYS_AI_dd1d6ufhks2nd TS_INDEX_DEMO ADVANCED LOW VALID VISIBLE FULL YES
2 from user_ind_columns
3 where table_name = 'DEMO014'
4 order by 1,2,3;
---------- -------------------- ---------- ---------------
DEMO014 SYS_AI_dd1d6ufhks2nd X 1
Clearly not, the reason being that the table resides in the default ASSM tablespace storage. Which is designed to avoid contention by concurrent inserts from different sessions. Therefore each of the three session inserting into the table are each assigned to different table blocks, resulting in the rows not being precisely inserted in the column X order.
ai_demo@PDB19> select * from demo014 where x = 77;
----------------------------------------------------------
Plan hash value: 406969913
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 66 | 6 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO014 | 3 | 66 | 6 (0)|
|* 2 | INDEX RANGE SCAN | SYS_AI_dd1d6ufhks2nd | 3 | | 3 (0)|
-------------------------------------------------------------------------------------------------
---------------------------------------------------
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
549 bytes sent via SQL*Net to client
546 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3 rows processed
ai_demo@PDB19> select * from demo014 where x between 77 and 1200;
----------------------------------------------------------
Plan hash value: 259284532
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3375 | 74250 | 3324 (2)| 00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| DEMO014 | 3375 | 74250 | 3324 (2)| 00:00:01 |
-------------------------------------------------------------------------------------
---------------------------------------------------
filter("X"<=1200 AND "X">=77)
----------------------------------------------------------
0 recursive calls
0 db block gets
11413 consistent gets
0 physical reads
0 redo size
76870 bytes sent via SQL*Net to client
788 bytes received via SQL*Net from client
24 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3372 rows processed
----------------------------------------------------------
Plan hash value: 406969913
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3375 | 74250 | 3384 (1)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO014 | 3375 | 74250 | 3384 (1)|
|* 2 | INDEX RANGE SCAN | SYS_AI_dd1d6ufhks2nd | 3375 | | 9 (0)|
-------------------------------------------------------------------------------------------------
---------------------------------------------------
----------------------------------------------------------
0 recursive calls
0 db block gets
3325 consistent gets
0 redo size
89762 bytes sent via SQL*Net to client
788 bytes received via SQL*Net from client
24 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3372 rows processed
2 from user_tables t,
3 user_indexes i
4 where t.table_name ='DEMO014'
5 and t.table_name = i.table_name;
---------- ---------- ------------------------- -----------------
3000000 12137 SYS_AI_dd1d6ufhks2nd 2999016
2 from user_tables t,
3 user_indexes i
4 where t.table_name ='DEMO014'
5 and t.table_name = i.table_name;
---------- ---------- ------------------------- -----------------
3000000 12137 SYS_AI_dd1d6ufhks2nd 14087
ai_demo@PDB19> select * from demo014 t1 where x between 77 and 1200;
----------------------------------------------------------
Plan hash value: 406969913
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3375 | 74250 | 25 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO014 | 3375 | 74250 | 25 (0)|
|* 2 | INDEX RANGE SCAN | SYS_AI_dd1d6ufhks2nd | 3375 | | 9 (0)|
-------------------------------------------------------------------------------------------------
---------------------------------------------------
----------------------------------------------------------
1 recursive calls
0 db block gets
3325 consistent gets
0 redo size
89762 bytes sent via SQL*Net to client
788 bytes received via SQL*Net from client
24 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3372 rows processed