Thursday, August 18, 2022

Automatic indexing - indexing JSON - Part II

In the previous blogpost we found two issues on the execution plan generated after the automatic indexing (AI) were created.
 
ai_demo@ATP21C> set autotrace traceonly exp statistics
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID') = '55';
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2321837570
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   100K|    31M|  9154   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO032              |   100K|    31M|  9154   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_fwkyzzbd70x9s | 40000 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.OBJECT_NEW_ID'
              RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='55')
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     478790  consistent gets
     478784  physical reads
          0  redo size
        960  bytes sent via SQL*Net to client
        433  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
 
The first issue is with the estimated cardinality in the execution plan. The plan got a cost of 9154 but most importantly an estimated cardinality of 100K rows, but actuals were only one row returned. The estimates are way off and so therefore are the associated cost. This could potentially result in an inefficient plan and with index not being used by the CBO.
 
The estimate of 100K rows comes from 1% of number of rows (10M rows) in the table. The issue here is CBO has no idea about what the expected cardinality might be, as the output from JSON expression is effectively a black box. Oracle generates a virtual hidden column for this purpose, to capture the column statistics that gives the CBO an accurate idea on the selectivity of expression-based predicates.
 
But it we look at the column statistics after the generation of the automatic indexes
 
ai_demo@ATP21C> select column_name,virtual_column,hidden_column,num_nulls,density,num_distinct
  2  from user_tab_cols
  3  where table_name = 'DEMO032';
 
COLUMN_NAME     VIRTU HIDDE  NUM_NULLS    DENSITY NUM_DISTINCT
--------------- ----- ----- ---------- ---------- ------------
ID              NO    NO             0 1.0304E-07      9705425
CREATED_DT      NO    NO             0 .333333333            3
MY_JSON_DATA    NO    NO             0          0            0
SYS_NC00005$    YES   YES
SYS_NC00004$    YES   YES
 
 
We notice that two virtual columns generated for the JSON based expression have no statistics. This is because we have yet to collect new statistics since the creation of automatic indexes. As a result, the CBO still has no idea on how many rows are estimated to come from the JSON based expression and so has to resort to the 1% estimate.
 
The good news with Oracle autonomous database environments, is that Oracle has the High Frequency Statistics collection capability, which will automatically collect these missing statistics after a small (configurable) period of time. 
 
So, if we wait approximately 15 mins and checkout the column statistics again.
 
ai_demo@ATP21C> exec dbms_output.put_line( dbms_stats.get_prefs('AUTO_TASK_STATUS') );
ON
 
PL/SQL procedure successfully completed.
ai_demo@ATP21C> select column_name,virtual_column,hidden_column,num_nulls,density,num_distinct
  2  from all_tab_cols
  3  where table_name = 'DEMO032';
 
COLUMN_NAME     VIR HID  NUM_NULLS    DENSITY NUM_DISTINCT
--------------- --- --- ---------- ---------- ------------
ID              NO  NO           0 1.0304E-07      9705425
CREATED_DT      NO  NO           0 .333333333            3
MY_JSON_DATA    NO  NO           0          0            0
SYS_NC00005$    YES YES          0          0      9705425
SYS_NC00004$    YES YES          0          0      9973333
 
 
If we re-run the query again,
 
ai_demo@ATP21C> set autotrace traceonly exp statistics
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID') = '55';
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2321837570
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |     1 |   339 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO032              |     1 |   339 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_fwkyzzbd70x9s |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.OBJECT_NEW_ID'
              RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='55')
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     478790  consistent gets
     478784  physical reads
          0  redo size
        960  bytes sent via SQL*Net to client
        433  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
 
We can see that the CBO has now correctly estimated that just 1 rows to be returned and the associated CBO cost has reduced to just 4 as a result.
 
So if you create a function based index, make sure the generated virtual column (whether created automatically or manually generated before the associated index) has the necessary statistics.
 

No comments:

Post a Comment