In the previous
blogpost we found two issues on the execution plan generated after the
automatic indexing (AI) were created.
ai_demo@ATP21C> set
autotrace traceonly exp statistics
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID') = '55';
Execution Plan
----------------------------------------------------------
Plan hash value: 2321837570
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 31M| 9154 (1)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO032 | 100K| 31M| 9154 (1)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SYS_AI_fwkyzzbd70x9s | 40000 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 -
access(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.OBJECT_NEW_ID'
RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='55')
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
478790 consistent gets
478784 physical reads
0 redo size
960 bytes sent via SQL*Net to client
433 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
The first issue is with the estimated
cardinality in the execution plan. The plan got a cost of 9154 but most
importantly an estimated cardinality of 100K rows, but actuals were only one
row returned. The estimates are way off and so therefore are the associated
cost. This could potentially result in an inefficient plan and with index not
being used by the CBO.
The estimate of 100K rows comes from 1%
of number of rows (10M rows) in the table. The issue here is CBO has no idea
about what the expected cardinality might be, as the output from JSON expression
is effectively a black box. Oracle generates a virtual hidden column for this
purpose, to capture the column statistics that gives the CBO an accurate idea
on the selectivity of expression-based predicates.
But it we look at the column statistics after
the generation of the automatic indexes
ai_demo@ATP21C> select
column_name,virtual_column,hidden_column,num_nulls,density,num_distinct
2 from user_tab_cols
3 where table_name = 'DEMO032';
COLUMN_NAME VIRTU HIDDE NUM_NULLS
DENSITY NUM_DISTINCT
--------------- ----- ----- ---------- ---------- ------------
ID NO NO 0 1.0304E-07 9705425
CREATED_DT NO NO 0 .333333333 3
MY_JSON_DATA NO NO 0 0 0
SYS_NC00005$ YES YES
SYS_NC00004$ YES YES
We notice that two virtual columns
generated for the JSON based expression have no statistics. This is because we
have yet to collect new statistics since the creation of automatic indexes. As
a result, the CBO still has no idea on how many rows are estimated to come from
the JSON based expression and so has to resort to the 1% estimate.
The good news with Oracle autonomous
database environments, is that Oracle has the High Frequency Statistics collection
capability, which will
automatically collect these missing statistics after a small (configurable)
period of time.
So, if we wait approximately 15 mins and
checkout the column statistics again.
ai_demo@ATP21C> exec
dbms_output.put_line( dbms_stats.get_prefs('AUTO_TASK_STATUS') );
ON
PL/SQL procedure successfully
completed.
ai_demo@ATP21C> select column_name,virtual_column,hidden_column,num_nulls,density,num_distinct
2 from all_tab_cols
3 where table_name = 'DEMO032';
COLUMN_NAME VIR HID
NUM_NULLS DENSITY NUM_DISTINCT
--------------- --- --- ---------- ---------- ------------
ID NO NO 0 1.0304E-07 9705425
CREATED_DT NO NO 0 .333333333 3
MY_JSON_DATA NO NO 0 0 0
SYS_NC00005$ YES YES 0 0 9705425
SYS_NC00004$ YES YES 0 0 9973333
If we re-run the query again,
ai_demo@ATP21C> set
autotrace traceonly exp statistics
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID') = '55';
Execution Plan
----------------------------------------------------------
Plan hash value: 2321837570
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 339 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO032 | 1 | 339 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SYS_AI_fwkyzzbd70x9s | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 -
access(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.OBJECT_NEW_ID'
RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='55')
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
478790 consistent gets
478784 physical reads
0 redo size
960 bytes sent via SQL*Net to client
433 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
We can see that the CBO has now correctly
estimated that just 1 rows to be returned and the associated CBO cost has
reduced to just 4 as a result.
So if you create a function based index,
make sure the generated virtual column (whether created automatically or
manually generated before the associated index) has the necessary statistics.
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID') = '55';
----------------------------------------------------------
Plan hash value: 2321837570
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 31M| 9154 (1)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO032 | 100K| 31M| 9154 (1)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SYS_AI_fwkyzzbd70x9s | 40000 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
---------------------------------------------------
RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='55')
-----
- automatic DOP: Computed Degree of Parallelism is 1
----------------------------------------------------------
0 recursive calls
0 db block gets
478790 consistent gets
478784 physical reads
0 redo size
960 bytes sent via SQL*Net to client
433 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
2 from user_tab_cols
3 where table_name = 'DEMO032';
--------------- ----- ----- ---------- ---------- ------------
ID NO NO 0 1.0304E-07 9705425
CREATED_DT NO NO 0 .333333333 3
MY_JSON_DATA NO NO 0 0 0
SYS_NC00005$ YES YES
SYS_NC00004$ YES YES
ON
ai_demo@ATP21C> select column_name,virtual_column,hidden_column,num_nulls,density,num_distinct
2 from all_tab_cols
3 where table_name = 'DEMO032';
--------------- --- --- ---------- ---------- ------------
ID NO NO 0 1.0304E-07 9705425
CREATED_DT NO NO 0 .333333333 3
MY_JSON_DATA NO NO 0 0 0
SYS_NC00005$ YES YES 0 0 9705425
SYS_NC00004$ YES YES 0 0 9973333
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID') = '55';
----------------------------------------------------------
Plan hash value: 2321837570
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 339 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| DEMO032 | 1 | 339 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | SYS_AI_fwkyzzbd70x9s | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
---------------------------------------------------
RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='55')
-----
- automatic DOP: Computed Degree of Parallelism is 1
----------------------------------------------------------
0 recursive calls
0 db block gets
478790 consistent gets
478784 physical reads
0 redo size
960 bytes sent via SQL*Net to client
433 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
No comments:
Post a Comment