Wednesday, August 10, 2022

Automatic indexing - indexing JSON - Part I

One of many potential capabilities of Oracle database 21c was the ability of auto indexing (AI) of JSON expressions. Will start by creating a simple table that uses the new JSON datatype (introduced in Oracle 21c – the following example was run from Oracle Autonomous database cloud service having Oracle 21c(21.3) ) with some JSON documents loaded into it.
 
ai_demo@ATP21C> create table demo032
  2  nologging as
  3  select rownum as id,
  4    a.created as created_dt,
  5    json_object(
  6    'OWNER' value a.OWNER ,
  7    'OBJECT_NAME' value a.OBJECT_NAME ,
  8    'SUBOBJECT_NAME' value a.SUBOBJECT_NAME ,
  9    'OBJECT_ID' value a.OBJECT_ID ,
 10    'OBJECT_NEW_ID' value rownum ,
 11    'DATA_OBJECT_ID' value a.DATA_OBJECT_ID ,
 12    'OBJECT_TYPE' value a.OBJECT_TYPE ,
 13    'CREATED' value a.CREATED ,
 14    'LAST_DDL_TIME' value a.LAST_DDL_TIME ,
 15    'TIMESTAMP' value a.TIMESTAMP ,
 16    'STATUS' value a.STATUS returning JSON ) my_json_data
 17  from stage a, stage b
 18  where rownum <= 10000000 ;
 
Table created.
 
Then I run a few SQL statements with different JSON expression-based predicates, like this
 
ai_demo@ATP21C> set autotrace traceonly exp statistics
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID') = '55';
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3377958815
 
-------------------------------------------------------------------------------------
| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |         |   100K|    31M|   130K  (1)| 00:00:06 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO032 |   100K|    31M|   130K  (1)| 00:00:06 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT OSON ,
              '$.OBJECT_NEW_ID' RETURNING VARCHAR2(4000) NULL ON ERROR)='55')
       filter(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT OSON ,
              '$.OBJECT_NEW_ID' RETURNING VARCHAR2(4000) NULL ON ERROR)='55')
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     478790  consistent gets
     478784  physical reads
          0  redo size
        960  bytes sent via SQL*Net to client
        433  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID' returning number) = 55;
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3377958815
 
-------------------------------------------------------------------------------------
| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |         |   100K|    31M|   130K  (1)| 00:00:06 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO032 |   100K|    31M|   130K  (1)| 00:00:06 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT OSON ,
              '$.OBJECT_NEW_ID' RETURNING NUMBER NULL ON ERROR)=55)
       filter(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT OSON ,
              '$.OBJECT_NEW_ID' RETURNING NUMBER NULL ON ERROR)=55)
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     478790  consistent gets
     478784  physical reads
          0  redo size
        960  bytes sent via SQL*Net to client
        433  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
ai_demo@ATP21C> select * from demo032 t1 where t1.my_json_data.OBJECT_NEW_ID.number() = 55;
 
 
Execution Plan
----------------------------------------------------------
Plan hash value: 3377958815
 
-------------------------------------------------------------------------------------
| Id  | Operation                 | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |         |   100K|    31M|   130K  (1)| 00:00:06 |
|*  1 |  TABLE ACCESS STORAGE FULL| DEMO032 |   100K|    31M|   130K  (1)| 00:00:06 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage(JSON_VALUE("T1"."MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT
              OSON , '$.OBJECT_NEW_ID.number()' RETURNING NUMBER NULL ON ERROR)=55)
       filter(JSON_VALUE("T1"."MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT
              OSON , '$.OBJECT_NEW_ID.number()' RETURNING NUMBER NULL ON ERROR)=55)
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
 
 
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     478790  consistent gets
     478784  physical reads
          0  redo size
        960  bytes sent via SQL*Net to client
        433  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
 
ai_demo@ATP21C> set autotrace off
 
they all return just one row, but most currently use full table scan with no indexes present.
 
So, what does AI make of things?
 
The first thing to note is that running the AI report generates the following error.
 
ai_demo@ATP21C> set echo on linesize 1000 trimspool on trim on long 2000000 longchunksize 2000000
ai_demo@ATP21C> col report for a180
ai_demo@ATP21C> select dbms_auto_index.report_activity(systimestamp -1/24, systimestamp,'TEXT','ALL','ALL') report from dual;
ERROR:
ORA-30954: char 0 is invalid in json_value(MY_JSON_DATA, '$.OBJECT_NEW_ID' returni
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 107
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8686
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 8686
ORA-06512: at "SYS.DBMS_AUTO_INDEX_INTERNAL", line 9236
ORA-06512: at "SYS.DBMS_AUTO_INDEX", line 89
ORA-06512: at line 1
 
 
 
no rows selected
 
If we look at the indexes now present in the table
 
 
ai_demo@ATP21C> select table_name,index_name,index_type,indexing,auto,
  2          status,visibility
  3  from all_indexes
  4  where table_name = 'DEMO032'
  5  and owner ='AI_DEMO';
 
TABLE_NAME INDEX_NAME                INDEX_TYPE                  INDEXIN AUT STATUS   VISIBILIT
---------- ------------------------- --------------------------- ------- --- -------- ---------
DEMO032    SYS_AI_fwkyzzbd70x9s      FUNCTION-BASED NORMAL       FULL    YES VALID    VISIBLE 
DEMO032    SYS_AI_17jynda4v9sxu      FUNCTION-BASED NORMAL       FULL    YES VALID    VISIBLE 
 
ai_demo@ATP21C> select table_name,index_name,column_name,column_position
  2  from user_ind_columns
  3  where table_name = 'DEMO032'
  4  order by 1,2;
 
TABLE_NAME INDEX_NAME                COLUMN_NAME     COLUMN_POSITION
---------- ------------------------- --------------- ---------------
DEMO032    SYS_AI_17jynda4v9sxu      SYS_NC00005$                  1
DEMO032    SYS_AI_fwkyzzbd70x9s      SYS_NC00004$                  1
 
ai_demo@ATP21C> col column_expression for a60
ai_demo@ATP21C> select index_name,column_expression
  2  from user_ind_expressions
  3  where table_name = 'DEMO032';
 
INDEX_NAME                COLUMN_EXPRESSION
------------------------- ------------------------------------------------------------
SYS_AI_17jynda4v9sxu      JSON_VALUE("MY_JSON_DATA" FORMAT OSON , '$.OBJECT_NEW_ID' RE
                          TURNING NUMBER ERROR ON ERROR NULL ON EMPTY)
 
SYS_AI_fwkyzzbd70x9s      JSON_VALUE("MY_JSON_DATA" FORMAT OSON , '$.OBJECT_NEW_ID' RE
                          TURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)
 
 
We can see that AI has indeed created two new automatic indexes, one on the varchar2 expression and other on numeric expression. If we re-run the sql’s we notice few important points, note the following was run after the automatic indexes were created.
 
ai_demo@ATP21C> set autotrace traceonly exp
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID') = '55';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2321837570
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   100K|    31M|  9154   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO032              |   100K|    31M|  9154   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_fwkyzzbd70x9s | 40000 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.OBJECT_NEW_ID'
              RETURNING VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='55')
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
 
ai_demo@ATP21C> select * from demo032 where json_value(my_json_data,'$.OBJECT_NEW_ID' returning number) = 55;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2305263965
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   100K|    31M|  1920   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO032              |   100K|    31M|  1920   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_17jynda4v9sxu | 40000 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.OBJECT_NEW_ID'
              RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=55)
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
 
ai_demo@ATP21C> select * from demo032 t1 where t1.my_json_data.OBJECT_NEW_ID.number() = 55;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 2305263965
 
------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   100K|    31M|  1920   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| DEMO032              |   100K|    31M|  1920   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SYS_AI_17jynda4v9sxu | 40000 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access(JSON_VALUE("MY_JSON_DATA" /*+ LOB_BY_VALUE */  FORMAT OSON , '$.OBJECT_NEW_ID'
              RETURNING NUMBER ERROR ON ERROR NULL ON EMPTY)=55)
 
Note
-----
   - automatic DOP: Computed Degree of Parallelism is 1
 
ai_demo@ATP21C> set autotrace off
 
 
the first point to note is that the CBO now uses the newly created AI, as only one row is returned. In the next few blogposts we will see about the anomalies from the above plan. 

No comments:

Post a Comment