Index path
subsetting is one of the most practical and often underutilized features of the
JSON search index. Instead of indexing every possible field inside a JSON
document, path subsetting allows us to explicitly decide which paths should be
indexed and which should be ignored. Any field that is excluded from indexing
simply does not contribute to the index structure.
The immediate
benefit is obvious: by filtering out irrelevant paths, we reduce the amount of
data that needs to be indexed. This leads to a smaller index footprint, faster
index creation time, and potentially better maintenance characteristics. In
other words, we index only what truly matters to our workload.
Consider the
following example:
demo@ADB26AI> create
search index my_demo_idx on my_demo(c2)
2 for json parameters(' search_on
3 VALUE(number) include ($.products.id)
4 TEXT include ($.products.str) ');
Index created.
This definition is
very intentional. It tells the database that:
Nothing else inside
the JSON document is indexed. All other paths are ignored.
Internally, if $.products.id evaluates to a value like 154980, that value is stored as a numeric token for range-based navigation. Similarly, if $.products.str evaluates to a string such as ABC, it is indexed as a text token for full-text search operations. The index stores only what we asked it to store — no more, no less.
The result of this
focused indexing strategy is immediately visible when we check the size:
demo@ADB26AI> select
round(sum(bytes)/1024/1024,2) size_mb
2 from user_segments
3 where segment_name like '%MY_DEMO_IDX%';
SIZE_MB
----------
852.06
In this case, the
index size is around 852 MB, which is significantly smaller than the broader
indexing flavors we examined earlier. This reduction comes purely from
eliminating unnecessary paths.
With this index in
place, a numeric range-based predicate on $.products.id works efficiently.
demo@ADB26AI> select *
2 from my_demo
3 where json_exists( c2, '$.products[*].id?(@.number() > 15)' );
Execution Plan
----------------------------------------------------------
Plan hash value: 2005405343
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 18200 | 488 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 25 | 18200 | 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
1 - filter(JSON_EXISTS2("C2" /*+
LOB_BY_VALUE */ FORMAT OSON ,
'$.products[*].id?(@.number() > 15)' /* json_path_str $.products[*].id?(@.number() >
15) */ FALSE ON ERROR TYPE(LAX) )=1)
3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
,'(sdata(FNUM_6AA7829B12A1ECA7231B4272D111B8B4_id > 15 ))')>0)
For example, an
open-ended condition such as id > 15 pushes both the JSON path and the numeric
boundary into the search index. The execution plan confirms that the DOMAIN
INDEX is used, and only matching rowids are returned. The table is accessed
only for those specific rows.
Similarly, full-text
search on $.products.str works seamlessly. When using JSON_TEXTCONTAINS, the
optimizer probes the index with both the path expression and the search term.
The index performs the filtering and returns only relevant rowids, ensuring
optimal performance.
demo@ADB26AI> select *
2 from my_demo
3 where json_textcontains( c2, '$.products.str ','ljfycrixvcehvmpmldubibaur');
Execution Plan
----------------------------------------------------------
Plan hash value: 2005405343
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2500 | 1777K| 488 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 2500 | 1777K| 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2"
/*+ LOB_BY_VALUE */
,'(ljfycrixvcehvmpmldubibaur)
INPATH (/products/str)')>0)
However, this
configuration also clearly defines its boundaries.
Because $.products.str
is indexed using the TEXT option, it supports full-text search and text-based
filtering operations, but not structured range-based string predicates. If we
attempt to run a query like:
demo@ADB26AI> select *
2 from my_demo nested c2.products[*]
3 columns ( x1 varchar2(40) path '$.str')
4 where x1 between 'A' and 'C' ;
Execution Plan
----------------------------------------------------------
Plan hash value: 1866419551
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 625M| 424G| 25M (19)| 00:16:31 |
| 1 | RESULT CACHE | 7fwrahdm15gh9fv7shatsu691k | 625M| 424G| 25M (19)| 00:16:31 |
| 2 | NESTED LOOPS | | 625M| 424G| 25M (19)| 00:16:31 |
|* 3 | TABLE ACCESS FULL | MY_DEMO | 50000 | 34M| 738 (40)| 00:00:01 |
|* 4 | JSONTABLE EVALUATION | | 12500 | 25000 | 507 (18)| 00:00:01 |
------------------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
3 -
filter(JSON_EXISTS2("MY_DEMO"."C2" /*+ LOB_BY_VALUE */ FORMAT OSON ,
'$?(@.products[*].str<="C" && @.products[*].str>="A")' /* json_path_str
$?((@.products[*].str.string() <= "C") && (@.products[*].str.string() >= "A")) */ FALSE ON
ERROR TYPE(LAX) )=1)
4 - filter("P"."X1">='A' AND "P"."X1"<='C')
on the
$.products.str path, the optimizer may skip the search index entirely. The
execution plan in such cases often shows a full table scan, because range-based
navigation for VARCHAR2 was not enabled under this configuration. The index was
never designed to support that type of predicate.
This behavior is not
a limitation of JSON search indexing itself, but rather a direct consequence of
the deliberate path subsetting and datatype configuration we chose during index
creation.
Another
powerful aspect of path subsetting is that multiple indexed paths can be
combined in a single query. For example, we can combine a full-text search on $.products.str
with a numeric range predicate on $.products.id. In such cases, the optimizer
can merge both conditions into a single index probe. The execution plan shows
that both predicates are pushed into the search index navigation step, and only
matching rowids are returned.
demo@ADB26AI> select *
2 from my_demo
3 where json_textcontains( c2, '$.products.str ','ljfycrixvcehvmpmldubibaur')
4 and json_exists( c2, '$.products[*].id?(@.number() > 15)' );
Execution Plan
----------------------------------------------------------
Plan hash value: 2005405343
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 18200 | 488 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 25 | 18200 | 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
1 - filter(JSON_EXISTS2("C2" /*+
LOB_BY_VALUE */ FORMAT OSON ,
'$.products[*].id?(@.number() > 15)' /* json_path_str $.products[*].id?(@.number() >
15) */ FALSE ON ERROR TYPE(LAX) )=1)
3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
,'(ljfycrixvcehvmpmldubibaur)
INPATH (/products/str) and
sdata(FNUM_6AA7829B12A1ECA7231B4272D111B8B4_id > 15 )')>0)
This demonstrates
the real strength of path subsetting: precision. We design the index to match
the workload, not the other way around.
In summary, index
path subsetting gives us fine-grained control over JSON indexing. By explicitly
including only relevant paths and binding them to specific search behaviours
(VALUE or TEXT), we can significantly reduce index size while maintaining
excellent performance for targeted queries. It is an ideal strategy when query
patterns are well understood and we want to optimize storage and efficiency
without sacrificing search capability.
2 for json parameters(' search_on
3 VALUE(number) include ($.products.id)
4 TEXT include ($.products.str) ');
- The path $.products.id should be indexed for numeric value-based searches, including range predicates.
- The path $.products.str should be indexed for full-text search and string equality-style operations.
Internally, if $.products.id evaluates to a value like 154980, that value is stored as a numeric token for range-based navigation. Similarly, if $.products.str evaluates to a string such as ABC, it is indexed as a text token for full-text search operations. The index stores only what we asked it to store — no more, no less.
2 from user_segments
3 where segment_name like '%MY_DEMO_IDX%';
----------
852.06
2 from my_demo
3 where json_exists( c2, '$.products[*].id?(@.number() > 15)' );
----------------------------------------------------------
Plan hash value: 2005405343
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 18200 | 488 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 25 | 18200 | 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
---------------------------------------------------
'$.products[*].id?(@.number() > 15)' /* json_path_str $.products[*].id?(@.number() >
15) */ FALSE ON ERROR TYPE(LAX) )=1)
3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
2 from my_demo
3 where json_textcontains( c2, '$.products.str ','ljfycrixvcehvmpmldubibaur');
----------------------------------------------------------
Plan hash value: 2005405343
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2500 | 1777K| 488 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 2500 | 1777K| 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
---------------------------------------------------
2 from my_demo nested c2.products[*]
3 columns ( x1 varchar2(40) path '$.str')
4 where x1 between 'A' and 'C' ;
----------------------------------------------------------
Plan hash value: 1866419551
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 625M| 424G| 25M (19)| 00:16:31 |
| 1 | RESULT CACHE | 7fwrahdm15gh9fv7shatsu691k | 625M| 424G| 25M (19)| 00:16:31 |
| 2 | NESTED LOOPS | | 625M| 424G| 25M (19)| 00:16:31 |
|* 3 | TABLE ACCESS FULL | MY_DEMO | 50000 | 34M| 738 (40)| 00:00:01 |
|* 4 | JSONTABLE EVALUATION | | 12500 | 25000 | 507 (18)| 00:00:01 |
------------------------------------------------------------------------------------------------------
---------------------------------------------------
'$?(@.products[*].str<="C" && @.products[*].str>="A")' /* json_path_str
$?((@.products[*].str.string() <= "C") && (@.products[*].str.string() >= "A")) */ FALSE ON
ERROR TYPE(LAX) )=1)
4 - filter("P"."X1">='A' AND "P"."X1"<='C')
2 from my_demo
3 where json_textcontains( c2, '$.products.str ','ljfycrixvcehvmpmldubibaur')
4 and json_exists( c2, '$.products[*].id?(@.number() > 15)' );
----------------------------------------------------------
Plan hash value: 2005405343
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 18200 | 488 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 25 | 18200 | 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
---------------------------------------------------
'$.products[*].id?(@.number() > 15)' /* json_path_str $.products[*].id?(@.number() >
15) */ FALSE ON ERROR TYPE(LAX) )=1)
3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
sdata(FNUM_6AA7829B12A1ECA7231B4272D111B8B4_id > 15 )')>0)
No comments:
Post a Comment