If you work with JSON data in
Oracle, you’ve probably come across the JSON Search Index. It’s one of those
features that looks simple on the surface but behaves very differently
depending on how it is configured. Understanding these differences can make a huge
impact on query performance, especially when working with large JSON datasets.
In
this post, I want to walk through how the JSON search index behaves when
created with SEARCH_ON = TEXT, using practical examples and
execution plan observations. This is not just about syntax or documentation;
the goal is to understand what Oracle actually does internally when queries
run.
The table and data setup used here
are the same as in the previous
blog of this series.
To begin with, let’s create a JSON
search index configured for text search:
demo@ADB26AI> create
search index my_demo_idx
2 on my_demo(c2)
3 for json parameters(' search_on TEXT ');
Index created.
Setting SEARCH_ON as NONE Does not enable any
indexing features, which indicates that the tables used for full-text and range
searches are not populated. Only the index data guide is maintained. The index
will not be used by any JSON query operators, including JSON_TEXTCONTAINS.
When we set SEARCH_ON
= TEXT, Oracle indexes only the textual
content inside JSON documents. This configuration is primarily designed to
accelerate full-text search queries and text-based filtering operations. It
works very well for searching words, phrases, or text values inside JSON
structures, but it is important to understand its boundaries before using it in
production systems.
A
common misconception is that the search index behaves like a general-purpose
index for all JSON queries. Its behaviour is very specific to the type of data
being indexed and the type of query being executed.
Let’s
start with a straightforward full-text search example using JSON_TEXTCONTAINS.
demo@ADB26AI> select *
2 from my_demo
3 where json_textcontains( c2, '$.products.str','ljfycrixvcehvmpmldubibaur') ;
Execution Plan
----------------------------------------------------------
Plan hash value: 2005405343
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2500 | 1777K| 488 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 2500 | 1777K| 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
3 -
access("CTXSYS"."CONTAINS"("MY_DEMO"."C2"
/*+ LOB_BY_VALUE */
,'(ljfycrixvcehvmpmldubibaur)
INPATH (/products/str)')>0)
When we examine the execution plan
for this query, we can see that Oracle uses the search index through a domain
index access. In the predicate section of the plan, Oracle shows a CONTAINS operation
where both the search value and the JSON path expression are used to navigate
the index structure.
Internally,
Oracle looks into the search index and directly navigates to the specified JSON
path. It then searches for the given text value within that path. If a match is
found, the index returns the corresponding rowids, which Oracle then uses to
fetch the rows from the table. If no match is found, those rows are never
accessed. This makes the operation extremely efficient and avoids unnecessary
table scans.
The
same efficiency can be observed when we rewrite the query using JSON_EXISTS.
demo@ADB26AI> select *
2 from my_demo
3 where json_exists( c2 ,'$.products[*]?(@.str == "ljfycrixvcehvmpmldubibaur")');
Execution Plan
----------------------------------------------------------
Plan hash value: 2005405343
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 18200 | 488 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 25 | 18200 | 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
1 - filter(JSON_EXISTS2("C2" /*+
LOB_BY_VALUE */ FORMAT OSON ,
'$.products[*]?(@.str == "ljfycrixvcehvmpmldubibaur")' /* json_path_str
$.products[*]?(@.str.string() == "ljfycrixvcehvmpmldubibaur") */ FALSE ON ERROR
TYPE(LAX) )=1)
3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
,'({ljfycrixvcehvmpmldubibaur}
INPATH (/products/str))')>0)
Even though the query is
written differently, the execution plan still shows that the search index is
being used effectively. The predicate information reveals that Oracle again
uses both the JSON path and the search value to navigate the index. This
confirms that the TEXT search index is not limited to JSON_TEXTCONTAINS; it can also support other
JSON operators when the filtering is based on textual values.
The behaviour remains
consistent when we use JSON_TABLE to project JSON values into
relational columns and apply filters on them.
demo@ADB26AI> select *
2 from my_demo, json_table( c2 ,'$.products[*]'
3 columns ( x1 varchar2(40) path '$.str') ) jt
4 where jt.x1 = 'ljfycrixvcehvmpmldubibaur' ;
Execution Plan
----------------------------------------------------------
Plan hash value: 3720428357
-----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 125M| 84G| 1261K (18)| 00:00:50 |
| 1 | NESTED LOOPS | | 125M| 84G| 1261K (18)| 00:00:50 |
| 2 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 2500 | 1777K| 488 (0)| 00:00:01 |
| 3 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 4 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
|* 5 |
JSONTABLE EVALUATION
| | 50000 | 97K|
505 (18)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
4 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2"
/*+ LOB_BY_VALUE */
,'({ljfycrixvcehvmpmldubibaur}
INPATH (/products/str))')>0)
5 - filter("P"."X1"='ljfycrixvcehvmpmldubibaur')
Oracle first uses the search
index to identify matching rows and then evaluates the JSON_TABLE operation on those rows.
This ensures that only relevant rows are expanded and processed further.
2 on my_demo(c2)
3 for json parameters(' search_on TEXT ');
2 from my_demo
3 where json_textcontains( c2, '$.products.str','ljfycrixvcehvmpmldubibaur') ;
----------------------------------------------------------
Plan hash value: 2005405343
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2500 | 1777K| 488 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 2500 | 1777K| 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
---------------------------------------------------
2 from my_demo
3 where json_exists( c2 ,'$.products[*]?(@.str == "ljfycrixvcehvmpmldubibaur")');
----------------------------------------------------------
Plan hash value: 2005405343
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 18200 | 488 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 25 | 18200 | 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
---------------------------------------------------
'$.products[*]?(@.str == "ljfycrixvcehvmpmldubibaur")' /* json_path_str
$.products[*]?(@.str.string() == "ljfycrixvcehvmpmldubibaur") */ FALSE ON ERROR
TYPE(LAX) )=1)
3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
2 from my_demo, json_table( c2 ,'$.products[*]'
3 columns ( x1 varchar2(40) path '$.str') ) jt
4 where jt.x1 = 'ljfycrixvcehvmpmldubibaur' ;
----------------------------------------------------------
Plan hash value: 3720428357
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 125M| 84G| 1261K (18)| 00:00:50 |
| 1 | NESTED LOOPS | | 125M| 84G| 1261K (18)| 00:00:50 |
| 2 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 2500 | 1777K| 488 (0)| 00:00:01 |
| 3 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 4 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
---------------------------------------------------
5 - filter("P"."X1"='ljfycrixvcehvmpmldubibaur')
2 from my_demo nested c2.products[*]
3 columns ( x1 varchar2(40) path '$.str')
4 where x1 = 'ljfycrixvcehvmpmldubibaur' ;
----------------------------------------------------------
Plan hash value: 3720428357
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 125M| 84G| 1261K (18)| 00:00:50 |
| 1 | NESTED LOOPS | | 125M| 84G| 1261K (18)| 00:00:50 |
| 2 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 2500 | 1777K| 488 (0)| 00:00:01 |
| 3 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 4 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
---------------------------------------------------
5 - filter("P"."X1"='ljfycrixvcehvmpmldubibaur')
2 from my_demo nested c2.products[*]
3 columns ( x1 varchar2(40) path '$.str')
4 where x1 between 'a' and 'b';
----------------------------------------------------------
Plan hash value: 3720428357
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 31M| 21G| 1268K (19)| 00:00:50 |
| 1 | NESTED LOOPS | | 31M| 21G| 1268K (19)| 00:00:50 |
| 2 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 2500 | 1777K| 488 (0)| 00:00:01 |
| 3 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 4 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
---------------------------------------------------
5 - filter("P"."X1">='a' AND "P"."X1"<='b')
At first glance, the execution plan still shows the search index
being used. But the predicate information tells a different story. Instead of
using the range values to navigate the index, Oracle only uses the path
expression to locate rows containing that JSON path. The actual range filtering
is applied later, after the rows have been fetched from the table.
In practical
terms, this means Oracle retrieves all rows containing that path, then filters
them based on the range condition. If the table contains a large number of rows
with that path, a significant number of rowids may be returned from the index
even though only a small subset satisfies the range condition. This can lead to
unnecessary table access and reduced performance. The index is technically
being used, but not in the most efficient way for range-based filtering.
Another important limitation of
demo@ADB26AI>
select *
2 from my_demo
3 where json_textcontains( c2, '$.products.id',9) ;
where json_textcontains( c2, '$.products.id',9)
*
ERROR at line 3:
ORA-00932: expression is of data type NUMBER, which is incompatible with expected data type CHAR
Help: https://docs.oracle.com/error-help/db/ora-00932/
If we attempt to filter numeric values using JSON_EXISTS, Oracle still shows the search index in the
execution plan. However, a closer look at the predicate information reveals
that only the path expression is used for index navigation, not the numeric value
itself. Oracle uses the index to identify rows containing the path and then
applies the numeric comparison after fetching the rows from the table.
demo@ADB26AI> select *
2 from my_demo
3 where json_exists( c2 ,'$.products[*]?(@.id.number() == 9)');
Execution Plan
----------------------------------------------------------
Plan hash value: 2005405343
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 18200 | 488 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 25 | 18200 | 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
1 - filter(JSON_EXISTS2("C2" /*+
LOB_BY_VALUE */ FORMAT OSON ,
'$.products[*]?(@.id.number() == 9)' /* json_path_str $.products[*]?(@.id.number()
== 9) */ FALSE ON ERROR TYPE(LAX) )=1)
3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
,'(HASPATH(/products/id))')>0)
This means that if a large number of rows contain that numeric
path but only a few match the specific value being searched, Oracle will still
retrieve many rowids from the index and then discard most of them during table
filtering. As data volume grows, this can become increasingly inefficient.
The key
takeaway here is that
Understanding
these nuances can help avoid confusion when analyzing execution plans. Seeing
the search index in a plan does not always mean the query is fully optimized;
it is important to examine the predicate information to understand how the
index is actually being used.
In the next post of this series, we will explore other search
index flavors, We will also look at when to choose each option and how they
behave under different query patterns.
SEARCH_ON = TEXT is
that it does not index numeric values. Since the index is designed for textual
content, attempting to use JSON_TEXTCONTAINS on numeric
fields will result in a datatype mismatch error.2 from my_demo
3 where json_textcontains( c2, '$.products.id',9) ;
where json_textcontains( c2, '$.products.id',9)
*
ERROR at line 3:
ORA-00932: expression is of data type NUMBER, which is incompatible with expected data type CHAR
Help: https://docs.oracle.com/error-help/db/ora-00932/
2 from my_demo
3 where json_exists( c2 ,'$.products[*]?(@.id.number() == 9)');
----------------------------------------------------------
Plan hash value: 2005405343
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25 | 18200 | 488 (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | MY_DEMO | 25 | 18200 | 488 (0)| 00:00:01 |
| 2 | SORT CLUSTER BY ROWID BATCHED| | | | 4 (0)| 00:00:01 |
|* 3 | DOMAIN INDEX | MY_DEMO_IDX | | | 4 (0)| 00:00:01 |
---------------------------------------------------
'$.products[*]?(@.id.number() == 9)' /* json_path_str $.products[*]?(@.id.number()
== 9) */ FALSE ON ERROR TYPE(LAX) )=1)
3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
SEARCH_ON = TEXT should be viewed
primarily as a full-text search accelerator. It is extremely effective for
searching textual values inside JSON documents and for supporting ad-hoc
text-based queries. However, it is not designed to efficiently handle numeric
filtering or range-based conditions. In such cases, the index may still appear
in the execution plan, but it will not provide the same level of performance
benefit.