THE RAJESH BLOG: JSON Search Index - Part I (SEARCH

If you work with JSON data in Oracle, you’ve probably come across the JSON Search Index. It’s one of those features that looks simple on the surface but behaves very differently depending on how it is configured. Understanding these differences can make a huge impact on query performance, especially when working with large JSON datasets.

In this post, I want to walk through how the JSON search index behaves when created with SEARCH_ON = TEXT, using practical examples and execution plan observations. This is not just about syntax or documentation; the goal is to understand what Oracle actually does internally when queries run.

The table and data setup used here are the same as in the previous blog of this series.

To begin with, let’s create a JSON search index configured for text search:

demo@ADB26AI> create search index my_demo_idx
2 on my_demo(c2)
3 for json parameters(' search_on TEXT ');

Index created.

Setting SEARCH_ON as NONE Does not enable any indexing features, which indicates that the tables used for full-text and range searches are not populated. Only the index data guide is maintained. The index will not be used by any JSON query operators, including JSON_TEXTCONTAINS.

When we set SEARCH_ON = TEXT, Oracle indexes only the textual content inside JSON documents. This configuration is primarily designed to accelerate full-text search queries and text-based filtering operations. It works very well for searching words, phrases, or text values inside JSON structures, but it is important to understand its boundaries before using it in production systems.

A common misconception is that the search index behaves like a general-purpose index for all JSON queries. Its behaviour is very specific to the type of data being indexed and the type of query being executed.

Let’s start with a straightforward full-text search example using JSON_TEXTCONTAINS.

demo@ADB26AI> select *
2 from my_demo
3 where json_textcontains( c2, '$.products.str','ljfycrixvcehvmpmldubibaur') ;

Execution Plan
----------------------------------------------------------
Plan hash value: 2005405343

----------------------------------------------------------------------------------------------
| Id | Operation                      | Name        | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |             | 2500 | 1777K|   488   (0)| 00:00:01 |
|   1 | TABLE ACCESS BY INDEX ROWID   | MY_DEMO     | 2500 | 1777K|   488   (0)| 00:00:01 |
|   2 |   SORT CLUSTER BY ROWID BATCHED|             |       |       |     4   (0)| 00:00:01 |
|* 3 |    DOMAIN INDEX                | MY_DEMO_IDX |       |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
              ,'(ljfycrixvcehvmpmldubibaur) INPATH (/products/str)')>0)

When we examine the execution plan for this query, we can see that Oracle uses the search index through a domain index access. In the predicate section of the plan, Oracle shows a CONTAINS operation where both the search value and the JSON path expression are used to navigate the index structure.

Internally, Oracle looks into the search index and directly navigates to the specified JSON path. It then searches for the given text value within that path. If a match is found, the index returns the corresponding rowids, which Oracle then uses to fetch the rows from the table. If no match is found, those rows are never accessed. This makes the operation extremely efficient and avoids unnecessary table scans.

The same efficiency can be observed when we rewrite the query using JSON_EXISTS.

demo@ADB26AI> select *
2 from my_demo
3 where json_exists( c2 ,'$.products[*]?(@.str == "ljfycrixvcehvmpmldubibaur")');

Execution Plan
----------------------------------------------------------
Plan hash value: 2005405343

----------------------------------------------------------------------------------------------
| Id | Operation                      | Name        | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |             |    25 | 18200 |   488   (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID   | MY_DEMO     |    25 | 18200 |   488   (0)| 00:00:01 |
|   2 |   SORT CLUSTER BY ROWID BATCHED|             |       |       |     4   (0)| 00:00:01 |
|* 3 |    DOMAIN INDEX                | MY_DEMO_IDX |       |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(JSON_EXISTS2("C2" /*+ LOB_BY_VALUE */ FORMAT OSON ,
              '$.products[*]?(@.str == "ljfycrixvcehvmpmldubibaur")' /* json_path_str
              $.products[*]?(@.str.string() == "ljfycrixvcehvmpmldubibaur") */ FALSE ON ERROR
              TYPE(LAX) )=1)
   3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
              ,'({ljfycrixvcehvmpmldubibaur} INPATH (/products/str))')>0)

Even though the query is written differently, the execution plan still shows that the search index is being used effectively. The predicate information reveals that Oracle again uses both the JSON path and the search value to navigate the index. This confirms that the TEXT search index is not limited to JSON_TEXTCONTAINS; it can also support other JSON operators when the filtering is based on textual values.

The behaviour remains consistent when we use JSON_TABLE to project JSON values into relational columns and apply filters on them.

demo@ADB26AI> select *
2 from my_demo, json_table( c2 ,'$.products[*]'
3      columns ( x1 varchar2(40) path '$.str') ) jt
4 where jt.x1 = 'ljfycrixvcehvmpmldubibaur' ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3720428357

-----------------------------------------------------------------------------------------------
| Id | Operation                       | Name        | Rows | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |             |   125M|    84G| 1261K (18)| 00:00:50 |
|   1 | NESTED LOOPS                   |             |   125M|    84G| 1261K (18)| 00:00:50 |
|   2 |   TABLE ACCESS BY INDEX ROWID   | MY_DEMO     | 2500 | 1777K|   488   (0)| 00:00:01 |
|   3 |    SORT CLUSTER BY ROWID BATCHED|             |       |       |     4   (0)| 00:00:01 |
|* 4 |     DOMAIN INDEX                | MY_DEMO_IDX |       |       |     4   (0)| 00:00:01 |
|* 5 |   JSONTABLE EVALUATION          |             | 50000 |    97K|   505 (18)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
              ,'({ljfycrixvcehvmpmldubibaur} INPATH (/products/str))')>0)
   5 - filter("P"."X1"='ljfycrixvcehvmpmldubibaur')

Oracle first uses the search index to identify matching rows and then evaluates the JSON_TABLE operation on those rows. This ensures that only relevant rows are expanded and processed further.

Even when using the SQL nested clause introduced in 19c, the execution plan and behavior remain the same.

demo@ADB26AI> select *
2 from my_demo nested c2.products[*]
3      columns ( x1 varchar2(40) path '$.str')
4 where x1 = 'ljfycrixvcehvmpmldubibaur' ;

Execution Plan
----------------------------------------------------------
Plan hash value: 3720428357

-----------------------------------------------------------------------------------------------
| Id | Operation                       | Name        | Rows | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |             |   125M|    84G| 1261K (18)| 00:00:50 |
|   1 | NESTED LOOPS                   |             |   125M|    84G| 1261K (18)| 00:00:50 |
|   2 |   TABLE ACCESS BY INDEX ROWID   | MY_DEMO     | 2500 | 1777K|   488   (0)| 00:00:01 |
|   3 |    SORT CLUSTER BY ROWID BATCHED|             |       |       |     4   (0)| 00:00:01 |
|* 4 |     DOMAIN INDEX                | MY_DEMO_IDX |      |       |     4   (0)| 00:00:01 |
|* 5 |   JSONTABLE EVALUATION          |             | 50000 |    97K|   505 (18)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
              ,'({ljfycrixvcehvmpmldubibaur} INPATH (/products/str))')>0)
   5 - filter("P"."X1"='ljfycrixvcehvmpmldubibaur')

So far, everything looks ideal. For text equality searches, the TEXT search index performs exactly as expected and provides significant performance benefits.

However, things start to change when we move beyond simple text equality conditions. Consider a query that performs a range search on a textual field.

demo@ADB26AI> select *
2 from my_demo nested c2.products[*]
3      columns ( x1 varchar2(40) path '$.str')
4 where x1 between 'a' and 'b';

Execution Plan
----------------------------------------------------------
Plan hash value: 3720428357

-----------------------------------------------------------------------------------------------
| Id | Operation                       | Name        | Rows | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |             |    31M|    21G| 1268K (19)| 00:00:50 |
|   1 | NESTED LOOPS                   |             |    31M|    21G| 1268K (19)| 00:00:50 |
|   2 |   TABLE ACCESS BY INDEX ROWID   | MY_DEMO     | 2500 | 1777K|   488   (0)| 00:00:01 |
|   3 |    SORT CLUSTER BY ROWID BATCHED|             |       |       |     4   (0)| 00:00:01 |
|* 4 |     DOMAIN INDEX                | MY_DEMO_IDX |       |       |     4   (0)| 00:00:01 |
|* 5 |   JSONTABLE EVALUATION          |             | 12500 | 25000 |   507 (18)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
              ,'((HASPATH(/products/str)) and (HASPATH(/products/str)))')>0)
   5 - filter("P"."X1">='a' AND "P"."X1"<='b')

At first glance, the execution plan still shows the search index being used. But the predicate information tells a different story. Instead of using the range values to navigate the index, Oracle only uses the path expression to locate rows containing that JSON path. The actual range filtering is applied later, after the rows have been fetched from the table.

In practical terms, this means Oracle retrieves all rows containing that path, then filters them based on the range condition. If the table contains a large number of rows with that path, a significant number of rowids may be returned from the index even though only a small subset satisfies the range condition. This can lead to unnecessary table access and reduced performance. The index is technically being used, but not in the most efficient way for range-based filtering.

Another important limitation of SEARCH_ON = TEXT is that it does not index numeric values. Since the index is designed for textual content, attempting to use JSON_TEXTCONTAINS on numeric fields will result in a datatype mismatch error.

demo@ADB26AI> select *
2 from my_demo
3 where json_textcontains( c2, '$.products.id',9) ;
where json_textcontains( c2, '$.products.id',9)
                                             *
ERROR at line 3:
ORA-00932: expression is of data type NUMBER, which is incompatible with expected data type CHAR
Help: https://docs.oracle.com/error-help/db/ora-00932/

If we attempt to filter numeric values using JSON_EXISTS, Oracle still shows the search index in the execution plan. However, a closer look at the predicate information reveals that only the path expression is used for index navigation, not the numeric value itself. Oracle uses the index to identify rows containing the path and then applies the numeric comparison after fetching the rows from the table.

demo@ADB26AI> select *
2 from my_demo
3 where json_exists( c2 ,'$.products[*]?(@.id.number() == 9)');

Execution Plan
----------------------------------------------------------
Plan hash value: 2005405343

----------------------------------------------------------------------------------------------
| Id | Operation                      | Name        | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |             |    25 | 18200 |   488   (0)| 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID   | MY_DEMO     |    25 | 18200 |   488   (0)| 00:00:01 |
|   2 |   SORT CLUSTER BY ROWID BATCHED|             |       |       |     4   (0)| 00:00:01 |
|* 3 |    DOMAIN INDEX                | MY_DEMO_IDX |       |       |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(JSON_EXISTS2("C2" /*+ LOB_BY_VALUE */ FORMAT OSON ,
              '$.products[*]?(@.id.number() == 9)' /* json_path_str $.products[*]?(@.id.number()
              == 9) */ FALSE ON ERROR TYPE(LAX) )=1)
   3 - access("CTXSYS"."CONTAINS"("MY_DEMO"."C2" /*+ LOB_BY_VALUE */
              ,'(HASPATH(/products/id))')>0)

This means that if a large number of rows contain that numeric path but only a few match the specific value being searched, Oracle will still retrieve many rowids from the index and then discard most of them during table filtering. As data volume grows, this can become increasingly inefficient.

The key takeaway here is that SEARCH_ON = TEXT should be viewed primarily as a full-text search accelerator. It is extremely effective for searching textual values inside JSON documents and for supporting ad-hoc text-based queries. However, it is not designed to efficiently handle numeric filtering or range-based conditions. In such cases, the index may still appear in the execution plan, but it will not provide the same level of performance benefit.

Understanding these nuances can help avoid confusion when analyzing execution plans. Seeing the search index in a plan does not always mean the query is fully optimized; it is important to examine the predicate information to understand how the index is actually being used.

In the next post of this series, we will explore other search index flavors, We will also look at when to choose each option and how they behave under different query patterns.

Saturday, February 14, 2026

JSON Search Index - Part I (SEARCH_ON = TEXT)

No comments:

Post a Comment