Tuesday, August 12, 2014

Linguistic Indexes

Characters are by default stored based on default encoding scheme. When it comes to “case-insensitive” searches, where data may be stored in many different cases and we want to return all data that matches a character value, regardless of its case.

The issue of course is that by default all searches are case-sensitive.

The standard fix is for the application to convert the data to a consistent case when performing the search. For example a search WHERE UPPER(Name) = ‘BELL’ will return all values of “BELL” regardless of their case but this will negate the use of any standard index on the Name column.

Therefore, a Function-Based index is required, say based on UPPER(Name), to ensure an efficient index access is possible for case insensitive searches.

However, this often requires an additional index to be created and for the application to be explicitly written to make use of the function-based index defined function.

Another possible solution is the use of a Linguistic Index. This is an index that is created based on a specific case insensitive linguistic language

Basic steps are

rajesh@ORA10G> create table t1 as
  2  select a.*, object_name as x
  3  from all_objects a;

Table created.

rajesh@ORA10G>
rajesh@ORA10G> insert into t1
  2  select a.*,
  3     decode(rownum,1,'Hello',2,'HELLO',3,'hello') x
  4  from all_objects a
  5  where rownum <=3;

3 rows created.

rajesh@ORA10G> commit;

Commit complete.

rajesh@ORA10G>
rajesh@ORA10G> create index t1_ind on t1( nlssort(x,'nls_sort=binary_ci'));

Index created.

rajesh@ORA10G> begin
  2     dbms_stats.gather_table_stats
  3     ( user,'T1',
  4      cascade=>true,
  5    method_opt=>'for all indexed columns size 254');
  6  end;
  7  /

PL/SQL procedure successfully completed.

rajesh@ORA10G>
rajesh@ORA10G>
rajesh@ORA10G> variable x varchar2(10);
rajesh@ORA10G> exec :x :='hello';

PL/SQL procedure successfully completed.

rajesh@ORA10G>
rajesh@ORA10G> set autotrace on explain statistics
rajesh@ORA10G> select x from t1 r1 where x = :x;

X
------------------------------
hello

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   584 |  9928 |   228   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T1   |   584 |  9928 |   228   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"=:X)


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       1011  consistent gets
          0  physical reads
          0  redo size
        331  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

rajesh@ORA10G> alter session set nls_sort=binary_ci;

Session altered.

rajesh@ORA10G> alter session set nls_comp=linguistic;

Session altered.

rajesh@ORA10G> select x from t1 r1 where x = :x;

X
------------------------------
Hello
HELLO
hello

3 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2059591622

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |    12 |   204 |     7   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |    12 |   204 |     7   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_IND |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(NLSSORT("X",'nls_sort=''BINARY_CI''')=NLSSORT(:X,'nls_sort=''BIN
              ARY_CI'''))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        373  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed

rajesh@ORA10G> set autotrace off
rajesh@ORA10G>

Now let’s create a new table with its own set of B*Tree index.


rajesh@ORA10G> connect /@ora10g
Connected.
rajesh@ORA10G> set timing off
rajesh@ORA10G> create table t2 as select * from all_objects;

Table created.

rajesh@ORA10G> create index t2_ind on t2(object_name);

Index created.

rajesh@ORA10G> begin
  2     dbms_stats.gather_table_stats
  3     ( user,'T2',
  4      cascade=>true,
  5    method_opt=>'for all indexed columns size 254');
  6  end;
  7  /

PL/SQL procedure successfully completed.

rajesh@ORA10G> variable x2 varchar2(10);
rajesh@ORA10G> exec :x2 :='a%';

PL/SQL procedure successfully completed.

rajesh@ORA10G>
rajesh@ORA10G> set autotrace traceonly explain statistics
rajesh@ORA10G> select object_name from t2 where object_name like :x2;

2 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2876512201

---------------------------------------------------------------------------
| Id  | Operation        | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |  2920 | 70080 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| T2_IND |  2920 | 70080 |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME" LIKE :X2)
       filter("OBJECT_NAME" LIKE :X2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        398  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

rajesh@ORA10G>

So we did a Range scan and returned 2 rows. This looks fine. See what happens when nls_comp set to linguistic, Index Range scan changed to Index Fast Full Scan and returned nearly 1500+ rows.

rajesh@ORA10G> alter session set nls_sort=binary_ci;

Session altered.

rajesh@ORA10G> alter session set nls_comp=linguistic;

Session altered.

rajesh@ORA10G>
rajesh@ORA10G> select object_name from t2 x1 where object_name like :x2;

1559 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1290188030

-------------------------------------------------------------------------------
| Id  | Operation            | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |  2920 | 70080 |    64   (2)| 00:00:01 |
|*  1 |  INDEX FAST FULL SCAN| T2_IND |  2920 | 70080 |    64   (2)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_NAME" LIKE :X2)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        297  consistent gets
          0  physical reads
          0  redo size
      30570  bytes sent via SQL*Net to client
        456  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1559  rows processed

The next issue is Linguistic issues are ignored for some types of predicates like min, max operations.

rajesh@ORA10G> connect /@ora10g
Connected.
rajesh@ORA10G> set timing off
rajesh@ORA10G> set autotrace traceonly explain statistics
rajesh@ORA10G> select max(x) from t1;

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    17 |   227   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE    |      |     1 |    17 |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 58395 |   969K|   227   (1)| 00:00:03 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       1011  consistent gets
          0  physical reads
          0  redo size
        356  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

rajesh@ORA10G>

With regular B*Tree index we see this.

rajesh@ORA10G> select max(object_name) from t2;

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2365471795

-------------------------------------------------------------------------------------
| Id  | Operation                  | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |        |     1 |    24 |     2   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE            |        |     1 |    24 |            |          |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T2_IND | 58394 |  1368K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        366  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

rajesh@ORA10G>

Even when nls_sort and nls_comp parameters set we don’t see changes in plan.

rajesh@ORA10G> alter session set nls_sort=binary_ci;

Session altered.

rajesh@ORA10G> alter session set nls_comp=linguistic;

Session altered.

rajesh@ORA10G> select max(x) from t1 x1;

1 row selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    17 |   227   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE    |      |     1 |    17 |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 58395 |   969K|   227   (1)| 00:00:03 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       1011  consistent gets
          0  physical reads
          0  redo size
        334  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


rajesh@ORA10G>

No comments:

Post a Comment