Monday, March 25, 2019

Indexes and not equal condition

 
The CBO has to deal with different possible scenario when trying to determine the most optimal execution plan for a sql statement. There are all sorts of shortcuts and assumptions that are coded into the CBO to make its life little easier. However these shortcuts can sometimes be problematic if they are not recognized and handled appropriately.
 
One of these shortcut worth noting is how the CBO handles the queries with NOT IN or NOT Equal conditions.
 
Typically when we have a condition where we just say NOT equal, we are basically suggesting we are interested in the vast majority of possible values with the exception of the value specified in the NOT equal condition, we want most values but not if this is a particular value.
 
For example a condition like this will say
 
WHERE FLAG <> ‘N’
 
Means we want all other possible values of FLAG, just not those with a specific value of N.
 
Oracle will not use an index if generally a relatively “high” percentage of rows are to be selected. It would generally be more efficient and less costly to simply perform a full Table scan if most rows are going to be returned.
 
Therefore the CBO simply ignores the index when costing the NOT Equal conditions.  The refusal of CBO to consider an index with NOT Equal condition can be easily illustrated.
 
demo@PDB1> create table t
  2  as
  3  select a.* ,
  4     decode(rownum,1,'Y','N') flag
  5  from big_table a;
 
Table created.
 
demo@PDB1>
demo@PDB1> select flag,count(*)
  2  from t
  3  group by flag;
 
F   COUNT(*)
- ----------
Y          1
N     999999
 
demo@PDB1> create index t_idx on t(flag);
 
Index created.
 
demo@PDB1> select column_name,histogram
  2  from user_tab_col_statistics
  3  where table_name ='T'
  4  and column_name ='FLAG';
 
COLUMN_NAME     HISTOGRAM
--------------- ---------------
FLAG            NONE
 
So only one row has the value that is FLAG <> ‘N’, which means an index to retrieve this one and only row would be an efficient and appropriate execution path.
 
Let’s now execute this simple statement.
 
demo@PDB1> set autotrace traceonly explain
demo@PDB1> select *
  2  from t
  3  where flag <> 'N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500K|    60M|  5255   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |   500K|    60M|  5255   (1)| 00:00:01 |
--------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("FLAG"<>'N')
 
Without histograms, CBO assumed uniform distribution of rows and estimated that this predicate is about to return half the volume of rows from the table and picked up the Full table scan (FTS).
 
With histograms in place, the optimizer cardinality estimates got improved but still ignored the index.
 
demo@PDB1> exec dbms_stats.gather_table_stats(user,'T',options=>'GATHER AUTO',no_invalidate=>false);
 
PL/SQL procedure successfully completed.
 
demo@PDB1> select column_name,histogram
  2  from user_tab_col_statistics
  3  where table_name ='T'
  4  and column_name ='FLAG';
 
COLUMN_NAME     HISTOGRAM
--------------- ---------------
FLAG            FREQUENCY
 
demo@PDB1> set autotrace traceonly explain
demo@PDB1> select *
  2  from t
  3  where flag <> 'N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
 
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   182 | 23114 |  5255   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |   182 | 23114 |  5255   (1)| 00:00:01 |
--------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter("FLAG"<>'N')
 
demo@PDB1> set autotrace off
 
However forcing the sql with hints, the CBO performed INDEX FULL Scan – means reading all the leaf nodes that contain all unwanted values and still an inefficient option – but not an INDEX RANGE SCAN.
 
demo@PDB1> set autotrace traceonly explain
demo@PDB1> select /*+ index(t,t_idx) */ *
  2  from t
  3  where flag <> 'N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 146203489
 
---------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |   182 | 23114 |  1829   (1)
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |   182 | 23114 |  1829   (1)
|*  2 |   INDEX FULL SCAN                   | T_IDX |   182 |       |  1825   (1)
---------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("FLAG"<>'N')
 
The right option would be rewrite the query in a “positive” sense and the index is now considered and used.
 
demo@PDB1> select *
  2  from t
  3  where flag = 'Y';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 767293772
 
----------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |   182 | 23114 |     7   (0)|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |   182 | 23114 |     7   (0)|
|*  2 |   INDEX RANGE SCAN                  | T_IDX |   182 |       |     3   (0)|
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("FLAG"='Y')
 
 
If there are many distinct values in the column that are not N, but which in total still constitute a relatively small percentage of total rows, then it could be rewritten as follows which can make use of the index in an effective manner, by concatenating two separate range scans.
 
demo@PDB1> select *
  2  from t
  3  where flag < 'N' or flag > 'N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 179818988
 
-------------------------------------------------------------------------------------
| Id  | Operation                             | Name            | Rows  |Cost (%CPU)|
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                 |   273 |   12   (0)|
|   1 |  VIEW                                 | VW_ORE_1B35BA0F |   273 |   12   (0)|
|   2 |   UNION-ALL                           |                 |       |           |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T               |    91 |    5   (0)|
|*  4 |     INDEX RANGE SCAN                  | T_IDX           |    91 |    3   (0)|
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T               |   182 |    7   (0)|
|*  6 |     INDEX RANGE SCAN                  | T_IDX           |   182 |    3   (0)|
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("FLAG"<'N')
   6 - access("FLAG">'N')
       filter(LNNVL("FLAG"<'N'))
 
Perhaps a NOT BETWEEN could do the same.
 
demo@PDB1> select *
  2  from t
  3  where flag NOT BETWEEN 'N' and 'N';
 
Execution Plan
----------------------------------------------------------
Plan hash value: 179818988
 
--------------------------------------------------------------------------------------
| Id  | Operation                             | Name            | Rows  | Cost (%CPU)|
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                 |   273 |    12   (0)|
|   1 |  VIEW                                 | VW_ORE_1B35BA0F |   273 |    12   (0)|
|   2 |   UNION-ALL                           |                 |       |            |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| T               |    91 |     5   (0)|
|*  4 |     INDEX RANGE SCAN                  | T_IDX           |    91 |     3   (0)|
|   5 |    TABLE ACCESS BY INDEX ROWID BATCHED| T               |   182 |     7   (0)|
|*  6 |     INDEX RANGE SCAN                  | T_IDX           |   182 |     3   (0)|
--------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   4 - access("FLAG"<'N')
   6 - access("FLAG">'N')
       filter(LNNVL("FLAG"<'N'))
 
Be cautious when using NOT EQUAL conditions and the impact they have with your indexes.
 

1 comment:

  1. Rajesh, suppose we need to apply the Business validation on Selected Records.
    kindly , describe how we need to select with not equal condition.

    case : Condition column - having 5 distinct value and total records 3000000

    ReplyDelete