The CBO has to deal with different possible scenario when trying to
determine the most optimal execution plan for a sql statement. There are all
sorts of shortcuts and assumptions that are coded into the CBO to make its life
little easier. However these shortcuts can sometimes be problematic if they are
not recognized and handled appropriately.
One of these shortcut worth noting is how the CBO handles the queries
with NOT IN or NOT Equal conditions.
Typically when we have a condition where we just say NOT equal, we are
basically suggesting we are interested in the vast majority of possible values
with the exception of the value specified in the NOT equal condition, we want
most values but not if this is a particular value.
For example a condition like this will say
WHERE FLAG <> ‘N’
Means we want all other possible values of FLAG, just not those with a
specific value of N.
Oracle will not use an index if generally a relatively “high”
percentage of rows are to be selected. It would generally be more efficient and
less costly to simply perform a full Table scan if most rows are going to be
returned.
Therefore the CBO simply ignores the index when costing the NOT Equal
conditions. The refusal of CBO to
consider an index with NOT Equal condition can be easily illustrated.
demo@PDB1> create table t
2 as
3 select a.* ,
4 decode(rownum,1,'Y','N') flag
5 from big_table a;
Table created.
demo@PDB1>
demo@PDB1> select flag,count(*)
2 from t
3 group by flag;
F COUNT(*)
- ----------
Y 1
N 999999
demo@PDB1> create index t_idx on t(flag);
Index created.
demo@PDB1> select column_name,histogram
2 from user_tab_col_statistics
3 where table_name ='T'
4 and column_name ='FLAG';
COLUMN_NAME HISTOGRAM
--------------- ---------------
FLAG NONE
So only one row has the value that is FLAG <> ‘N’, which means an
index to retrieve this one and only row would be an efficient and appropriate
execution path.
Let’s now execute this simple statement.
demo@PDB1> set autotrace traceonly explain
demo@PDB1> select *
2 from t
3 where flag <> 'N';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
500K| 60M| 5255
(1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T | 500K|
60M| 5255 (1)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 -
filter("FLAG"<>'N')
Without histograms, CBO assumed uniform distribution of rows and estimated
that this predicate is about to return half the volume of rows from the table and
picked up the Full table scan (FTS).
With histograms in place, the optimizer cardinality estimates got
improved but still ignored the index.
demo@PDB1> exec
dbms_stats.gather_table_stats(user,'T',options=>'GATHER
AUTO',no_invalidate=>false);
PL/SQL procedure successfully completed.
demo@PDB1> select column_name,histogram
2 from user_tab_col_statistics
3 where table_name ='T'
4 and column_name ='FLAG';
COLUMN_NAME HISTOGRAM
--------------- ---------------
FLAG FREQUENCY
demo@PDB1> set autotrace traceonly explain
demo@PDB1> select *
2 from t
3 where flag <> 'N';
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
182 | 23114 | 5255
(1)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T |
182 | 23114 | 5255 (1)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 -
filter("FLAG"<>'N')
demo@PDB1> set autotrace off
However forcing the sql with hints, the CBO performed INDEX FULL Scan –
means reading all the leaf nodes that contain all unwanted values and still an
inefficient option – but not an INDEX RANGE SCAN.
demo@PDB1> set autotrace traceonly explain
demo@PDB1> select /*+ index(t,t_idx) */ *
2 from t
3 where flag <> 'N';
Execution Plan
----------------------------------------------------------
Plan hash value: 146203489
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)
---------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
182 | 23114 | 1829 (1)
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T |
182 | 23114 | 1829 (1)
|* 2 | INDEX FULL SCAN | T_IDX | 182 |
| 1825 (1)
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -
filter("FLAG"<>'N')
The right option would be rewrite the query in a “positive” sense and
the index is now considered and used.
demo@PDB1> select *
2 from t
3 where flag = 'Y';
Execution Plan
----------------------------------------------------------
Plan hash value: 767293772
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)|
----------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
182 | 23114 | 7 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T |
182 | 23114 | 7 (0)|
|* 2 | INDEX RANGE
SCAN | T_IDX
| 182 | |
3 (0)|
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -
access("FLAG"='Y')
If there are many distinct values in the column that are not N, but
which in total still constitute a relatively small percentage of total rows,
then it could be rewritten as follows which can make use of the index in an
effective manner, by concatenating two separate range scans.
demo@PDB1> select *
2 from t
3 where flag < 'N' or flag > 'N';
Execution Plan
----------------------------------------------------------
Plan hash value: 179818988
-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows |Cost (%CPU)|
-------------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | | 273 |
12 (0)|
| 1 | VIEW |
VW_ORE_1B35BA0F | 273 | 12
(0)|
| 2 | UNION-ALL | | | |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 91 |
5 (0)|
|* 4 | INDEX RANGE
SCAN | T_IDX |
91 | 3 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 182 |
7 (0)|
|* 6 | INDEX RANGE
SCAN | T_IDX |
182 | 3 (0)|
-------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 -
access("FLAG"<'N')
6 -
access("FLAG">'N')
filter(LNNVL("FLAG"<'N'))
Perhaps a NOT BETWEEN could do the same.
demo@PDB1> select *
2 from t
3 where flag NOT BETWEEN 'N' and 'N';
Execution Plan
----------------------------------------------------------
Plan hash value: 179818988
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|
--------------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | | 273 |
12 (0)|
| 1 | VIEW |
VW_ORE_1B35BA0F | 273 | 12
(0)|
| 2 | UNION-ALL | | |
|
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 91 |
5 (0)|
|* 4 | INDEX RANGE
SCAN | T_IDX |
91 | 3 (0)|
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 182 |
7 (0)|
|* 6 | INDEX RANGE
SCAN | T_IDX |
182 | 3 (0)|
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 -
access("FLAG"<'N')
6 -
access("FLAG">'N')
filter(LNNVL("FLAG"<'N'))
Be cautious when using NOT EQUAL conditions and the impact they have
with your indexes.
Rajesh, suppose we need to apply the Business validation on Selected Records.
ReplyDeletekindly , describe how we need to select with not equal condition.
case : Condition column - having 5 distinct value and total records 3000000