THE RAJESH BLOG: Index and NOT IN clause

The Cost Based Optimizer (CBO) is a rather complex piece of code that has to deal with countless different possible scenarios when trying to determine what the most optimal execution plan might be. It’s also a vitally important piece of code because not only do the decisions need to be reasonably accurate so that it doesn’t generate inefficient execution plans but it needs to make these decisions in a reasonably efficient manner else it wastes resources and most importantly wastes time while it performs its calculations.

So there’s a trade-off between ensuring the CBO makes reasonable decisions while ensuring it makes its decisions in a timely and resource efficient manner. Database performance could be directly impacted if these trade-offs are not managed effectively.

Therefore, there are all sorts of short cuts and assumptions that are coded into the CBO to make its life a little easier.

Typically when we have a condition where we just say NOT EQUAL, we’re basically suggesting we’re interested in the vast majority of possible values with the exception of the value specified in the NOT EQUAL condition. We want most values but not if it’s this particular value.

For example, a condition where we state something such as:

WHERE id <> 99

means we want all the other possible values of 99, just not those with the specific value of 99. In other words, we’re typically interested in the vast majority of possible values when we specify a NOT EQUAL condition.

However, we all know that typically, Oracle will not use an index if generally a relatively “high” percentage of rows are to be selected. It would generally be more efficient and less costly to simply perform a Full Table Scan if most rows are going to be returned anyways.

Therefore the CBO simply ignores indexes when costing a NOT EQUAL condition. Why bother going to all the overhead of calculating the cost of using an index to retrieve the vast majority of rows when a Full Table Scan is going to be the cheaper alternative in the vast majority of such cases. So the CBO doesn’t even bother trying and ignores all indexes that could potentially be used to retrieve the rows based on the NOT EQUAL condition.

But what if the data isn’t evenly distributed and the NOT EQUAL condition actually retrieves only a relatively small proportion of the rows. What if most rows actually have the value specified in the NOT EQUAL condition and the remaining rows constitute a relatively small proportion of the remaining rows?

When the CBO ignores indexes, it ignores indexes in all cases. Even if 99.99% of rows match the value in the NOT EQUAL condition and there’s only a handful of remaining rows to actually be retrieved, the code path in the CBO is still followed and indexes are ignored regardless.

rajesh@ORA12C> create table t as

2 select a.*,

3 decode(rownum,1,1,99) x

4 from all_objects a;

Table created.

rajesh@ORA12C>

rajesh@ORA12C> create index t_idx on t(x);

Index created.

rajesh@ORA12C>

rajesh@ORA12C> begin

2 dbms_stats.gather_table_stats

3 (ownname=> user,

4 tabname=> 'T',

5 method_opt=>'for all indexed columns size 5') ;

6 end;

7 /

PL/SQL procedure successfully completed.

rajesh@ORA12C> set autotrace traceonly explain

rajesh@ORA12C> select min(object_id) from t where x <> 99;

Execution Plan

----------------------------------------------------------

Plan hash value: 2966233522

---------------------------------------------------------------------------

---------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 8 | 432 (1)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 8 | | |

|* 2 | TABLE ACCESS FULL| T | 2 | 16 | 432 (1)| 00:00:01 |

---------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("X"<>99)

If we look at relevant parts of the 10053 trace, we note that CBO don’t even cost or consider using Index.

Access path analysis for T

***************************************

SINGLE TABLE ACCESS PATH

Single Table Cardinality Estimation for T[T]

SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE

Column (#19):

NewDensity:0.000006, OldDensity:0.000006 BktCnt:89955.000000, PopBktCnt:89954.000000,

PopValCnt:1, NDV:2

Column (#19): X(NUMBER)

AvgLen: 3 NDV: 2 Nulls: 0 Density: 0.000006 Min: 0.000000 Max: 1.000000

Histogram: Freq kts: 2 UncompBkts: 89955 EndPtVals: 2 ActualVal: yes

Table: T Alias: T

Card: Original: 89955.000000 Rounded: 2 Computed: 1.500000 Non Adjusted: 1.500000

Scan IO Cost (Disk) = 430.000000

Scan CPU Cost (Disk) = 57157410.960000

Total Scan IO Cost = 430.000000 (scan (Disk))

+ 0.000000 (io filter eval) (= 0.000000 (per row) * 89955.000000

(#rows))

= 430.000000

Total Scan CPU Cost = 57157410.960000 (scan (Disk))

+ 4497750.000000 (cpu filter eval) (= 50.000000 (per row) *

89955.000000 (#rows))

= 61655160.960000

Access Path: TableScan

Cost: 431.605050 Resp: 431.605050 Degree: 0

Cost_io: 430.000000 Cost_cpu: 61655161

Resp_io: 430.000000 Resp_cpu: 61655161

Best:: AccessPath: TableScan

Cost: 431.605050 Degree: 1 Resp: 431.605050 Card: 1.500000 Bytes: 0.000000

***************************************

Wednesday, January 6, 2016

Index and NOT IN clause

No comments:

Post a Comment