Wednesday, January 6, 2016

Index and NOT IN clause

The Cost Based Optimizer (CBO) is a rather complex piece of code that has to deal with countless different possible scenarios when trying to determine what the most optimal execution plan might be. It’s also a vitally important piece of code because not only do the decisions need to be reasonably accurate so that it doesn’t generate inefficient execution plans but it needs to make these decisions in a reasonably efficient manner else it wastes resources and most importantly wastes time while it performs its calculations.

So there’s a trade-off between ensuring the CBO makes reasonable decisions while ensuring it makes its decisions in a timely and resource efficient manner. Database performance could be directly impacted if these trade-offs are not managed effectively.

Therefore, there are all sorts of short cuts and assumptions that are coded into the CBO to make its life a little easier.

Typically when we have a condition where we just say NOT EQUAL, we’re basically suggesting we’re interested in the vast majority of possible values with the exception of the value specified in the NOT EQUAL condition. We want most values but not if it’s this particular value.

For example, a condition where we state something such as:

WHERE id <> 99

means we want all the other possible values of 99, just not those with the specific value of 99. In other words, we’re typically interested in the vast majority of possible values when we specify a NOT EQUAL condition.

However, we all know that typically, Oracle will not use an index if generally a relatively “high” percentage of rows are to be selected. It would generally be more efficient and less costly to simply perform a Full Table Scan if most rows are going to be returned anyways.

Therefore the CBO simply ignores indexes when costing a NOT EQUAL condition. Why bother going to all the overhead of calculating the cost of using an index to retrieve the vast majority of rows when a Full Table Scan is going to be the cheaper alternative in the vast majority of such cases. So the CBO doesn’t even bother trying and ignores all indexes that could potentially be used to retrieve the rows based on the NOT EQUAL condition.

But what if the data isn’t evenly distributed and the NOT EQUAL condition actually retrieves only a relatively small proportion of the rows. What if most rows actually have the value specified in the NOT EQUAL condition and the remaining rows constitute a relatively small proportion of the remaining rows?

When the CBO ignores indexes, it ignores indexes in all cases. Even if 99.99% of rows match the value in the NOT EQUAL condition and there’s only a handful of remaining rows to actually be retrieved, the code path in the CBO is still followed and indexes are ignored regardless.

rajesh@ORA12C> create table t as
  2  select a.*,
  3     decode(rownum,1,1,99) x
  4  from all_objects a;

Table created.

rajesh@ORA12C>
rajesh@ORA12C> create index t_idx on t(x);

Index created.

rajesh@ORA12C>
rajesh@ORA12C> begin
  2     dbms_stats.gather_table_stats
  3     (ownname=> user,
  4      tabname=> 'T',
  5      method_opt=>'for all indexed columns size 5') ;
  6  end;
  7  /

PL/SQL procedure successfully completed.

rajesh@ORA12C> set autotrace traceonly explain
rajesh@ORA12C> select min(object_id) from t where x <> 99;

Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     8 |   432   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     8 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |     2 |    16 |   432   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("X"<>99)

If we look at relevant parts of the 10053 trace, we note that CBO don’t even cost or consider using Index.

Access path analysis for T
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T[T]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#19):
    NewDensity:0.000006, OldDensity:0.000006 BktCnt:89955.000000, PopBktCnt:89954.000000,
PopValCnt:1, NDV:2
  Column (#19): X(NUMBER)
    AvgLen: 3 NDV: 2 Nulls: 0 Density: 0.000006 Min: 0.000000 Max: 1.000000
    Histogram: Freq  kts: 2  UncompBkts: 89955  EndPtVals: 2  ActualVal: yes
  Table: T  Alias: T
    Card: Original: 89955.000000  Rounded: 2  Computed: 1.500000  Non Adjusted: 1.500000
  Scan IO  Cost (Disk) =   430.000000
  Scan CPU Cost (Disk) =   57157410.960000
  Total Scan IO  Cost  =   430.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) * 89955.000000
(#rows))
                       =   430.000000
  Total Scan CPU  Cost =   57157410.960000 (scan (Disk))
                         + 4497750.000000 (cpu filter eval) (= 50.000000 (per row) *
89955.000000 (#rows))
                       =   61655160.960000
  Access Path: TableScan
    Cost:  431.605050  Resp: 431.605050  Degree: 0
      Cost_io: 430.000000  Cost_cpu: 61655161
      Resp_io: 430.000000  Resp_cpu: 61655161
  Best:: AccessPath: TableScan
         Cost: 431.605050  Degree: 1  Resp: 431.605050  Card: 1.500000  Bytes: 0.000000

***************************************

No comments:

Post a Comment