Friday, February 12, 2016

Function based index and missing statistics

You have a column with 100 distinct values and out of those if you select just one values (assuming even distribution of data) you will select approximately 1% of data in that table.

However if the table potentially allow mixed set of values like ‘Hello’, ‘hello’ or ‘HEllo’ etc, therefore you decide to write a query that first converts all the fields to upper case such that you can now compare all possible values of specific name to HELLO.

Now, what is the selectivity of the column after you converted everything to upper case? There is no easy answer to this, it all depends on how many distinct values haves in our data.

However, what if the function did something more complex and converted the name to some numerical values based on the sum of all characters via some wired formula. In theory you could have no distinct values in that everything could get converted into a NULL or you could possibly have many distinct values.

Again, oracle can’t really tell how many rows get selected based on the determination of such function. When a predicate is used with a function call, oracle has as great difficulty in determining the correct selectivity. Therefore when you create a function based index, oracle behind the scenes create a hidden virtual column on the parent table in order to capture the data characteristics of the function so that CBO can make an accurate determination of selectivity associated with that function. 

If oracle knows the low_value, high_value, num of nulls and NDV on a column it can then accurately determine the selectivity and cardinality when the function is used on the column and hence calculate an accurate cost and determine whether the use of FBI is appropriate.

When function based index is created, oracle will now (even with 12c) automatically collects the statistics associated with the index (like blevel, number of leaf blocks, CF etc) but not the statistics associated with the hidden virtual columns as these statistics are associated with the table and not with the index.

rajesh@ORA12C> create table t(x int, flag varchar2(1));

Table created.

rajesh@ORA12C> insert /*+ append */ into t(x,flag)
  2  select rownum,case when rownum <=10
  3                     then 'N' else 'Y' end
  4  from big_table;

1000000 rows created.

rajesh@ORA12C> commit;

Commit complete.

rajesh@ORA12C> create index t_idx on t(decode(flag,'Y',1));

Index created.

rajesh@ORA12C>

Behind the scenes oracle has created a hidden virtual columns.

rajesh@ORA12C> select column_name,num_distinct,low_value,
  2          high_value,density,num_nulls,last_analyzed
  3  from user_tab_cols
  4  where table_name ='T'
  5  order by column_name;

COLUMN_NAME     NUM_DISTINCT LOW_V HIGH_    DENSITY  NUM_NULLS LAST_ANALYZ
--------------- ------------ ----- ----- ---------- ---------- -----------
FLAG                       2 4E    59            .5          0 11-FEB-2016
SYS_NC00003$   
X                    1000000 C102  C402     .000001          0 11-FEB-2016
                
3 rows selected.

Therefore even after you’ve created the index, Oracle still has no idea on the selectivity associated the function because the necessary virtual column statistics are still missing. Oracle simply makes “guess” as with more guesses it will quite likely to be wrong. Which means selectivity like to be wrong in-turn costing will be wrong in-turn execution plan will be wrong.

rajesh@ORA12C> set autotrace traceonly explain statistics
rajesh@ORA12C> select * from t where decode(flag,'Y',1)=1;

999990 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 767293772

---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       | 10000 | 70000 |   460   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     | 10000 | 70000 |   460   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |  4000 |       |  1952   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(DECODE("FLAG",'Y',1)=1)


Statistics
----------------------------------------------------------
          3  recursive calls
          0  db block gets
      16947  consistent gets
       3609  physical reads
          0  redo size
   11253571  bytes sent via SQL*Net to client
      73878  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     999990  rows processed

rajesh@ORA12C> select * from t where flag='Y';

999990 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500K|  3417K|   467   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |   500K|  3417K|   467   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("FLAG"='Y')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       8330  consistent gets
          0  physical reads
          0  redo size
   11253571  bytes sent via SQL*Net to client
      73878  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     999990  rows processed

rajesh@ORA12C> set autotrace off

What should be done after creating a function based indexed? You should collect statistics on the HIDDEN columns, so that CBO can accurately determine the cardinality associated with using the function and so make correct decision regarding the most appropriate execution plan.

rajesh@ORA12C> begin
  2     dbms_stats.gather_table_stats(user,'T',
  3             method_opt=>'for all hidden columns size 1',
  4             no_invalidate=>false);
  5  end;
  6  /

PL/SQL procedure successfully completed.

rajesh@ORA12C> select column_name,num_distinct,low_value,
  2          high_value,density,num_nulls,last_analyzed
  3  from user_tab_cols
  4  where table_name ='T'
  5  order by column_name;

COLUMN_NAME     NUM_DISTINCT LOW_V HIGH_    DENSITY  NUM_NULLS LAST_ANALYZED
--------------- ------------ ----- ----- ---------- ---------- --------------------
FLAG                       2 4E    59            .5          0 11-FEB-2016 13:59:44
SYS_NC00003$               1 C102  C102           1         10 11-FEB-2016 14:00:18
X                    1000000 C102  C402     .000001          0 11-FEB-2016 13:59:44
               
3 rows selected.

rajesh@ORA12C> set autotrace traceonly explain statistics
rajesh@ORA12C> select * from t where decode(flag,'Y',1)=1;

999990 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   999K|  9765K|   474   (3)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |   999K|  9765K|   474   (3)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(DECODE("FLAG",'Y',1)=1)


Statistics
----------------------------------------------------------
         28  recursive calls
          0  db block gets
       8377  consistent gets
          0  physical reads
          0  redo size
   11253571  bytes sent via SQL*Net to client
      73878  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
     999990  rows processed

rajesh@ORA12C>
rajesh@ORA12C> set autotrace off
rajesh@ORA12C>

No comments:

Post a Comment