You
have a column with 100 distinct values and out of those if you select just one
values (assuming even distribution of data) you will select approximately 1% of
data in that table.
However
if the table potentially allow mixed set of values like ‘Hello’, ‘hello’ or
‘HEllo’ etc, therefore you decide to write a query that first converts all the
fields to upper case such that you can now compare all possible values of
specific name to HELLO.
Now,
what is the selectivity of the column after you converted everything to upper
case? There is no easy answer to this, it all depends on how many distinct
values haves in our data.
However,
what if the function did something more complex and converted the name to some
numerical values based on the sum of all characters via some wired formula. In
theory you could have no distinct values in that everything could get converted
into a NULL or you could possibly have many distinct values.
Again,
oracle can’t really tell how many rows get selected based on the determination
of such function. When a predicate is used with a function call, oracle has as
great difficulty in determining the correct selectivity. Therefore when you
create a function based index, oracle behind the scenes create a hidden virtual
column on the parent table in order to capture the data characteristics of the
function so that CBO can make an accurate determination of selectivity
associated with that function.
If
oracle knows the low_value, high_value, num of nulls and NDV on a column it can
then accurately determine the selectivity and cardinality when the function is
used on the column and hence calculate an accurate cost and determine whether
the use of FBI is appropriate.
When
function based index is created, oracle will now (even with 12c) automatically
collects the statistics associated with the index (like blevel, number of leaf
blocks, CF etc) but not the statistics associated with the hidden virtual
columns as these statistics are associated with the table and not with the
index.
rajesh@ORA12C> create table t(x int, flag varchar2(1));
Table created.
rajesh@ORA12C> insert /*+ append */ into t(x,flag)
2 select rownum,case when rownum <=10
3 then 'N' else 'Y' end
4 from
big_table;
1000000 rows created.
rajesh@ORA12C> commit;
Commit complete.
rajesh@ORA12C> create index t_idx on t(decode(flag,'Y',1));
Index created.
rajesh@ORA12C>
Behind
the scenes oracle has created a hidden virtual columns.
rajesh@ORA12C> select column_name,num_distinct,low_value,
2
high_value,density,num_nulls,last_analyzed
3 from user_tab_cols
4 where table_name ='T'
5 order by column_name;
COLUMN_NAME
NUM_DISTINCT LOW_V HIGH_
DENSITY NUM_NULLS LAST_ANALYZ
--------------- ------------ ----- ----- ---------- ----------
-----------
FLAG
2 4E 59 .5 0 11-FEB-2016
SYS_NC00003$
X
1000000 C102 C402 .000001 0 11-FEB-2016
3 rows selected.
Therefore
even after you’ve created the index, Oracle still has no idea on the
selectivity associated the function because the necessary virtual column
statistics are still missing. Oracle simply makes “guess” as with more guesses
it will quite likely to be wrong. Which means selectivity like to be wrong in-turn
costing will be wrong in-turn execution plan will be wrong.
rajesh@ORA12C> set autotrace traceonly explain statistics
rajesh@ORA12C> select * from t where decode(flag,'Y',1)=1;
999990 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 767293772
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | | 10000 | 70000 | 460
(0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 10000 | 70000 | 460
(0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 4000 |
| 1952 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -
access(DECODE("FLAG",'Y',1)=1)
Statistics
----------------------------------------------------------
3
recursive calls
0 db block gets
16947 consistent gets
3609 physical reads
0 redo size
11253571 bytes sent via SQL*Net to client
73878 bytes received via SQL*Net from client
6668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
999990 rows processed
rajesh@ORA12C> select * from t where flag='Y';
999990 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT |
| 500K| 3417K|
467 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL|
T | 500K|
3417K| 467 (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 -
filter("FLAG"='Y')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
8330 consistent gets
0 physical reads
0 redo size
11253571 bytes sent via SQL*Net to client
73878 bytes received via SQL*Net from client
6668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
999990 rows processed
rajesh@ORA12C> set autotrace off
What
should be done after creating a function based indexed? You should collect
statistics on the HIDDEN columns, so that CBO can accurately determine the
cardinality associated with using the function and so make correct decision
regarding the most appropriate execution plan.
rajesh@ORA12C> begin
2 dbms_stats.gather_table_stats(user,'T',
3 method_opt=>'for all hidden
columns size 1',
4 no_invalidate=>false);
5 end;
6 /
PL/SQL procedure successfully completed.
rajesh@ORA12C> select column_name,num_distinct,low_value,
2
high_value,density,num_nulls,last_analyzed
3 from user_tab_cols
4 where table_name ='T'
5 order by column_name;
COLUMN_NAME
NUM_DISTINCT LOW_V HIGH_
DENSITY NUM_NULLS LAST_ANALYZED
--------------- ------------ ----- ----- ---------- ----------
--------------------
FLAG
2 4E 59 .5 0 11-FEB-2016 13:59:44
SYS_NC00003$ 1 C102 C102
1 10 11-FEB-2016 14:00:18
X
1000000 C102 C402 .000001 0 11-FEB-2016 13:59:44
3 rows selected.
rajesh@ORA12C> set autotrace traceonly explain statistics
rajesh@ORA12C> select * from t where decode(flag,'Y',1)=1;
999990 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
999K| 9765K| 474
(3)| 00:00:01 |
|* 1 | TABLE ACCESS FULL|
T |
999K| 9765K| 474
(3)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 -
filter(DECODE("FLAG",'Y',1)=1)
Statistics
----------------------------------------------------------
28 recursive calls
0 db block gets
8377 consistent gets
0 physical reads
0 redo size
11253571 bytes sent via SQL*Net to client
73878
bytes received via SQL*Net from client
6668 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
999990 rows processed
rajesh@ORA12C>
rajesh@ORA12C> set autotrace off
rajesh@ORA12C>
No comments:
Post a Comment