Tuesday, April 27, 2010

Histograms Critical to CBO

Histograms are to get accurate estimates of the distribution of column data. A histogram partitions the values in the column into bands, so that all column values in a band fall within the same range. Histograms provide improved selectivity estimates in the presence of data skew, resulting in optimal execution plans with nonuniform data distributions. ( An Excellent Information from product documentation ). Here is a test to demonstrate that.

test@9iR2> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.8.0 - Production
PL/SQL Release 9.2.0.8.0 - Production
CORE    9.2.0.8.0       Production
TNS for 32-bit Windows: Version 9.2.0.8.0 - Production
NLSRTL Version 9.2.0.8.0 - Production

test@9iR2> create table t
  2  as
  3  select case when rownum = 1 then 1 else 99 end as id,
  4        a.*
  5  from all_objects a;

Table created.

test@9iR2> create index t_ind on t(id);

Index created.



We created a Table 'T' with data skewed considerably on column ID.


test@9iR2> begin
  2             dbms_stats.gather_table_stats(ownname=>USER,
  3                                                     tabname=>'T',
  4                                                     cascade=>true,
  5                                                     method_opt=>null);
  6  end;
  7  /

PL/SQL procedure successfully completed.

Statistics is gathered on table T and index T_IND but not the histograms. So looking at the plan.

test@9iR2> set autotrace traceonly explain;
test@9iR2> select *
  2  from t
  3  where id = 1;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=46 Card=16598 Bytes=1576810)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=46 Card=16598 Bytes=1576810)

test@9iR2> select *
  2  from t
  3  where id = 99;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=46 Card=16598 Bytes=1576810)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=46 Card=16598 Bytes=1576810)

test@9iR2> set autotrace off;


test@9iR2> begin
  2             dbms_stats.gather_table_stats(ownname=>USER,
  3                                                     tabname=>'T',
  4                                                     cascade=>true,
  5                                                     method_opt=>' for all indexed columns size auto');
  6  end;
  7  /

PL/SQL procedure successfully completed.

Now, after gathering Histograms on columns with Skewed data, the plan changes.

test@9iR2> set autotrace traceonly explain;
test@9iR2> select *
  2  from t
  3  where id = 1;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=95)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T' (Cost=2 Card=1 Bytes=95)
   2    1     INDEX (RANGE SCAN) OF 'T_IND' (NON-UNIQUE) (Cost=1 Card=1)

test@9iR2> select *
  2  from t
  3  where id = 99;

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=46 Card=33195 Bytes=3153525)
   1    0   TABLE ACCESS (FULL) OF 'T' (Cost=46 Card=33195 Bytes=3153525)

test@9iR2> set autotrace off;

No comments:

Post a Comment