Thursday, February 3, 2011

Histograms Improved in Oracle 10g..

Something that keeps changing in Oracle database, Now its time to see how Histograms improved in Oracle database 10g.

rajesh@10GR2> create table t
  2  nologging
  3  as
  4  select case when rownum = 1 then 5 else mod(rownum,4) end as id,
  5      a.*
  6  from all_objects a;

Table created.

Elapsed: 00:00:07.90
rajesh@10GR2>
rajesh@10GR2> create index t_ind on t(id) nologging;

Index created.

Elapsed: 00:00:01.81
rajesh@10GR2>
rajesh@10GR2> begin
  2     dbms_stats.gather_table_stats(user,'T');
  3  end;
  4  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:01.29
rajesh@10GR2>

Now, we have massively skewed data on the column 'ID'. one row with value '5' and others with values from 0 to 3. looking at the Histograms shows me this.

rajesh@10GR2> select endpoint_number, endpoint_value
  2  from user_tab_histograms
  3  where table_name ='T'
  4  and column_name ='ID'
  5  /

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              0              0
              1              3

Elapsed: 00:00:00.01
rajesh@10GR2>

We don't have complete histograms on 'ID' column, it shows only two buckets. Running queries against on these skewed columns, shows below plans.

rajesh@10GR2> set autotrace traceonly explain;
rajesh@10GR2> select *
  2  from t
  3  where id = 5;

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  4753 |   450K|   185   (1)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    |  4753 |   450K|   185   (1)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"=5)

rajesh@10GR2>
rajesh@10GR2> select *
  2  from t
  3  where id = 1;

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 14260 |  1350K|   186   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    | 14260 |  1350K|   186   (2)| 00:00:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"=1)

rajesh@10GR2> set autotrace off;

The first query ( where id=5) has come up with false plan, it should have used Index segment rather than Table segment. ( since we have only one record in table 'T' with id = 5).

rajesh@10GR2> exec print_table ('select * from sys.col_usage$ where obj# = (select object_id from user_objects where object_name =''T'') ');

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.18
rajesh@10GR2>


Now, regathering stats again,  Histogram changes.

rajesh@10GR2> begin
  2     dbms_stats.gather_table_stats(user,'T');
  3  end;
  4  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.45
rajesh@10GR2>
rajesh@10GR2> select endpoint_number, endpoint_value
  2  from user_tab_histograms
  3  where table_name ='T'
  4  and column_name ='ID'
  5  /

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
           1595              0
           3227              1
           4817              2
           6427              3

Elapsed: 00:00:00.06
rajesh@10GR2>

Now, we have more information (buckets) available in Histograms!

rajesh@10GR2>
rajesh@10GR2> exec print_table ('select * from sys.col_usage$ where obj# = (select object_id from user_objects where object_name =''T'') ');

     OBJ#---------------- 158387
     INTCOL#------------- 1
     EQUALITY_PREDS------ 1
     EQUIJOIN_PREDS------ 0
     NONEQUIJOIN_PREDS--- 0
     RANGE_PREDS--------- 0
     LIKE_PREDS---------- 0
     NULL_PREDS---------- 0
     TIMESTAMP----------- 03-FEB-11
--------------------------------------------

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04
rajesh@10GR2>

Now re-running the queries I see different plans.

rajesh@10GR2> set autotrace traceonly explain;
rajesh@10GR2> select *
  2  from t
  3  where id = 5;

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1376202287
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    97 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |     1 |    97 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IND |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"=5)

rajesh@10GR2>
rajesh@10GR2> select *
  2  from t
  3  where id = 1;

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 14484 |  1372K|   186   (2)| 00:00:03 |
|*  1 |  TABLE ACCESS FULL| T    | 14484 |  1372K|   186   (2)| 00:00:03 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ID"=1)

rajesh@10GR2> set autotrace off;
rajesh@10GR2>
rajesh@10GR2>

This shows that based on predicate usage, optimizer is getting smarter and comming up with better plans. The "magic" behind DBMS_STATS in 10g database.

( Just another update, I tried this test case in 9.2.0.8.0 database and NO changes in 9i, it looks like "magic" with 10g database)

This clearly shows that Histograms are not gathered properly, untill we query the data !