Friday, October 15, 2010

AUTO sampling statistics Improvement in Oracle 11g

It is not always easy for users to pick the appropriate sampling percentage. If the specified sampling percentage is too high, it can take longer to gather statistics. On the contray, if the data is skewed and the specified sampling percentage is too low, the resulting statistics can be inaccurate. For this reason, Oracle introduced the AUTO value for the estimate_percent parameter ( dbms_stats.auto_sample_size ).

The advantage of using AUTO sample size over a fixed number is two-folds
  1. First, when AUTO sample size is specified, the system automatically determines the appropriate sampling percentage       
  2. Second, AUTO sample size is more flexible than a fixed sampling percentage. A fixed sampling percentage size that was good at some point in time may not be appropriate after the data distribution in the table has changed. On the other hand when AUTO value is used Oracle will adjust the sample size when the data distribution changes

In Oracle 11g, we improved the behavior when the AUTO value is used.
  1. First, AUTO sampling now generates deterministic statistics. 
  2. Second, and more importantly, AUTO sampling generates statistics that are almost as accurate as 100% sampling but takes much less time than 100% sampling.

rajesh@11GR2> create table big_table
  2  nologging
  3  as
  4  select *
  5  from all_objects a;

Table created.

Elapsed: 00:00:06.83
rajesh@11GR2> begin
  2     for i in 1..6
  3     loop
  4             insert /*+ append */ into big_table
  5             select * from big_table;
  6             commit;
  7     end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

Elapsed: 00:01:36.54

rajesh@11GR2> select /*+ parallel(ir) */ count(*) from big_table ir;

  COUNT(*)
----------
   6602624

Elapsed: 00:00:18.82

rajesh@11GR2> select dbms_stats.get_param('DEGREE') from dual;

DBMS_STATS.GET_PARAM('DEGREE')
-----------------------------------------------------------------------------------------------------------------------------------
NULL

So, i don't  have any parallelism set at system level.


rajesh@11GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>1);

PL/SQL procedure successfully completed.

Elapsed: 00:00:24.03
rajesh@11GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
  2  from user_tab_col_statistics
  3  where table_name ='BIG_TABLE'
  4  order by 1
  5  /

COLUMN_NAME                    NUM_DISTINCT    DENSITY  NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED                                4674  .00021395          0           1 NONE
DATA_OBJECT_ID                         8912 .000112208    5569800           1 NONE
GENERATED                                 2         .5          0           1 NONE
LAST_DDL_TIME                          4779 .000209249          0           1 NONE
OBJECT_ID                             56444 .000017717          0           1 NONE
OBJECT_NAME                           29502 .000033896          0           1 NONE
OBJECT_TYPE                              37 .027027027          0           1 NONE
OWNER                                    29 .034482759          0           1 NONE
SECONDARY                                 1          1          0           1 NONE
STATUS                                    2         .5          0           1 NONE
SUBOBJECT_NAME                         1052  .00095057    6192300           1 NONE
TEMPORARY                                 2         .5          0           1 NONE
TIMESTAMP                              4830 .000207039          0           1 NONE

13 rows selected.

Now analyzing table using dbms_stats.Auto_sample_size parameter

rajesh@11GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>dbms_stats.auto_sample_size);

PL/SQL procedure successfully completed.

Elapsed: 00:01:22.16
rajesh@11GR2>
rajesh@11GR2>
rajesh@11GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
  2  from user_tab_col_statistics
  3  where table_name ='BIG_TABLE'
  4  order by 1
  5  /

COLUMN_NAME                    NUM_DISTINCT    DENSITY  NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED                                5536 .000180636          0           1 NONE
DATA_OBJECT_ID                         8903 .000112322    5567257           1 NONE
GENERATED                                 2         .5          0           1 NONE
LAST_DDL_TIME                          5690 .000175747          0           1 NONE
OBJECT_ID                             56588 .000017672          0           1 NONE
OBJECT_NAME                           31850 .000031397          0           1 NONE
OBJECT_TYPE                              40       .025          0           1 NONE
OWNER                                    29 .034482759          0           1 NONE
SECONDARY                                 1          1          0           1 NONE
STATUS                                    2         .5          0           1 NONE
SUBOBJECT_NAME                         1136 .000880282    6196071           1 NONE
TEMPORARY                                 2         .5          0           1 NONE
TIMESTAMP                              5774  .00017319          0           1 NONE

13 rows selected.

Elapsed: 00:00:00.07

Now analyzing table ( big_table ) using estimate_percent = 100

rajesh@11GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>100);

PL/SQL procedure successfully completed.

Elapsed: 00:01:36.38
rajesh@11GR2>
rajesh@11GR2>
rajesh@11GR2>
rajesh@11GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
  2  from user_tab_col_statistics
  3  where table_name ='BIG_TABLE'
  4  order by 1
  5  /

COLUMN_NAME                    NUM_DISTINCT    DENSITY  NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED                                5536 .000180636          0           1 NONE
DATA_OBJECT_ID                         8903 .000112322    5567257           1 NONE
GENERATED                                 2         .5          0           1 NONE
LAST_DDL_TIME                          5690 .000175747          0           1 NONE
OBJECT_ID                             56291 .000017765          0           1 NONE
OBJECT_NAME                           31815 .000031432          0           1 NONE
OBJECT_TYPE                              40       .025          0           1 NONE
OWNER                                    29 .034482759          0           1 NONE
SECONDARY                                 1          1          0           1 NONE
STATUS                                    2         .5          0           1 NONE
SUBOBJECT_NAME                         1136 .000880282    6196071           1 NONE
TEMPORARY                                 2         .5          0           1 NONE
TIMESTAMP                              5774  .00017319          0           1 NONE

13 rows selected.

Elapsed: 00:00:00.42
rajesh@11GR2>

Now the results are very promising, AUTO sampling generates statistics that are almost as accurate as 100% sampling but takes much less time than 100% sampling.

Repeated the above test in Oracle 10.2.0.1.0 instance and AUTO sampling generates statistics that are far way from the 100% sampling.

rajesh@10GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>dbms_stats.auto_sample_size);

PL/SQL procedure successfully completed.

Elapsed: 00:00:28.87
rajesh@10GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
  2  from user_tab_col_statistics
  3  where table_name ='BIG_TABLE'
  4  order by 1
  5  /

COLUMN_NAME                    NUM_DISTINCT    DENSITY  NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED                                1865 .000536193          0           1 NONE
DATA_OBJECT_ID                         9065 .000110314    5573646           1 NONE
GENERATED                                 2         .5          0           1 NONE
LAST_DDL_TIME                          1839 .000543774          0           1 NONE
OBJECT_ID                             56711 .000017633          0           1 NONE
OBJECT_NAME                           29007 .000034474          0           1 NONE
OBJECT_TYPE                              32     .03125          0           1 NONE
OWNER                                    25        .04          0           1 NONE
SECONDARY                                 1          1          0           1 NONE
STATUS                                    2         .5          0           1 NONE
SUBOBJECT_NAME                         1032 .000968992    6216552           1 NONE
TEMPORARY                                 2         .5          0           1 NONE
TIMESTAMP                              1862 .000537057          0           1 NONE

13 rows selected.

Elapsed: 00:00:00.04
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>100);

PL/SQL procedure successfully completed.

Elapsed: 00:01:00.74
rajesh@10GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
  2  from user_tab_col_statistics
  3  where table_name ='BIG_TABLE'
  4  order by 1
  5  /

COLUMN_NAME                    NUM_DISTINCT    DENSITY  NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED                                5536 .000180636          0           1 NONE
DATA_OBJECT_ID                         8903 .000112322    5567257           1 NONE
GENERATED                                 2         .5          0           1 NONE
LAST_DDL_TIME                          5690 .000175747          0           1 NONE
OBJECT_ID                             56291 .000017765          0           1 NONE
OBJECT_NAME                           31815 .000031432          0           1 NONE
OBJECT_TYPE                              40       .025          0           1 NONE
OWNER                                    29 .034482759          0           1 NONE
SECONDARY                                 1          1          0           1 NONE
STATUS                                    2         .5          0           1 NONE
SUBOBJECT_NAME                         1136 .000880282    6196071           1 NONE
TEMPORARY                                 2         .5          0           1 NONE
TIMESTAMP                              5774  .00017319          0           1 NONE

13 rows selected.

Elapsed: 00:00:00.06
rajesh@10GR2>

1 comment:

  1. Excellent way of telling, and good article to
    obtain data on the topic of my presentation subject matter, which i am going to present in school.


    Also visit my homepage ... heated cat houses for outside

    ReplyDelete