The advantage of using AUTO sample size over a fixed number is two-folds
- First, when AUTO sample size is specified, the system automatically determines the appropriate sampling percentage
- Second, AUTO sample size is more flexible than a fixed sampling percentage. A fixed sampling percentage size that was good at some point in time may not be appropriate after the data distribution in the table has changed. On the other hand when AUTO value is used Oracle will adjust the sample size when the data distribution changes
In Oracle 11g, we improved the behavior when the AUTO value is used.
- First, AUTO sampling now generates deterministic statistics.
- Second, and more importantly, AUTO sampling generates statistics that are almost as accurate as 100% sampling but takes much less time than 100% sampling.
rajesh@11GR2> create table big_table
2 nologging
3 as
4 select *
5 from all_objects a;
Table created.
Elapsed: 00:00:06.83
rajesh@11GR2> begin
2 for i in 1..6
3 loop
4 insert /*+ append */ into big_table
5 select * from big_table;
6 commit;
7 end loop;
8 end;
9 /
PL/SQL procedure successfully completed.
Elapsed: 00:01:36.54
rajesh@11GR2> select /*+ parallel(ir) */ count(*) from big_table ir;
COUNT(*)
----------
6602624
Elapsed: 00:00:18.82
rajesh@11GR2> select dbms_stats.get_param('DEGREE') from dual;
DBMS_STATS.GET_PARAM('DEGREE')
-----------------------------------------------------------------------------------------------------------------------------------
NULL
So, i don't have any parallelism set at system level.
rajesh@11GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>1);
PL/SQL procedure successfully completed.
Elapsed: 00:00:24.03
rajesh@11GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name ='BIG_TABLE'
4 order by 1
5 /
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED 4674 .00021395 0 1 NONE
DATA_OBJECT_ID 8912 .000112208 5569800 1 NONE
GENERATED 2 .5 0 1 NONE
LAST_DDL_TIME 4779 .000209249 0 1 NONE
OBJECT_ID 56444 .000017717 0 1 NONE
OBJECT_NAME 29502 .000033896 0 1 NONE
OBJECT_TYPE 37 .027027027 0 1 NONE
OWNER 29 .034482759 0 1 NONE
SECONDARY 1 1 0 1 NONE
STATUS 2 .5 0 1 NONE
SUBOBJECT_NAME 1052 .00095057 6192300 1 NONE
TEMPORARY 2 .5 0 1 NONE
TIMESTAMP 4830 .000207039 0 1 NONE
13 rows selected.
Now analyzing table using dbms_stats.Auto_sample_size parameter
rajesh@11GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>dbms_stats.auto_sample_size);
PL/SQL procedure successfully completed.
Elapsed: 00:01:22.16
rajesh@11GR2>
rajesh@11GR2>
rajesh@11GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name ='BIG_TABLE'
4 order by 1
5 /
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED 5536 .000180636 0 1 NONE
DATA_OBJECT_ID 8903 .000112322 5567257 1 NONE
GENERATED 2 .5 0 1 NONE
LAST_DDL_TIME 5690 .000175747 0 1 NONE
OBJECT_ID 56588 .000017672 0 1 NONE
OBJECT_NAME 31850 .000031397 0 1 NONE
OBJECT_TYPE 40 .025 0 1 NONE
OWNER 29 .034482759 0 1 NONE
SECONDARY 1 1 0 1 NONE
STATUS 2 .5 0 1 NONE
SUBOBJECT_NAME 1136 .000880282 6196071 1 NONE
TEMPORARY 2 .5 0 1 NONE
TIMESTAMP 5774 .00017319 0 1 NONE
13 rows selected.
Elapsed: 00:00:00.07
Now analyzing table ( big_table ) using estimate_percent = 100
rajesh@11GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>100);
PL/SQL procedure successfully completed.
Elapsed: 00:01:36.38
rajesh@11GR2>
rajesh@11GR2>
rajesh@11GR2>
rajesh@11GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name ='BIG_TABLE'
4 order by 1
5 /
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED 5536 .000180636 0 1 NONE
DATA_OBJECT_ID 8903 .000112322 5567257 1 NONE
GENERATED 2 .5 0 1 NONE
LAST_DDL_TIME 5690 .000175747 0 1 NONE
OBJECT_ID 56291 .000017765 0 1 NONE
OBJECT_NAME 31815 .000031432 0 1 NONE
OBJECT_TYPE 40 .025 0 1 NONE
OWNER 29 .034482759 0 1 NONE
SECONDARY 1 1 0 1 NONE
STATUS 2 .5 0 1 NONE
SUBOBJECT_NAME 1136 .000880282 6196071 1 NONE
TEMPORARY 2 .5 0 1 NONE
TIMESTAMP 5774 .00017319 0 1 NONE
13 rows selected.
Elapsed: 00:00:00.42
rajesh@11GR2>
Now the results are very promising, AUTO sampling generates statistics that are almost as accurate as 100% sampling but takes much less time than 100% sampling.
Repeated the above test in Oracle 10.2.0.1.0 instance and AUTO sampling generates statistics that are far way from the 100% sampling.
rajesh@10GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>dbms_stats.auto_sample_size);
PL/SQL procedure successfully completed.
Elapsed: 00:00:28.87
rajesh@10GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name ='BIG_TABLE'
4 order by 1
5 /
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED 1865 .000536193 0 1 NONE
DATA_OBJECT_ID 9065 .000110314 5573646 1 NONE
GENERATED 2 .5 0 1 NONE
LAST_DDL_TIME 1839 .000543774 0 1 NONE
OBJECT_ID 56711 .000017633 0 1 NONE
OBJECT_NAME 29007 .000034474 0 1 NONE
OBJECT_TYPE 32 .03125 0 1 NONE
OWNER 25 .04 0 1 NONE
SECONDARY 1 1 0 1 NONE
STATUS 2 .5 0 1 NONE
SUBOBJECT_NAME 1032 .000968992 6216552 1 NONE
TEMPORARY 2 .5 0 1 NONE
TIMESTAMP 1862 .000537057 0 1 NONE
13 rows selected.
Elapsed: 00:00:00.04
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'BIG_TABLE',estimate_percent=>100);
PL/SQL procedure successfully completed.
Elapsed: 00:01:00.74
rajesh@10GR2> select column_name, num_distinct, density, num_nulls, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name ='BIG_TABLE'
4 order by 1
5 /
COLUMN_NAME NUM_DISTINCT DENSITY NUM_NULLS NUM_BUCKETS HISTOGRAM
------------------------------ ------------ ---------- ---------- ----------- ---------------
CREATED 5536 .000180636 0 1 NONE
DATA_OBJECT_ID 8903 .000112322 5567257 1 NONE
GENERATED 2 .5 0 1 NONE
LAST_DDL_TIME 5690 .000175747 0 1 NONE
OBJECT_ID 56291 .000017765 0 1 NONE
OBJECT_NAME 31815 .000031432 0 1 NONE
OBJECT_TYPE 40 .025 0 1 NONE
OWNER 29 .034482759 0 1 NONE
SECONDARY 1 1 0 1 NONE
STATUS 2 .5 0 1 NONE
SUBOBJECT_NAME 1136 .000880282 6196071 1 NONE
TEMPORARY 2 .5 0 1 NONE
TIMESTAMP 5774 .00017319 0 1 NONE
13 rows selected.
Elapsed: 00:00:00.06
rajesh@10GR2>
Excellent way of telling, and good article to
ReplyDeleteobtain data on the topic of my presentation subject matter, which i am going to present in school.
Also visit my homepage ... heated cat houses for outside