Top
Frequency histogram was newly introduced in 12c and it is a variation on a
Frequency histogram that ignores non-popular values that are statistically
insignificant. A top frequency histogram
can produce better histogram for highly popular values.
If a small number of
values occupies most of the rows, then creating a Frequency histogram on that
small set of values is useful even when the NDV (Number of Distinct Values) is
greater than the requested number of histogram buckets. To create a better quality
histogram for popular values, the optimizer ignores the non-popular values and
creates a frequency histogram.
The database creates a
Top frequency histogram, when the following criteria are met.
- NDV is greater than n, where n is the requested number of buckets (default 254)
- The percentage of rows occupied by Top-frequent values is greater than or equal to the threshold p where p is (1-(1/n) * 100).
- The estimate_percent parameter in dbms_stats gathering procedure should be auto_sample_size (set to default)
rajesh@ORA12C> create table t as select * from all_objects;
Table created.
rajesh@ORA12C> select num_rows from user_tables where
table_name ='T';
NUM_ROWS
----------
90215
1 row selected.
rajesh@ORA12C> select column_name,num_distinct,density,histogram
2
fromuser_tab_col_statistics
3 wheretable_name ='T'
4 andcolumn_name
='OWNER';
COLUMN_NAM NUM_DISTINCT
DENSITY HISTOGRAM
---------- ------------ ---------- ---------------
OWNER
33 .03030303 NONE
1 row selected.
rajesh@ORA12C> select owner,count(*)
2 from t
3 group by owner
4 order by 2 desc;
OWNER
COUNT(*)
-------------------- ----------
SYS
41882
PUBLIC
37076
ORDSYS
3157
APEX_040200
3025
MDSYS
1647
SYSTEM
589
XDB
401
CTXSYS
401
WMSYS
371
SH
300
DVSYS
292
ORDDATA 274
LBACSYS
237
OE
112
GSMADMIN_INTERNAL
104
DBSNMP
55
IX
48
RAJESH
38
HR
34
OLAPSYS 25
DEMO
25
OJVMSYS
22
DVF
19
FLOWS_FILES
11
ORDPLUGINS
10
PM
10
AUDSYS
9
OUTLN 8
SI_INFORMTN_SCHEMA
8
BI
8
SCOTT
6
ORACLE_OCM
6
APPQOSSYS
5
33 rows selected.
rajesh@ORA12C> begin
2 dbms_stats.gather_table_stats
3 (ownname=>user,
4 tabname=>'T',
5 method_opt=>'for columns owner
size 10’);
6 end;
7 /
PL/SQL procedure successfully completed.
rajesh@ORA12C>
So during stats gathering we requested for histogram on the column OWNER with bucket size of 10.
- OWNER column has NDV = 33 and the requested bucket size is 10. So NDV > N (where N is the number of buckets requested)
- The Top 10 values of the column OWNER corresponds to 98% rows in the table, which is greater than the arrived threshold value is ( (1 – (1/10))*100) = 90%, so obviously the rows occupied by Top-N dominates here.
- The default sample size is being used for this histogram generation
Since it matches all the above three conditions, we
end up with Top-Frequency histogram.
rajesh@ORA12C> select
column_name,num_distinct,density,histogram
2
fromuser_tab_col_statistics
3 wheretable_name ='T'
4 andcolumn_name
='OWNER';
COLUMN_NAM NUM_DISTINCT
DENSITY HISTOGRAM
---------- ------------ ---------- ---------------
OWNER 33
5.5423E-06 TOP-FREQUENCY
1 row selected.
rajesh@ORA12C>
rajesh@ORA12C> column endpoint_actual_value format a15
rajesh@ORA12C> select endpoint_number,endpoint_actual_value
2 fromuser_tab_histograms
3 wheretable_name ='T'
4 andcolumn_name
='OWNER';
ENDPOINT_NUMBER ENDPOINT_ACTUAL
--------------- ---------------
3025
APEX_040200
3426 CTXSYS
5073 MDSYS
8230 ORDSYS
45306 PUBLIC
45606 SH
87488 SYS
88077 SYSTEM
88448 WMSYS
88849 XDB
10 rows selected.
Cardinality = (Num_of_rows *Num_of_endpoints_spanned) / Total_num_of_endpoints.
rajesh@ORA12C> select count(*) from t where owner ='MDSYS';
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
1 | 6 | 480
(13)| 00:00:01 |
| 1 | SORT AGGREGATE |
| 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T |
1647 | 9882 | 480
(13)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -
filter("OWNER"='MDSYS')
So taking the value of OWNER = MDSYS into
consideration and applying the above cardinality formula produces this result.
Cardinality = ((5073 - 3426) * 90215)/88847 = 1672
(that pretty much matches up with the estimated cardinality)
But in case of non-popular values, the formula from
the documentation is
Cardinality = Num_of_rows * Density,
applying this formula to our data sets produces the result as 0.5 – rounded off
to the nearestinteger the estimated cardinality should be one.
rajesh@ORA12C> select num_distinct,density,
density*90215 the_guess
2
fromuser_tab_col_statistics
3
wheretable_name ='T'
4
andcolumn_name ='OWNER' ;
NUM_DISTINCT
DENSITY THE_GUESS
------------ ---------- ----------
33
5.5423E-06 .5
1 row selected.
rajesh@ORA12C>
But the explain plan for the non-popular values
reported the estimated cardinality as 59, which seems far away from our
estimated calculations.
rajesh@ORA12C> select count(*) from t where owner ='SCOTT';
Execution Plan
----------------------------------------------------------
Plan hash value: 2966233522
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
1 | 6 | 480
(13)| 00:00:01 |
| 1 | SORT AGGREGATE |
| 1 | 6 | | |
|* 2 | TABLE ACCESS FULL| T | 59 |
354 | 480 (13)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -
filter("OWNER"='SCOTT')
So looking into 10053 trace, showed that rather than
using density from user_tab_col_statistics, optimizer used a different values
called “NewDensity” to produce the cardinality.
=====================================
Access path analysis for T
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality
Estimation for T[T]
SPD: Return code in
qosdDSDirSetup: NOCTX, estType = TABLE
Column (#1):
NewDensity:0.000658, OldDensity:0.000006
BktCnt:88849.000000, PopBktCnt:88849.000000, PopValCnt:10, NDV:33
Column (#1):
OWNER(VARCHAR2)
AvgLen: 6 NDV: 33 Nulls: 0 Density: 0.000658
Histogram:
Top-Freq #Bkts: 88849 UncompBkts: 88849 EndPtVals: 10
ActualVal: yes
Table: T Alias: T
Card: Original:
90215.000000 Rounded: 59 Computed: 59.391304 Non Adjusted: 59.391304
So applying the formula as Cardinality = NewDensity
* Num_of_rows = 0.000658 * 90215 = 59.3 (rounded to the nearest integer
produced the value as 59) – that matches up with the estimated cardinality
shown up in the explain plan.
But how does the optimizer arrived this NewDensity
value shown up in 10053 Trace?
NewDensity = sum_of_unpopular_value_occurances/ (unpopular NDV * Num_of_rows )
Applying this to our data set resulted in this.
NewDensity = (292 + 274
+ 237 + 112 + 104 + 55 + 48 + 38 + 34 + 25 + 25 + 22 + 19 + 11 + 10 + 10 + 9 +
8 + 8 + 8 + 6 + 6 + 5 ) / (
23*90215) = 0.000658
No comments:
Post a Comment