Today’s post is about the effect of
real-time statistics with Histograms in place. Let’s build a test case to
demonstrate the effect.
demo@PDB19> show parameter
optimizer_real_time_statistics
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
optimizer_real_time_statistics boolean FALSE
demo@PDB19> alter session set optimizer_real_time_statistics=true;
Session altered.
demo@PDB19> create table
t(
2 n1 number primary key,
3 n2 number ,
4 n3 varchar2(80) );
Table created.
demo@PDB19>
demo@PDB19> insert into t(n1,n2,n3)
2 with rws as ( select rownum n from dual connect by level <= 1000 )
3 select n, ceil(sqrt(n)),
4 to_char(to_date(n,'j'),'jsp')
5 from rws;
1000 rows created.
demo@PDB19> commit;
Commit complete.
demo@PDB19> begin
2 dbms_Stats.gather_table_stats(user,'T',
3 no_invalidate=>false,
4 method_opt=>'for all columns size auto for columns N2 size 2048');
5 end;
6 /
PL/SQL procedure successfully
completed.
So we got a table created with small set
of data in it and build a histogram on
the column N2. The number of distinct values in the column is 32, which is way
less than the number of buckets (2048) we requested, so we easily ended up with
a nice Frequency histogram.
demo@PDB19>
select column_name,num_nulls,num_distinct,low_value,get_stats_val(data_type,low_value)
low_val,
2 high_value,get_stats_val(data_type,high_value) high_val, histogram
3 from user_tab_columns
4 where table_name ='T';
COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL
HIGH_VALUE HIGH_VAL HISTOGRAM
---------- ---------- ------------ ---------- ---------- ------------ --------------- ---------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F206875 two hundred two NONE
demo@PDB19> select
endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='N2'
5 order by endpoint_value;
ENDPOINT_NUMBER
ENDPOINT_VALUE
--------------- --------------
1 1
4 2
9 3
16 4
25 5
36 6
49 7
64 8
81 9
100 10
121 11
144 12
169 13
196 14
225 15
256 16
289 17
324 18
361 19
400 20
441 21
484 22
529 23
576 24
625 25
676 26
729 27
784 28
841 29
900 30
961 31
1000 32
32 rows selected.
Let’s do few more conventional load for
the real-time statistics to kick in.
demo@PDB19> insert into
t(n1,n2,n3)
2 with rws as ( select rownum+10000 n from dual connect by level <= 10000 )
3 select n, ceil(sqrt(n)), to_char(to_date(n,'j'),'jsp')
4 from rws;
10000 rows created.
demo@PDB19> commit;
Commit complete.
With monitoring
information not yet flushed we got ended up with Real-Time column level
statistics
demo@PDB19>
select num_rows,blocks,monitoring
2 from user_tables
3 where table_name ='T';
NUM_ROWS
BLOCKS MON
---------- ---------- ---
1000 5 YES
demo@PDB19>
select num_rows,blocks,stale_stats,notes
2 from user_tab_statistics
3 where table_name ='T';
NUM_ROWS
BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
1000 5 YES
demo@PDB19>
select column_name,num_nulls,num_distinct,low_value,get_stats_val(data_type,low_value)
low_val,
2 high_value,get_stats_val(data_type,high_value) high_val, histogram
3 from user_tab_columns
4 where table_name ='T';
COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL
HIGH_VALUE HIGH_VAL HISTOGRAM
---------- ---------- ------------ ---------- ---------- ----------- --------------- ---------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F20687 two hundred two NONE
demo@PDB19>
select t1.column_name,t1.num_nulls,t1.num_distinct,t1.low_value,get_stats_val(t2.data_type,t1.low_value)
low_val,
2 t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
3 from user_tab_col_statistics t1,
4 user_tab_columns t2
5 where t1.table_name ='T'
6 and t1.table_name = t2.table_name
7 and t1.column_name = t2.column_name;
COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL
HIGH_VALUE HIGH_VAL HISTOGRAM NOTES
---------- ---------- ------------ ---------- ---------- ------------ --------------- ---------- -------------------------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F206875 two hundred two NONE
N1 0 C102 1 C30264 19900 STATS_ON_CONVENTIONAL_DML
N2 0 C102 1 C2022B 142 STATS_ON_CONVENTIONAL_DML
N3 0 6569676874 eight 74776F206875 two hundred two STATS_ON_CONVENTIONAL_DML
6 rows selected.
demo@PDB19> select
endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='N2'
5 order by endpoint_value;
ENDPOINT_NUMBER
ENDPOINT_VALUE
--------------- --------------
1 1
4 2
9 3
16 4
25 5
36 6
49 7
64 8
81 9
100 10
121 11
144 12
169 13
196 14
225 15
256 16
289 17
324 18
361 19
400 20
441 21
484 22
529 23
576 24
625 25
676 26
729 27
784 28
841 29
900 30
961 31
1000 32
32 rows selected.
Now parsing the
sql against the table produces the plan like this
demo@PDB19>
set serveroutput off
demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t where n2 < 42;
COUNT(*)
----------
1000
demo@PDB19>
select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 469b54saa49vu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t where n2 < 42
Plan hash value:
2966233522
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 83 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 83 |
|* 2 | TABLE ACCESS STORAGE FULL| T | 1 | 1000 | 1000 |00:00:00.01 | 83 |
---------------------------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
2 - storage("N2"<42)
filter("N2"<42)
Note
-----
- dynamic statistics used: statistics for conventional DML
All the values
in the histogram are < 42, so the optimizer think it will return all rows
from the table. Hence the estimated cardinality was reported as 1000 , that
maps to the number of rows statistics available in the data dictionary.
Lets flush the
monitoring info manually and verify the statistics
demo@PDB19>
exec dbms_stats.flush_database_monitoring_info;
PL/SQL procedure
successfully completed.
demo@PDB19>
demo@PDB19> select num_rows,blocks,monitoring
2 from user_tables
3 where table_name ='T';
NUM_ROWS
BLOCKS MON
---------- ---------- ---
1000 5 YES
demo@PDB19>
demo@PDB19> select num_rows,blocks,stale_stats,notes
2 from user_tab_statistics
3 where table_name ='T';
NUM_ROWS
BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
1000 5 YES
11000 80 STATS_ON_CONVENTIONAL_DML
demo@PDB19>
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,get_stats_val(data_type,low_value) low_val,
2 high_value,get_stats_val(data_type,high_value) high_val, histogram
3 from user_tab_columns
4 where table_name ='T';
COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL HIGH_VALUE HIGH_VAL HISTOGRAM
---------- ---------- ------------ ---------- ---------- ----------- ----------- ----------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F20687 two hundred NONE
demo@PDB19>
demo@PDB19> select t1.column_name,t1.num_nulls,t1.num_distinct,t1.low_value,get_stats_val(t2.data_type,t1.low_value) low_val,
2 t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
3 from user_tab_col_statistics t1,
4 user_tab_columns t2
5 where t1.table_name ='T'
6 and t1.table_name = t2.table_name
7 and t1.column_name = t2.column_name;
COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL
HIGH_VALUE HIGH_VAL HISTOGRAM NOTES
---------- ---------- ------------ ---------- ---------- ----------- ----------- --------------- -------------------------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F20687 two hundred NONE
N1 0 C102 1 C3026222 19733 STATS_ON_CONVENTIONAL_DML
N2 0 C102 1 C2022A 141 STATS_ON_CONVENTIONAL_DML
N3 0 6569676874 eight 74776F20687 two hundred STATS_ON_CONVENTIONAL_DML
6 rows selected.
demo@PDB19>
demo@PDB19> select endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='N2'
5 order by endpoint_value;
ENDPOINT_NUMBER
ENDPOINT_VALUE
--------------- --------------
1 1
4 2
9 3
16 4
25 5
36 6
49 7
64 8
81 9
100 10
121 11
144 12
169 13
196 14
225 15
256 16
289 17
324 18
361 19
400 20
441 21
484 22
529 23
576 24
625 25
676 26
729 27
784 28
841 29
900 30
961 31
1000 32
32 rows selected.
demo@PDB19>
Real time
statistics update the table and column level statistics, but not the
histograms, now parsing the sql against the fresh set of statistics will
produce the plan like this
demo@PDB19>
select /*+ gather_plan_statistics */ count(*) from t t1 where n2 < 42;
COUNT(*)
----------
1000
demo@PDB19>
select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID 8bpt64wrnspbu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t t1 where n2 < 42
Plan hash value:
2966233522
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 83 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 83 |
|* 2 | TABLE ACCESS STORAGE FULL| T | 1 | 11000 | 1000 |00:00:00.01 | 83 |
---------------------------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
2 - storage("N2"<42)
filter("N2"<42)
Note
-----
- dynamic statistics used: statistics for conventional DML
24 rows selected.
The optimizer
knows that there are 11000 rows in the table, but the histogram still says they
are all in the range 1 to 32, even though the column high value statistics represent
141. So it estimates the cardinality as the number of rows in the table.
The key thing
to note here is Real time statistics do update the table and column level
statistics but not the histograms & this will be a viable problem if your
queries are heavily dependent on histograms for optimization.
------------------------------------ ----------- ------------------------------
optimizer_real_time_statistics boolean FALSE
demo@PDB19> alter session set optimizer_real_time_statistics=true;
2 n1 number primary key,
3 n2 number ,
4 n3 varchar2(80) );
demo@PDB19> insert into t(n1,n2,n3)
2 with rws as ( select rownum n from dual connect by level <= 1000 )
3 select n, ceil(sqrt(n)),
4 to_char(to_date(n,'j'),'jsp')
5 from rws;
2 dbms_Stats.gather_table_stats(user,'T',
3 no_invalidate=>false,
4 method_opt=>'for all columns size auto for columns N2 size 2048');
5 end;
6 /
2 high_value,get_stats_val(data_type,high_value) high_val, histogram
3 from user_tab_columns
4 where table_name ='T';
---------- ---------- ------------ ---------- ---------- ------------ --------------- ---------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F206875 two hundred two NONE
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='N2'
5 order by endpoint_value;
--------------- --------------
1 1
4 2
9 3
16 4
25 5
36 6
49 7
64 8
81 9
100 10
121 11
144 12
169 13
196 14
225 15
256 16
289 17
324 18
361 19
400 20
441 21
484 22
529 23
576 24
625 25
676 26
729 27
784 28
841 29
900 30
961 31
1000 32
2 with rws as ( select rownum+10000 n from dual connect by level <= 10000 )
3 select n, ceil(sqrt(n)), to_char(to_date(n,'j'),'jsp')
4 from rws;
2 from user_tables
3 where table_name ='T';
---------- ---------- ---
1000 5 YES
2 from user_tab_statistics
3 where table_name ='T';
---------- ---------- ------- -------------------------
1000 5 YES
2 high_value,get_stats_val(data_type,high_value) high_val, histogram
3 from user_tab_columns
4 where table_name ='T';
---------- ---------- ------------ ---------- ---------- ----------- --------------- ---------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F20687 two hundred two NONE
2 t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
3 from user_tab_col_statistics t1,
4 user_tab_columns t2
5 where t1.table_name ='T'
6 and t1.table_name = t2.table_name
7 and t1.column_name = t2.column_name;
---------- ---------- ------------ ---------- ---------- ------------ --------------- ---------- -------------------------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F206875 two hundred two NONE
N1 0 C102 1 C30264 19900 STATS_ON_CONVENTIONAL_DML
N2 0 C102 1 C2022B 142 STATS_ON_CONVENTIONAL_DML
N3 0 6569676874 eight 74776F206875 two hundred two STATS_ON_CONVENTIONAL_DML
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='N2'
5 order by endpoint_value;
--------------- --------------
1 1
4 2
9 3
16 4
25 5
36 6
49 7
64 8
81 9
100 10
121 11
144 12
169 13
196 14
225 15
256 16
289 17
324 18
361 19
400 20
441 21
484 22
529 23
576 24
625 25
676 26
729 27
784 28
841 29
900 30
961 31
1000 32
demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t where n2 < 42;
----------
1000
-------------------------------------------------------------------------
SQL_ID 469b54saa49vu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t where n2 < 42
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 83 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 83 |
|* 2 | TABLE ACCESS STORAGE FULL| T | 1 | 1000 | 1000 |00:00:00.01 | 83 |
---------------------------------------------------
filter("N2"<42)
-----
- dynamic statistics used: statistics for conventional DML
demo@PDB19> select num_rows,blocks,monitoring
2 from user_tables
3 where table_name ='T';
---------- ---------- ---
1000 5 YES
demo@PDB19> select num_rows,blocks,stale_stats,notes
2 from user_tab_statistics
3 where table_name ='T';
---------- ---------- ------- -------------------------
1000 5 YES
11000 80 STATS_ON_CONVENTIONAL_DML
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,get_stats_val(data_type,low_value) low_val,
2 high_value,get_stats_val(data_type,high_value) high_val, histogram
3 from user_tab_columns
4 where table_name ='T';
COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL HIGH_VALUE HIGH_VAL HISTOGRAM
---------- ---------- ------------ ---------- ---------- ----------- ----------- ----------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F20687 two hundred NONE
demo@PDB19> select t1.column_name,t1.num_nulls,t1.num_distinct,t1.low_value,get_stats_val(t2.data_type,t1.low_value) low_val,
2 t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
3 from user_tab_col_statistics t1,
4 user_tab_columns t2
5 where t1.table_name ='T'
6 and t1.table_name = t2.table_name
7 and t1.column_name = t2.column_name;
---------- ---------- ------------ ---------- ---------- ----------- ----------- --------------- -------------------------
N1 0 1000 C102 1 C20B 1000 NONE
N2 0 32 C102 1 C121 32 FREQUENCY
N3 0 1000 6569676874 eight 74776F20687 two hundred NONE
N1 0 C102 1 C3026222 19733 STATS_ON_CONVENTIONAL_DML
N2 0 C102 1 C2022A 141 STATS_ON_CONVENTIONAL_DML
N3 0 6569676874 eight 74776F20687 two hundred STATS_ON_CONVENTIONAL_DML
demo@PDB19> select endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='N2'
5 order by endpoint_value;
--------------- --------------
1 1
4 2
9 3
16 4
25 5
36 6
49 7
64 8
81 9
100 10
121 11
144 12
169 13
196 14
225 15
256 16
289 17
324 18
361 19
400 20
441 21
484 22
529 23
576 24
625 25
676 26
729 27
784 28
841 29
900 30
961 31
1000 32
----------
1000
--------------------------------------------------------------------------------------
SQL_ID 8bpt64wrnspbu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t t1 where n2 < 42
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 83 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 83 |
|* 2 | TABLE ACCESS STORAGE FULL| T | 1 | 11000 | 1000 |00:00:00.01 | 83 |
---------------------------------------------------
filter("N2"<42)
-----
- dynamic statistics used: statistics for conventional DML