Keeping the
object statistics upto date is critical to Oracle database performance and
stability. Real time statistics introduced in Oracle 19c is one such feature
agument the existing statistics. However they are not quite as real-time as the
name suggest, to keep their implementation light weight this feature use the
table monitoring mechanism and this even limits the amount of information can
be collected.
And most
importanly this feature is available only on engineerred system and so certainly targeted at the autonomous
databases.
Oracle 12c
introduced online statistics gathering for direct path inserts and CREATE
TABLE AS SELECT..
While 19c introduced real-time statistics, which extends online support for conventional DML statements.
Lets setup a test
case to show how things worked prior to Oracle 19c and then will show how this
feature “real-time statistics” will kick-in to provide improvements in Oracle
19c and above.
demo@XEPDB1> create table
t (
2 c1 number constraint t_pk primary key ,
3 c2 number );
Table created.
demo@XEPDB1> insert /*+
append */ into t(c1,c2)
2 select rownum, rownum
3 from dual
4 connect by level <=10000;
10000 rows created.
demo@XEPDB1> commit;
Commit complete.
demo@XEPDB1> select num_rows,blocks,stale_stats
2 from user_tab_statistics
3 where table_name ='T';
NUM_ROWS
BLOCKS STALE_S
---------- ---------- -------
10000 22 NO
demo@XEPDB1> select
column_name,low_value,high_value,num_distinct,notes
2 from user_tab_col_statistics
3 where table_name ='T';
COLUMN_NAM LOW_VALUE HIGH_VALUE NUM_DISTINCT NOTES
---------- ---------- ---------- ------------ -------------------------
C1 C102 C302 10000 STATS_ON_LOAD
C2 C102 C302 10000 STATS_ON_LOAD
So created a
table and did direct path loads to get the statistics upto date with the latest
data, now lets parse few sql’s and validate the estimates
demo@XEPDB1> select /*+ gather_plan_statistics
*/ max(c2), count(*)
2 from t
3 where c1 >= 9000;
MAX(C2)
COUNT(*)
---------- ----------
10000 1001
demo@XEPDB1> select * from
table( dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID 6fktndansb5hj, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max(c2), count(*) from t where c1
>= 9000
Plan hash value: 2966233522
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | 18 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | 18 |
|* 2 | TABLE ACCESS FULL| T | 1 | 1001 | 1001 |00:00:00.01 | 21 | 18 |
----------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 - filter("C1">=9000)
With right
statistics in place estimates were close to actuals. Lets do few more data loads and verify the
estimates
demo@XEPDB1> insert into
t(c1,c2)
2 select rownum+10000, rownum+10000
3 from dual
4 connect by level <=1000;
1000 rows created.
demo@XEPDB1> commit;
Commit complete.
demo@XEPDB1> select /*+
gather_plan_statistics */ max(c2), count(*)
2 from t t2
3 where c1 >= 9000;
MAX(C2)
COUNT(*)
---------- ----------
11000 2001
demo@XEPDB1> select * from
table( dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID 6vsgr79023ggn, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max(c2), count(*) from t t2 where
c1 >= 9000
Plan hash value: 2966233522
-------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 30 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 30 |
|* 2 | TABLE ACCESS FULL| T | 1 | 1001 | 2001 |00:00:00.01 | 30 |
-------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 - filter("C1">=9000)
20 rows selected.
demo@XEPDB1>
with few more
1000 rows got added to the table, but those changes were not yet available on
statistics and due to that estimates were misleading.
But when we get
into 19c database, the optimizer will keep track off all the dml changes in the
data dictionary, And these statistics were feed back into the optimizer while
optimising the future sql’s.
Here is an example of this performed in 19c.
demo@PDB19> insert into
t(c1,c2)
2 select rownum+10000, rownum+10000
3 from dual
4 connect by level <=1000;
1000 rows created.
demo@PDB19> commit;
Commit complete.
demo@PDB19> select
num_rows,blocks,notes,stale_stats
2 from user_tab_statistics
3 where table_name ='T';
NUM_ROWS
BLOCKS NOTES
STALE_S
---------- ---------- ------------------------- -------
10000 22 NO
Here is where
the 19c enhancements comes into play, when we query the data dictionary, we now
have additional statistics and this new statistics we have is derived from
normal DML activities that have occurred in the database and this is not a full
GATHER Statistics option. The real time statistics can keep trace of simple
column level statistics like whether low/high value of columns have changed.
demo@PDB19> select
column_name,low_value,high_value,num_distinct,notes
2 from user_tab_col_statistics
3 where table_name ='T';
COLUMN_NAM LOW_VALUE HIGH_VALUE NUM_DISTINCT NOTES
---------- ---------- ---------- ------------ -------------------------
C1 C102 C302 10000 STATS_ON_LOAD
C2 C102 C302 10000 STATS_ON_LOAD
C1 C102 C3020737 STATS_ON_CONVENTIONAL_DML
C2 C102 C3020737 STATS_ON_CONVENTIONAL_DML
demo@PDB19> select /*+
gather_plan_statistics */ max(c2), count(*)
2 from t t1
3 where c1 >= 9000;
MAX(C2)
COUNT(*)
---------- ----------
11000 2001
demo@PDB19> select * from
table( dbms_xplan.display_cursor(format=>'allstats last'));
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
SQL_ID 4q9b758gr91m8, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max(c2), count(*) from t t1 where
c1 >= 9000
Plan hash value: 2966233522
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 30 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 30 |
|* 2 | TABLE ACCESS STORAGE FULL| T | 1 | 1709 | 2001 |00:00:00.01 | 30 |
---------------------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 - storage("C1">=9000)
filter("C1">=9000)
Note
-----
- dynamic statistics used: statistics for conventional DML
Now the
estimates is much close to actuals, we can expect a little variation from there
from the true values, but the real-time statistics is not about performing a
full gather statistics operation on a table.
So some elements of the statistics (like NDV , histograms etc) are not feasible to keep upto date everytime DML is performed. But having improved rowcounts yeild benefit to execution plan.
More to come,
stay Tuned.
While 19c introduced real-time statistics, which extends online support for conventional DML statements.
2 c1 number constraint t_pk primary key ,
3 c2 number );
2 select rownum, rownum
3 from dual
4 connect by level <=10000;
2 from user_tab_statistics
3 where table_name ='T';
---------- ---------- -------
10000 22 NO
2 from user_tab_col_statistics
3 where table_name ='T';
---------- ---------- ---------- ------------ -------------------------
C1 C102 C302 10000 STATS_ON_LOAD
C2 C102 C302 10000 STATS_ON_LOAD
2 from t
3 where c1 >= 9000;
---------- ----------
10000 1001
----------------------------------------------------------------------------------------------------------------
SQL_ID 6fktndansb5hj, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max(c2), count(*) from t where c1
>= 9000
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 21 | 18 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 21 | 18 |
|* 2 | TABLE ACCESS FULL| T | 1 | 1001 | 1001 |00:00:00.01 | 21 | 18 |
----------------------------------------------------------------------------------------------
---------------------------------------------------
2 select rownum+10000, rownum+10000
3 from dual
4 connect by level <=1000;
2 from t t2
3 where c1 >= 9000;
---------- ----------
11000 2001
----------------------------------------------------------------------------------------------------------------
SQL_ID 6vsgr79023ggn, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max(c2), count(*) from t t2 where
c1 >= 9000
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 30 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 30 |
|* 2 | TABLE ACCESS FULL| T | 1 | 1001 | 2001 |00:00:00.01 | 30 |
-------------------------------------------------------------------------------------
---------------------------------------------------
Here is an example of this performed in 19c.
2 select rownum+10000, rownum+10000
3 from dual
4 connect by level <=1000;
2 from user_tab_statistics
3 where table_name ='T';
---------- ---------- ------------------------- -------
10000 22 NO
2 from user_tab_col_statistics
3 where table_name ='T';
---------- ---------- ---------- ------------ -------------------------
C1 C102 C302 10000 STATS_ON_LOAD
C2 C102 C302 10000 STATS_ON_LOAD
C1 C102 C3020737 STATS_ON_CONVENTIONAL_DML
C2 C102 C3020737 STATS_ON_CONVENTIONAL_DML
2 from t t1
3 where c1 >= 9000;
---------- ----------
11000 2001
----------------------------------------------------------------------------------------------
SQL_ID 4q9b758gr91m8, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max(c2), count(*) from t t1 where
c1 >= 9000
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 30 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 30 |
|* 2 | TABLE ACCESS STORAGE FULL| T | 1 | 1709 | 2001 |00:00:00.01 | 30 |
---------------------------------------------------------------------------------------------
---------------------------------------------------
filter("C1">=9000)
-----
- dynamic statistics used: statistics for conventional DML
So some elements of the statistics (like NDV , histograms etc) are not feasible to keep upto date everytime DML is performed. But having improved rowcounts yeild benefit to execution plan.
No comments:
Post a Comment