Wednesday, October 27, 2021

Real Time statistics - Part III

Having discussed enough about real time statistics in the previous blog post, it is the time to look about mixed workload in the context of real time statistics.
 
demo@PDB19> show parameter optimizer_real_time_statistics
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_real_time_statistics       boolean     FALSE
demo@PDB19> alter session set optimizer_real_time_statistics=true;
 
Session altered.
demo@PDB19> create table t(
  2     n1 number primary key,
  3     n2 number ,
  4     n3 varchar2(80) );
 
Table created.
demo@PDB19> insert /*+ append */ into t(n1,n2,n3)
  2  with rws as ( select rownum n from dual connect by level <= 1000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
1000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
We got an initial table created and did some direct path load on it, to get the basic statistics available. 
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 NO
 
Did some conventional path dml, to get the real-time statistics to kick in.
 
demo@PDB19> insert into t(n1,n2,n3)
  2  with rws as ( select rownum+1000 n from dual connect by level <= 10000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
10000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> select num_rows,blocks,monitoring
  2  from user_tables
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS MON
---------- ---------- ---
      1000          8 YES
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 YES
 
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value
  2  from user_tab_columns
  3  where table_name ='T';
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  HIGH_VALUE
---------- ---------- ------------ ---------- ----------
N1                  0         1000 C102       C20B
N2                  0           32 C102       C121
N3                  0         1000 6569676874 74776F2068
                                              756E647265
                                              642074776F
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value,notes
  2  from user_tab_col_statistics
  3  where table_name ='T';
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  HIGH_VALUE NOTES
---------- ---------- ------------ ---------- ---------- -------------------------
N1                  0         1000 C102       C20B       STATS_ON_LOAD
N2                  0           32 C102       C121       STATS_ON_LOAD
N3                  0         1000 6569676874 74776F2068 STATS_ON_LOAD
                                              756E647265
                                              642074776F
 
N1                  0              C102       C3020A02   STATS_ON_CONVENTIONAL_DML
N2                  0              C102       C20206     STATS_ON_CONVENTIONAL_DML
N3                  0              6569676874 74776F2074 STATS_ON_CONVENTIONAL_DML
                                              686F757361
                                              6E64207477
                                              6F2068756E
                                              6472656420
                                              6569676874
 
So without flushing database monitoring info we got only the additional column level stats, but no information about the table level stats available yet.
 
demo@PDB19> set serveroutput off
demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t where n2 <= 42;
 
  COUNT(*)
----------
      1764
 
demo@PDB19> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID  99qz8xwvgmh1s, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t where n2 <= 42
 
Plan hash value: 2966233522
 
------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |      1 |00:00:00.01 |      75 |      5 |
|   1 |  SORT AGGREGATE            |      |      1 |      1 |      1 |00:00:00.01 |      75 |      5 |
|*  2 |   TABLE ACCESS STORAGE FULL| T    |      1 |    425 |   1764 |00:00:00.01 |      75 |      5 |
------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - storage("N2"<=42)
       filter("N2"<=42)
 
Note
-----
   - dynamic statistics used: statistics for conventional DML
 
 
24 rows selected.
 
Column stats for conventional dml, but number of rows on table wrong. The above sql got optimized using real time statistics available at the column level stats.
 
The column N2 got low value and high value recorded as C102 and  C20206, which translates to the numerical value of 1 and 105.
 
demo@XEPDB1> variable n number
demo@XEPDB1> exec dbms_stats.convert_raw_value('C102',:n);
 
PL/SQL procedure successfully completed.
 
demo@XEPDB1> print n
 
         N
----------
         1
 
demo@XEPDB1> exec dbms_stats.convert_raw_value('C20206',:n);
 
PL/SQL procedure successfully completed.
 
demo@XEPDB1> print n
 
         N
----------
       105
 
The value of 425 reported as cardinality in the above plan is due to mix of table and column level statistics.
 
Cardinality = Selectivity * Num of rows
Selectivity = required range / available range + 1 /NDV (since it got unbounded, closed)
 
Selectivity = (42 – 1) / (105 -1) + 1/32 = 0 .425480769
Cardinality = 0 .425480769 * 1000 = 425.48
 
We could wait for the monitoring information to flush naturally, or we can flush it manually.
 
demo@PDB19> exec dbms_stats.flush_database_monitoring_info;
 
PL/SQL procedure successfully completed.
 
demo@PDB19> select num_rows,blocks,monitoring
  2  from user_tables
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS MON
---------- ---------- ---
      1000          8 YES
 
demo@PDB19>
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 YES
     11000         76         STATS_ON_CONVENTIONAL_DML
 
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value
  2  from user_tab_columns
  3  where table_name ='T';
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  HIGH_VALUE
---------- ---------- ------------ ---------- ----------
N1                  0         1000 C102       C20B
N2                  0           32 C102       C121
N3                  0         1000 6569676874 74776F2068
                                              756E647265
                                              642074776F
 
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value,notes
  2  from user_tab_col_statistics
  3  where table_name ='T';
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  HIGH_VALUE NOTES
---------- ---------- ------------ ---------- ---------- -------------------------
N1                  0         1000 C102       C20B       STATS_ON_LOAD
N2                  0           32 C102       C121       STATS_ON_LOAD
N3                  0         1000 6569676874 74776F2068 STATS_ON_LOAD
                                              756E647265
                                              642074776F
 
N1                  0              C102       C3020A02   STATS_ON_CONVENTIONAL_DML
N2                  0              C102       C20206     STATS_ON_CONVENTIONAL_DML
N3                  0              6569676874 74776F2074 STATS_ON_CONVENTIONAL_DML
                                              686F757361
                                              6E64207477
                                              6F2068756E
                                              6472656420
                                              6569676874
 
Now the sql against this fresh set of statistics produced new estimates.
 
demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t t1 where n2 <= 42;
 
  COUNT(*)
----------
      1764
 
demo@PDB19> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID  f7d7dtpdauwcb, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t t1 where n2 <= 42
 
Plan hash value: 2966233522
 
---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |      1 |00:00:00.01 |      75 |
|   1 |  SORT AGGREGATE            |      |      1 |      1 |      1 |00:00:00.01 |      75 |
|*  2 |   TABLE ACCESS STORAGE FULL| T    |      1 |   4680 |   1764 |00:00:00.01 |      75 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - storage("N2"<=42)
       filter("N2"<=42)
 
Note
-----
   - dynamic statistics used: statistics for conventional DML
 
Creation of real-time statistics does not invalidate cursors. Application requires to reparse the new sql’s, still there are several opportunities for execution plan to change as the state of statistics change.
 
Stale statistics – underestimate rows.
Real time column statistics – still underestimate rows
Real time table statistics – over estimate rows.
      
 


Friday, October 22, 2021

Real Time statistics - Part II

Having discussed about the importance of Real-Time statistics and how it helps the optimizer to produce better estimates in the previous blog post, it is the time to look at the deep insights about Real-Time statistics feature.
 
In this blogpost we will see about how the real-time statistic will work with base statistics in place.
 
demo@PDB19> create table t(
  2     n1 number primary key,
  3     n2 number ,
  4     n3 varchar2(80) );
 
Table created.
 
demo@PDB19> insert into t(n1,n2,n3)
  2  with rws as ( select rownum n from dual connect by level <= 1000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
1000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
So, we got a table created and loaded it with sample data sets from conventional path dml’s
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
 
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value,notes
  2  from user_tab_col_statistics
  3  where table_name ='T';
 
no rows selected
 
demo@PDB19>
 
we don’t have basic statistics in the data dictionary, since the above load is neither direct path load nor we don’t have any calls to dbms_stats package.
 
Now lets do few more loads to this table
 
demo@PDB19> insert into t(n1,n2,n3)
  2  with rws as ( select rownum+1000 n from dual connect by level <= 10000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
10000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
And the explain plan from the above dml shows this
 
demo@PDB19> explain plan for
  2  insert into t(n1,n2,n3)
  3  with rws as ( select rownum+1000 n from dual connect by level <= 1000 )
  4  select n, ceil(sqrt(n)),
  5     to_char(to_date(n,'j'),'jsp')
  6  from rws;
 
Explained.
 
demo@PDB19> @xplan
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------
Plan hash value: 761049541
 
-----------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |      |     1 |    13 |     2   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL         | T    |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |      |     1 |    13 |     2   (0)| 00:00:01 |
|   3 |    VIEW                          |      |     1 |    13 |     2   (0)| 00:00:01 |
|   4 |     COUNT                        |      |       |       |            |          |
|*  5 |      CONNECT BY WITHOUT FILTERING|      |       |       |            |          |
|   6 |       FAST DUAL                  |      |     1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   5 - filter(LEVEL<=1000)
 
18 rows selected.
 
The above plan shows real-time statistics being gathered during conventional dml, but we still don’t see any real-time statistics kick in
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
 
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value,notes
  2  from user_tab_col_statistics
  3  where table_name ='T';
 
no rows selected
 
demo@PDB19>
 
even after flushing the monitoring info explicitly
 
demo@PDB19> exec dbms_stats.flush_database_monitoring_info;
 
PL/SQL procedure successfully completed.
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
 
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value,notes
  2  from user_tab_col_statistics
  3  where table_name ='T';
 
no rows selected
 
demo@PDB19>
 
so just because the statistics gathered doesn’t mean they are stored. So, if we don’t have any normal statistics on the underlying tables, then we don’t get into real-time statistics.

Wednesday, October 6, 2021

Real Time statistics - Part I

Keeping the object statistics upto date is critical to Oracle database performance and stability. Real time statistics introduced in Oracle 19c is one such feature agument the existing statistics. However they are not quite as real-time as the name suggest, to keep their implementation light weight this feature use the table monitoring mechanism and this even limits the amount of information can be collected.
 
And most importanly this feature is available only on engineerred system  and so certainly targeted at the autonomous databases.
 
Oracle 12c introduced online statistics gathering for direct path inserts and CREATE TABLE AS SELECT..
While 19c introduced real-time statistics, which extends online support for conventional DML statements.
 
Lets setup a test case to show how things worked prior to Oracle 19c and then will show how this feature “real-time statistics” will kick-in to provide improvements in Oracle 19c and above.
 
demo@XEPDB1> create table t (
  2     c1 number constraint t_pk primary key ,
  3     c2 number );
 
Table created.
 
demo@XEPDB1> insert /*+ append */ into t(c1,c2)
  2  select rownum, rownum
  3  from dual
  4  connect by level <=10000;
 
10000 rows created.
 
demo@XEPDB1> commit;
 
Commit complete.
 
demo@XEPDB1> select num_rows,blocks,stale_stats
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S
---------- ---------- -------
     10000         22 NO
 
demo@XEPDB1> select column_name,low_value,high_value,num_distinct,notes
  2  from user_tab_col_statistics
  3  where table_name ='T';
 
COLUMN_NAM LOW_VALUE  HIGH_VALUE NUM_DISTINCT NOTES
---------- ---------- ---------- ------------ -------------------------
C1         C102       C302              10000 STATS_ON_LOAD
C2         C102       C302              10000 STATS_ON_LOAD
 
So created a table and did direct path loads to get the statistics upto date with the latest data, now lets parse few sql’s and validate the estimates
 
demo@XEPDB1> select /*+ gather_plan_statistics */ max(c2), count(*)
  2  from t
  3  where c1 >= 9000;
 
   MAX(C2)   COUNT(*)
---------- ----------
     10000       1001
 
demo@XEPDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID  6fktndansb5hj, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max(c2), count(*) from t where c1
>= 9000
 
Plan hash value: 2966233522
 
----------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      21 |     18 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |      21 |     18 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   1001 |   1001 |00:00:00.01 |      21 |     18 |
----------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("C1">=9000)
 
With right statistics in place estimates were close to actuals.  Lets do few more data loads and verify the estimates
 
demo@XEPDB1> insert into t(c1,c2)
  2  select rownum+10000, rownum+10000
  3  from dual
  4  connect by level <=1000;
 
1000 rows created.
 
demo@XEPDB1> commit;
 
Commit complete.
 
demo@XEPDB1> select /*+ gather_plan_statistics */ max(c2), count(*)
  2  from t t2
  3  where c1 >= 9000;
 
   MAX(C2)   COUNT(*)
---------- ----------
     11000       2001
 
demo@XEPDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID  6vsgr79023ggn, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max(c2), count(*) from t t2 where
c1 >= 9000
 
Plan hash value: 2966233522
 
-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |      30 |
|   1 |  SORT AGGREGATE    |      |      1 |      1 |      1 |00:00:00.01 |      30 |
|*  2 |   TABLE ACCESS FULL| T    |      1 |   1001 |   2001 |00:00:00.01 |      30 |
-------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter("C1">=9000)
 
 
20 rows selected.
 
demo@XEPDB1>
 
with few more 1000 rows got added to the table, but those changes were not yet available on statistics and due to that estimates were misleading.
 
But when we get into 19c database, the optimizer will keep track off all the dml changes in the data dictionary, And these statistics were feed back into the optimizer while optimising the future sql’s.
Here is an example of this performed in 19c.
 
demo@PDB19> insert into t(c1,c2)
  2  select rownum+10000, rownum+10000
  3  from dual
  4  connect by level <=1000;
 
1000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> select num_rows,blocks,notes,stale_stats
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS NOTES                     STALE_S
---------- ---------- ------------------------- -------
     10000         22                           NO
 
Here is where the 19c enhancements comes into play, when we query the data dictionary, we now have additional statistics and this new statistics we have is derived from normal DML activities that have occurred in the database and this is not a full GATHER Statistics option. The real time statistics can keep trace of simple column level statistics like whether low/high value of columns have changed.
 
demo@PDB19> select column_name,low_value,high_value,num_distinct,notes
  2  from user_tab_col_statistics
  3  where table_name ='T';
 
COLUMN_NAM LOW_VALUE  HIGH_VALUE NUM_DISTINCT NOTES
---------- ---------- ---------- ------------ -------------------------
C1         C102       C302              10000 STATS_ON_LOAD
C2         C102       C302              10000 STATS_ON_LOAD
C1         C102       C3020737                STATS_ON_CONVENTIONAL_DML
C2         C102       C3020737                STATS_ON_CONVENTIONAL_DML
 
demo@PDB19> select /*+ gather_plan_statistics */ max(c2), count(*)
  2  from t t1
  3  where c1 >= 9000;
 
   MAX(C2)   COUNT(*)
---------- ----------
     11000       2001
 
demo@PDB19> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
SQL_ID  4q9b758gr91m8, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ max(c2), count(*) from t t1 where
c1 >= 9000
 
Plan hash value: 2966233522
 
---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |      1 |00:00:00.01 |      30 |
|   1 |  SORT AGGREGATE            |      |      1 |      1 |      1 |00:00:00.01 |      30 |
|*  2 |   TABLE ACCESS STORAGE FULL| T    |      1 |   1709 |   2001 |00:00:00.01 |      30 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - storage("C1">=9000)
       filter("C1">=9000)
 
Note
-----
   - dynamic statistics used: statistics for conventional DML
 
Now the estimates is much close to actuals, we can expect a little variation from there from the true values, but the real-time statistics is not about performing a full gather statistics operation on a table.
So some elements of the statistics (like NDV , histograms etc) are not feasible to keep upto date everytime DML is performed. But having improved rowcounts yeild benefit to execution plan.
 
More to come, stay Tuned.