THE RAJESH BLOG: 2021

Tuesday, November 16, 2021

Real Time statistics - Part IV

Today’s post is about the effect of real-time statistics with Histograms in place. Let’s build a test case to demonstrate the effect.

demo@PDB19> show parameter optimizer_real_time_statistics

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_real_time_statistics       boolean     FALSE
demo@PDB19> alter session set optimizer_real_time_statistics=true;

Session altered.

demo@PDB19> create table t(
2     n1 number primary key,
3     n2 number ,
4     n3 varchar2(80) );

Table created.

demo@PDB19>
demo@PDB19> insert into t(n1,n2,n3)
2 with rws as ( select rownum n from dual connect by level <= 1000 )
3 select n, ceil(sqrt(n)),
4     to_char(to_date(n,'j'),'jsp')
5 from rws;

1000 rows created.

demo@PDB19> commit;

Commit complete.

demo@PDB19> begin
2     dbms_Stats.gather_table_stats(user,'T',
3             no_invalidate=>false,
4             method_opt=>'for all columns size auto for columns N2 size 2048');
5 end;
6 /

PL/SQL procedure successfully completed.

So we got a table created with small set of data in it and build a histogram on the column N2. The number of distinct values in the column is 32, which is way less than the number of buckets (2048) we requested, so we easily ended up with a nice Frequency histogram.

demo@PDB19> select column_name,num_nulls,num_distinct,low_value,get_stats_val(data_type,low_value) low_val,
2      high_value,get_stats_val(data_type,high_value) high_val, histogram
3 from user_tab_columns
4 where table_name ='T';

COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL    HIGH_VALUE   HIGH_VAL        HISTOGRAM
---------- ---------- ------------ ---------- ---------- ------------ --------------- ---------
N1                  0         1000 C102       1          C20B         1000            NONE
N2                  0           32 C102       1          C121         32              FREQUENCY
N3                  0         1000 6569676874 eight      74776F206875 two hundred two NONE

demo@PDB19> select endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='N2'
5 order by endpoint_value;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              1              1
              4              2
              9              3
             16              4
             25              5
             36              6
             49              7
             64              8
             81              9
            100             10
            121             11
            144             12
            169             13
            196             14
            225             15
            256             16
            289             17
            324             18
            361             19
            400             20
            441             21
            484             22
            529             23
            576             24
            625             25
            676             26
            729             27
            784             28
            841             29
            900             30
            961             31
           1000             32

32 rows selected.

Let’s do few more conventional load for the real-time statistics to kick in.

demo@PDB19> insert into t(n1,n2,n3)
2 with rws as ( select rownum+10000 n from dual connect by level <= 10000 )
3 select n, ceil(sqrt(n)), to_char(to_date(n,'j'),'jsp')
4 from rws;

10000 rows created.

demo@PDB19> commit;

Commit complete.

With monitoring information not yet flushed we got ended up with Real-Time column level statistics

demo@PDB19> select num_rows,blocks,monitoring
2 from user_tables
3 where table_name ='T';

NUM_ROWS     BLOCKS MON
---------- ---------- ---
      1000          5 YES

demo@PDB19> select num_rows,blocks,stale_stats,notes
2 from user_tab_statistics
3 where table_name ='T';

NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          5 YES

demo@PDB19> select column_name,num_nulls,num_distinct,low_value,get_stats_val(data_type,low_value) low_val,
2      high_value,get_stats_val(data_type,high_value) high_val, histogram
3 from user_tab_columns
4 where table_name ='T';

COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL    HIGH_VALUE HIGH_VAL        HISTOGRAM
---------- ---------- ------------ ---------- ---------- ----------- --------------- ---------
N1                  0         1000 C102       1          C20B        1000            NONE
N2                  0           32 C102       1          C121        32              FREQUENCY
N3                  0         1000 6569676874 eight      74776F20687 two hundred two NONE

demo@PDB19> select t1.column_name,t1.num_nulls,t1.num_distinct,t1.low_value,get_stats_val(t2.data_type,t1.low_value) low_val,
2      t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
3 from user_tab_col_statistics t1,
4       user_tab_columns t2
5 where t1.table_name ='T'
6 and t1.table_name = t2.table_name
7 and t1.column_name = t2.column_name;

COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL    HIGH_VALUE   HIGH_VAL        HISTOGRAM NOTES
---------- ---------- ------------ ---------- ---------- ------------ --------------- ---------- -------------------------
N1                  0         1000 C102       1          C20B         1000            NONE
N2                  0           32 C102       1          C121         32              FREQUENCY
N3                  0         1000 6569676874 eight      74776F206875 two hundred two NONE
N1                  0              C102       1          C30264       19900                      STATS_ON_CONVENTIONAL_DML
N2                  0              C102       1          C2022B       142                        STATS_ON_CONVENTIONAL_DML
N3                  0              6569676874 eight      74776F206875 two hundred two            STATS_ON_CONVENTIONAL_DML

6 rows selected.

demo@PDB19> select endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='N2'
5 order by endpoint_value;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              1              1
              4              2
              9              3
             16              4
             25              5
             36              6
             49              7
             64              8
             81              9
            100             10
            121             11
            144             12
            169             13
            196             14
            225             15
            256             16
            289             17
            324             18
            361             19
            400             20
            441             21
            484             22
            529             23
            576             24
            625             25
            676             26
            729             27
            784             28
            841             29
            900             30
            961             31
           1000             32

32 rows selected.

Now parsing the sql against the table produces the plan like this

demo@PDB19> set serveroutput off
demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t where n2 < 42;

COUNT(*)
----------
      1000

demo@PDB19> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------
SQL_ID 469b54saa49vu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t where n2 < 42

Plan hash value: 2966233522

---------------------------------------------------------------------------------------------
| Id | Operation                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |      1 |00:00:00.01 |      83 |
|   1 | SORT AGGREGATE            |      |      1 |      1 |      1 |00:00:00.01 |      83 |
|* 2 |   TABLE ACCESS STORAGE FULL| T    |      1 |   1000 |   1000 |00:00:00.01 |      83 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("N2"<42)
       filter("N2"<42)

Note
-----
   - dynamic statistics used: statistics for conventional DML

All the values in the histogram are < 42, so the optimizer think it will return all rows from the table. Hence the estimated cardinality was reported as 1000 , that maps to the number of rows statistics available in the data dictionary.

Lets flush the monitoring info manually and verify the statistics

demo@PDB19> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

demo@PDB19>
demo@PDB19> select num_rows,blocks,monitoring
2 from user_tables
3 where table_name ='T';

NUM_ROWS     BLOCKS MON
---------- ---------- ---
      1000          5 YES

demo@PDB19>
demo@PDB19> select num_rows,blocks,stale_stats,notes
2 from user_tab_statistics
3 where table_name ='T';

NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          5 YES
     11000         80         STATS_ON_CONVENTIONAL_DML

demo@PDB19>
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,get_stats_val(data_type,low_value) low_val,
2      high_value,get_stats_val(data_type,high_value) high_val, histogram
3 from user_tab_columns
4 where table_name ='T';

COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL    HIGH_VALUE HIGH_VAL     HISTOGRAM
---------- ---------- ------------ ---------- ---------- ----------- ----------- ----------
N1                  0         1000 C102       1          C20B        1000         NONE
N2                  0           32 C102       1          C121        32           FREQUENCY
N3                  0         1000 6569676874 eight      74776F20687 two hundred NONE

demo@PDB19>
demo@PDB19> select t1.column_name,t1.num_nulls,t1.num_distinct,t1.low_value,get_stats_val(t2.data_type,t1.low_value) low_val,
2      t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
3 from user_tab_col_statistics t1,
4       user_tab_columns t2
5 where t1.table_name ='T'
6 and t1.table_name = t2.table_name
7 and t1.column_name = t2.column_name;

COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE LOW_VAL    HIGH_VALUE HIGH_VAL    HISTOGRAM       NOTES
---------- ---------- ------------ ---------- ---------- ----------- ----------- --------------- -------------------------
N1                  0         1000 C102       1          C20B        1000        NONE
N2                  0           32 C102       1          C121        32          FREQUENCY
N3                  0         1000 6569676874 eight      74776F20687 two hundred NONE
N1                  0              C102       1          C3026222    19733                       STATS_ON_CONVENTIONAL_DML
N2                  0              C102       1          C2022A      141                         STATS_ON_CONVENTIONAL_DML
N3                  0              6569676874 eight      74776F20687 two hundred                 STATS_ON_CONVENTIONAL_DML

6 rows selected.

demo@PDB19>
demo@PDB19> select endpoint_number, endpoint_value
2 from user_tab_histograms
3 where table_name ='T'
4 and column_name ='N2'
5 order by endpoint_value;

ENDPOINT_NUMBER ENDPOINT_VALUE
--------------- --------------
              1              1
              4              2
              9              3
             16              4
             25              5
             36              6
             49              7
             64              8
             81              9
            100             10
            121             11
            144             12
            169             13
            196             14
            225             15
            256             16
            289             17
            324             18
            361             19
            400             20
            441             21
            484             22
            529             23
            576             24
            625             25
            676             26
            729             27
            784             28
            841             29
            900             30
            961             31
           1000             32

32 rows selected.

demo@PDB19>

Real time statistics update the table and column level statistics, but not the histograms, now parsing the sql against the fresh set of statistics will produce the plan like this

demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t t1 where n2 < 42;

COUNT(*)
----------
      1000

demo@PDB19> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID 8bpt64wrnspbu, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t t1 where n2 < 42

Plan hash value: 2966233522

---------------------------------------------------------------------------------------------
| Id | Operation                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |      1 |00:00:00.01 |      83 |
|   1 | SORT AGGREGATE            |      |      1 |      1 |      1 |00:00:00.01 |      83 |
|* 2 |   TABLE ACCESS STORAGE FULL| T    |      1 | 11000 |   1000 |00:00:00.01 |      83 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("N2"<42)
       filter("N2"<42)

Note
-----
   - dynamic statistics used: statistics for conventional DML

24 rows selected.

The optimizer knows that there are 11000 rows in the table, but the histogram still says they are all in the range 1 to 32, even though the column high value statistics represent 141. So it estimates the cardinality as the number of rows in the table.

The key thing to note here is Real time statistics do update the table and column level statistics but not the histograms & this will be a viable problem if your queries are heavily dependent on histograms for optimization.

Wednesday, October 27, 2021

Real Time statistics - Part III

Having discussed enough about real time statistics in the previous blog post, it is the time to look about mixed workload in the context of real time statistics.

demo@PDB19> show parameter optimizer_real_time_statistics

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_real_time_statistics       boolean     FALSE
demo@PDB19> alter session set optimizer_real_time_statistics=true;

Session altered.
demo@PDB19> create table t(
2     n1 number primary key,
3     n2 number ,
4     n3 varchar2(80) );

Table created.
demo@PDB19> insert /*+ append */ into t(n1,n2,n3)
2 with rws as ( select rownum n from dual connect by level <= 1000 )
3 select n, ceil(sqrt(n)),
4     to_char(to_date(n,'j'),'jsp')
5 from rws;

1000 rows created.

demo@PDB19> commit;

Commit complete.

We got an initial table created and did some direct path load on it, to get the basic statistics available.

demo@PDB19> select num_rows,blocks,stale_stats,notes
2 from user_tab_statistics
3 where table_name ='T';

NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 NO

Did some conventional path dml, to get the real-time statistics to kick in.

demo@PDB19> insert into t(n1,n2,n3)
2 with rws as ( select rownum+1000 n from dual connect by level <= 10000 )
3 select n, ceil(sqrt(n)),
4     to_char(to_date(n,'j'),'jsp')
5 from rws;

10000 rows created.

demo@PDB19> commit;

Commit complete.

demo@PDB19> select num_rows,blocks,monitoring
  2 from user_tables
3 where table_name ='T';

NUM_ROWS     BLOCKS MON
---------- ---------- ---
      1000          8 YES

demo@PDB19> select num_rows,blocks,stale_stats,notes
2 from user_tab_statistics
3 where table_name ='T';

NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 YES

demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value
2 from user_tab_columns
3 where table_name ='T';

COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE HIGH_VALUE
---------- ---------- ------------ ---------- ----------
N1                  0         1000 C102       C20B
N2                  0           32 C102       C121
N3                  0         1000 6569676874 74776F2068
                                              756E647265
                                              642074776F
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value,notes
2 from user_tab_col_statistics
3 where table_name ='T';

COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE HIGH_VALUE NOTES
---------- ---------- ------------ ---------- ---------- -------------------------
N1                  0         1000 C102       C20B       STATS_ON_LOAD
N2                  0           32 C102       C121       STATS_ON_LOAD
N3                  0         1000 6569676874 74776F2068 STATS_ON_LOAD
                                              756E647265
                                              642074776F

N1                  0              C102       C3020A02   STATS_ON_CONVENTIONAL_DML
N2                  0              C102       C20206     STATS_ON_CONVENTIONAL_DML
N3                  0              6569676874 74776F2074 STATS_ON_CONVENTIONAL_DML
                                              686F757361
                                              6E64207477
                                              6F2068756E
                                              6472656420
                                              6569676874

So without flushing database monitoring info we got only the additional column level stats, but no information about the table level stats available yet.

demo@PDB19> set serveroutput off
demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t where n2 <= 42;

COUNT(*)
----------
      1764

demo@PDB19> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID 99qz8xwvgmh1s, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t where n2 <= 42

Plan hash value: 2966233522

------------------------------------------------------------------------------------------------------
| Id | Operation                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |      1 |00:00:00.01 |      75 |      5 |
|   1 | SORT AGGREGATE            |      |      1 |      1 |      1 |00:00:00.01 |      75 |      5 |
|* 2 |   TABLE ACCESS STORAGE FULL| T    |      1 |    425 |   1764 |00:00:00.01 |      75 |      5 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("N2"<=42)
       filter("N2"<=42)

Note
-----
   - dynamic statistics used: statistics for conventional DML

24 rows selected.

Column stats for conventional dml, but number of rows on table wrong. The above sql got optimized using real time statistics available at the column level stats.

The column N2 got low value and high value recorded as C102 and C20206, which translates to the numerical value of 1 and 105.

demo@XEPDB1> variable n number
demo@XEPDB1> exec dbms_stats.convert_raw_value('C102',:n);

PL/SQL procedure successfully completed.

demo@XEPDB1> print n

         N
----------
         1

demo@XEPDB1> exec dbms_stats.convert_raw_value('C20206',:n);

PL/SQL procedure successfully completed.

demo@XEPDB1> print n

         N
----------
       105

The value of 425 reported as cardinality in the above plan is due to mix of table and column level statistics.

Cardinality = Selectivity * Num of rows
Selectivity = required range / available range + 1 /NDV (since it got unbounded, closed)

Selectivity = (42 – 1) / (105 -1) + 1/32 = 0 .425480769
Cardinality = 0 .425480769 * 1000 = 425.48

We could wait for the monitoring information to flush naturally, or we can flush it manually.

demo@PDB19> exec dbms_stats.flush_database_monitoring_info;

PL/SQL procedure successfully completed.

demo@PDB19> select num_rows,blocks,monitoring
2 from user_tables
3 where table_name ='T';

NUM_ROWS     BLOCKS MON
---------- ---------- ---
      1000          8 YES

demo@PDB19>
demo@PDB19> select num_rows,blocks,stale_stats,notes
2 from user_tab_statistics
3 where table_name ='T';

NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 YES
     11000         76         STATS_ON_CONVENTIONAL_DML

demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value
2 from user_tab_columns
3 where table_name ='T';

COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE HIGH_VALUE
---------- ---------- ------------ ---------- ----------
N1                  0         1000 C102       C20B
N2                  0           32 C102       C121
N3                  0         1000 6569676874 74776F2068
                                              756E647265
                                              642074776F

demo@PDB19> select column_name,num_nulls,num_distinct,low_value,high_value,notes
2 from user_tab_col_statistics
3 where table_name ='T';

COLUMN_NAM NUM_NULLS NUM_DISTINCT LOW_VALUE HIGH_VALUE NOTES
---------- ---------- ------------ ---------- ---------- -------------------------
N1                  0         1000 C102       C20B       STATS_ON_LOAD
N2                  0           32 C102       C121       STATS_ON_LOAD
N3                  0         1000 6569676874 74776F2068 STATS_ON_LOAD
                                              756E647265
                                              642074776F

N1                  0              C102       C3020A02   STATS_ON_CONVENTIONAL_DML
N2                  0              C102       C20206     STATS_ON_CONVENTIONAL_DML
N3                  0              6569676874 74776F2074 STATS_ON_CONVENTIONAL_DML
                                              686F757361
                                              6E64207477
                                              6F2068756E
                                              6472656420
                                              6569676874

Now the sql against this fresh set of statistics produced new estimates.

demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t t1 where n2 <= 42;

COUNT(*)
----------
      1764

demo@PDB19> select * from table(dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID f7d7dtpdauwcb, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t t1 where n2 <= 42

Plan hash value: 2966233522

---------------------------------------------------------------------------------------------
| Id | Operation                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |      1 |00:00:00.01 |      75 |
|   1 | SORT AGGREGATE            |      |      1 |      1 |      1 |00:00:00.01 |      75 |
|* 2 |   TABLE ACCESS STORAGE FULL| T    |      1 |   4680 |   1764 |00:00:00.01 |      75 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("N2"<=42)
       filter("N2"<=42)

Note
-----
   - dynamic statistics used: statistics for conventional DML

Creation of real-time statistics does not invalidate cursors. Application requires to reparse the new sql’s, still there are several opportunities for execution plan to change as the state of statistics change.

Stale statistics – underestimate rows.
Real time column statistics – still underestimate rows
Real time table statistics – over estimate rows.

Books I follow...

Advanced SQL Functions in Oracle 10g - by Richard Walsh Earp
Cost-Based Oracle Fundamentals - By Jonathan Lewis
Effective Oracle by Design - by Thomas kyte
Expert One on One Oracle - by Thomas kyte
Expert Oracle - by Thomas kyte
Expert Oracle Database Architecture Oracle Database 9i, 10g - Thomas Kyte
Expert Oracle Database Architecture Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions - Thomas Kyte
Expert Oracle Practices Oracle Database Administration from the Oak Table
High-Performance Oracle Database Applications - by Donald K. Burleson
Mastering Oracle PL/SQL: Practical Solutions - by Connor McDonald
Mastering Oracle Scheduler in Oracle 11g Databases - by Ronald Rood
Mastering Oracle SQL and SQL*Plus - By LEX DE HAAN
Optimizing Oracle Performance - By Cary Millsap
Oracle Database 10g Performance Tuning Tips & Techniques - By Richard J. Niemiec
Oracle Database 10g XML & SQL: Design, Build & Manage XML Applications in Java, C, C++ & PL/SQL - by Mark V. Scardina, Ben Chang and Jinyu Wang
Oracle Database 10g: The Complete Reference - by Kevin Loney
Oracle Database 11g DBA Handbook - by Bob Bryla , Kevin Loney
Oracle Database 11g New Features - Maximize the new capabilities of the Latest Database Release - by Robert G. Freeman
Oracle Performance Tuning for 10gR2 - By Gavin Powell
Oracle PL/SQL for DBAs By Steven Feuerstein, Arup Nanda
Oracle SQL*Plus: The Definitive Guide - By Jonathan Gennick
Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning by Richmond Shee, Kirtikumar Deshpande and K Gopalakrishnan
Oracle8i Internal Services for Waits, Latches, Locks, and Memory - by Steve Adams
Professional Oracle 8i Application Programming - By Thomas Kyte
Secrets of the Oracle Database - By Norbert Debes
Troubleshooting Oracle Performance - By Christian Antognini
Tuning Third-party Vendor Oracle systems Tuning when you can't touch the code - By Mike Ault
Using Oracle SQL Stored Outlines & Optimizer Plan Stability - By Mike Ault