Thursday, January 27, 2022

Real Time statistics - Part IX

In this blog post we are going to see about the effect of real time statistics feature with locking statistics in place.
 
demo@PDB19> show parameter optimizer_real_time_statistics
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_real_time_statistics       boolean     FALSE
demo@PDB19> alter session set optimizer_real_time_statistics=true;
 
Session altered.
demo@PDB19> create table t(
  2     n1 number primary key,
  3     n2 number ,
  4     n3 varchar2(80) );
 
Table created.
 
demo@PDB19> insert /*+ append */ into t(n1,n2,n3)
  2  with rws as ( select rownum n from dual connect by level <= 1000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
1000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> exec dbms_stats.lock_table_stats(user,'T');
 
PL/SQL procedure successfully completed.
 
demo@PDB19> insert into t(n1,n2,n3)
  2  with rws as ( select rownum+1000 n from dual connect by level <= 1000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
1000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> select num_rows,blocks,monitoring
  2  from user_tables
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS MON
---------- ---------- ---
      1000          8 YES
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 YES
 
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,get_stats_val(data_type,low_value) low_val,
  2      high_value,get_stats_val(data_type,high_value) high_val, histogram
  3  from user_tab_columns
  4  where table_name ='T';
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  LOW_VAL    HIGH_VALUE HIGH_VAL             HISTOGRAM
---------- ---------- ------------ ---------- ---------- ---------- -------------------- ---------------
N1                  0         1000 C102       1          C20B       1000                 NONE
N2                  0           32 C102       1          C121       32                   NONE
N3                  0         1000 6569676874 eight      74776F2068 two hundred two      NONE
                                                         756E647265
                                                         642074776F
 
 
demo@PDB19> select t1.column_name,t1.num_nulls,t1.num_distinct,t1.low_value,get_stats_val(t2.data_type,t1.low_value) low_val,
  2      t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
  3  from user_tab_col_statistics t1,
  4       user_tab_columns t2
  5  where t1.table_name ='T'
  6  and t1.table_name = t2.table_name
  7  and t1.column_name = t2.column_name;
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  LOW_VAL    HIGH_VALUE HIGH_VAL             HISTOGRAM       NOTES
---------- ---------- ------------ ---------- ---------- ---------- -------------------- --------------- -------------------------
N1                  0         1000 C102       1          C20B       1000                 NONE            STATS_ON_LOAD
N2                  0           32 C102       1          C121       32                   NONE            STATS_ON_LOAD
N3                  0         1000 6569676874 eight      74776F2068 two hundred two      NONE            STATS_ON_LOAD
                                                         756E647265
                                                         642074776F
 
 
demo@PDB19>
 
locking the statistics will prevent the collection of statistics in schema/database wide operations, also it prevents the collection of real-time statistics during conventional path loads.

Sunday, January 23, 2022

Real Time statistics - Part VIII

In this blog post we are going to see about the effect of real time statistics feature with global temporary tables (GTT) in place.
 
demo@PDB19> create global temporary table t(
  2     n1 number primary key,
  3     n2 number ,
  4     n3 varchar2(80) )
  5  on commit preserve rows    ;
 
Table created.
 
demo@PDB19> insert /*+ append */ into t(n1,n2,n3)
  2  with rws as ( select rownum n from dual connect by level <= 1000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
1000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> insert into t(n1,n2,n3)
  2  with rws as ( select rownum+1000 n from dual connect by level <= 1000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
1000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> exec dbms_stats.flush_database_monitoring_info;
 
PL/SQL procedure successfully completed.
 
demo@PDB19> select num_rows,blocks,monitoring
  2  from user_tables
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS MON
---------- ---------- ---
                      NO
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
 
      1000          5
 
demo@PDB19> select column_name,num_nulls,num_distinct,low_value,get_stats_val(data_type,low_value) low_val,
  2      high_value,get_stats_val(data_type,high_value) high_val, histogram
  3  from user_tab_columns
  4  where table_name ='T';
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  LOW_VAL    HIGH_VALUE HIGH_VAL             HISTOGRAM
---------- ---------- ------------ ---------- ---------- ---------- -------------------- ---------------
N1                                                                                       NONE
N2                                                                                       NONE
N3                                                                                       NONE
 
demo@PDB19>
demo@PDB19> select t1.column_name,t1.num_nulls,t1.num_distinct,t1.low_value,get_stats_val(t2.data_type,t1.low_value) low_val,
  2      t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
  3  from user_tab_col_statistics t1,
  4       user_tab_columns t2
  5  where t1.table_name ='T'
  6  and t1.table_name = t2.table_name
  7  and t1.column_name = t2.column_name;
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  LOW_VAL    HIGH_VALUE HIGH_VAL             HISTOGRAM       NOTES
---------- ---------- ------------ ---------- ---------- ---------- -------------------- --------------- -------------------------
N1                  0         1000 C102       1          C20B       1000                 NONE
N2                  0           32 C102       1          C121       32                   NONE
N3                  0         1000 6569676874 eight      74776F2068 two hundred two      NONE
                                                         756E647265
                                                         642074776F
 
 
demo@PDB19>
 
the concept of real-time statistics does not fit with GTT. Statistics on GTT have always been problematic and statistics gathered on one session can be used by the sql’s parsing in another session, to solve this problem Oracle 12c introduced a concept called session level statistics on GTT

Monday, January 17, 2022

Real Time statistics - Part VII

In this blog post we are going to see about the effect of real time statistics feature with table truncate and reload operations.
 
demo@PDB19> show parameter optimizer_real_time_statistics
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_real_time_statistics       boolean     FALSE
demo@PDB19> alter session set optimizer_real_time_statistics=true;
 
Session altered.
 
demo@PDB19> create table t(
  2     n1 number primary key,
  3     n2 number ,
  4     n3 varchar2(80) );
 
Table created.
 
demo@PDB19> insert /*+ append */ into t(n1,n2,n3)
  2  with rws as ( select rownum n from dual connect by level <= 1000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
1000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> truncate table t;
 
Table truncated.
 
That above insert statement was a direct path insert – which led to basic statistics gathered on the table – however this truncate statement removed the data from the table but not its statistics
 
demo@PDB19> select num_rows,blocks,monitoring
  2  from user_tables
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS MON
---------- ---------- ---
      1000          8 YES
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 YES
 
Now when we reload the data using conventional path dml, the real-time statistics will kick in and end up like this
 
demo@PDB19> insert into t(n1,n2,n3)
  2  with rws as ( select rownum+1000 n from dual connect by level <= 10000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
10000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> exec dbms_stats.flush_database_monitoring_info;
 
PL/SQL procedure successfully completed.
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 YES
     10000         73         STATS_ON_CONVENTIONAL_DML
 
demo@PDB19> select t1.column_name,t1.num_nulls,t1.num_distinct,t1.low_value,get_stats_val(t2.data_type,t1.low_value) low_val,
  2      t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
  3  from user_tab_col_statistics t1,
  4       user_tab_columns t2
  5  where t1.table_name ='T'
  6  and t1.table_name = t2.table_name
  7  and t1.column_name = t2.column_name;
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  LOW_VAL    HIGH_VALUE HIGH_VAL             HISTOGRAM       NOTES
---------- ---------- ------------ ---------- ---------- ---------- -------------------- --------------- -------------------------
N1                  0         1000 C102       1          C20B       1000                 NONE            STATS_ON_LOAD
N2                  0           32 C102       1          C121       32                   NONE            STATS_ON_LOAD
N3                  0         1000 6569676874 eight      74776F2068 two hundred two      NONE            STATS_ON_LOAD
                                                         756E647265
                                                         642074776F
 
N1                  0              C102       1          C3020A59   10988                                STATS_ON_CONVENTIONAL_DML
N2                  0              C102       1          C20206     105                                  STATS_ON_CONVENTIONAL_DML
N3                  0              6569676874 eight      74776F2074 two thousand two hun                 STATS_ON_CONVENTIONAL_DML
                                                         686F757361 dred twenty-six
                                                         6E64207477
                                                         6F2068756E
                                                         6472656420
                                                         7477656E74
                                                         792D736978
 
 
The base statistics says we got 1000 rows initially load and then real-time statistics says we got 10K rows loaded, now when we parse a sql with predicate matching the subset of data in the range that got truncated before returns no rows (because the data in the range of 1 to 1000 got truncated).
 
demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t where n1 < 501;
 
  COUNT(*)
----------
         0
 
demo@PDB19> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID  agtrz8n1288u0, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t where n1 < 501
 
Plan hash value: 1241160216
 
-------------------------------------------------------------------------------------------
| Id  | Operation         | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE   |             |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN| SYS_C007944 |      1 |    455 |      0 |00:00:00.01 |       2 |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("N1"<501)
 
Note
-----
   - dynamic statistics used: statistics for conventional DML
 
 
23 rows selected.
 
demo@PDB19>
 
but looking at the column level statistics it says the column N1 got data in the range of 1 to 11000, so when we parse the sql, the optimizer took into this column level statistics and estimated the cardinality as
 
cardinality = number of rows * ( limit – low_value) / ( high_value – low_value )
cardinality = 10000 * ( 501 – 1 ) / ( 10988 – 1 ) = 455.08
 
so if the application requires the frequent truncate and reload, then ensure that either the reload is done as part of direct path loads or if it is a conventional path load have them stats regathered manually at the end of the load to prevent the suboptimal plans post the reload.

Friday, January 14, 2022

Real Time statistics - Part VI

In the previous blog post we discussed about how lone deletes are not tracked by real-time statistics, however there is an exception to that. Deletes are tracked by real-time statistics if they were followed by an update statement.
 
demo@PDB19> show parameter optimizer_real_time_statistics
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_real_time_statistics       boolean     FALSE
demo@PDB19> alter session set optimizer_real_time_statistics=true;
 
Session altered.
 
demo@PDB19> create table t(
  2     n1 number primary key,
  3     n2 number ,
  4     n3 varchar2(80) );
 
Table created.
 
demo@PDB19> insert /*+ append */ into t(n1,n2,n3)
  2  with rws as ( select rownum n from dual connect by level <= 1000 )
  3  select n, ceil(sqrt(n)),
  4     to_char(to_date(n,'j'),'jsp')
  5  from rws;
 
1000 rows created.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> update t
  2  set n2 = ceil(sqrt(n1+1000)) ,
  3      n3 = 'z'||to_char(to_date(n1+1000,'j'),'jsp')
  4  where n1 <= 500;
 
500 rows updated.
 
demo@PDB19> commit;
 
Commit complete.
 
demo@PDB19> delete from t where n1 > 800;
 
200 rows deleted.
 
demo@PDB19> commit;
 
Commit complete.
 
The reduction in the number of rows by this delete command was tracked by real-time statistics.
 
demo@PDB19> exec dbms_stats.flush_database_monitoring_info;
 
PL/SQL procedure successfully completed.
 
demo@PDB19> select num_rows,blocks,monitoring
  2  from user_tables
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS MON
---------- ---------- ---
      1000          8 YES
 
demo@PDB19> select num_rows,blocks,stale_stats,notes
  2  from user_tab_statistics
  3  where table_name ='T';
 
  NUM_ROWS     BLOCKS STALE_S NOTES
---------- ---------- ------- -------------------------
      1000          8 YES
       800         16         STATS_ON_CONVENTIONAL_DML
 
However looking at the column level statistics, no real-time statistics were observed for the column N1, since it was not updated.
 
demo@PDB19> select t1.column_name,t1.num_nulls,t1.num_distinct,t1.low_value,get_stats_val(t2.data_type,t1.low_value) low_val,
  2      t1.high_value,get_stats_val(t2.data_type,t1.high_value) high_val, t1.histogram , t1.notes
  3  from user_tab_col_statistics t1,
  4       user_tab_columns t2
  5  where t1.table_name ='T'
  6  and t1.table_name = t2.table_name
  7  and t1.column_name = t2.column_name;
 
COLUMN_NAM  NUM_NULLS NUM_DISTINCT LOW_VALUE  LOW_VAL    HIGH_VALUE HIGH_VAL             HISTOGRAM  NOTES
---------- ---------- ------------ ---------- ---------- ---------- -------------------- ---------- -------------------------
N1                  0         1000 C102       1          C20B       1000                 NONE       STATS_ON_LOAD
N2                  0           32 C102       1          C121       32                   NONE       STATS_ON_LOAD
N3                  0         1000 6569676874 eight      74776F2068 two hundred two      NONE       STATS_ON_LOAD
                                                         756E647265
                                                         642074776F
 
N2                  0              C102       1          C128       39                              STATS_ON_CONVENTIONAL_DML
N3                  0              6569676874 eight      7A6F6E6520 zone thousand twenty            STATS_ON_CONVENTIONAL_DML
                                                         74686F7573 -nine
                                                         616E642074
                                                         77656E7479
                                                         2D6E696E65
 
Now parsing a sql having a predicate on the column N1 - that was part of our “Delete” statement, but don’t have real-time column level statistics on it yet -  produces a plan like this.
 
demo@PDB19> select /*+ gather_plan_statistics */ count(*) from t where n1 > 800;
 
  COUNT(*)
----------
         0
 
demo@PDB19> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
SQL_ID  cfbuu2gdzj40k, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t where n1 > 800
 
Plan hash value: 496403668
 
-------------------------------------------------------------------------------------------
| Id  | Operation         | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |             |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE   |             |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN| SYS_C007942 |      1 |    160 |      0 |00:00:00.01 |       2 |
-------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("N1">800)
 
Note
-----
   - dynamic statistics used: statistics for conventional DML
 
The plan estimates were very way off from the actuals - were 160 rows – that is because the predicate N1 got low/high values as 1 and 1000 and the estimated cardinality is calculated as
 
Estimated cardinality = number of rows * ( high_value – limit / high_value – low_value )
Estimated cardinality = 800 * ( (1000 – 800 ) / (1000 – 1 ) ) = 160.16
 
So real-time table level statistics were tracked for those delete statement followed by an update statement, but no real time column level statistics for the predicates from delete statement. This lack of column level statistics could potentially lead to sub-optimal plans.