Sunday, May 1, 2022

High Frequency Automatic Stats Gathering

To highlight a new capability which got introduced in Oracle 19c(available only on Exadata environments), I’m going to setup a slightly different demo with three tables.
 
ai_demo@PDB19> create table t1( id number, x1 number, x2 varchar2(180) );
 
Table created.
 
ai_demo@PDB19> insert into t1(id,x1,x2)
  2  select rownum, object_id, object_name
  3  from all_objects;
 
68488 rows created.
 
ai_demo@PDB19> commit;
 
Commit complete.
 
Table T1 has no statistics collect on it.
 
ai_demo@PDB19> create table t2( id number, x1 number, x2 varchar2(180) );
 
Table created.
 
ai_demo@PDB19> insert into t2(id,x1,x2)
  2  select rownum, object_id, object_name
  3  from all_objects a, all_users b
  4  where rownum <= 100000;
 
100000 rows created.
 
ai_demo@PDB19> commit;
 
Commit complete.
 
ai_demo@PDB19> exec dbms_stats.gather_table_stats(user,'T2');
 
PL/SQL procedure successfully completed.
 
ai_demo@PDB19> insert into t2(id,x1,x2)
  2  select rownum+100000, object_id, object_name
  3  from all_objects a, all_users b
  4  where rownum <= 100000;
 
100000 rows created.
 
ai_demo@PDB19> commit;
 
Table T2 got new rows added after statistics have been collected and so have “stale” outdated stats.
 
ai_demo@PDB19> create table t3( id number, x1 number, x2 varchar2(180) );
 
Table created.
 
ai_demo@PDB19> insert into t3(id,x1,x2)
  2  select rownum, 10, object_name
  3  from all_objects a, all_users b, all_users c
  4  where rownum <= 1000000;
 
1000000 rows created.
 
ai_demo@PDB19> update t3 set x1 = 9 where mod(id,3) = 0;
 
333333 rows updated.
 
ai_demo@PDB19> update t3 set x1 = 1 where mod(id,2) = 0 and id between 1 and 20000;
 
10000 rows updated.
 
ai_demo@PDB19> update t3 set x1 = 2 where mod(id,2) = 0 and id between 30000 and 40000 ;
 
5001 rows updated.
 
ai_demo@PDB19> update t3 set x1 = 3 where mod(id,100) = 0 and id between 300001 and 400000 ;
 
1000 rows updated.
 
ai_demo@PDB19> update t3 set x1 = 4 where mod(id,100) = 0 and id between 400001 and 500000;
 
1000 rows updated.
 
ai_demo@PDB19> update t3 set x1 = 5 where mod(id,100) = 0 and id between 600001 and 700000;
 
1000 rows updated.
 
ai_demo@PDB19> update t3 set x1 = 6 where mod(id,1000) = 0 and id between 700001 and 800000;
 
100 rows updated.
 
ai_demo@PDB19> update t3 set x1 = 7 where mod(id,1000) = 0 and id between 800001 and 900000;
 
100 rows updated.
 
ai_demo@PDB19> update t3 set x1 = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;
 
100 rows updated.
 
ai_demo@PDB19> commit;
 
Commit complete.
 
ai_demo@PDB19> exec dbms_stats.gather_table_stats(user,'T3');
 
PL/SQL procedure successfully completed.
 
ai_demo@PDB19>
ai_demo@PDB19> select x1, count(*)
  2  from t3
  3  group by x1
  4  order by x1;
 
        X1   COUNT(*)
---------- ----------
         1      10000
         2       5001
         3       1000
         4       1000
         5       1000
         6        100
         7        100
         8        100
         9     327234
        10     654465
 
10 rows selected.
 
 
Table T3 got data skew on X1 column but with no histogram collected. Now we are going to run a query on T3 where CBO gets the cardinality estimate hopelessly wrong because of missing histogram on X1 column.
 
ai_demo@PDB19> set autotrace traceonly exp
ai_demo@PDB19> select * from t3 where x1 = 8;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650
 
----------------------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |   100K|  2148K|  1036   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| T3   |   100K|  2148K|  1036   (2)| 00:00:01 |
----------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("X1"=8)
       filter("X1"=8)
 
 
If we look at the current statistics on these tables
 
ai_demo@PDB19> select table_name, num_rows, stale_stats, notes, last_analyzed
  2  from user_tab_statistics
  3  where table_name in ('T1','T2','T3') ;
 
TABLE_NAME   NUM_ROWS STALE_S NOTES   LAST_ANALYZED
---------- ---------- ------- ------- -----------------------
T1
T2             100000 YES             21-APR-2022 10:53:41 am
T3            1000000 NO              21-APR-2022 10:55:03 am
 
ai_demo@PDB19> select column_name, num_buckets, histogram
  2  from user_tab_col_statistics
  3  where table_name ='T3';
 
COLUMN_NAM NUM_BUCKETS HISTOGRAM
---------- ----------- ---------------
ID                   1 NONE
X1                   1 NONE
X2                   1 NONE
 
 
We can see that Table T1 got missing statistics
 
Table T2 is marked as having stale statistics, table T3 is considered fine in that it does have statistics which are not stale, but we don’t have currently have any histogram even though a simple single table query was previously run based on X1 predicates which had hopelessly inaccurate cardinality estimates.
 
If we wait for approximately 15 minutes (default) for the high frequency automatic statistics gathering process to run and look at the column statistics again:
 
ai_demo@PDB19> select table_name, num_rows, stale_stats, notes, last_analyzed
  2  from user_tab_statistics
  3  where table_name in ('T1','T2','T3') ;
 
TABLE_NAME   NUM_ROWS STALE_S NOTES  LAST_ANALYZED
---------- ---------- ------- ------ -----------------------
T1              68488 NO             21-APR-2022 11:07:49 am
T2             200000 NO             21-APR-2022 11:07:53 am
T3            1000000 NO             21-APR-2022 11:23:37 am
 
ai_demo@PDB19>
ai_demo@PDB19> select column_name, num_buckets, histogram
  2  from user_tab_col_statistics
  3  where table_name ='T3';
 
COLUMN_NAME NUM_BUCKETS HISTOGRAM
----------- ----------- -----------
ID                    1 NONE
X1                   10 FREQUENCY
X2                    1 NONE
 
ai_demo@PDB19>
 
we notice now that:
 
Table T1 now has statistics captured, as the high frequency automatic stats collection process look for tables with missing statistics
 
Table T2 now has fully up to date statistics, as the high frequency automatic stats collection process looks for tables with stale statistics.
 
Table T3 now has histograms on the X1 column, as the high frequency automatic stats collection process looks out for missing histograms if queries have been subsequently run with poor cardinality estimates.
 
Having more accurate, approximate, and up to date statistics all support the CBO in making much better decision in relation to use of any newly created automatic indexes.
 
We can configure High-Frequency automatic statistics collection in the following manner. This High-Frequency automatic statistics collection was not enabled by default.
 
ai_demo@PDB19> select dbms_stats.get_prefs('AUTO_TASK_STATUS') task_status,
  2      dbms_stats.get_prefs('AUTO_TASK_MAX_RUN_TIME') max_run_time,
  3      dbms_stats.get_prefs('AUTO_TASK_INTERVAL') task_interval
  4  from dual;
 
TASK_STATU MAX_RUN_TI TASK_INTER
---------- ---------- ----------
OFF        3600       900
 
We can configure High-Frequency automatic statistics in this following manner
 
ai_demo@PDB19> exec dbms_stats.set_global_prefs('AUTO_TASK_STATUS','ON');
 
PL/SQL procedure successfully completed.
 
This turns the feature ON/OFF, it is OFF by default on Exadata environments but ON by default in Autonomous database environments.
 
AUTO_TASK_MAX_RUN_TIME configures how long to allow the process to run(default is 3600 seconds/60 minutes)
 
AUTO_TASK_INTERVAL configures the interval between the process running( default is every 900 seconds/15 minutes)
 
 

No comments:

Post a Comment