To highlight a new capability which got
introduced in Oracle 19c(available only on Exadata environments), I’m going to
setup a slightly different demo with three tables.
ai_demo@PDB19> create
table t1( id number, x1 number, x2 varchar2(180) );
Table created.
ai_demo@PDB19> insert into
t1(id,x1,x2)
2 select rownum, object_id, object_name
3 from all_objects;
68488 rows created.
ai_demo@PDB19> commit;
Commit complete.
Table T1 has no statistics collect on it.
ai_demo@PDB19> create
table t2( id number, x1 number, x2 varchar2(180) );
Table created.
ai_demo@PDB19> insert into
t2(id,x1,x2)
2 select rownum, object_id, object_name
3 from all_objects a, all_users b
4 where rownum <= 100000;
100000 rows created.
ai_demo@PDB19> commit;
Commit complete.
ai_demo@PDB19> exec
dbms_stats.gather_table_stats(user,'T2');
PL/SQL procedure successfully
completed.
ai_demo@PDB19> insert into
t2(id,x1,x2)
2 select rownum+100000, object_id, object_name
3 from all_objects a, all_users b
4 where rownum <= 100000;
100000 rows created.
ai_demo@PDB19> commit;
Table T2 got new rows added after
statistics have been collected and so have “stale” outdated stats.
ai_demo@PDB19> create
table t3( id number, x1 number, x2 varchar2(180) );
Table created.
ai_demo@PDB19> insert into
t3(id,x1,x2)
2 select rownum, 10, object_name
3 from all_objects a, all_users b, all_users c
4 where rownum <= 1000000;
1000000 rows created.
ai_demo@PDB19> update t3
set x1 = 9 where mod(id,3) = 0;
333333 rows updated.
ai_demo@PDB19> update t3
set x1 = 1 where mod(id,2) = 0 and id between 1 and 20000;
10000 rows updated.
ai_demo@PDB19> update t3
set x1 = 2 where mod(id,2) = 0 and id between 30000 and 40000 ;
5001 rows updated.
ai_demo@PDB19> update t3
set x1 = 3 where mod(id,100) = 0 and id between 300001 and 400000 ;
1000 rows updated.
ai_demo@PDB19> update t3
set x1 = 4 where mod(id,100) = 0 and id between 400001 and 500000;
1000 rows updated.
ai_demo@PDB19> update t3
set x1 = 5 where mod(id,100) = 0 and id between 600001 and 700000;
1000 rows updated.
ai_demo@PDB19> update t3
set x1 = 6 where mod(id,1000) = 0 and id between 700001 and 800000;
100 rows updated.
ai_demo@PDB19> update t3
set x1 = 7 where mod(id,1000) = 0 and id between 800001 and 900000;
100 rows updated.
ai_demo@PDB19> update t3
set x1 = 8 where mod(id,1000) = 0 and id between 900001 and 1000000;
100 rows updated.
ai_demo@PDB19> commit;
Commit complete.
ai_demo@PDB19> exec
dbms_stats.gather_table_stats(user,'T3');
PL/SQL procedure successfully
completed.
ai_demo@PDB19>
ai_demo@PDB19> select x1, count(*)
2 from t3
3 group by x1
4 order by x1;
X1
COUNT(*)
---------- ----------
1 10000
2 5001
3 1000
4 1000
5 1000
6 100
7 100
8 100
9 327234
10 654465
10 rows selected.
Table T3 got data skew on X1 column but
with no histogram collected. Now we are going to run a query on T3 where CBO
gets the cardinality estimate hopelessly wrong because of missing histogram on
X1 column.
ai_demo@PDB19> set
autotrace traceonly exp
ai_demo@PDB19> select * from t3 where x1 = 8;
Execution Plan
----------------------------------------------------------
Plan hash value: 4161002650
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 2148K| 1036 (2)| 00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| T3 | 100K| 2148K| 1036 (2)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
1 - storage("X1"=8)
filter("X1"=8)
If we look at the current statistics on
these tables
ai_demo@PDB19> select
table_name, num_rows, stale_stats, notes, last_analyzed
2 from user_tab_statistics
3 where table_name in ('T1','T2','T3') ;
TABLE_NAME NUM_ROWS STALE_S NOTES LAST_ANALYZED
---------- ---------- ------- ------- -----------------------
T1
T2 100000 YES 21-APR-2022 10:53:41 am
T3 1000000 NO 21-APR-2022 10:55:03 am
ai_demo@PDB19> select
column_name, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name ='T3';
COLUMN_NAM NUM_BUCKETS
HISTOGRAM
---------- ----------- ---------------
ID 1 NONE
X1 1 NONE
X2 1 NONE
We can see that Table T1 got missing
statistics
Table T2 is marked as having stale
statistics, table T3 is considered fine in that it does have statistics which
are not stale, but we don’t have currently have any histogram even though a
simple single table query was previously run based on X1 predicates which had
hopelessly inaccurate cardinality estimates.
If we wait for approximately 15 minutes
(default) for the high frequency automatic statistics gathering process to run
and look at the column statistics again:
ai_demo@PDB19> select
table_name, num_rows, stale_stats, notes, last_analyzed
2 from user_tab_statistics
3 where table_name in ('T1','T2','T3') ;
TABLE_NAME NUM_ROWS STALE_S NOTES LAST_ANALYZED
---------- ---------- ------- ------ -----------------------
T1 68488 NO 21-APR-2022 11:07:49 am
T2 200000 NO 21-APR-2022 11:07:53 am
T3 1000000 NO 21-APR-2022 11:23:37 am
ai_demo@PDB19>
ai_demo@PDB19> select column_name, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name ='T3';
COLUMN_NAME NUM_BUCKETS
HISTOGRAM
----------- ----------- -----------
ID 1 NONE
X1 10 FREQUENCY
X2 1 NONE
ai_demo@PDB19>
we notice now that:
Table T1 now has statistics captured, as
the high frequency automatic stats collection process look for tables with
missing statistics
Table T2 now has fully up to date
statistics, as the high frequency automatic stats collection process looks for
tables with stale statistics.
Table T3 now has histograms on the X1
column, as the high frequency automatic stats collection process looks out for
missing histograms if queries have been subsequently run with poor cardinality
estimates.
Having more accurate, approximate, and up
to date statistics all support the CBO in making much better decision in
relation to use of any newly created automatic indexes.
We can configure High-Frequency automatic
statistics collection in the following manner. This High-Frequency automatic
statistics collection was not enabled by default.
ai_demo@PDB19> select
dbms_stats.get_prefs('AUTO_TASK_STATUS') task_status,
2 dbms_stats.get_prefs('AUTO_TASK_MAX_RUN_TIME') max_run_time,
3 dbms_stats.get_prefs('AUTO_TASK_INTERVAL') task_interval
4 from dual;
TASK_STATU MAX_RUN_TI
TASK_INTER
---------- ---------- ----------
OFF 3600 900
We can configure High-Frequency automatic
statistics in this following manner
ai_demo@PDB19> exec
dbms_stats.set_global_prefs('AUTO_TASK_STATUS','ON');
PL/SQL procedure successfully
completed.
This turns the feature ON/OFF, it is OFF
by default on Exadata environments but ON by default in Autonomous database
environments.
AUTO_TASK_MAX_RUN_TIME configures how
long to allow the process to run(default is 3600 seconds/60 minutes)
AUTO_TASK_INTERVAL configures the
interval between the process running( default is every 900 seconds/15 minutes)
2 select rownum, object_id, object_name
3 from all_objects;
2 select rownum, object_id, object_name
3 from all_objects a, all_users b
4 where rownum <= 100000;
2 select rownum+100000, object_id, object_name
3 from all_objects a, all_users b
4 where rownum <= 100000;
2 select rownum, 10, object_name
3 from all_objects a, all_users b, all_users c
4 where rownum <= 1000000;
ai_demo@PDB19> select x1, count(*)
2 from t3
3 group by x1
4 order by x1;
---------- ----------
1 10000
2 5001
3 1000
4 1000
5 1000
6 100
7 100
8 100
9 327234
10 654465
ai_demo@PDB19> select * from t3 where x1 = 8;
----------------------------------------------------------
Plan hash value: 4161002650
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 2148K| 1036 (2)| 00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| T3 | 100K| 2148K| 1036 (2)| 00:00:01 |
----------------------------------------------------------------------------------
---------------------------------------------------
filter("X1"=8)
2 from user_tab_statistics
3 where table_name in ('T1','T2','T3') ;
---------- ---------- ------- ------- -----------------------
T1
T2 100000 YES 21-APR-2022 10:53:41 am
T3 1000000 NO 21-APR-2022 10:55:03 am
2 from user_tab_col_statistics
3 where table_name ='T3';
---------- ----------- ---------------
ID 1 NONE
X1 1 NONE
X2 1 NONE
2 from user_tab_statistics
3 where table_name in ('T1','T2','T3') ;
---------- ---------- ------- ------ -----------------------
T1 68488 NO 21-APR-2022 11:07:49 am
T2 200000 NO 21-APR-2022 11:07:53 am
T3 1000000 NO 21-APR-2022 11:23:37 am
ai_demo@PDB19> select column_name, num_buckets, histogram
2 from user_tab_col_statistics
3 where table_name ='T3';
----------- ----------- -----------
ID 1 NONE
X1 10 FREQUENCY
X2 1 NONE
2 dbms_stats.get_prefs('AUTO_TASK_MAX_RUN_TIME') max_run_time,
3 dbms_stats.get_prefs('AUTO_TASK_INTERVAL') task_interval
4 from dual;
---------- ---------- ----------
OFF 3600 900
No comments:
Post a Comment