Tuesday, March 6, 2018

One percent data retrieval from table

One of the most common myth about index is, if any SQL statement is about to retrieve one percent of data from a table, then the index based access would be the best option.
However that is not the case **always**, it depends on clustering of the data.
If the data is very well clustered according to the index, then the cost of reading the table via the index would be the cheap.
Created a table with unique index on ID column and a very good appealing clustering factor.
demo@ORA12C> create table t (id number NOT NULL, some_other_datas varchar2(300) );
 
Table created.
 
demo@ORA12C> create unique index t_idx on t(id);
 
Index created.
 
demo@ORA12C> insert into t (id,some_other_datas)
  2  select rownum, rpad('*',200,'*')
  3  from all_objects, all_users
  4  where rownum <=2e6;
 
2000000 rows created.
 
demo@ORA12C> exec dbms_stats.gather_table_stats(user,'T');
 
PL/SQL procedure successfully completed.
 
demo@ORA12C> select i.index_name, i.clustering_factor, t.num_rows, t.blocks
  2  from user_indexes i ,
  3    user_tables t
  4  where i.table_name = t.table_name
  5  and i.index_name ='T_IDX';
 
INDEX_NAME CLUSTERING_FACTOR   NUM_ROWS     BLOCKS
---------- ----------------- ---------- ----------
T_IDX                  58824    2000000      59057
 
demo@ORA12C>
 
 
To retrieve one percent of data from this table, the optimizer picks the index as best access path.
 
 
demo@ORA12C> alter session set statistics_level=all;
 
Session altered.
 
demo@ORA12C> set serveroutput off
demo@ORA12C> set feedback only
demo@ORA12C> select *
  2  from t
  3  where id between 20000 and 40000;
 
20001 rows selected.
 
demo@ORA12C> set feedback 6
demo@ORA12C> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------
SQL_ID  b7jpkag179cf6, child number 0
-------------------------------------
select * from t where id between 20000 and 40000
 
Plan hash value: 767293772
 
-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |  20001 |00:00:00.01 |     889 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |  20002 |  20001 |00:00:00.01 |     889 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |      1 |  20002 |  20001 |00:00:00.01 |     174 |
-------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ID">=20000 AND "ID"<=40000)
 
 
19 rows selected.
 
demo@ORA12C>
 
The query completed in less than a second returning about 20K rows doing 890 Logical IOs.
 
Now let’s truncate and reload this table with a different set of data, having a poor clustering factor.
 
demo@ORA12C> truncate table t;
 
Table truncated.
 
demo@ORA12C> insert into t (id,some_other_datas)
  2  select dbms_random.value(1,2000000), rpad('*',200,'*')
  3  from all_objects, all_users
  4  where rownum <=2e6;
 
2000000 rows created.
 
demo@ORA12C> exec dbms_stats.gather_table_stats(user,'T');
 
PL/SQL procedure successfully completed.
 
demo@ORA12C> select i.index_name, i.clustering_factor, t.num_rows, t.blocks
  2  from user_indexes i ,
  3    user_tables t
  4  where i.table_name = t.table_name
  5  and i.index_name ='T_IDX';
 
INDEX_NAME CLUSTERING_FACTOR   NUM_ROWS     BLOCKS
---------- ----------------- ---------- ----------
T_IDX                2005915    2000000      65177
 
 
Now to retrieve one percent of data from this table, the optimizer picks the Full table scan as best access path.
 
demo@ORA12C> alter session set statistics_level=all;
 
Session altered.
 
demo@ORA12C> set serveroutput off
demo@ORA12C> set feedback only
demo@ORA12C> select *
  2  from t
  3  where id between 20000 and 40000;
 
20119 rows selected.
 
demo@ORA12C> set feedback 6
demo@ORA12C> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
SQL_ID  b7jpkag179cf6, child number 0
-------------------------------------
select * from t where id between 20000 and 40000
 
Plan hash value: 1601196873
 
---------------------------------------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |  20119 |00:00:06.42 |     129K|  64669 |
|*  1 |  TABLE ACCESS FULL| T    |      1 |  20002 |  20119 |00:00:06.42 |     129K|  64669 |
---------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - filter(("ID"<=40000 AND "ID">=20000))
 
 
18 rows selected.
 
This look around 6 sec to complete and returned 20K rows by doing 129K logical IOs. 
 
Though the optimizer doesn’t pick the index to answer this query, if we force the index via a Hint, things become worse like this.
 
demo@ORA12C> set feedback only
demo@ORA12C> select /*+ index(t,t_idx) */ *
  2  from t
  3  where id between 20000 and 40000;
 
20119 rows selected.
 
demo@ORA12C> set feedback 6
demo@ORA12C> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------
SQL_ID  478424k8ajjqh, child number 0
-------------------------------------
select /*+ index(t,t_idx) */ * from t where id between 20000 and 40000
 
Plan hash value: 767293772
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |  20119 |00:02:42.33 |   37768 |  18066 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |  20002 |  20119 |00:02:42.33 |   37768 |  18066 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |      1 |  20002 |  20119 |00:00:02.01 |     293 |     10 |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ID">=20000 AND "ID"<=40000)
 
 
19 rows selected.
 
demo@ORA12C>
 
Using index the sql took around 160 sec, where as a Full table scan did only 6 sec.
 
So the answer to this question – if a SQL is about to return one percent of data from a table, then the choice to use either the full table scan or the index based access purely depends on clustering of the data.

1 comment:

  1. Thank you for the posts related to performance Tunings:

    COuld you please help us to undertand How to create profiles.
    How to create baselines

    and How troubleshoot Buffer Busy waits - During INSERTS

    -A big Follower of your blog. Thank you so much for your efforts.

    Waiting for the new posts soon.

    Regards,
    Smita

    ReplyDelete