Friday, February 14, 2020

Storage Indexes - Part V

The clustering factor (CF) is the important index related statistics, with the efficiency of index performing range scans very much is depended on CF of the index. If the data in the table is very well clustered in relation to the index, then index range scan has to visit few table blocks to obtain necessary data. If the data is effectively randomized and not well clustered in relation to the index, then index index range scan has to visit more table blocks to obtain necessary data. The CBO will be less included to use such an index as a result depending on the overall selectivity of the query.
 
It is very similar story for Exadata storage indexes (SI) as well. The better the table data is clustered in relation to the SI’s, the more efficient and effective the SI’s are likely to be in relation to being able eliminate accessing storage region that can’t possibly contain data of interest. By having the data more clustered (or ordered) in relation to specific SI, the min/max ranges associated with SI are more likely to be able to determine area of table where data can’t exists.
 
c##rajesh@PDB1> create table t
  2  as
  3  select rownum as id, trunc( rownum/100000 )+1 as album_id,
  4      mod( rownum,100000 ) artist_id,
  5      ceil( dbms_random.value(0,5) )*2 as format_id,
  6      trunc(sysdate - ceil( dbms_random.value(0,10000) )) as relase_date,
  7      ceil( dbms_random.value(0,500000) ) as total_sales,
  8      rpad('some really big string',80) data
  9  from big_table
 10  where rownum <= 10000000;
 
Table created.
 
c##rajesh@PDB1> create index t_album_id_idx on t( album_id ) nologging;
 
Index created.
 
c##rajesh@PDB1> create index t_total_sales_idx on t( total_sales ) nologging;
 
Index created.
 
c##rajesh@PDB1> col index_name format a20
c##rajesh@PDB1> select i.index_name, i.clustering_factor,i.num_rows,t.blocks
  2  from user_indexes i,
  3      user_tables  t
  4  where i.table_name = t.table_name
  5  and t.table_name ='T';
 
INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS     BLOCKS
-------------------- ----------------- ---------- ----------
T_ALBUM_ID_IDX                  161259   10000000     161903
T_TOTAL_SALES_IDX              9999399   10000000     161903
 
c##rajesh@PDB1>
 
The following query is on the album_id column that has an excellent CF.
 
c##rajesh@PDB1> @script03a.sql
 
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID  1cd1u6mdr6ap4, child number 1
-------------------------------------
select * from t where album_id = 42
 
Plan hash value: 4221737049
 
----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |      1 |        |    100K|00:00:00.05 |    3123 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T              |      1 |  99010 |    100K|00:00:00.05 |    3123 |
|*  2 |   INDEX RANGE SCAN                  | T_ALBUM_ID_IDX |      1 |  99010 |    100K|00:00:00.02 |     863 |
----------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("ALBUM_ID"=42)
 
The index range scan access path is selected by CBO to retrieve 100K rows, with just 3000 consistent gets. However if we run the same query on Exadata platform for storage index to kick in
 
c##rajesh@PDB1> alter index t_album_id_idx invisible;
 
Index altered.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> @script03a.sql
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID  1cd1u6mdr6ap4, child number 1
-------------------------------------
select * from t where album_id = 42
 
Plan hash value: 1601196873
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |      1 |        |    100K|00:00:00.23 |     161K|    161K|
|*  1 |  TABLE ACCESS STORAGE FULL| T    |      1 |  99010 |    100K|00:00:00.23 |     161K|    161K|
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("ALBUM_ID"=42)
       filter("ALBUM_ID"=42)
 
 
19 rows selected.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                      1306501120
cell physical IO interconnect bytes returned by smart scan            2245248
 
c##rajesh@PDB1>
 
We notice that full scan is being performed; storage indexes are kicking in here and by saving 1306 MB. Storage indexes based on album_id column are therefore very effective at eliminating the access of un-necessary data.
 
If we now run a query based on “total_sales” column in which the data is randomly distributed all over the place and so the associated index have a very poor CF:
 
c##rajesh@PDB1> @script03b.sql
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
SQL_ID  5qpmchf457gjc, child number 0
-------------------------------------
select * from t where total_sales between 42 and 142
 
Plan hash value: 3620091251
 
-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |      1 |        |   1975 |00:00:00.01 |    1996 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T                 |      1 |   2039 |   1975 |00:00:00.01 |    1996 |
|*  2 |   INDEX RANGE SCAN                  | T_TOTAL_SALES_IDX |      1 |   2039 |   1975 |00:00:00.01 |      21 |
-------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("TOTAL_SALES">=42 AND "TOTAL_SALES"<=142)
 
 
We noticed that only 1975 rows returned and 1996 consistent gets have been performed (practically 1 for each row returned) this is somewhat more than 3123 consistent gets from pervious example that returned 100K rows.
 
If we run the same query on Exadata environment then
 
c##rajesh@PDB1> alter index t_total_sales_idx invisible;
 
Index altered.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> @script03b.sql
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID  5qpmchf457gjc, child number 0
-------------------------------------
select * from t where total_sales between 42 and 142
 
Plan hash value: 1601196873
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |      1 |        |   1975 |00:00:00.09 |     161K|    161K|
|*  1 |  TABLE ACCESS STORAGE FULL| T    |      1 |   2039 |   1975 |00:00:00.09 |     161K|    161K|
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage(("TOTAL_SALES"<=142 AND "TOTAL_SALES">=42))
       filter(("TOTAL_SALES"<=142 AND "TOTAL_SALES">=42))
 
 
19 rows selected.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                       122060800
cell physical IO interconnect bytes returned by smart scan            1661880
 
c##rajesh@PDB1>
 
we noticed that amount of Physical IO saved by storage index has significantly reduced from previous execution (just 122MB down to 1306 MB), the poor clustering of data in relation to “total_sales” column has effectively neutralized the effectiveness of the associated SI on “total_sales” column.
 

Wednesday, February 12, 2020

Storage Indexes - Part IV

It is often stated that in Exadata, you don’t need a conventional database indexes any more as everything runs fast, that indexes are simply waste of time and storage. So simply drop all database indexes and things will run so fast, well that is not quite true, there are many many cases where database indexes are still critical for optimal performance and scalability in an Exadata environment.
 
The first example is a scenario when one want either the minimum or maximum of a column value. As database index entries are always ordered, finding minimum or maximum of a column value is efficient by traversing down to either the first/last leaf block with in the index structure. Therefore an execution of INDEX FULL SCAN MIN/MAX access path by which the CBO can decide to access an index in this manner.
 
c##rajesh@PDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last') );
 
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------
SQL_ID  2hg8tc5vjaj9k, child number 0
-------------------------------------
select min( object_id ) from t
 
Plan hash value: 2683064407
 
-------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |      1 |        |      1 |00:00:00.01 |       3 |      3 |
|   1 |  SORT AGGREGATE            |       |      1 |      1 |      1 |00:00:00.01 |       3 |      3 |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T_IDX |      1 |      1 |      1 |00:00:00.01 |       3 |      3 |
-------------------------------------------------------------------------------------------------------
 
 
14 rows selected.
 
With only a few consistent gets, we have managed to get the minimum value of a column extremely quickly and efficiently
 
However if we drop this index and re-run the same query in exadata many times,
 
c##rajesh@PDB1> alter index T_IDX invisible;
 
Index altered.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> set serveroutput off
c##rajesh@PDB1> alter session set statistics_level=all;
 
Session altered.
 
c##rajesh@PDB1> select min( object_id ) from t;
 
MIN(OBJECT_ID)
--------------
             2
 
c##rajesh@PDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last') );
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
SQL_ID  2hg8tc5vjaj9k, child number 0
-------------------------------------
select min( object_id ) from t
 
Plan hash value: 2966233522
 
------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |      |      1 |        |      1 |00:00:01.84 |    2108K|   2108K|
|   1 |  SORT AGGREGATE            |      |      1 |      1 |      1 |00:00:01.84 |    2108K|   2108K|
|   2 |   TABLE ACCESS STORAGE FULL| T    |      1 |    100M|    100M|00:00:00.92 |    2108K|   2108K|
------------------------------------------------------------------------------------------------------
 
 
14 rows selected.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan          516301536
 
c##rajesh@PDB1>
 
We noticed that query takes considerably longer and that the storage indexes have been unable to be of any help with not a single byte saved.
 
This is because all type of aggregations are performed at database layer and not in the storage cells.  Even though storage index could be seen as perhaps being a value there, they are totally ignored by Oracle when retrieving either the minimum/maximum of a column value.  

Storage Indexes - Part III

Nulls are the special case for storage indexes. There is a separate flag in the storage index structure that is used to indicate whether a storage region contains nulls or not. This separate flag make queries looking for nulls or absence of nulls even more efficient that the normal minimum and maximum comparison that are typically done. Here is an example comparing typical performance with special null optimization.
 
c##rajesh@PDB1> select count(*) from big_table;
 
  COUNT(*)
----------
 100000000
 
c##rajesh@PDB1> exec show_space('BIG_TABLE')
Unformatted Blocks .....................               0
FS1 Blocks (0-25) ......................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               9
Full Blocks ............................       2,024,478
Total Blocks............................       2,028,080
Total Bytes.............................  16,614,031,360
Total MBytes............................          15,844
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................              24
Last Used Ext BlockId...................      20,748,032
Last Used Block.........................             512
 
PL/SQL procedure successfully completed.
 
c##rajesh@PDB1>
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> set timing on
c##rajesh@PDB1> select count(*) from big_table where ID is null;
 
  COUNT(*)
----------
         0
 
Elapsed: 00:00:00.27
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                     16584597504
cell physical IO interconnect bytes returned by smart scan                  0
 
Elapsed: 00:00:00.00
c##rajesh@PDB1>
 
In this example you can see that retrieval of few nulls was extremely fast, this is because there is no possibility that any storage region that does not contain a null value will have to be read. So no false positives will slow down this query. Notice also the amount of IO saved by the null query is almost equal to the size of the table.