THE RAJESH BLOG: Storage Indexes

The clustering factor (CF) is the important index related statistics, with the efficiency of index performing range scans very much is depended on CF of the index. If the data in the table is very well clustered in relation to the index, then index range scan has to visit few table blocks to obtain necessary data. If the data is effectively randomized and not well clustered in relation to the index, then index index range scan has to visit more table blocks to obtain necessary data. The CBO will be less included to use such an index as a result depending on the overall selectivity of the query.

It is very similar story for Exadata storage indexes (SI) as well. The better the table data is clustered in relation to the SI’s, the more efficient and effective the SI’s are likely to be in relation to being able eliminate accessing storage region that can’t possibly contain data of interest. By having the data more clustered (or ordered) in relation to specific SI, the min/max ranges associated with SI are more likely to be able to determine area of table where data can’t exists.

c##rajesh@PDB1> create table t

2 as

3 select rownum as id, trunc( rownum/100000 )+1 as album_id,

4 mod( rownum,100000 ) artist_id,

5 ceil( dbms_random.value(0,5) )*2 as format_id,

6 trunc(sysdate - ceil( dbms_random.value(0,10000) )) as relase_date,

7 ceil( dbms_random.value(0,500000) ) as total_sales,

8 rpad('some really big string',80) data

9 from big_table

10 where rownum <= 10000000;

Table created.

c##rajesh@PDB1> create index t_album_id_idx on t( album_id ) nologging;

Index created.

c##rajesh@PDB1> create index t_total_sales_idx on t( total_sales ) nologging;

Index created.

c##rajesh@PDB1> col index_name format a20

c##rajesh@PDB1> select i.index_name, i.clustering_factor,i.num_rows,t.blocks

2 from user_indexes i,

3 user_tables t

4 where i.table_name = t.table_name

5 and t.table_name ='T';

INDEX_NAME CLUSTERING_FACTOR NUM_ROWS BLOCKS

-------------------- ----------------- ---------- ----------

T_ALBUM_ID_IDX 161259 10000000 161903

T_TOTAL_SALES_IDX 9999399 10000000 161903

c##rajesh@PDB1>

The following query is on the album_id column that has an excellent CF.

c##rajesh@PDB1> @script03a.sql

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------------------

SQL_ID 1cd1u6mdr6ap4, child number 1

-------------------------------------

select * from t where album_id = 42

Plan hash value: 4221737049

----------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 100K|00:00:00.05 | 3123 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 99010 | 100K|00:00:00.05 | 3123 |

|* 2 | INDEX RANGE SCAN | T_ALBUM_ID_IDX | 1 | 99010 | 100K|00:00:00.02 | 863 |

----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("ALBUM_ID"=42)

The index range scan access path is selected by CBO to retrieve 100K rows, with just 3000 consistent gets. However if we run the same query on Exadata platform for storage index to kick in

c##rajesh@PDB1> alter index t_album_id_idx invisible;

Index altered.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

c##rajesh@PDB1> @script03a.sql

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------

SQL_ID 1cd1u6mdr6ap4, child number 1

-------------------------------------

select * from t where album_id = 42

Plan hash value: 1601196873

-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 100K|00:00:00.23 | 161K| 161K|

|* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 99010 | 100K|00:00:00.23 | 161K| 161K|

-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("ALBUM_ID"=42)

filter("ALBUM_ID"=42)

19 rows selected.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 1306501120

cell physical IO interconnect bytes returned by smart scan 2245248

c##rajesh@PDB1>

We notice that full scan is being performed; storage indexes are kicking in here and by saving 1306 MB. Storage indexes based on album_id column are therefore very effective at eliminating the access of un-necessary data.

If we now run a query based on “total_sales” column in which the data is randomly distributed all over the place and so the associated index have a very poor CF:

c##rajesh@PDB1> @script03b.sql

PLAN_TABLE_OUTPUT

---------------------------------------------------------------------------------------------------------------------------------

SQL_ID 5qpmchf457gjc, child number 0

-------------------------------------

select * from t where total_sales between 42 and 142

Plan hash value: 3620091251

-------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1975 |00:00:00.01 | 1996 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 2039 | 1975 |00:00:00.01 | 1996 |

|* 2 | INDEX RANGE SCAN | T_TOTAL_SALES_IDX | 1 | 2039 | 1975 |00:00:00.01 | 21 |

-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("TOTAL_SALES">=42 AND "TOTAL_SALES"<=142)

We noticed that only 1975 rows returned and 1996 consistent gets have been performed (practically 1 for each row returned) this is somewhat more than 3123 consistent gets from pervious example that returned 100K rows.

If we run the same query on Exadata environment then

c##rajesh@PDB1> alter index t_total_sales_idx invisible;

Index altered.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

c##rajesh@PDB1> @script03b.sql

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------

SQL_ID 5qpmchf457gjc, child number 0

-------------------------------------

select * from t where total_sales between 42 and 142

Plan hash value: 1601196873

-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1975 |00:00:00.09 | 161K| 161K|

|* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 2039 | 1975 |00:00:00.09 | 161K| 161K|

-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage(("TOTAL_SALES"<=142 AND "TOTAL_SALES">=42))

filter(("TOTAL_SALES"<=142 AND "TOTAL_SALES">=42))

19 rows selected.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 122060800

cell physical IO interconnect bytes returned by smart scan 1661880

c##rajesh@PDB1>

we noticed that amount of Physical IO saved by storage index has significantly reduced from previous execution (just 122MB down to 1306 MB), the poor clustering of data in relation to “total_sales” column has effectively neutralized the effectiveness of the associated SI on “total_sales” column.

Friday, February 14, 2020

Storage Indexes - Part V

No comments:

Post a Comment