THE RAJESH BLOG: Storage Indexes

In the previous post discussed, how clustering of data within the data is an important factor in the performance and efficiency of both types of indexes. In this post, will expand a little more on this point and highlighted scenario, where a normal database indexes can significantly improve the performance but totally ineffective as a storage index.

Storage index can only determine where the required data can’t exists within the storage region via the accessed min/max values of the index. If the requested value can’t possibly exists within the min/max boundary the specific storage region can be ignored and not be accessed during full table scan. However in some scenarios there can be the limitation in the usefulness of having min/max values to work if both min/max values are relatively common and distributed throughout the table. This is very common when dealing with columns that have data such as “code status” values.

c##rajesh@PDB1> create table t

2 as

3 select rownum as id,

4 (mod(rownum-1,5)+1)*2 as format_id,

5 rpad('Some_really_big_data',80,'*') datas

6 from big_table

7 where rownum <=10000000;

Table created.

c##rajesh@PDB1> select format_id,count(*)

2 from t

3 group by format_id

4 order by format_id;

FORMAT_ID COUNT(*)

---------- ----------

2 2000000

4 2000000

6 2000000

8 2000000

10 2000000

The format_id column was initially populated with just values 2,4,6,8,10 evenly distributed throughout the table. Then updated relatively small number of these values in following manner.

c##rajesh@PDB1> update t set format_id = 3 where mod(id,10000)=0;

1000 rows updated.

c##rajesh@PDB1> update t set format_id = 5 where id between 424242 and 425241;

1000 rows updated.

c##rajesh@PDB1> commit;

Commit complete.

So we have relatively few format_id = 3 (1000 rows from 10M rows, 0.01% of data) littered throughout the table and a few format_id = 5 located in specific location/portion of the table.

We will now create a normal index on this column and a histogram to let the CBO know there are relatively few occurrence of 3 and 5 within this table

c##rajesh@PDB1> create index t_idx on t(format_id) nologging;

Index created.

c##rajesh@PDB1> begin

2 dbms_stats.gather_table_stats(user,'T',

3 method_opt=>'for columns FORMAT_ID size 10');

4 end;

5 /

PL/SQL procedure successfully completed.

If we now run a query looking for all the format_id = 3, then

c##rajesh@PDB1> @script04a.sql

Session altered.

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------

SQL_ID 9cw13pjgzkvwc, child number 1

-------------------------------------

select * from t where format_id = 3

Plan hash value: 767293772

-------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 1012 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1000 | 1000 |00:00:00.01 | 1012 |

|* 2 | INDEX RANGE SCAN | T_IDX | 1 | 1035 | 1000 |00:00:00.01 | 12 |

-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("FORMAT_ID"=3)

We noticed that CBO has used the index to efficiently retrieve just that 0.01% of rows because of high selectivity of the query.

It does even a better job when accessing format_id = 5 as these specific values are very well clustered and found in very tight portion of the table.

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------

SQL_ID 5gchbg91bk70w, child number 1

-------------------------------------

select * from t where format_id = 5

Plan hash value: 767293772

-------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 33 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1000 | 1000 |00:00:00.01 | 33 |

|* 2 | INDEX RANGE SCAN | T_IDX | 1 | 1035 | 1000 |00:00:00.01 | 12 |

-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("FORMAT_ID"=5)

The databases indexes are extremely efficient in these cases, because it can point just to the location within the table, where these relatively few rows of interest are located.

Lets see how storage indexes cope up these scenarios. We will rerun the same again, but now making the index invisible/unusable.

c##rajesh@PDB1> alter index t_idx invisible;

Index altered.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

c##rajesh@PDB1> @script04b.sql

Session altered.

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------

SQL_ID 9cw13pjgzkvwc, child number 1

-------------------------------------

select * from t where format_id = 3

Plan hash value: 1601196873

-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.11 | 131K| 131K|

|* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 1000 | 1000 |00:00:00.11 | 131K| 131K|

-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("FORMAT_ID"=3)

filter("FORMAT_ID"=3)

19 rows selected.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 659712

c##rajesh@PDB1>

no matter how often we try, storage indexes have managed to save us preciously nothing. The entire table has to be scanned and read during the full table scan operation.

Same results again, when accessing format_id =5, any possible storage indexes have been totally ineffective and the full table scan has to read the entire table.

c##rajesh@PDB1> conn c##rajesh/Password-1@PDB1

Connected.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

c##rajesh@PDB1> @script04c.sql

Session altered.

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------

SQL_ID 5gchbg91bk70w, child number 1

-------------------------------------

select * from t where format_id = 5

Plan hash value: 1601196873

-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.16 | 131K| 131K|

|* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 1000 | 1000 |00:00:00.16 | 131K| 131K|

-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("FORMAT_ID"=5)

filter("FORMAT_ID"=5)

19 rows selected.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 405728

The results produced by storage index is significantly slower that the database index in place.

Since the data primarily consist of randomly, distributed values of 2,4,6,8 and 10 with the odd values of 3 and 5 littered around. This means if storage index created the min/max values will be 2 and 10 for all 1M storage regions throughout the table. Therefore both the values 3 and 5 could possibly exists within all/any of the storage regions as they exist inside the 2 - 10 min/max boundaries. These 3 and 5 effectively “hidden” within the storage min/max values making storage index totally ineffective in this scenario. As they can’t skip anything.

Wednesday, March 4, 2020

Storage Indexes - Part VI

No comments:

Post a Comment