Monday, March 30, 2020

Storage Indexes - Part VII

Exadata storage indexes are purely memory only structures located on exadata storage servers, care need to taken in how much memory they can potentially consume. While on exadata storage cells, there is a limit on the number of storage index created on each table, prior to Exadata storage software 12.2.1.1.0 there was a limit of 8 columns per table, with 12.2.1.1.0 the limit has been increased to 24 columns per table. Which means there a high chances for all of the critical columns used in the where clause predicates will be covered in storage indexes.
 
Here is a test case to demonstrate that.
 
First, will create a table of 24+ columns of nicely clustered data, which will make them the obvious candidates for storage indexes as queries occur on them.
 
c##rajesh@PDB1> create table t as
  2  select trunc(rownum/1000) as c1,
  3     trunc(rownum/1000) as c2,
  4     trunc(rownum/1000) as c3,
  5     trunc(rownum/1000) as c4,
  6     trunc(rownum/1000) as c5,
  7     trunc(rownum/1000) as c6,
  8     trunc(rownum/1000) as c7,
  9     trunc(rownum/1000) as c8,
 10     trunc(rownum/1000) as c9,
 11     trunc(rownum/1000) as c10,
 12     trunc(rownum/1000) as c11,
 13     trunc(rownum/1000) as c12,
 14     trunc(rownum/1000) as c13,
 15     trunc(rownum/1000) as c14,
 16     trunc(rownum/1000) as c15,
 17     trunc(rownum/1000) as c16,
 18     trunc(rownum/1000) as c17,
 19     trunc(rownum/1000) as c18,
 20     trunc(rownum/1000) as c19,
 21     trunc(rownum/1000) as c20,
 22     trunc(rownum/1000) as c21,
 23     trunc(rownum/1000) as c22,
 24     trunc(rownum/1000) as c23,
 25     trunc(rownum/1000) as c24,
 26     trunc(rownum/1000) as c25,
 27     rpad('*',100,'*')  as some_other_data
 28  from big_table
 29  where rownum <= 10000000;
 
Table created.
 
c##rajesh@PDB1> --Warming up the storage region to build storage indexes
c##rajesh@PDB1>
c##rajesh@PDB1> declare
  2     l_data t.some_other_data%type;
  3  begin
  4     for i in 1..24
  5     loop
  6             for k in 1..20
  7             loop
  8                     dbms_application_info.set_client_info( ' i='||i||' k='||k);
  9                     execute immediate ' select max(some_other_data) from t where c'||i||' < :b1 '
 10                             into l_data
 11                             using ((k*10)+1000) ;
 12             end loop;
 13     end loop;
 14  end;
 15  /
 
PL/SQL procedure successfully completed.
 
Procedure to accept the column name as input parameter, and prepare a dynamic sql and execute it.
 
c##rajesh@PDB1> create or replace procedure validate_si(p_colName in varchar2)
  2  as
  3     l_data t.some_other_data%type;
  4  begin
  5     execute immediate ' select max(some_other_data) from t where '|| p_colName|| ' < 2000 '
  6             into l_data;
  7     dbms_output.put_line( l_data );
  8  end;
  9  /
 
Procedure created.
 
Here is the script; we can use to see the storage index in play.
 
c##rajesh@PDB1> $ type script05.sql
conn c##rajesh/Password-1@PDB1
 
col name format a60
col value format 999999999999999
 
select s.name,m.value
from v$statname s ,
    v$mystat m
where s.statistic# = m.statistic#
and s.name in ('cell physical IO bytes saved by storage index',
'cell physical IO interconnect bytes returned by smart scan' );
 
exec validate_si('&1');
 
select s.name,m.value
from v$statname s ,
    v$mystat m
where s.statistic# = m.statistic#
and s.name in ('cell physical IO bytes saved by storage index',
'cell physical IO interconnect bytes returned by smart scan' );
 
 
c##rajesh@PDB1>
 
Invoking the script for the column C1 produced a result like this.
 
c##rajesh@PDB1> @script05.sql C1
Connected.
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
 
PL/SQL procedure successfully completed.
 
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                      1869717504
cell physical IO interconnect bytes returned by smart scan            2045376
 
This confirms that storage index was been used while scanning against column C1.
 
The same happens for column C2.
 
c##rajesh@PDB1> @script05.sql C2
Connected.
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
 
PL/SQL procedure successfully completed.
 
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                      1869717504
cell physical IO interconnect bytes returned by smart scan            2045376
 
The results were same when invoking for every other columns in the table. Here is the results for the 24th column in the table.
 
c##rajesh@PDB1> @script05.sql C24
Connected.
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
 
PL/SQL procedure successfully completed.
 
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                      1869717504
cell physical IO interconnect bytes returned by smart scan            2045376
 
as you can see we get the storage index benefit for all the 24 columns, thus proving that we have indeed lifted the limit from 8 storage indexes.
 

Wednesday, March 4, 2020

Storage Indexes - Part VI

In the previous post discussed, how clustering of data within the data is an important factor in the performance and efficiency of both types of indexes. In this post, will expand a little more on this point and highlighted scenario, where a normal database indexes can significantly improve the performance but totally ineffective as a storage index.  
 
Storage index can only determine where the required data can’t exists within the storage region via the accessed min/max values of the index. If the requested value can’t possibly exists within the min/max boundary the specific storage region can be ignored and not be accessed during full table scan. However in some scenarios there can be the limitation in the usefulness of having min/max values to work if both min/max values are relatively common and distributed throughout the table. This is very common when dealing with columns that have data such as “code status” values.
 
c##rajesh@PDB1> create table t
  2  as
  3  select rownum as id,
  4     (mod(rownum-1,5)+1)*2 as format_id,
  5     rpad('Some_really_big_data',80,'*') datas
  6  from big_table
  7  where rownum <=10000000;
 
Table created.
 
c##rajesh@PDB1> select format_id,count(*)
  2  from t
  3  group by format_id
  4  order by format_id;
 
 FORMAT_ID   COUNT(*)
---------- ----------
         2    2000000
         4    2000000
         6    2000000
         8    2000000
        10    2000000
 
The format_id column was initially populated with just values 2,4,6,8,10 evenly distributed throughout the table. Then updated relatively small number of these values in following manner.
 
c##rajesh@PDB1> update t set format_id = 3 where mod(id,10000)=0;
 
1000 rows updated.
 
c##rajesh@PDB1> update t set format_id = 5 where id between 424242 and 425241;
 
1000 rows updated.
 
c##rajesh@PDB1> commit;
 
Commit complete.
 
So we have relatively few format_id = 3 (1000 rows from 10M rows, 0.01% of data) littered throughout the table and a few format_id = 5 located in specific location/portion of the table.
We will now create a normal index on this column and a histogram to let the CBO know there are relatively few occurrence of 3 and 5 within this table
 
c##rajesh@PDB1> create index t_idx on t(format_id) nologging;
 
Index created.
 
c##rajesh@PDB1> begin
  2     dbms_stats.gather_table_stats(user,'T',
  3             method_opt=>'for columns FORMAT_ID size 10');
  4  end;
  5  /
 
PL/SQL procedure successfully completed.
 
If we now run a query looking for all the format_id = 3, then
 
c##rajesh@PDB1> @script04a.sql
 
Session altered.
 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
SQL_ID  9cw13pjgzkvwc, child number 1
-------------------------------------
select * from t where format_id = 3
 
Plan hash value: 767293772
 
-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |   1000 |00:00:00.01 |    1012 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |   1000 |   1000 |00:00:00.01 |    1012 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |      1 |   1035 |   1000 |00:00:00.01 |      12 |
-------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("FORMAT_ID"=3)
 
We noticed that CBO has used the index to efficiently retrieve just that 0.01% of rows because of high selectivity of the query.
 
It does even a better job when accessing format_id = 5 as these specific values are very well clustered and found in very tight portion of the table.
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
SQL_ID  5gchbg91bk70w, child number 1
-------------------------------------
select * from t where format_id = 5
 
Plan hash value: 767293772
 
-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |   1000 |00:00:00.01 |      33 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |   1000 |   1000 |00:00:00.01 |      33 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |      1 |   1035 |   1000 |00:00:00.01 |      12 |
-------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("FORMAT_ID"=5)
 
 
The databases indexes are extremely efficient in these cases, because it can point just to the location within the table, where these relatively few rows of interest are located.
 
Lets see how storage indexes cope up these scenarios.  We will rerun the same again, but now making the index invisible/unusable.
 
c##rajesh@PDB1> alter  index t_idx invisible;
 
Index altered.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> @script04b.sql
 
Session altered.
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID  9cw13pjgzkvwc, child number 1
-------------------------------------
select * from t where format_id = 3
 
Plan hash value: 1601196873
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |      1 |        |   1000 |00:00:00.11 |     131K|    131K|
|*  1 |  TABLE ACCESS STORAGE FULL| T    |      1 |   1000 |   1000 |00:00:00.11 |     131K|    131K|
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("FORMAT_ID"=3)
       filter("FORMAT_ID"=3)
 
 
19 rows selected.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan             659712
 
c##rajesh@PDB1>
 
no matter how often we try, storage indexes have managed to save us preciously nothing. The entire table has to be scanned and read during the full table scan operation.
 
Same results again, when accessing format_id =5, any possible storage indexes have been totally ineffective and the full table scan has to read the entire table.  
 
c##rajesh@PDB1> conn c##rajesh/Password-1@PDB1
Connected.
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> @script04c.sql
 
Session altered.
 
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID  5gchbg91bk70w, child number 1
-------------------------------------
select * from t where format_id = 5
 
Plan hash value: 1601196873
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |      1 |        |   1000 |00:00:00.16 |     131K|    131K|
|*  1 |  TABLE ACCESS STORAGE FULL| T    |      1 |   1000 |   1000 |00:00:00.16 |     131K|    131K|
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("FORMAT_ID"=5)
       filter("FORMAT_ID"=5)
 
 
19 rows selected.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan             405728
 
 
The results produced by storage index is significantly slower that the database index in place.
 
Since the data primarily consist of randomly, distributed values of 2,4,6,8 and 10 with the odd values of 3 and 5 littered around. This means if storage index created the min/max values will be 2 and 10 for all 1M storage regions throughout the table. Therefore both the values 3 and 5 could possibly exists within all/any of the storage regions as they exist inside the 2 - 10  min/max boundaries. These 3 and 5 effectively “hidden” within the storage min/max values making storage index totally ineffective in this scenario. As they can’t skip anything.