Showing posts with label Storage indexes. Show all posts
Showing posts with label Storage indexes. Show all posts

Wednesday, July 22, 2020

Storage Indexes - Part VIII

Exadata storage indexes depends on smart scan, which in-turn depend on direct path reads (either serial or parallel). However Oracle will generally use serial direct path reads for large objects, but when the objects are partitioned, Oracle may fail to recognize that the object is “large” while accessing the individual partitions on the table, because Oracle look at the size of each individual segments.  This might result in some partitions not being read via direct path read and hence no smart scan mechanism thus disabling any storage indexes for that partition.
 
The same goes with compression in place, when the data is being compressed the reduced size of the compressed segments will be even less likely to trigger the serial direct path reads and the problem becomes even more noticeable.
 
Here is the table that sized 1400MB in size
 
c##rajesh@PDB1> create table t
  2  nologging
  3  as
  4  select *
  5  from big_table;
 
Table created.
 
c##rajesh@PDB1> exec show_space('T');
Unformatted Blocks .....................               0
FS1 Blocks (0-25) ......................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks ............................         182,648
Total Blocks............................         188,416
Total Bytes.............................   1,543,503,872
Total MBytes............................           1,472
Unused Blocks...........................           5,100
Unused Bytes............................      41,779,200
Last Used Ext FileId....................              24
Last Used Ext BlockId...................      18,268,160
Last Used Block.........................           3,092
 
PL/SQL procedure successfully completed.
 
Here is the script that was used for execution.
 
c##rajesh@PDB1> $ type script.sql
set termout off
select * from t where owner ='JYU';
set termout on
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
We make repeated execution to warmup the storage cells to build the storage index and see if that adds benefit to the execution.
 
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                      3842637824
cell physical IO interconnect bytes returned by smart scan            5940000
 
the storage index is helping us here, that got saved nearly 3+GB of data being transferred from storage to database layer, and the amount of the data that returned from storage to database layer in this case was just as few as 5MB.
 
Now let’s see the effect of compression place
 
c##rajesh@PDB1> alter table t compress for archive high;
 
Table altered.
 
c##rajesh@PDB1> alter table t move online parallel 8;
 
Table altered.
 
c##rajesh@PDB1> exec show_space('T');
Unformatted Blocks .....................               0
FS1 Blocks (0-25) ......................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               4
Full Blocks ............................           9,621
Total Blocks............................           9,752
Total Bytes.............................      79,888,384
Total MBytes............................              76
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................              24
Last Used Ext BlockId...................      23,909,120
Last Used Block.........................             536
 
PL/SQL procedure successfully completed.
 
c##rajesh@PDB1> exec dbms_stats.gather_table_stats(user,'T',degree=>4,no_invalidate=>false);
 
PL/SQL procedure successfully completed.
 
The effect of compression has reduced the size from 1400+ MB to just 76 MB.  Let’s run the queries post the compression
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan            1894136
 
c##rajesh@PDB1>
 
No matter how often we run this queries now, no storage index is used. This confirms that storage index will not be in use for smaller segments. Ofcourse for smaller segments that sounds reasonable and for larger segments that favor direct path reads storage indexes plays a major role in eliminating the portion of region where the requested data can’t exist.

Monday, March 30, 2020

Storage Indexes - Part VII

Exadata storage indexes are purely memory only structures located on exadata storage servers, care need to taken in how much memory they can potentially consume. While on exadata storage cells, there is a limit on the number of storage index created on each table, prior to Exadata storage software 12.2.1.1.0 there was a limit of 8 columns per table, with 12.2.1.1.0 the limit has been increased to 24 columns per table. Which means there a high chances for all of the critical columns used in the where clause predicates will be covered in storage indexes.
 
Here is a test case to demonstrate that.
 
First, will create a table of 24+ columns of nicely clustered data, which will make them the obvious candidates for storage indexes as queries occur on them.
 
c##rajesh@PDB1> create table t as
  2  select trunc(rownum/1000) as c1,
  3     trunc(rownum/1000) as c2,
  4     trunc(rownum/1000) as c3,
  5     trunc(rownum/1000) as c4,
  6     trunc(rownum/1000) as c5,
  7     trunc(rownum/1000) as c6,
  8     trunc(rownum/1000) as c7,
  9     trunc(rownum/1000) as c8,
 10     trunc(rownum/1000) as c9,
 11     trunc(rownum/1000) as c10,
 12     trunc(rownum/1000) as c11,
 13     trunc(rownum/1000) as c12,
 14     trunc(rownum/1000) as c13,
 15     trunc(rownum/1000) as c14,
 16     trunc(rownum/1000) as c15,
 17     trunc(rownum/1000) as c16,
 18     trunc(rownum/1000) as c17,
 19     trunc(rownum/1000) as c18,
 20     trunc(rownum/1000) as c19,
 21     trunc(rownum/1000) as c20,
 22     trunc(rownum/1000) as c21,
 23     trunc(rownum/1000) as c22,
 24     trunc(rownum/1000) as c23,
 25     trunc(rownum/1000) as c24,
 26     trunc(rownum/1000) as c25,
 27     rpad('*',100,'*')  as some_other_data
 28  from big_table
 29  where rownum <= 10000000;
 
Table created.
 
c##rajesh@PDB1> --Warming up the storage region to build storage indexes
c##rajesh@PDB1>
c##rajesh@PDB1> declare
  2     l_data t.some_other_data%type;
  3  begin
  4     for i in 1..24
  5     loop
  6             for k in 1..20
  7             loop
  8                     dbms_application_info.set_client_info( ' i='||i||' k='||k);
  9                     execute immediate ' select max(some_other_data) from t where c'||i||' < :b1 '
 10                             into l_data
 11                             using ((k*10)+1000) ;
 12             end loop;
 13     end loop;
 14  end;
 15  /
 
PL/SQL procedure successfully completed.
 
Procedure to accept the column name as input parameter, and prepare a dynamic sql and execute it.
 
c##rajesh@PDB1> create or replace procedure validate_si(p_colName in varchar2)
  2  as
  3     l_data t.some_other_data%type;
  4  begin
  5     execute immediate ' select max(some_other_data) from t where '|| p_colName|| ' < 2000 '
  6             into l_data;
  7     dbms_output.put_line( l_data );
  8  end;
  9  /
 
Procedure created.
 
Here is the script; we can use to see the storage index in play.
 
c##rajesh@PDB1> $ type script05.sql
conn c##rajesh/Password-1@PDB1
 
col name format a60
col value format 999999999999999
 
select s.name,m.value
from v$statname s ,
    v$mystat m
where s.statistic# = m.statistic#
and s.name in ('cell physical IO bytes saved by storage index',
'cell physical IO interconnect bytes returned by smart scan' );
 
exec validate_si('&1');
 
select s.name,m.value
from v$statname s ,
    v$mystat m
where s.statistic# = m.statistic#
and s.name in ('cell physical IO bytes saved by storage index',
'cell physical IO interconnect bytes returned by smart scan' );
 
 
c##rajesh@PDB1>
 
Invoking the script for the column C1 produced a result like this.
 
c##rajesh@PDB1> @script05.sql C1
Connected.
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
 
PL/SQL procedure successfully completed.
 
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                      1869717504
cell physical IO interconnect bytes returned by smart scan            2045376
 
This confirms that storage index was been used while scanning against column C1.
 
The same happens for column C2.
 
c##rajesh@PDB1> @script05.sql C2
Connected.
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
 
PL/SQL procedure successfully completed.
 
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                      1869717504
cell physical IO interconnect bytes returned by smart scan            2045376
 
The results were same when invoking for every other columns in the table. Here is the results for the 24th column in the table.
 
c##rajesh@PDB1> @script05.sql C24
Connected.
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
 
PL/SQL procedure successfully completed.
 
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                      1869717504
cell physical IO interconnect bytes returned by smart scan            2045376
 
as you can see we get the storage index benefit for all the 24 columns, thus proving that we have indeed lifted the limit from 8 storage indexes.
 

Wednesday, March 4, 2020

Storage Indexes - Part VI

In the previous post discussed, how clustering of data within the data is an important factor in the performance and efficiency of both types of indexes. In this post, will expand a little more on this point and highlighted scenario, where a normal database indexes can significantly improve the performance but totally ineffective as a storage index.  
 
Storage index can only determine where the required data can’t exists within the storage region via the accessed min/max values of the index. If the requested value can’t possibly exists within the min/max boundary the specific storage region can be ignored and not be accessed during full table scan. However in some scenarios there can be the limitation in the usefulness of having min/max values to work if both min/max values are relatively common and distributed throughout the table. This is very common when dealing with columns that have data such as “code status” values.
 
c##rajesh@PDB1> create table t
  2  as
  3  select rownum as id,
  4     (mod(rownum-1,5)+1)*2 as format_id,
  5     rpad('Some_really_big_data',80,'*') datas
  6  from big_table
  7  where rownum <=10000000;
 
Table created.
 
c##rajesh@PDB1> select format_id,count(*)
  2  from t
  3  group by format_id
  4  order by format_id;
 
 FORMAT_ID   COUNT(*)
---------- ----------
         2    2000000
         4    2000000
         6    2000000
         8    2000000
        10    2000000
 
The format_id column was initially populated with just values 2,4,6,8,10 evenly distributed throughout the table. Then updated relatively small number of these values in following manner.
 
c##rajesh@PDB1> update t set format_id = 3 where mod(id,10000)=0;
 
1000 rows updated.
 
c##rajesh@PDB1> update t set format_id = 5 where id between 424242 and 425241;
 
1000 rows updated.
 
c##rajesh@PDB1> commit;
 
Commit complete.
 
So we have relatively few format_id = 3 (1000 rows from 10M rows, 0.01% of data) littered throughout the table and a few format_id = 5 located in specific location/portion of the table.
We will now create a normal index on this column and a histogram to let the CBO know there are relatively few occurrence of 3 and 5 within this table
 
c##rajesh@PDB1> create index t_idx on t(format_id) nologging;
 
Index created.
 
c##rajesh@PDB1> begin
  2     dbms_stats.gather_table_stats(user,'T',
  3             method_opt=>'for columns FORMAT_ID size 10');
  4  end;
  5  /
 
PL/SQL procedure successfully completed.
 
If we now run a query looking for all the format_id = 3, then
 
c##rajesh@PDB1> @script04a.sql
 
Session altered.
 
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
SQL_ID  9cw13pjgzkvwc, child number 1
-------------------------------------
select * from t where format_id = 3
 
Plan hash value: 767293772
 
-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |   1000 |00:00:00.01 |    1012 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |   1000 |   1000 |00:00:00.01 |    1012 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |      1 |   1035 |   1000 |00:00:00.01 |      12 |
-------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("FORMAT_ID"=3)
 
We noticed that CBO has used the index to efficiently retrieve just that 0.01% of rows because of high selectivity of the query.
 
It does even a better job when accessing format_id = 5 as these specific values are very well clustered and found in very tight portion of the table.
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
SQL_ID  5gchbg91bk70w, child number 1
-------------------------------------
select * from t where format_id = 5
 
Plan hash value: 767293772
 
-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |   1000 |00:00:00.01 |      33 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T     |      1 |   1000 |   1000 |00:00:00.01 |      33 |
|*  2 |   INDEX RANGE SCAN                  | T_IDX |      1 |   1035 |   1000 |00:00:00.01 |      12 |
-------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("FORMAT_ID"=5)
 
 
The databases indexes are extremely efficient in these cases, because it can point just to the location within the table, where these relatively few rows of interest are located.
 
Lets see how storage indexes cope up these scenarios.  We will rerun the same again, but now making the index invisible/unusable.
 
c##rajesh@PDB1> alter  index t_idx invisible;
 
Index altered.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> @script04b.sql
 
Session altered.
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID  9cw13pjgzkvwc, child number 1
-------------------------------------
select * from t where format_id = 3
 
Plan hash value: 1601196873
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |      1 |        |   1000 |00:00:00.11 |     131K|    131K|
|*  1 |  TABLE ACCESS STORAGE FULL| T    |      1 |   1000 |   1000 |00:00:00.11 |     131K|    131K|
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("FORMAT_ID"=3)
       filter("FORMAT_ID"=3)
 
 
19 rows selected.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan             659712
 
c##rajesh@PDB1>
 
no matter how often we try, storage indexes have managed to save us preciously nothing. The entire table has to be scanned and read during the full table scan operation.
 
Same results again, when accessing format_id =5, any possible storage indexes have been totally ineffective and the full table scan has to read the entire table.  
 
c##rajesh@PDB1> conn c##rajesh/Password-1@PDB1
Connected.
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> @script04c.sql
 
Session altered.
 
 
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID  5gchbg91bk70w, child number 1
-------------------------------------
select * from t where format_id = 5
 
Plan hash value: 1601196873
 
-----------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |      |      1 |        |   1000 |00:00:00.16 |     131K|    131K|
|*  1 |  TABLE ACCESS STORAGE FULL| T    |      1 |   1000 |   1000 |00:00:00.16 |     131K|    131K|
-----------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   1 - storage("FORMAT_ID"=5)
       filter("FORMAT_ID"=5)
 
 
19 rows selected.
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan             405728
 
 
The results produced by storage index is significantly slower that the database index in place.
 
Since the data primarily consist of randomly, distributed values of 2,4,6,8 and 10 with the odd values of 3 and 5 littered around. This means if storage index created the min/max values will be 2 and 10 for all 1M storage regions throughout the table. Therefore both the values 3 and 5 could possibly exists within all/any of the storage regions as they exist inside the 2 - 10  min/max boundaries. These 3 and 5 effectively “hidden” within the storage min/max values making storage index totally ineffective in this scenario. As they can’t skip anything.