THE RAJESH BLOG: Storage indexes

Showing posts with label Storage indexes. Show all posts

Wednesday, July 22, 2020

Storage Indexes - Part VIII

Exadata storage indexes depends on smart scan, which in-turn depend on direct path reads (either serial or parallel). However Oracle will generally use serial direct path reads for large objects, but when the objects are partitioned, Oracle may fail to recognize that the object is “large” while accessing the individual partitions on the table, because Oracle look at the size of each individual segments. This might result in some partitions not being read via direct path read and hence no smart scan mechanism thus disabling any storage indexes for that partition.

The same goes with compression in place, when the data is being compressed the reduced size of the compressed segments will be even less likely to trigger the serial direct path reads and the problem becomes even more noticeable.

Here is the table that sized 1400MB in size

c##rajesh@PDB1> create table t

2 nologging

3 as

4 select *

5 from big_table;

Table created.

c##rajesh@PDB1> exec show_space('T');

Unformatted Blocks ..................... 0

FS1 Blocks (0-25) ...................... 0

FS2 Blocks (25-50) ..................... 0

FS3 Blocks (50-75) ..................... 0

FS4 Blocks (75-100)..................... 0

Full Blocks ............................ 182,648

Total Blocks............................ 188,416

Total Bytes............................. 1,543,503,872

Total MBytes............................ 1,472

Unused Blocks........................... 5,100

Unused Bytes............................ 41,779,200

Last Used Ext FileId.................... 24

Last Used Ext BlockId................... 18,268,160

Last Used Block......................... 3,092

PL/SQL procedure successfully completed.

Here is the script that was used for execution.

c##rajesh@PDB1> $ type script.sql

set termout off

select * from t where owner ='JYU';

set termout on

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

We make repeated execution to warmup the storage cells to build the storage index and see if that adds benefit to the execution.

c##rajesh@PDB1> @script.sql

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 3842637824

cell physical IO interconnect bytes returned by smart scan 5940000

the storage index is helping us here, that got saved nearly 3+GB of data being transferred from storage to database layer, and the amount of the data that returned from storage to database layer in this case was just as few as 5MB.

Now let’s see the effect of compression place

c##rajesh@PDB1> alter table t compress for archive high;

Table altered.

c##rajesh@PDB1> alter table t move online parallel 8;

Table altered.

c##rajesh@PDB1> exec show_space('T');

Unformatted Blocks ..................... 0

FS1 Blocks (0-25) ...................... 0

FS2 Blocks (25-50) ..................... 0

FS3 Blocks (50-75) ..................... 0

FS4 Blocks (75-100)..................... 4

Full Blocks ............................ 9,621

Total Blocks............................ 9,752

Total Bytes............................. 79,888,384

Total MBytes............................ 76

Unused Blocks........................... 0

Unused Bytes............................ 0

Last Used Ext FileId.................... 24

Last Used Ext BlockId................... 23,909,120

Last Used Block......................... 536

PL/SQL procedure successfully completed.

c##rajesh@PDB1> exec dbms_stats.gather_table_stats(user,'T',degree=>4,no_invalidate=>false);

PL/SQL procedure successfully completed.

The effect of compression has reduced the size from 1400+ MB to just 76 MB. Let’s run the queries post the compression

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

c##rajesh@PDB1> @script.sql

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 1894136

c##rajesh@PDB1>

No matter how often we run this queries now, no storage index is used. This confirms that storage index will not be in use for smaller segments. Ofcourse for smaller segments that sounds reasonable and for larger segments that favor direct path reads storage indexes plays a major role in eliminating the portion of region where the requested data can’t exist.

Monday, March 30, 2020

Storage Indexes - Part VII

Exadata storage indexes are purely memory only structures located on exadata storage servers, care need to taken in how much memory they can potentially consume. While on exadata storage cells, there is a limit on the number of storage index created on each table, prior to Exadata storage software 12.2.1.1.0 there was a limit of 8 columns per table, with 12.2.1.1.0 the limit has been increased to 24 columns per table. Which means there a high chances for all of the critical columns used in the where clause predicates will be covered in storage indexes.

Here is a test case to demonstrate that.

First, will create a table of 24+ columns of nicely clustered data, which will make them the obvious candidates for storage indexes as queries occur on them.

c##rajesh@PDB1> create table t as

2 select trunc(rownum/1000) as c1,

3 trunc(rownum/1000) as c2,

4 trunc(rownum/1000) as c3,

5 trunc(rownum/1000) as c4,

6 trunc(rownum/1000) as c5,

7 trunc(rownum/1000) as c6,

8 trunc(rownum/1000) as c7,

9 trunc(rownum/1000) as c8,

10 trunc(rownum/1000) as c9,

11 trunc(rownum/1000) as c10,

12 trunc(rownum/1000) as c11,

13 trunc(rownum/1000) as c12,

14 trunc(rownum/1000) as c13,

15 trunc(rownum/1000) as c14,

16 trunc(rownum/1000) as c15,

17 trunc(rownum/1000) as c16,

18 trunc(rownum/1000) as c17,

19 trunc(rownum/1000) as c18,

20 trunc(rownum/1000) as c19,

21 trunc(rownum/1000) as c20,

22 trunc(rownum/1000) as c21,

23 trunc(rownum/1000) as c22,

24 trunc(rownum/1000) as c23,

25 trunc(rownum/1000) as c24,

26 trunc(rownum/1000) as c25,

27 rpad('*',100,'*') as some_other_data

28 from big_table

29 where rownum <= 10000000;

Table created.

c##rajesh@PDB1> --Warming up the storage region to build storage indexes

c##rajesh@PDB1>

c##rajesh@PDB1> declare

2 l_data t.some_other_data%type;

3 begin

4 for i in 1..24

5 loop

6 for k in 1..20

7 loop

8 dbms_application_info.set_client_info( ' i='||i||' k='||k);

9 execute immediate ' select max(some_other_data) from t where c'||i||' < :b1 '

10 into l_data

11 using ((k*10)+1000) ;

12 end loop;

13 end loop;

14 end;

15 /

PL/SQL procedure successfully completed.

Procedure to accept the column name as input parameter, and prepare a dynamic sql and execute it.

c##rajesh@PDB1> create or replace procedure validate_si(p_colName in varchar2)

2 as

3 l_data t.some_other_data%type;

4 begin

5 execute immediate ' select max(some_other_data) from t where '|| p_colName|| ' < 2000 '

6 into l_data;

7 dbms_output.put_line( l_data );

8 end;

9 /

Procedure created.

Here is the script; we can use to see the storage index in play.

c##rajesh@PDB1> $ type script05.sql

conn c##rajesh/Password-1@PDB1

col name format a60

col value format 999999999999999

select s.name,m.value

from v$statname s ,

v$mystat m

where s.statistic# = m.statistic#

and s.name in ('cell physical IO bytes saved by storage index',

'cell physical IO interconnect bytes returned by smart scan' );

exec validate_si('&1');

select s.name,m.value

from v$statname s ,

v$mystat m

where s.statistic# = m.statistic#

and s.name in ('cell physical IO bytes saved by storage index',

'cell physical IO interconnect bytes returned by smart scan' );

c##rajesh@PDB1>

Invoking the script for the column C1 produced a result like this.

c##rajesh@PDB1> @script05.sql C1

Connected.

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

PL/SQL procedure successfully completed.

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 1869717504

cell physical IO interconnect bytes returned by smart scan 2045376

This confirms that storage index was been used while scanning against column C1.

The same happens for column C2.

c##rajesh@PDB1> @script05.sql C2

Connected.

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

PL/SQL procedure successfully completed.

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 1869717504

cell physical IO interconnect bytes returned by smart scan 2045376

The results were same when invoking for every other columns in the table. Here is the results for the 24^th column in the table.

c##rajesh@PDB1> @script05.sql C24

Connected.

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

PL/SQL procedure successfully completed.

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 1869717504

cell physical IO interconnect bytes returned by smart scan 2045376

as you can see we get the storage index benefit for all the 24 columns, thus proving that we have indeed lifted the limit from 8 storage indexes.

Wednesday, March 4, 2020

Storage Indexes - Part VI

In the previous post discussed, how clustering of data within the data is an important factor in the performance and efficiency of both types of indexes. In this post, will expand a little more on this point and highlighted scenario, where a normal database indexes can significantly improve the performance but totally ineffective as a storage index.

Storage index can only determine where the required data can’t exists within the storage region via the accessed min/max values of the index. If the requested value can’t possibly exists within the min/max boundary the specific storage region can be ignored and not be accessed during full table scan. However in some scenarios there can be the limitation in the usefulness of having min/max values to work if both min/max values are relatively common and distributed throughout the table. This is very common when dealing with columns that have data such as “code status” values.

c##rajesh@PDB1> create table t

2 as

3 select rownum as id,

4 (mod(rownum-1,5)+1)*2 as format_id,

5 rpad('Some_really_big_data',80,'*') datas

6 from big_table

7 where rownum <=10000000;

Table created.

c##rajesh@PDB1> select format_id,count(*)

2 from t

3 group by format_id

4 order by format_id;

FORMAT_ID COUNT(*)

---------- ----------

2 2000000

4 2000000

6 2000000

8 2000000

10 2000000

The format_id column was initially populated with just values 2,4,6,8,10 evenly distributed throughout the table. Then updated relatively small number of these values in following manner.

c##rajesh@PDB1> update t set format_id = 3 where mod(id,10000)=0;

1000 rows updated.

c##rajesh@PDB1> update t set format_id = 5 where id between 424242 and 425241;

1000 rows updated.

c##rajesh@PDB1> commit;

Commit complete.

So we have relatively few format_id = 3 (1000 rows from 10M rows, 0.01% of data) littered throughout the table and a few format_id = 5 located in specific location/portion of the table.

We will now create a normal index on this column and a histogram to let the CBO know there are relatively few occurrence of 3 and 5 within this table

c##rajesh@PDB1> create index t_idx on t(format_id) nologging;

Index created.

c##rajesh@PDB1> begin

2 dbms_stats.gather_table_stats(user,'T',

3 method_opt=>'for columns FORMAT_ID size 10');

4 end;

5 /

PL/SQL procedure successfully completed.

If we now run a query looking for all the format_id = 3, then

c##rajesh@PDB1> @script04a.sql

Session altered.

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------

SQL_ID 9cw13pjgzkvwc, child number 1

-------------------------------------

select * from t where format_id = 3

Plan hash value: 767293772

-------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 1012 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1000 | 1000 |00:00:00.01 | 1012 |

|* 2 | INDEX RANGE SCAN | T_IDX | 1 | 1035 | 1000 |00:00:00.01 | 12 |

-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("FORMAT_ID"=3)

We noticed that CBO has used the index to efficiently retrieve just that 0.01% of rows because of high selectivity of the query.

It does even a better job when accessing format_id = 5 as these specific values are very well clustered and found in very tight portion of the table.

PLAN_TABLE_OUTPUT

------------------------------------------------------------------------------------------------------------

SQL_ID 5gchbg91bk70w, child number 1

-------------------------------------

select * from t where format_id = 5

Plan hash value: 767293772

-------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.01 | 33 |

| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T | 1 | 1000 | 1000 |00:00:00.01 | 33 |

|* 2 | INDEX RANGE SCAN | T_IDX | 1 | 1035 | 1000 |00:00:00.01 | 12 |

-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - access("FORMAT_ID"=5)

The databases indexes are extremely efficient in these cases, because it can point just to the location within the table, where these relatively few rows of interest are located.

Lets see how storage indexes cope up these scenarios. We will rerun the same again, but now making the index invisible/unusable.

c##rajesh@PDB1> alter index t_idx invisible;

Index altered.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

c##rajesh@PDB1> @script04b.sql

Session altered.

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------

SQL_ID 9cw13pjgzkvwc, child number 1

-------------------------------------

select * from t where format_id = 3

Plan hash value: 1601196873

-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.11 | 131K| 131K|

|* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 1000 | 1000 |00:00:00.11 | 131K| 131K|

-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("FORMAT_ID"=3)

filter("FORMAT_ID"=3)

19 rows selected.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 659712

c##rajesh@PDB1>

no matter how often we try, storage indexes have managed to save us preciously nothing. The entire table has to be scanned and read during the full table scan operation.

Same results again, when accessing format_id =5, any possible storage indexes have been totally ineffective and the full table scan has to read the entire table.

c##rajesh@PDB1> conn c##rajesh/Password-1@PDB1

Connected.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 0

c##rajesh@PDB1> @script04c.sql

Session altered.

PLAN_TABLE_OUTPUT

-----------------------------------------------------------------------------------------------------

SQL_ID 5gchbg91bk70w, child number 1

-------------------------------------

select * from t where format_id = 5

Plan hash value: 1601196873

-----------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | | 1000 |00:00:00.16 | 131K| 131K|

|* 1 | TABLE ACCESS STORAGE FULL| T | 1 | 1000 | 1000 |00:00:00.16 | 131K| 131K|

-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - storage("FORMAT_ID"=5)

filter("FORMAT_ID"=5)

19 rows selected.

c##rajesh@PDB1> select s.name,m.value

2 from v$statname s ,

3 v$mystat m

4 where s.statistic# = m.statistic#

5 and s.name in ('cell physical IO bytes saved by storage index',

6 'cell physical IO interconnect bytes returned by smart scan' );

NAME VALUE

------------------------------------------------------------ ----------------

cell physical IO bytes saved by storage index 0

cell physical IO interconnect bytes returned by smart scan 405728

The results produced by storage index is significantly slower that the database index in place.

Since the data primarily consist of randomly, distributed values of 2,4,6,8 and 10 with the odd values of 3 and 5 littered around. This means if storage index created the min/max values will be 2 and 10 for all 1M storage regions throughout the table. Therefore both the values 3 and 5 could possibly exists within all/any of the storage regions as they exist inside the 2 - 10 min/max boundaries. These 3 and 5 effectively “hidden” within the storage min/max values making storage index totally ineffective in this scenario. As they can’t skip anything.

Books I follow...

Advanced SQL Functions in Oracle 10g - by Richard Walsh Earp
Cost-Based Oracle Fundamentals - By Jonathan Lewis
Effective Oracle by Design - by Thomas kyte
Expert One on One Oracle - by Thomas kyte
Expert Oracle - by Thomas kyte
Expert Oracle Database Architecture Oracle Database 9i, 10g - Thomas Kyte
Expert Oracle Database Architecture Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions - Thomas Kyte
Expert Oracle Practices Oracle Database Administration from the Oak Table
High-Performance Oracle Database Applications - by Donald K. Burleson
Mastering Oracle PL/SQL: Practical Solutions - by Connor McDonald
Mastering Oracle Scheduler in Oracle 11g Databases - by Ronald Rood
Mastering Oracle SQL and SQL*Plus - By LEX DE HAAN
Optimizing Oracle Performance - By Cary Millsap
Oracle Database 10g Performance Tuning Tips & Techniques - By Richard J. Niemiec
Oracle Database 10g XML & SQL: Design, Build & Manage XML Applications in Java, C, C++ & PL/SQL - by Mark V. Scardina, Ben Chang and Jinyu Wang
Oracle Database 10g: The Complete Reference - by Kevin Loney
Oracle Database 11g DBA Handbook - by Bob Bryla , Kevin Loney
Oracle Database 11g New Features - Maximize the new capabilities of the Latest Database Release - by Robert G. Freeman
Oracle Performance Tuning for 10gR2 - By Gavin Powell
Oracle PL/SQL for DBAs By Steven Feuerstein, Arup Nanda
Oracle SQL*Plus: The Definitive Guide - By Jonathan Gennick
Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning by Richmond Shee, Kirtikumar Deshpande and K Gopalakrishnan
Oracle8i Internal Services for Waits, Latches, Locks, and Memory - by Steve Adams
Professional Oracle 8i Application Programming - By Thomas Kyte
Secrets of the Oracle Database - By Norbert Debes
Troubleshooting Oracle Performance - By Christian Antognini
Tuning Third-party Vendor Oracle systems Tuning when you can't touch the code - By Mike Ault
Using Oracle SQL Stored Outlines & Optimizer Plan Stability - By Mike Ault