In the previous post discussed, how clustering of data within the data
is an important factor in the performance and efficiency of both types of
indexes. In this post, will expand a little more on this point and highlighted
scenario, where a normal database indexes can significantly improve the
performance but totally ineffective as a storage index.
Storage index can only determine where the required data can’t exists
within the storage region via the accessed min/max values of the index. If the
requested value can’t possibly exists within the min/max boundary the specific
storage region can be ignored and not be accessed during full table scan.
However in some scenarios there can be the limitation in the usefulness of
having min/max values to work if both min/max values are relatively common and
distributed throughout the table. This is very common when dealing with columns
that have data such as “code status” values.
c##rajesh@PDB1>
create table t
2 as
3
select rownum as id,
4
(mod(rownum-1,5)+1)*2 as format_id,
5
rpad('Some_really_big_data',80,'*') datas
6 from
big_table
7
where rownum <=10000000;
Table created.
c##rajesh@PDB1>
select format_id,count(*)
2 from
t
3
group by format_id
4
order by format_id;
FORMAT_ID
COUNT(*)
----------
----------
2
2000000
4
2000000
6
2000000
8
2000000
10
2000000
The format_id column was initially populated with just values
2,4,6,8,10 evenly distributed throughout the table. Then updated relatively
small number of these values in following manner.
c##rajesh@PDB1>
update t set format_id = 3 where mod(id,10000)=0;
1000 rows
updated.
c##rajesh@PDB1>
update t set format_id = 5 where id between 424242 and 425241;
1000 rows
updated.
c##rajesh@PDB1>
commit;
Commit
complete.
So we have relatively few format_id = 3 (1000 rows from 10M rows, 0.01%
of data) littered throughout the table and a few format_id = 5 located in
specific location/portion of the table.
We will now create a normal index on this column and a histogram to let
the CBO know there are relatively few occurrence of 3 and 5 within this table
c##rajesh@PDB1>
create index t_idx on t(format_id) nologging;
Index created.
c##rajesh@PDB1>
begin
2
dbms_stats.gather_table_stats(user,'T',
3
method_opt=>'for columns FORMAT_ID size 10');
4 end;
5 /
PL/SQL
procedure successfully completed.
If we now run a query looking for all the format_id = 3, then
c##rajesh@PDB1>
@script04a.sql
Session
altered.
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
SQL_ID 9cw13pjgzkvwc, child number 1
-------------------------------------
select * from t
where format_id = 3
Plan hash
value: 767293772
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time
| Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
1 | | 1000 |00:00:00.01 | 1012 |
| 1 |
TABLE ACCESS BY INDEX ROWID BATCHED| T |
1 | 1000 | 1000 |00:00:00.01 | 1012 |
|* 2 | INDEX RANGE SCAN | T_IDX | 1 |
1035 | 1000 |00:00:00.01 | 12 |
-------------------------------------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
2 - access("FORMAT_ID"=3)
We noticed that CBO has used the index to efficiently retrieve just
that 0.01% of rows because of high selectivity of the query.
It does even a better job when accessing format_id = 5 as these
specific values are very well clustered and found in very tight portion of the
table.
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------
SQL_ID 5gchbg91bk70w, child number 1
-------------------------------------
select * from t
where format_id = 5
Plan hash
value: 767293772
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time
| Buffers |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
1 | | 1000 |00:00:00.01 | 33 |
| 1 |
TABLE ACCESS BY INDEX ROWID BATCHED| T |
1 | 1000 | 1000 |00:00:00.01 | 33 |
|* 2 | INDEX RANGE SCAN | T_IDX | 1 |
1035 | 1000 |00:00:00.01 | 12 |
-------------------------------------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
2 - access("FORMAT_ID"=5)
The databases indexes are extremely efficient in these cases, because
it can point just to the location within the table, where these relatively few
rows of interest are located.
Lets see how storage indexes cope up these scenarios. We will rerun the same again, but now making
the index invisible/unusable.
c##rajesh@PDB1>
alter index t_idx invisible;
Index altered.
c##rajesh@PDB1>
select s.name,m.value
2 from
v$statname s ,
3
v$mystat m
4
where s.statistic# = m.statistic#
5 and
s.name in ('cell physical IO bytes saved by storage index',
6
'cell physical IO interconnect bytes returned by smart scan' );
NAME
VALUE
------------------------------------------------------------
----------------
cell physical
IO bytes saved by storage index 0
cell physical
IO interconnect bytes returned by smart scan 0
c##rajesh@PDB1>
@script04b.sql
Session
altered.
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID 9cw13pjgzkvwc, child number 1
-------------------------------------
select * from t
where format_id = 3
Plan hash
value: 1601196873
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |
A-Rows | A-Time | Buffers | Reads |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT |
| 1 | |
1000 |00:00:00.11 | 131K| 131K|
|* 1 | TABLE ACCESS STORAGE FULL| T | 1 |
1000 | 1000 |00:00:00.11 | 131K|
131K|
-----------------------------------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
1 - storage("FORMAT_ID"=3)
filter("FORMAT_ID"=3)
19 rows
selected.
c##rajesh@PDB1>
select s.name,m.value
2 from
v$statname s ,
3
v$mystat m
4
where s.statistic# = m.statistic#
5 and
s.name in ('cell physical IO bytes saved by storage index',
6
'cell physical IO interconnect bytes returned by smart scan' );
NAME
VALUE
------------------------------------------------------------
----------------
cell
physical IO bytes saved by storage index 0
cell physical
IO interconnect bytes returned by smart scan 659712
c##rajesh@PDB1>
no matter how often we try, storage indexes have managed to save us
preciously nothing. The entire table has to be scanned and read during the full
table scan operation.
Same results again, when accessing format_id =5, any possible storage
indexes have been totally ineffective and the full table scan has to read the
entire table.
c##rajesh@PDB1>
conn c##rajesh/Password-1@PDB1
Connected.
c##rajesh@PDB1>
select s.name,m.value
2 from
v$statname s ,
3
v$mystat m
4
where s.statistic# = m.statistic#
5 and
s.name in ('cell physical IO bytes saved by storage index',
6
'cell physical IO interconnect bytes returned by smart scan' );
NAME
VALUE
------------------------------------------------------------
----------------
cell physical
IO bytes saved by storage index 0
cell physical
IO interconnect bytes returned by smart scan 0
c##rajesh@PDB1>
@script04c.sql
Session
altered.
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------
SQL_ID 5gchbg91bk70w, child number 1
-------------------------------------
select * from t
where format_id = 5
Plan hash
value: 1601196873
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows |
A-Rows | A-Time | Buffers | Reads |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT |
| 1 | |
1000 |00:00:00.16 | 131K| 131K|
|* 1 | TABLE ACCESS STORAGE FULL| T | 1 |
1000 | 1000 |00:00:00.16 | 131K|
131K|
-----------------------------------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
1 - storage("FORMAT_ID"=5)
filter("FORMAT_ID"=5)
19 rows
selected.
c##rajesh@PDB1>
select s.name,m.value
2 from
v$statname s ,
3
v$mystat m
4
where s.statistic# = m.statistic#
5 and
s.name in ('cell physical IO bytes saved by storage index',
6
'cell physical IO interconnect bytes returned by smart scan' );
NAME VALUE
------------------------------------------------------------
----------------
cell
physical IO bytes saved by storage index 0
cell physical
IO interconnect bytes returned by smart scan 405728
The results produced by storage index is significantly slower that the
database index in place.
Since the data primarily consist of randomly, distributed values of
2,4,6,8 and 10 with the odd values of 3 and 5 littered around. This means if
storage index created the min/max values will be 2 and 10 for all 1M storage
regions throughout the table. Therefore both the values 3 and 5 could possibly
exists within all/any of the storage regions as they exist inside the 2 - 10 min/max boundaries. These 3 and 5 effectively
“hidden” within the storage min/max values making storage index totally
ineffective in this scenario. As they can’t skip anything.
No comments:
Post a Comment