Friday, March 20, 2015

Advanced Index Compression

Compressing an index can be an excellent way to permanently reduce the size of an index in a very cost effective manner. Index entries with many duplicate values (or duplicate leading columns within the index) can be “compressed” by Oracle to reduce both storage overheads and potentially access overheads for large index scans. Oracle basically de-duplicates repeated indexed column values within each individual leaf block by storing each unique occurrence in a prefix section within the block.

But it’s important to compress the right indexes in the right manner. If indexes do not have enough repeated data, it’s quite possible to make certain indexes larger rather than smaller when using compression (as the overheads of having the prefix section in the index block outweighs the benefits of limited reduction of repeated values). So one needs to be very selective on which indexes to compress and take care to compress the correct number of columns within the index. Oracle will only protect you from yourself if you attempt to compress all columns in a unique index, as in this scenario there can be no duplicate values to compress.

So, wouldn’t it be nice if Oracle made it all a lot easier for us and automatically decided which indexes to compress, which columns within the index to compress and which indexes to simply not bother compressing at all. Additionally, rather than an all or nothing approach in which all index leaf blocks are compressed in the same manner, wouldn’t it be nice if Oracle decided for each and every individual leaf block within the index how to best compress it. For those index leaf block that have no duplicate entries, do nothing, for those with some repeated columns just compress them and for those leaf blocks with lots of repeated columns and values to compress all of them as efficiently as possible.

All these are being addressed by the recent release of Oracle database 12.1.0.2, one of the really cool new feature got introduced was Advanced Index Compression

rajesh@PDB1> create table t
  2  nologging
  3  as
  4  select owner,object_name,subobject_name,object_id,
  5     data_object_id,object_type,created,last_ddl_time,
  6     timestamp,status,temporary,generated,secondary,
  7     namespace,edition_name,sharing,editionable,
  8     oracle_maintained, case when id between
  9                     0 and 250000 then 0
 10                     else id end id
 11  from big_table;

Table created.

So, I have fabricated the data such that values in the ID column are effectively unique within 75% of the table but the other 25% consist of repeated values.

If we create a normal B-Tree index without compression:

rajesh@PDB1> create index t_idx on t(id) nologging ;

Index created.

rajesh@PDB1> column index_name format a10
rajesh@PDB1> select index_name,compression,leaf_blocks
  2  from user_indexes
  3  where index_name ='T_IDX';

INDEX_NAME COMPRESSION   LEAF_BLOCKS
---------- ------------- -----------
T_IDX      DISABLED             2124

1 row selected.

rajesh@PDB1> 

We noticed that the index consist of 2124 leaf blocks.
If we try to use normal compression on the index:

rajesh@PDB1> alter index t_idx rebuild compress nologging;

Index altered.

rajesh@PDB1> select index_name,compression,leaf_blocks
  2  from user_indexes
  3  where index_name ='T_IDX';

INDEX_NAME COMPRESSION   LEAF_BLOCKS
---------- ------------- -----------
T_IDX      ENABLED              2685

1 row selected.

rajesh@PDB1>

We notice that the compressed index rather than decrease in size has actually increased in size, up to 2685 leaf blocks. So the index has grown by some 25% due to the fact the index predominately contains unique values which don’t compress at all and the resultant prefix section in the leaf blocks becomes nothing more than additional overhead. The 25% section of the index containing all the repeated values has indeed compressed effectively but these savings are more than offset by the increase in size associated with the other 75% of the index where the index entries had no duplication.

However, if we use the new advanced index compression capability via the COMPRESS ADVANCED LOW clause:

rajesh@PDB1> alter index t_idx rebuild compress advanced low nologging;

Index altered.

rajesh@PDB1>
rajesh@PDB1> select index_name,compression,leaf_blocks
  2  from user_indexes
  3  where index_name ='T_IDX';

INDEX_NAME COMPRESSION   LEAF_BLOCKS
---------- ------------- -----------
T_IDX      ADVANCED LOW         2054

1 row selected.

rajesh@PDB1>

We notice the index has now indeed decreased in size from the original 2124 leaf blocks down to 2054. Oracle has effectively ignored all those leaf blocks where compression wasn’t viable and compressed just the 25% of the index where compression was effective. Obviously, the larger the key values (remembering the rowids associated with the index entries can’t be compressed) and the larger the percentage of repeated data, the larger the overall compression returns.

Advanced Index Compression enables Oracle to automatically just compress those index leaf blocks where compression is beneficial.

If we look at a couple of partial block dumps from this index, first a dump from a leaf block that did have duplicate index entries

rajesh@PDB1> column header_file new_val hf
rajesh@PDB1> column header_block new_val hb
rajesh@PDB1>
rajesh@PDB1> select file_id as header_file , block_id +5 as header_block
  2  from dba_extents
  3  where segment_name ='T_IDX'
  4  and owner = user
  5  and extent_id = 0 ;

HEADER_FILE HEADER_BLOCK
----------- ------------
         12        17197

1 row selected.

rajesh@PDB1>
rajesh@PDB1> alter system flush buffer_cache;

System altered.

rajesh@PDB1> @tkfilename.sql
D:\APP\179818\diag\rdbms\cdb1\cdb1\trace\cdb1_ora_5384.trc

rajesh@PDB1> alter system dump datafile &hf block &hb ;
old   1: alter system dump datafile &hf block &hb
new   1: alter system dump datafile         12 block      17197

System altered.

rajesh@PDB1> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Leaf block dump
===============
header address 408388196=0x18578264
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 652
kdxcofbo 1348=0x544
kdxcofeo 2164=0x874
kdxcoavs 816
kdxlespl 0
kdxlende 0
kdxlenxt 50348846=0x300432e
kdxleprv 50348844=0x300432c
kdxledsz 0
kdxlebksz 8036
kdxlepnro 1
kdxlepnco 1 (Adaptive)
prefix row#0[8032] flag: -P-----, lock: 0, len=4
col 0; len 1; (1):  80
prc 652
row#0[8023] flag: -------, lock: 0, len=9
col 0; len 6; (6):  03 00 41 96 00 04
psno 0
row#1[8014] flag: -------, lock: 0, len=9
col 0; len 6; (6):  03 00 41 96 00 05
psno 0
row#2[8005] flag: -------, lock: 0, len=9
col 0; len 6; (6):  03 00 41 96 00 06
psno 0
row#3[7996] flag: -------, lock: 0, len=9
col 0; len 6; (6):  03 00 41 96 00 07
psno 0
row#4[7987] flag: -------, lock: 0, len=9
col 0; len 6; (6):  03 00 41 96 00 08
psno 0
. . . . . . . .
. . . . . . . .
row#650[2173] flag: -------, lock: 0, len=9
col 0; len 6; (6):  03 00 41 a2 00 10
psno 0
row#651[2164] flag: -------, lock: 0, len=9
col 0; len 6; (6):  03 00 41 a2 00 11
psno 0
----- end of leaf block Logical dump -----
----- end of leaf block dump -----
End dump data blocks tsn: 4 file#: 12 minblk 17197 maxblk 17197


The red section is a portion of the index header that determines the number of rows in the prefix table of the index (kdxlepnro 1). The prefix table basically lists all the distinct column values in the leaf blocks that are to be compressed. The value 1 denotes there is actually only just the 1 distinct column value in this specific leaf block (i.e. all index entries have the same indexed value). This section also denotes how many of the indexed columns are to be compressed (kdxlepnco 1). As this index only has the one column, it also has a value of 1. Note this value can potentially be anything between 0 (no columns compressed) up to the number of columns in the index.  The (Adaptive) reference tells us that Index Advanced Compression has been used and that the values here can change from leaf block to leaf block depending on the data characteristics of the index entries within each leaf block (a dump of a basic compressed index will not have the “Adaptive” reference).

The blue section is the compression prefix table and details all the unique combinations of index entries to be compressed within the leaf block. As all indexed values are the same in this index (value 0, internally represented as 80 hex), the prefix table only has the one row. prc 652 denotes that all 652 index entries in this leaf block have this specific indexed value

Next follows all the actual index entries, which now only consist of the rowid (the 6 byte col 0 column) as they all reference psno 0, which is the unique row id of the only row within the prefix table (row#0).

So rather than storing the indexed value 652 times, we can just store the index value (0) just the once within the prefix table and simply reference it from within the actual index entries. This is why index compression can save us storage, storing something once within a leaf block rather than multiple times.

If we now look at a partial block dump of another index leaf block within the index,  that consists of many differing (basically unique) index entries:

C:\Users\179818>sqlplus rajesh/oracle@pdb1

SQL*Plus: Release 12.1.0.2.0 Production on Sat Mar 21 11:25:55 2015

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Last Successful login time: Sat Mar 21 2015 11:22:46 +05:30

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

rajesh@PDB1>
rajesh@PDB1>
rajesh@PDB1> set timing off
rajesh@PDB1>
rajesh@PDB1> column header_file new_val hf
rajesh@PDB1> column header_block new_val hb
rajesh@PDB1> select file_id as header_file , block_id +3 as header_block
  2  from dba_extents
  3  where segment_name ='T_IDX'
  4  and owner = user
  5  and extent_id = 29 ;

HEADER_FILE HEADER_BLOCK
----------- ------------
         12        36867

1 row selected.

rajesh@PDB1>
rajesh@PDB1> @tkfilename.sql
D:\APP\179818\diag\rdbms\cdb1\cdb1\trace\cdb1_ora_11688.trc

rajesh@PDB1> alter system dump datafile &hf block &hb ;
old   1: alter system dump datafile &hf block &hb
new   1: alter system dump datafile         12 block      36867

System altered.

rajesh@PDB1> exit
Disconnected from Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Prod
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

C:\Users\179818>

Leaf block dump
===============
header address 462979684=0x1b988264
kdxcolev 0
KDXCOLEV Flags = - - -
kdxcolok 0
kdxcoopc 0xa0: opcode=0: iot flags=-C- is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 449
kdxcofbo 938=0x3aa
kdxcofeo 1754=0x6da
kdxcoavs 816
kdxlespl 0
kdxlende 0
kdxlenxt 50368516=0x3009004
kdxleprv 50368514=0x3009002
kdxledsz 0
kdxlebksz 8036
kdxlepnro 0
kdxlepnco 0 (Adaptive)
row#0[8022] flag: -------, lock: 0, len=14
col 0; len 4; (4):  c3 57 36 32
col 1; len 6; (6):  03 00 7c 0a 00 40
row#1[8008] flag: -------, lock: 0, len=14
col 0; len 4; (4):  c3 57 36 33
col 1; len 6; (6):  03 00 7c 0a 00 41
row#2[7994] flag: -------, lock: 0, len=14
col 0; len 4; (4):  c3 57 36 34
col 1; len 6; (6):  03 00 7c 0a 00 42
row#3[7980] flag: -------, lock: 0, len=14
col 0; len 4; (4):  c3 57 36 35
col 1; len 6; (6):  03 00 7c 0a 00 43
row#4[7966] flag: -------, lock: 0, len=14
col 0; len 4; (4):  c3 57 36 36
col 1; len 6; (6):  03 00 7c 0a 00 44
. . . . . . . .
. . . . . . . .
row#447[1768] flag: -------, lock: 0, len=14
col 0; len 4; (4):  c3 57 3a 61
col 1; len 6; (6):  03 00 7d 03 00 20
row#448[1754] flag: -------, lock: 0, len=14
col 0; len 4; (4):  c3 57 3a 62
col 1; len 6; (6):  03 00 7d 03 00 21
----- end of leaf block Logical dump -----
----- end of leaf block dump -----
End dump data blocks tsn: 4 file#: 12 minblk 36867 maxblk 36867


We notice that in the red section, both kdxlepnro 0 and kdxlepnco 0 (Adaptive) have a value of 0, meaning we have no rows and no columns within the prefix table. As such, we have no prefix table at all here and that this leaf block has simply not been compressed.

If we look at the actual index entries, they all have an additional column now in blue, that being the actual indexed value as all the index values in this leaf block are different from each other. Without some form of index entry duplication, there would be no benefit from compression and Index Advanced Compression has automatically determined this and not bothered to compress this leaf block. An attempt to compress this block would have actually increased the necessary overall storage for these index entries, due to the additional overheads associated with the prefix table (note it has an additional 2 byes of overhead per row within the prefix table).

No comments:

Post a Comment