Thursday, July 30, 2020

Hybrid Columnar Compression in 12.2

Oracle 12.2 introduced an interesting optimization for Hybrid columnar compression (HCC).  Until 12.2 we have use direct path load into HCC compressed segments for the data to be actually compressed, if we don’t use a direct path load it will still succeed in entering data into the segment, however this newly inserted data was not HCC compressed and there was no error message or warning about that.
 
Here is a test case from Oracle 11g database running on Exadata X7
 
demo@ORA11G> select * from v$version;
 
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g EE Extreme Perf Release 11.2.0.4.0 - 64bit Production
PL/SQL Release 11.2.0.4.0 - Production
CORE    11.2.0.4.0      Production
TNS for Linux: Version 11.2.0.4.0 - Production
NLSRTL Version 11.2.0.4.0 - Production
 
demo@ORA11G> create table t
  2  nologging
  3  compress for query high
  4  as
  5  select *
  6  from all_objects
  7  where 1 = 0 ;
 
Table created.
 
First we will start with a direct path load.
 
demo@ORA11G> insert /*+ append */ into t select * from all_objects where rownum <=10000;
 
10000 rows created.
 
demo@ORA11G> commit;
 
Commit complete.
 
So how do we verify the data loaded as part of the above dml, was compressed? Thankfully Oracle is well instrumented, and even more thankfully that instrumentation was not limited to performance. The Oracle provided a package dbms_compression features a method “get_compression_type”, which allow us to pass a ROWID and it will tell us the type of compression applied on that row.
 
demo@ORA11G> select dbms_compression.get_compression_type(user,'T',rowid) comp_type, count(*)
  2  from t
  3  group by dbms_compression.get_compression_type(user,'T',rowid);
 
COMP_TYPE   COUNT(*)
---------- ----------
         4      10000
 
Compression type =4 means HCC query high, all rows loaded using direct path were compressed based on the compression type set at the segment.
 
Now let’s repeat the test case without direct path (removing append hint) and see the type of compression getting applied.
 
demo@ORA11G> truncate table t;
 
Table truncated.
 
demo@ORA11G> insert into t select * from all_objects where rownum <=10000;
 
10000 rows created.
 
demo@ORA11G> commit;
 
Commit complete.
 
demo@ORA11G> select dbms_compression.get_compression_type(user,'T',rowid) comp_type, count(*)
  2  from t
  3  group by dbms_compression.get_compression_type(user,'T',rowid);
 
COMP_TYPE   COUNT(*)
---------- ----------
         1       8994
        64       1006
 
Compression type =1 means no compression, this means during convention path load, no compression is applied to the newly inserted data, therefore to have it effectively compressed all the load should be done using direct path load.
 
Repeating the same in Oracle 12c (12.2), we see this.
 
c##rajesh@PDB1> insert into t select * from all_objects;
 
82161 rows created.
 
c##rajesh@PDB1> commit;
 
Commit complete.
 
c##rajesh@PDB1> select dbms_compression.get_compression_type(user,'T',rowid) comp_type, count(*)
  2  from t
  3  group by dbms_compression.get_compression_type(user,'T',rowid)
  4  /
 
COMP_TYPE   COUNT(*)
---------- ----------
         4      82161
 
Though we did conventional path laod, all rows were effectively compressed. So there is no more need for the append hint for insert-select in 12.2? This would be great news, and should alleviate some issues people have from unknowingly using HCC without direct path inserts. And it’s good for concurrency (because direct path operations require a segment lock, whereas array inserts do not use a segment lock)! So array inserts into HCC segments can compress data in Oracle 12.2 even if we don’t specify APPEND hint.
 

Wednesday, July 22, 2020

Storage Indexes - Part VIII

Exadata storage indexes depends on smart scan, which in-turn depend on direct path reads (either serial or parallel). However Oracle will generally use serial direct path reads for large objects, but when the objects are partitioned, Oracle may fail to recognize that the object is “large” while accessing the individual partitions on the table, because Oracle look at the size of each individual segments.  This might result in some partitions not being read via direct path read and hence no smart scan mechanism thus disabling any storage indexes for that partition.
 
The same goes with compression in place, when the data is being compressed the reduced size of the compressed segments will be even less likely to trigger the serial direct path reads and the problem becomes even more noticeable.
 
Here is the table that sized 1400MB in size
 
c##rajesh@PDB1> create table t
  2  nologging
  3  as
  4  select *
  5  from big_table;
 
Table created.
 
c##rajesh@PDB1> exec show_space('T');
Unformatted Blocks .....................               0
FS1 Blocks (0-25) ......................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks ............................         182,648
Total Blocks............................         188,416
Total Bytes.............................   1,543,503,872
Total MBytes............................           1,472
Unused Blocks...........................           5,100
Unused Bytes............................      41,779,200
Last Used Ext FileId....................              24
Last Used Ext BlockId...................      18,268,160
Last Used Block.........................           3,092
 
PL/SQL procedure successfully completed.
 
Here is the script that was used for execution.
 
c##rajesh@PDB1> $ type script.sql
set termout off
select * from t where owner ='JYU';
set termout on
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
We make repeated execution to warmup the storage cells to build the storage index and see if that adds benefit to the execution.
 
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                      3842637824
cell physical IO interconnect bytes returned by smart scan            5940000
 
the storage index is helping us here, that got saved nearly 3+GB of data being transferred from storage to database layer, and the amount of the data that returned from storage to database layer in this case was just as few as 5MB.
 
Now let’s see the effect of compression place
 
c##rajesh@PDB1> alter table t compress for archive high;
 
Table altered.
 
c##rajesh@PDB1> alter table t move online parallel 8;
 
Table altered.
 
c##rajesh@PDB1> exec show_space('T');
Unformatted Blocks .....................               0
FS1 Blocks (0-25) ......................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               4
Full Blocks ............................           9,621
Total Blocks............................           9,752
Total Bytes.............................      79,888,384
Total MBytes............................              76
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................              24
Last Used Ext BlockId...................      23,909,120
Last Used Block.........................             536
 
PL/SQL procedure successfully completed.
 
c##rajesh@PDB1> exec dbms_stats.gather_table_stats(user,'T',degree=>4,no_invalidate=>false);
 
PL/SQL procedure successfully completed.
 
The effect of compression has reduced the size from 1400+ MB to just 76 MB.  Let’s run the queries post the compression
 
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan                  0
 
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> @script.sql
c##rajesh@PDB1> select s.name,m.value
  2  from v$statname s ,
  3      v$mystat m
  4  where s.statistic# = m.statistic#
  5  and s.name in ('cell physical IO bytes saved by storage index',
  6  'cell physical IO interconnect bytes returned by smart scan' );
 
NAME                                                                    VALUE
------------------------------------------------------------ ----------------
cell physical IO bytes saved by storage index                               0
cell physical IO interconnect bytes returned by smart scan            1894136
 
c##rajesh@PDB1>
 
No matter how often we run this queries now, no storage index is used. This confirms that storage index will not be in use for smaller segments. Ofcourse for smaller segments that sounds reasonable and for larger segments that favor direct path reads storage indexes plays a major role in eliminating the portion of region where the requested data can’t exist.