Wednesday, June 26, 2019

Exadata Smart Scan - Part III

 
Exadata smart scan is not only limited to predicate filtering and column projection, but also possible with joins – that is joins can be offloaded to storage tier as well. Offloaded joins are accomplished by creating what is called bloom filters. Bloom filters have been around for some long time since Oracle version 10g release 2, so they are not specific to Exadata. One of the main reason oracle uses them is to reduce the traffic between parallel query slaves. They have the advantage of being very small relative to the data set that they represent. However this comes with the price – they can return false positives. That is, the rows that should not be included in the desired results set can occasionally pass a bloom filter. For that reason additional filter must be applied after the bloom filter to ensure that any false positives are eliminated.
 
The interesting thing about the bloom filters from an exadata perspective is that they may be passed to storage servers and evaluated there. This technique can result in a large decrease in the volume of data that must be transmitted back to the database servers.
 
Here is an example:
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cell_offload_processing              boolean     TRUE
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  d53sar94vmh6k, child number 0
-------------------------------------
select /*+ parallel(b1,4) parallel(b2,4) */ b1.object_type,
count(distinct b2.owner) from big_table b1, big_table b2 where b1.id =
b2.id and b1.object_id = b2.object_id and b2.object_type like 'TABLE%'
group by b1.object_type
 
Plan hash value: 564293460
 
------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name      | Starts | E-Rows | Cost (%CPU)|    TQ  |IN-OUT| A-Rows |   A-Time   |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |           |      1 |        |   306K(100)|        |      |      3 |00:00:04.35 |
|   1 |  PX COORDINATOR                           |           |      1 |        |            |        |      |      3 |00:00:04.35 |
|   2 |   PX SEND QC (RANDOM)                     | :TQ10003  |      0 |     34 |   306K  (1)|  Q1,03 | P->S |      0 |00:00:00.01 |
|   3 |    HASH GROUP BY                          |           |      4 |     34 |   306K  (1)|  Q1,03 | PCWP |      3 |00:00:00.01 |
|   4 |     PX RECEIVE                            |           |      4 |     34 |   306K  (1)|  Q1,03 | PCWP |      8 |00:00:00.01 |
|   5 |      PX SEND HASH                         | :TQ10002  |      0 |     34 |   306K  (1)|  Q1,02 | P->P |      0 |00:00:00.01 |
|   6 |       HASH GROUP BY                       |           |      4 |     34 |   306K  (1)|  Q1,02 | PCWP |      8 |00:00:00.02 |
|   7 |        VIEW                               | VM_NWVW_1 |      4 |    746 |   306K  (1)|  Q1,02 | PCWP |     30 |00:00:00.02 |
|   8 |         HASH GROUP BY                     |           |      4 |    746 |   306K  (1)|  Q1,02 | PCWP |     30 |00:00:00.02 |
|   9 |          PX RECEIVE                       |           |      4 |    746 |   306K  (1)|  Q1,02 | PCWP |    120 |00:00:00.02 |
|  10 |           PX SEND HASH                    | :TQ10001  |      0 |    746 |   306K  (1)|  Q1,01 | P->P |      0 |00:00:00.01 |
|  11 |            HASH GROUP BY                  |           |      4 |    746 |   306K  (1)|  Q1,01 | PCWP |    120 |00:00:17.01 |
|* 12 |             HASH JOIN                     |           |      4 |   2941K|   306K  (1)|  Q1,01 | PCWP |   3875K|00:00:16.37 |
|  13 |              JOIN FILTER CREATE           | :BF0000   |      4 |   2941K|   153K  (1)|  Q1,01 | PCWP |     15M|00:00:02.69 |
|  14 |               PX RECEIVE                  |           |      4 |   2941K|   153K  (1)|  Q1,01 | PCWP |     15M|00:00:01.14 |
|  15 |                PX SEND BROADCAST          | :TQ10000  |      0 |   2941K|   153K  (1)|  Q1,00 | P->P |      0 |00:00:00.01 |
|  16 |                 PX BLOCK ITERATOR         |           |      4 |   2941K|   153K  (1)|  Q1,00 | PCWC |   3875K|00:00:00.84 |
|* 17 |                  TABLE ACCESS STORAGE FULL| BIG_TABLE |     53 |   2941K|   153K  (1)|  Q1,00 | PCWP |   3875K|00:00:00.81 |
|  18 |              JOIN FILTER USE              | :BF0000   |      4 |    100M|   153K  (1)|  Q1,01 | PCWP |    100M|00:00:02.77 |
|  19 |               PX BLOCK ITERATOR           |           |      4 |    100M|   153K  (1)|  Q1,01 | PCWC |    100M|00:00:02.70 |
|* 20 |                TABLE ACCESS STORAGE FULL  | BIG_TABLE |     53 |    100M|   153K  (1)|  Q1,01 | PCWP |    100M|00:00:02.62 |
------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
 
  12 - access("B1"."ID"="B2"."ID" AND "B1"."OBJECT_ID"="B2"."OBJECT_ID")
  17 - storage(:Z>=:Z AND :Z<=:Z AND "B2"."OBJECT_TYPE" LIKE 'TABLE%')
       filter("B2"."OBJECT_TYPE" LIKE 'TABLE%')
  20 - storage(:Z>=:Z AND :Z<=:Z AND SYS_OP_BLOOM_FILTER(:BF0000,"B1"."ID","B1"."OBJECT_ID"))
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"B1"."ID","B1"."OBJECT_ID"))
 
 
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cell_offload_processing              boolean     FALSE
 
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  d53sar94vmh6k, child number 1
-------------------------------------
select /*+ parallel(b1,4) parallel(b2,4) */ b1.object_type,
count(distinct b2.owner) from big_table b1, big_table b2 where b1.id =
b2.id and b1.object_id = b2.object_id and b2.object_type like 'TABLE%'
group by b1.object_type
 
Plan hash value: 564293460
 
------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name      | Starts | E-Rows | Cost (%CPU)|    TQ  |IN-OUT| A-Rows |   A-Time   |
------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                          |           |      1 |        |   306K(100)|        |      |      3 |00:00:18.10 |
|   1 |  PX COORDINATOR                           |           |      1 |        |            |        |      |      3 |00:00:18.10 |
|   2 |   PX SEND QC (RANDOM)                     | :TQ10003  |      0 |     34 |   306K  (1)|  Q1,03 | P->S |      0 |00:00:00.01 |
|   3 |    HASH GROUP BY                          |           |      4 |     34 |   306K  (1)|  Q1,03 | PCWP |      3 |00:00:00.01 |
|   4 |     PX RECEIVE                            |           |      4 |     34 |   306K  (1)|  Q1,03 | PCWP |      8 |00:00:00.01 |
|   5 |      PX SEND HASH                         | :TQ10002  |      0 |     34 |   306K  (1)|  Q1,02 | P->P |      0 |00:00:00.01 |
|   6 |       HASH GROUP BY                       |           |      4 |     34 |   306K  (1)|  Q1,02 | PCWP |      8 |00:00:00.01 |
|   7 |        VIEW                               | VM_NWVW_1 |      4 |    746 |   306K  (1)|  Q1,02 | PCWP |     30 |00:00:00.01 |
|   8 |         HASH GROUP BY                     |           |      4 |    746 |   306K  (1)|  Q1,02 | PCWP |     30 |00:00:00.01 |
|   9 |          PX RECEIVE                       |           |      4 |    746 |   306K  (1)|  Q1,02 | PCWP |    120 |00:00:00.01 |
|  10 |           PX SEND HASH                    | :TQ10001  |      0 |    746 |   306K  (1)|  Q1,01 | P->P |      0 |00:00:00.01 |
|  11 |            HASH GROUP BY                  |           |      4 |    746 |   306K  (1)|  Q1,01 | PCWP |    120 |00:01:12.05 |
|* 12 |             HASH JOIN                     |           |      4 |   2941K|   306K  (1)|  Q1,01 | PCWP |   3875K|00:01:11.44 |
|  13 |              JOIN FILTER CREATE           | :BF0000   |      4 |   2941K|   153K  (1)|  Q1,01 | PCWP |     15M|00:00:28.17 |
|  14 |               PX RECEIVE                  |           |      4 |   2941K|   153K  (1)|  Q1,01 | PCWP |     15M|00:00:26.31 |
|  15 |                PX SEND BROADCAST          | :TQ10000  |      0 |   2941K|   153K  (1)|  Q1,00 | P->P |      0 |00:00:00.01 |
|  16 |                 PX BLOCK ITERATOR         |           |      4 |   2941K|   153K  (1)|  Q1,00 | PCWC |   3875K|00:00:29.16 |
|* 17 |                  TABLE ACCESS STORAGE FULL| BIG_TABLE |     53 |   2941K|   153K  (1)|  Q1,00 | PCWP |   3875K|00:00:29.04 |
|  18 |              JOIN FILTER USE              | :BF0000   |      4 |    100M|   153K  (1)|  Q1,01 | PCWP |   6614K|00:00:35.04 |
|  19 |               PX BLOCK ITERATOR           |           |      4 |    100M|   153K  (1)|  Q1,01 | PCWC |   6614K|00:00:34.79 |
|* 20 |                TABLE ACCESS STORAGE FULL  | BIG_TABLE |     53 |    100M|   153K  (1)|  Q1,01 | PCWP |   6614K|00:00:34.51 |
------------------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
  12 - access("B1"."ID"="B2"."ID" AND "B1"."OBJECT_ID"="B2"."OBJECT_ID")
  17 - storage(:Z>=:Z AND :Z<=:Z)
       filter("B2"."OBJECT_TYPE" LIKE 'TABLE%')
  20 - storage(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"B1"."ID","B1"."OBJECT_ID"))
 
 
Notice that our test query with offloading took around 4sec whereas without the offloading took around 18sec. if you look closely at the predicate information in the plans you can observe the bloom filter predicate - SYS_OP_BLOOM_FILTER(:BF0000,"B1"."ID","B1"."OBJECT_ID") – was run on the storage server for the first execution. The offloaded version run faster because the storage servers were able to pre-join the tables, which eliminated a large amount of data that would otherwise have been transferred back to database servers.
 
c##rajesh@PDB1> select child_number,
  2      round(s.io_cell_offload_eligible_bytes/1024/1024,2) as eligible_mb,
  3      round(s.io_interconnect_bytes/1024/1024,2) actual_mb
  4  from v$sql s
  5  where sql_id ='d53sar94vmh6k'
  6  order by 1 ;
 
CHILD_NUMBER ELIGIBLE_MB  ACTUAL_MB
------------ ----------- ----------
           0    31633.63    1215.78
           1           0   31633.63
 
 
With smart scan we just transferred around 1GB of data from storage server to database server. Whereas without smart scan we transferred around 32GB of data.
 

Monday, June 17, 2019

Exadata Smart Scan - Part II


Exadata smart scan is not only limited to predicate filtering, but also possible with column projection. The term Column projection refers to the Exadata ability to limit the volume of data transferred between storage tier and database tier by returning only the columns of interest (that is those in the select list or necessary for join operations on the database tier).

That is if your query returns only five column from a hundred-column table. Exadata can eliminate most of the data that would be returned to the database server by non-exadata storage. This feature is much bigger than you might expect and it can have a very significant impact on response times.

c##rajesh@QCPR1> set serveroutput off
c##rajesh@QCPR1> show parameter cell_offload_processing

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cell_offload_processing              boolean     TRUE
c##rajesh@QCPR1> alter session set cell_offload_processing=false;

Session altered.

c##rajesh@QCPR1> select count(distinct owner) from big_table ;

COUNT(DISTINCTOWNER)
--------------------
                  31

c##rajesh@QCPR1> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3xw054b8k55xu, child number 0
-------------------------------------
select count(distinct owner) from big_table

Plan hash value: 1273675752

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |      1 |00:00:31.42 |    2024K|   2024K|       |       |          |
|   1 |  SORT AGGREGATE              |           |      1 |      1 |      1 |00:00:31.42 |    2024K|   2024K|       |       |          |
|   2 |   VIEW                       | VW_DAG_0  |      1 |     31 |     31 |00:00:31.42 |    2024K|   2024K|       |       |          |
|   3 |    HASH GROUP BY             |           |      1 |     31 |     31 |00:00:31.42 |    2024K|   2024K|  2783K|  2783K| 4850K (0)|
|   4 |     TABLE ACCESS STORAGE FULL| BIG_TABLE |      1 |    100M|    100M|00:00:23.46 |    2024K|   2024K|  1025K|  1025K|          |
----------------------------------------------------------------------------------------------------------------------------------------


16 rows selected.

Without smart scan, the query took around 30 seconds to complete.

c##rajesh@QCPR1> alter system flush buffer_cache;

System altered.

c##rajesh@QCPR1> alter session set cell_offload_processing=true;

Session altered.

c##rajesh@QCPR1> select count(distinct owner) from big_table ;

COUNT(DISTINCTOWNER)
--------------------
                  31

c##rajesh@QCPR1> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  3xw054b8k55xu, child number 1
-------------------------------------
select count(distinct owner) from big_table

Plan hash value: 1273675752

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |      1 |        |      1 |00:00:01.83 |    2024K|   2024K|       |       |          |
|   1 |  SORT AGGREGATE              |           |      1 |      1 |      1 |00:00:01.83 |    2024K|   2024K|       |       |          |
|   2 |   VIEW                       | VW_DAG_0  |      1 |     31 |     31 |00:00:01.83 |    2024K|   2024K|       |       |          |
|   3 |    HASH GROUP BY             |           |      1 |     31 |     31 |00:00:01.83 |    2024K|   2024K|  2783K|  2783K| 4840K (0)|
|   4 |     TABLE ACCESS STORAGE FULL| BIG_TABLE |      1 |    100M|    100M|00:00:00.96 |    2024K|   2024K|  1025K|  1025K| 6170K (0)|
----------------------------------------------------------------------------------------------------------------------------------------


16 rows selected.

With smart scan in place, the query completed in less than a couple of seconds.

But how could we confirm that smart scan was benefited in the second execution– with no details in the “predicate information” section to indicate Storage option. Luckily the V$SQL view contains columns that define the volume of data that might be saved by smart scan (IO_CELL_OFFLOAD_ELIGIBLE_BYTES) and the volume of data that actually returned by the storage servers (IO_INTERCONNECT_BYTES).

Looking at those columns from V$SQL for those two above execution, shows this.

c##rajesh@QCPR1> select child_number,executions,
  2      io_cell_offload_eligible_bytes eligible,
  3      io_interconnect_bytes actual
  4  from v$sql
  5  where sql_id ='3xw054b8k55xu';

CHILD_NUMBER EXECUTIONS      ELIGIBLE        ACTUAL
------------ ---------- ------------- -------------
           0          1             0   16585129984
           1          1   16585129984      98257712

c##rajesh@QCPR1>

For the first execution (with child_number=0), with no smart scan (since “eligible” = 0) we transferred the entire volume of the table (16GB) from storage to database server (since “actual” = size of the table). However for the second execution (with child_number=1), with smart scan in place, has improved the execution by transferring a very small volume (98MB) from storage to database server.

From this we conclude that smart scan on exadata is possible either by
·         looking at the predicate section in the plan for storage filtering
·         looking at the V$SQL for IO_CELL_OFFLOAD_ELIGIBLE_BYTES and IO_INTERCONNECT_BYTES columns.

Monday, June 10, 2019

Exadata Smart Scan - Part I

Smart scan is the secret sauce of Exadata. It’s what makes Exadata different from every other platform that Oracle runs on. It refers to the concept of moving the processing from the database server to storage layer. It is also a key paradigm shift possible only in Exadata platform.

One of the major problem of large database is the time it takes to transfer the large volume of data (for adhoc queries) between storage and database server. Smart scan is an approach that used to solve the problem of excessive time spent in moving relevant data between tiers.

The idea of this posting is to show that there is more behind the Exadata Database Machine than sophisticated and well-combined Hardware, but there is actually some kind of (Database) Intelligence built into the Storage Layer that enables it to do column & predicate filtering before it sends results to the Database Layer. This functionality is called Smart Scan:

Create a table having 100M rows that sized around 16GB.

c##rajesh@PDB1> exec show_space('BIG_TABLE');
Unformatted Blocks .....................               0
FS1 Blocks (0-25) ......................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................              42
Full Blocks ............................       2,024,510
Total Blocks............................       2,029,448
Total Bytes.............................  16,625,238,016
Total MBytes............................          15,855
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................              24
Last Used Ext BlockId...................       2,023,424
Last Used Block.........................           4,824

PL/SQL procedure successfully completed.

c##rajesh@PDB1> select to_char(count(*),'fm999,999,999') cnt from big_table;

CNT
------------
100,000,000

The playing field is setup. We will first look at the scan speed without smart scan.

c##rajesh@PDB1> alter session set statistics_level=all;

Session altered.

c##rajesh@PDB1> set serveroutput off
c##rajesh@PDB1> show parameter cell_offload_processing

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cell_offload_processing              boolean     TRUE
c##rajesh@PDB1> alter session set cell_offload_processing=false;

Session altered.

c##rajesh@PDB1> select count(*) from big_table where owner ='JYU';

  COUNT(*)
----------
      7384

c##rajesh@PDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  bkjgwg8jwctqj, child number 0
-------------------------------------
select count(*) from big_table where owner ='JYU'

Plan hash value: 599409829

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |      1 |        |      1 |00:00:19.45 |    2024K|   2024K|
|   1 |  SORT AGGREGATE            |           |      1 |      1 |      1 |00:00:19.45 |    2024K|   2024K|
|*  2 |   TABLE ACCESS STORAGE FULL| BIG_TABLE |      1 |   7384 |   7384 |00:00:19.45 |    2024K|   2024K|
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"='JYU')


19 rows selected.

We needed around 20 sec to scan about 16GB, with the storage servers delivering the full amount of data into the database layer.

Now the very similar statement with smart scan functionality.

c##rajesh@PDB1> alter session set cell_offload_processing=true;

Session altered.

c##rajesh@PDB1> select count(*) from big_table where owner ='JYU';

  COUNT(*)
----------
      7384

c##rajesh@PDB1> select * from table( dbms_xplan.display_cursor(format=>'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  bkjgwg8jwctqj, child number 1
-------------------------------------
select count(*) from big_table where owner ='JYU'

Plan hash value: 599409829

--------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |           |      1 |        |      1 |00:00:00.20 |    2024K|   2024K|       |       |          |
|   1 |  SORT AGGREGATE            |           |      1 |      1 |      1 |00:00:00.20 |    2024K|   2024K|       |       |          |
|*  2 |   TABLE ACCESS STORAGE FULL| BIG_TABLE |      1 |   7384 |   7384 |00:00:00.20 |    2024K|   2024K|  1025K|  1025K| 6170K (0)|
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("OWNER"='JYU')
       filter("OWNER"='JYU')


20 rows selected.

c##rajesh@PDB1>

Runtime was reduced to less than a second. The presence of “storage” in the predicate section confirms that smart scan – the ability to filter the columns and predicates at storage layer before sending the results to database layer - optimization kick in.