Saturday, December 19, 2015

Vector Transformation in Oracle 12c (a.k.a In-memory Aggregation)

Vector Transformation is a new feature introduced with Oracle 12c (12.1.0.2) to Optimize the CPU usage for the queries that aggregate the results of join between small tables (dimension table) and large table (fact table)

In a typical warehouse queries aggregate from a fact table, and joins fact with one or more dimension tables, that is

o   Filter tables and producing row sets from fact tables
o   Joining row sets with dimension tables
o   Aggregating the rows to produce result sets

But in case of vector transformation, it involves few steps “Vector Group by” and “Key Vector”. Key vector is a data structure that maps between dense join keys and dense grouping keys.

KEY VECTOR – does the following operation

A dense key is a numerical key that is stored as native integer, A dense join key represent all join keys whose columns come from Fact / dimension tables. A dense grouping keys represent all grouping keys whose columns come from fact / dimension tables.

Assume that “EMP” table has the following values on deptno column.

rajesh@ORA12C> select deptno from emp;

    DEPTNO
----------
        20
        30
        30
        20
        30
        30
        10
        20
        10
        30
        20
        30
        20
        10

14 rows selected.

rajesh@ORA12C>

Any complex analytic/aggregation queries that filter where deptno=20 to the EMP table. A Key vector for this filter may look like the following one dimensional array.

1
0
0
1
0
0
0
1
0
0
1
0
1
0

In the preceding results, 1 is the dense grouping key for deptno=20, the value 0 indicate rows in EMP table that do not match the filter.  Similarly for the query that uses filter like where deptno in (10,20) then array may look like as follows. Where 2 is the dense grouping key for deptno=20 and 1 is the dense grouping key for deptno=10.

2
0
0
2
0
0
1
2
1
0
2
0
2
1

VECTOR GROUP BY – Involves the two phases of In-memory aggregation. Process each fact and dimension table sequentially as follows.

Phase I - Process each dimension as (highlighted in explain plan as Brown color)
o   Find a unique dense grouping keys
o   Create a key vector
o   Create a temporary table

Phase II - Process each fact as (highlighted in explain plan as Blue color)
o   Process all joins and aggregation using the key vectors created in previous phase
o   Join back those results to each temporary table.

Here is a Test case to demonstrate this Vector transformation.

rajesh@ORA12C> select * from v$version;

BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production              0
PL/SQL Release 12.1.0.2.0 - Production                                                    0
CORE    12.1.0.2.0      Production                                                        0
TNS for 64-bit Windows: Version 12.1.0.2.0 - Production                                   0
NLSRTL Version 12.1.0.2.0 - Production                                                    0

5 rows selected.

rajesh@ORA12C> show parameter inmemory_size

NAME                                          TYPE        VALUE
--------------------------------------------- ----------- -------------------------
inmemory_size                                 big integer 152M
rajesh@ORA12C>
rajesh@ORA12C> select cdb from v$database ;

CDB
---
NO

1 row selected.

rajesh@ORA12C>
rajesh@ORA12C> create table jobs
  2  as
  3  select rownum as job_id,
  4     job_title,
  5     trunc(dbms_random.value(1,100)) as min_sal,
  6     trunc(dbms_random.value(200,300)) as max_sal
  7  from (     select distinct owner as job_title
  8  from all_objects ) ;

Table created.

rajesh@ORA12C>
rajesh@ORA12C> alter table jobs
  2  add constraint jobs_pk
  3  primary key(job_id);

Table altered.

rajesh@ORA12C>
rajesh@ORA12C> create table dept
  2  as
  3  select rownum as deptno,
  4     dept_name,
  5     trunc(dbms_random.value(1,100)) as manager_id,
  6     trunc(dbms_random.value(1,50)) as location_id
  7  from ( select distinct object_type as dept_name
  8     from all_objects) ;

Table created.

rajesh@ORA12C>
rajesh@ORA12C> alter table dept
  2  add constraint dept_pk
  3  primary key(deptno);

Table altered.

rajesh@ORA12C>
rajesh@ORA12C> create table emp
  2  partition by hash(emp_id)
  3  ( partition p1,
  4    partition p2,
  5    partition p3,
  6    partition p4 )
  7  as
  8  select rownum as emp_id,
  9     object_name as first_name,
 10     object_name as last_name,
 11     object_name||'@gmail.com' as email_id,
 12     object_id as phone_no,
 13     created as hire_date,
 14     trunc(dbms_random.value(1,32)) as job_id,
 15     trunc(dbms_random.value(1,5000)) as salary,
 16     trunc(dbms_random.value(1,3000)) as comm,
 17     rownum as manager_id ,
 18     trunc(dbms_random.value(1,39)) as deptno
 19  from all_objects;

Table created.

rajesh@ORA12C> declare
  2     l_empno number;
  3  begin
  4     select max(emp_id) into l_empno
  5     from emp;
  6
  7     execute immediate ' alter session force parallel dml parallel 4';
  8     for i in 1..7
  9     loop
 10             dbms_application_info.set_client_info('i ='||i);
 11             insert /*+ append */ into emp
 12             ( EMP_ID,FIRST_NAME,LAST_NAME,EMAIL_ID,
 13                     PHONE_NO,HIRE_DATE,JOB_ID,SALARY,COMM,MANAGER_ID,
 14                     DEPTNO )
 15             select l_empno+rownum,FIRST_NAME,LAST_NAME,EMAIL_ID,
 16                     PHONE_NO,HIRE_DATE,JOB_ID,SALARY,COMM,MANAGER_ID,
 17                     DEPTNO
 18             from emp ;
 19             l_empno := l_empno + sql%rowcount;
 20             commit;
 21     end loop;
 22     execute immediate ' alter session disable parallel dml';
 23  end;
 24  /

PL/SQL procedure successfully completed.

rajesh@ORA12C>
rajesh@ORA12C> begin
  2     dbms_stats.gather_table_stats(user,'emp',
  3             degree=>2,
  4             method_opt=>'for all columns size 254',
  5             no_invalidate=>false,
  6             cascade=>true );
  7  end;
  8  /

PL/SQL procedure successfully completed.

rajesh@ORA12C> select /*+ parallel(emp,4) */ count(*) from emp;

  COUNT(*)
----------
  11524480

1 row selected.

rajesh@ORA12C> select count(*) from dept;

  COUNT(*)
----------
        39

1 row selected.

rajesh@ORA12C> select count(*) from jobs;

  COUNT(*)
----------
        32

1 row selected.

rajesh@ORA12C>

For Key vector transformation to kick in we need to place them in In-memory storage

rajesh@ORA12C> alter table dept inmemory;

Table altered.

rajesh@ORA12C> alter table jobs inmemory;

Table altered.

rajesh@ORA12C> alter table emp inmemory
  2     no inmemory(emp_id,first_name,last_name,
  3             hire_date,email_id,
  4             phone_no,manager_id);

Table altered.

rajesh@ORA12C>
rajesh@ORA12C> column owner format a10
rajesh@ORA12C> column segment_name format a10
rajesh@ORA12C> column partition_name format a10
rajesh@ORA12C> select owner,segment_name,partition_name,inmemory_size
  2  from v$im_segments;

no rows selected

rajesh@ORA12C> alter table emp modify partition p1 inmemory;

Table altered.

rajesh@ORA12C> select count(*) from emp partition(p1);

  COUNT(*)
----------
   2880928

1 row selected.

rajesh@ORA12C>
rajesh@ORA12C> alter table emp modify partition p2 inmemory;

Table altered.

rajesh@ORA12C> select count(*) from emp partition(p2);

  COUNT(*)
----------
   2881300

1 row selected.

rajesh@ORA12C>
rajesh@ORA12C> alter table emp modify partition p3 inmemory;

Table altered.

rajesh@ORA12C> select count(*) from emp partition(p3);

  COUNT(*)
----------
   2883687

1 row selected.

rajesh@ORA12C>
rajesh@ORA12C> alter table emp modify partition p4 inmemory;

Table altered.

rajesh@ORA12C> select count(*) from emp partition(p4);

  COUNT(*)
----------
   2878565

1 row selected.

rajesh@ORA12C> select owner,segment_name,partition_name,inmemory_size
  2  from v$im_segments;

OWNER      SEGMENT_NA PARTITION_ INMEMORY_SIZE
---------- ---------- ---------- -------------
RAJESH     EMP        P4              23527424
RAJESH     EMP        P1              23527424
RAJESH     EMP        P3              23527424
RAJESH     EMP        P2              23527424

4 rows selected.

rajesh@ORA12C>

Here is the query against the typical fact and dimension tables.

rajesh@ORA12C> set autotrace traceonly explain
rajesh@ORA12C> select d.location_id, j.min_sal,j.max_sal,
  2        sum(e.salary),min(e.comm)
  3  from emp e, dept d, jobs j
  4  where e.deptno = d.deptno
  5  and e.job_id = j.job_id
  6  and d.dept_name in ('EDITION','SEQUENCE')
  7  and j.job_title in ('SYS','PUBLIC')
  8  group by d.location_id, j.min_sal,j.max_sal;

Execution Plan
----------------------------------------------------------
Plan hash value: 1749636783

------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name               | Rows  | Cost (%CPU)|Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                    |     3 |  2579   (9)|      |       |
|   1 |  TEMP TABLE TRANSFORMATION          |                    |       |            |      |       |
|   2 |   LOAD AS SELECT                    | SYS_TEMP_0FD9D666B_|       |            |      |       |
|   3 |    VECTOR GROUP BY                  |                    |     2 |     2  (50)|      |       |
|   4 |     KEY VECTOR CREATE BUFFERED      | :KV0000            |     2 |     2  (50)|      |       |
|*  5 |      TABLE ACCESS INMEMORY FULL     | DEPT               |     2 |     1   (0)|      |       |
|   6 |   LOAD AS SELECT                    | SYS_TEMP_0FD9D666C_|       |            |      |       |
|   7 |    VECTOR GROUP BY                  |                    |     2 |     2  (50)|      |       |
|   8 |     KEY VECTOR CREATE BUFFERED      | :KV0001            |     2 |     2  (50)|      |       |
|*  9 |      TABLE ACCESS INMEMORY FULL     | JOBS               |     2 |     1   (0)|      |       |
|  10 |   HASH GROUP BY                     |                    |     3 |  2576   (9)|      |       |
|* 11 |    HASH JOIN                        |                    |     3 |  2575   (8)|      |       |
|  12 |     MERGE JOIN CARTESIAN            |                    |     4 |     4   (0)|      |       |
|  13 |      TABLE ACCESS FULL              | SYS_TEMP_0FD9D666C_|     2 |     2   (0)|      |       |
|  14 |      BUFFER SORT                    |                    |     2 |     2   (0)|      |       |
|  15 |       TABLE ACCESS FULL             | SYS_TEMP_0FD9D666B_|     2 |     1   (0)|      |       |
|  16 |     VIEW                            | VW_VT_38F5D95B     |     3 |  2571   (9)|      |       |
|  17 |      VECTOR GROUP BY                |                    |     3 |  2571   (9)|      |       |
|  18 |       HASH GROUP BY                 |                    |     3 |  2571   (9)|      |       |
|  19 |        KEY VECTOR USE               | :KV0001            | 39131 |  2571   (9)|      |       |
|  20 |         KEY VECTOR USE              | :KV0000            |   606K|  2571   (9)|      |       |
|  21 |          PARTITION HASH ALL         |                    |    11M|  2567   (8)|    1 |     4 |
|* 22 |           TABLE ACCESS INMEMORY FULL| EMP                |    11M|  2567   (8)|    1 |     4 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   5 - inmemory("D"."DEPT_NAME"='EDITION' OR "D"."DEPT_NAME"='SEQUENCE')
       filter("D"."DEPT_NAME"='EDITION' OR "D"."DEPT_NAME"='SEQUENCE')
   9 - inmemory("J"."JOB_TITLE"='PUBLIC' OR "J"."JOB_TITLE"='SYS')
       filter("J"."JOB_TITLE"='PUBLIC' OR "J"."JOB_TITLE"='SYS')
  11 - access("ITEM_12"=INTERNAL_FUNCTION("C0") AND "ITEM_13"="C2" AND "ITEM_10"=INTERNAL_FUNCTION("C0") AND
              "ITEM_11"="C3")
  22 - inmemory(SYS_OP_KEY_VECTOR_FILTER("E"."DEPTNO",:KV0000) AND SYS_OP_KEY_VECTOR_FILTER("E"."JOB_ID",:KV0001))
       filter(SYS_OP_KEY_VECTOR_FILTER("E"."DEPTNO",:KV0000) AND SYS_OP_KEY_VECTOR_FILTER("E"."JOB_ID",:KV0001))

Note
-----
   - vector transformation used for this statement

rajesh@ORA12C>

Here is the TKPROF results for this transformation.

select d.location_id, j.min_sal,j.max_sal,
      sum(e.salary),min(e.comm)
from emp e, dept d, jobs j
where e.deptno = d.deptno
and e.job_id = j.job_id
and d.dept_name in ('EDITION','SEQUENCE')
and j.job_title in ('SYS','PUBLIC')
group by d.location_id, j.min_sal,j.max_sal

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          0          0          0           0
Execute      1      0.00       0.00          0          6         14           0
Fetch        2      0.04       0.06          2         22          2           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.04       0.08          2         28         16           4


By, disabling the Vector transformation using Hints, we see this. A slightly more CPU utilization.


select /*+ no_vector_transform */ d.location_id, j.min_sal,j.max_sal,
      sum(e.salary),min(e.comm)
from emp e, dept d, jobs j
where e.deptno = d.deptno
and e.job_id = j.job_id
and d.dept_name in ('EDITION','SEQUENCE')
and j.job_title in ('SYS','PUBLIC')
group by d.location_id, j.min_sal,j.max_sal

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.15       0.15          0         18          0           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.17       0.16          0         18          0           4


By, disabling the Vector transformation and In-memory using Hints, we see this. A slightly even more CPU utilization and lots of disk reads (Physical IO’s)

select /*+ no_vector_transform no_inmemory(e) no_inmemory(d) no_inmemory(j) */
              d.location_id, j.min_sal,j.max_sal,
      sum(e.salary),min(e.comm)
from emp e, dept d, jobs j
where e.deptno = d.deptno
and e.job_id = j.job_id
and d.dept_name in ('EDITION','SEQUENCE')
and j.job_title in ('SYS','PUBLIC')
group by d.location_id, j.min_sal,j.max_sal

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      2.23      15.83     203769     203818          0           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      2.24      15.84     203769     203818          0           4

1 comment:

  1. Thank you so much sir for such a good article. Hope to read many more articles like this from your blog.

    ReplyDelete