Vector Transformation is a new feature introduced with Oracle 12c
(12.1.0.2) to Optimize the CPU usage for the queries that aggregate the results
of join between small tables (dimension table) and large table (fact table)
In a typical warehouse queries aggregate from a fact table, and joins
fact with one or more dimension tables, that is
o
Filter tables and producing row sets from fact
tables
o
Joining row sets with dimension tables
o
Aggregating the rows to produce result sets
But in case of vector transformation, it involves few steps “Vector
Group by” and “Key Vector”. Key vector is a data structure that maps between
dense join keys and dense grouping keys.
KEY VECTOR – does the following operation
A dense key is a numerical key that is stored as native integer, A
dense join key represent all join keys whose columns come from Fact / dimension
tables. A dense grouping keys represent all grouping keys whose columns come
from fact / dimension tables.
Assume that “EMP” table has the following values on deptno column.
rajesh@ORA12C> select deptno from emp;
DEPTNO
----------
20
30
30
20
30
30
10
20
10
30
20
30
20
10
14 rows selected.
rajesh@ORA12C>
Any
complex analytic/aggregation queries that filter where deptno=20 to the EMP
table. A Key vector for this filter may look like the following one dimensional
array.
1
0
0
1
0
0
0
1
0
0
1
0
1
0
In
the preceding results, 1 is the dense grouping key for deptno=20, the value 0
indicate rows in EMP table that do not match the filter. Similarly for the query that uses filter like
where deptno in (10,20) then array may look like as follows. Where 2 is the
dense grouping key for deptno=20 and 1 is the dense grouping key for deptno=10.
2
0
0
2
0
0
1
2
1
0
2
0
2
1
VECTOR
GROUP BY – Involves the two phases of In-memory aggregation. Process each fact
and dimension table sequentially as follows.
Phase
I - Process each dimension as (highlighted in explain plan as Brown color)
o
Find a unique dense grouping keys
o
Create a key vector
o
Create a temporary table
Phase
II - Process each fact as (highlighted in explain plan as Blue color)
o
Process all joins and aggregation using the key
vectors created in previous phase
o
Join back those results to each temporary table.
Here
is a Test case to demonstrate this Vector transformation.
rajesh@ORA12C> select * from v$version;
BANNER
CON_ID
--------------------------------------------------------------------------------
----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 -
64bit Production 0
PL/SQL Release 12.1.0.2.0 - Production 0
CORE 12.1.0.2.0 Production
0
TNS for 64-bit Windows: Version 12.1.0.2.0 - Production 0
NLSRTL Version 12.1.0.2.0 - Production 0
5 rows selected.
rajesh@ORA12C> show parameter inmemory_size
NAME
TYPE VALUE
--------------------------------------------- -----------
-------------------------
inmemory_size big integer
152M
rajesh@ORA12C>
rajesh@ORA12C> select cdb from v$database ;
CDB
---
NO
1 row selected.
rajesh@ORA12C>
rajesh@ORA12C> create table jobs
2 as
3 select rownum as job_id,
4 job_title,
5 trunc(dbms_random.value(1,100)) as
min_sal,
6 trunc(dbms_random.value(200,300)) as
max_sal
7 from (
select distinct owner as job_title
8 from all_objects ) ;
Table created.
rajesh@ORA12C>
rajesh@ORA12C> alter table jobs
2 add constraint jobs_pk
3 primary key(job_id);
Table altered.
rajesh@ORA12C>
rajesh@ORA12C> create table dept
2 as
3 select rownum as deptno,
4 dept_name,
5 trunc(dbms_random.value(1,100)) as
manager_id,
6 trunc(dbms_random.value(1,50)) as
location_id
7 from ( select distinct object_type as
dept_name
8 from all_objects) ;
Table created.
rajesh@ORA12C>
rajesh@ORA12C> alter table dept
2 add constraint dept_pk
3 primary key(deptno);
Table altered.
rajesh@ORA12C>
rajesh@ORA12C> create table emp
2 partition by hash(emp_id)
3 ( partition p1,
4 partition p2,
5 partition p3,
6 partition p4 )
7 as
8 select rownum as emp_id,
9 object_name as first_name,
10 object_name as last_name,
11 object_name||'@gmail.com' as email_id,
12 object_id as phone_no,
13 created as hire_date,
14 trunc(dbms_random.value(1,32)) as job_id,
15 trunc(dbms_random.value(1,5000)) as
salary,
16 trunc(dbms_random.value(1,3000)) as comm,
17 rownum as manager_id ,
18 trunc(dbms_random.value(1,39)) as deptno
19 from all_objects;
Table created.
rajesh@ORA12C> declare
2 l_empno number;
3 begin
4 select max(emp_id) into l_empno
5 from emp;
6
7 execute immediate ' alter session force
parallel dml parallel 4';
8 for i in 1..7
9 loop
10
dbms_application_info.set_client_info('i ='||i);
11 insert /*+ append */ into emp
12 ( EMP_ID,FIRST_NAME,LAST_NAME,EMAIL_ID,
13
PHONE_NO,HIRE_DATE,JOB_ID,SALARY,COMM,MANAGER_ID,
14 DEPTNO )
15 select
l_empno+rownum,FIRST_NAME,LAST_NAME,EMAIL_ID,
16
PHONE_NO,HIRE_DATE,JOB_ID,SALARY,COMM,MANAGER_ID,
17 DEPTNO
18 from emp ;
19 l_empno := l_empno + sql%rowcount;
20 commit;
21 end loop;
22 execute immediate ' alter session disable
parallel dml';
23 end;
24 /
PL/SQL procedure successfully completed.
rajesh@ORA12C>
rajesh@ORA12C> begin
2 dbms_stats.gather_table_stats(user,'emp',
3 degree=>2,
4 method_opt=>'for all columns
size 254',
5 no_invalidate=>false,
6 cascade=>true );
7 end;
8 /
PL/SQL procedure successfully completed.
rajesh@ORA12C> select /*+ parallel(emp,4) */ count(*) from
emp;
COUNT(*)
----------
11524480
1 row selected.
rajesh@ORA12C> select count(*) from dept;
COUNT(*)
----------
39
1 row selected.
rajesh@ORA12C> select count(*) from jobs;
COUNT(*)
----------
32
1 row selected.
rajesh@ORA12C>
For
Key vector transformation to kick in we need to place them in In-memory storage
rajesh@ORA12C> alter table dept inmemory;
Table altered.
rajesh@ORA12C> alter table jobs inmemory;
Table altered.
rajesh@ORA12C> alter table emp inmemory
2 no inmemory(emp_id,first_name,last_name,
3 hire_date,email_id,
4 phone_no,manager_id);
Table altered.
rajesh@ORA12C>
rajesh@ORA12C> column owner format a10
rajesh@ORA12C> column segment_name format a10
rajesh@ORA12C> column partition_name format a10
rajesh@ORA12C> select
owner,segment_name,partition_name,inmemory_size
2 from v$im_segments;
no rows selected
rajesh@ORA12C> alter table emp modify partition p1 inmemory;
Table altered.
rajesh@ORA12C> select count(*) from emp partition(p1);
COUNT(*)
----------
2880928
1 row selected.
rajesh@ORA12C>
rajesh@ORA12C> alter table emp modify partition p2 inmemory;
Table altered.
rajesh@ORA12C> select count(*) from emp partition(p2);
COUNT(*)
----------
2881300
1 row selected.
rajesh@ORA12C>
rajesh@ORA12C> alter table emp modify partition p3 inmemory;
Table altered.
rajesh@ORA12C> select count(*) from emp partition(p3);
COUNT(*)
----------
2883687
1 row selected.
rajesh@ORA12C>
rajesh@ORA12C> alter table emp modify partition p4 inmemory;
Table altered.
rajesh@ORA12C> select count(*) from emp partition(p4);
COUNT(*)
----------
2878565
1 row selected.
rajesh@ORA12C> select
owner,segment_name,partition_name,inmemory_size
2 from v$im_segments;
OWNER SEGMENT_NA
PARTITION_ INMEMORY_SIZE
---------- ---------- ---------- -------------
RAJESH EMP P4 23527424
RAJESH EMP P1 23527424
RAJESH EMP P3 23527424
RAJESH EMP P2 23527424
4 rows selected.
rajesh@ORA12C>
Here is the query against the typical fact and dimension tables.
rajesh@ORA12C> set autotrace traceonly explain
rajesh@ORA12C> select d.location_id, j.min_sal,j.max_sal,
2 sum(e.salary),min(e.comm)
3 from emp e, dept d, jobs j
4 where e.deptno = d.deptno
5 and e.job_id = j.job_id
6 and d.dept_name in ('EDITION','SEQUENCE')
7 and j.job_title in ('SYS','PUBLIC')
8 group by d.location_id, j.min_sal,j.max_sal;
Execution Plan
----------------------------------------------------------
Plan hash value: 1749636783
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)|Pstart| Pstop |
------------------------------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | | 3 |
2579 (9)| |
|
| 1 |
TEMP TABLE TRANSFORMATION
| | | | |
|
| 2 |
LOAD AS SELECT
| SYS_TEMP_0FD9D666B_| | | |
|
| 3 |
VECTOR GROUP BY
| | 2 |
2 (50)| |
|
| 4 |
KEY VECTOR CREATE BUFFERED |
:KV0000 | 2 |
2 (50)| |
|
|* 5 |
TABLE ACCESS INMEMORY FULL |
DEPT | 2 |
1 (0)| |
|
| 6 |
LOAD AS SELECT
| SYS_TEMP_0FD9D666C_|
| | |
|
| 7 |
VECTOR GROUP BY
| | 2 |
2 (50)| |
|
| 8 |
KEY VECTOR CREATE BUFFERED |
:KV0001 | 2 |
2 (50)| |
|
|* 9 |
TABLE ACCESS INMEMORY FULL |
JOBS | 2 |
1 (0)| |
|
| 10 | HASH GROUP BY | | 3 |
2576 (9)| |
|
|* 11 | HASH JOIN | | 3 |
2575 (8)| |
|
| 12 | MERGE JOIN CARTESIAN | | 4 |
4 (0)| |
|
| 13 | TABLE ACCESS FULL | SYS_TEMP_0FD9D666C_| 2 |
2 (0)| |
|
| 14 | BUFFER SORT | | 2 |
2 (0)| |
|
| 15 | TABLE ACCESS FULL | SYS_TEMP_0FD9D666B_| 2 |
1 (0)| |
|
| 16 | VIEW |
VW_VT_38F5D95B | 3 |
2571 (9)| |
|
| 17 | VECTOR GROUP BY | | 3 |
2571 (9)| |
|
| 18 | HASH GROUP BY | | 3 |
2571 (9)| |
|
| 19 | KEY VECTOR USE | :KV0001 | 39131 | 2571
(9)| | |
| 20 | KEY VECTOR USE | :KV0000 |
606K| 2571 (9)|
| |
| 21 | PARTITION HASH ALL | | 11M|
2567 (8)| 1 |
4 |
|* 22 | TABLE
ACCESS INMEMORY FULL| EMP
| 11M| 2567
(8)| 1 | 4 |
------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 -
inmemory("D"."DEPT_NAME"='EDITION' OR
"D"."DEPT_NAME"='SEQUENCE')
filter("D"."DEPT_NAME"='EDITION' OR
"D"."DEPT_NAME"='SEQUENCE')
9 -
inmemory("J"."JOB_TITLE"='PUBLIC' OR
"J"."JOB_TITLE"='SYS')
filter("J"."JOB_TITLE"='PUBLIC' OR
"J"."JOB_TITLE"='SYS')
11 -
access("ITEM_12"=INTERNAL_FUNCTION("C0") AND
"ITEM_13"="C2" AND
"ITEM_10"=INTERNAL_FUNCTION("C0") AND
"ITEM_11"="C3")
22 -
inmemory(SYS_OP_KEY_VECTOR_FILTER("E"."DEPTNO",:KV0000) AND
SYS_OP_KEY_VECTOR_FILTER("E"."JOB_ID",:KV0001))
filter(SYS_OP_KEY_VECTOR_FILTER("E"."DEPTNO",:KV0000)
AND SYS_OP_KEY_VECTOR_FILTER("E"."JOB_ID",:KV0001))
Note
-----
- vector
transformation used for this statement
rajesh@ORA12C>
Here
is the TKPROF results for this transformation.
select d.location_id, j.min_sal,j.max_sal,
sum(e.salary),min(e.comm)
from emp e, dept d, jobs j
where e.deptno = d.deptno
and e.job_id = j.job_id
and d.dept_name in ('EDITION','SEQUENCE')
and j.job_title in ('SYS','PUBLIC')
group by d.location_id, j.min_sal,j.max_sal
call count cpu
elapsed disk query
current rows
------- ------ --------
---------- ---------- ---------- ----------
----------
Parse 1 0.00
0.02 0 0 0 0
Execute 1 0.00
0.00 0 6 14 0
Fetch 2 0.04
0.06 2 22 2 4
------- ------ --------
---------- ---------- ---------- ----------
----------
total 4 0.04
0.08 2 28 16 4
By,
disabling the Vector transformation using Hints, we see this. A slightly more
CPU utilization.
select /*+ no_vector_transform */ d.location_id,
j.min_sal,j.max_sal,
sum(e.salary),min(e.comm)
from emp e, dept d, jobs j
where e.deptno = d.deptno
and e.job_id = j.job_id
and d.dept_name in ('EDITION','SEQUENCE')
and j.job_title in ('SYS','PUBLIC')
group by d.location_id, j.min_sal,j.max_sal
call count cpu
elapsed disk query
current rows
------- ------ --------
---------- ---------- ---------- ----------
----------
Parse 1 0.01
0.01 0 0
0 0
Execute 1 0.00
0.00 0 0 0 0
Fetch 2 0.15
0.15 0 18 0 4
------- ------ --------
---------- ---------- ---------- ----------
----------
total 4 0.17 0.16 0 18 0 4
By,
disabling the Vector transformation and In-memory using Hints, we see this. A
slightly even more CPU utilization and lots of disk reads (Physical IO’s)
select /*+ no_vector_transform
no_inmemory(e) no_inmemory(d) no_inmemory(j) */
d.location_id,
j.min_sal,j.max_sal,
sum(e.salary),min(e.comm)
from emp e, dept d, jobs j
where e.deptno = d.deptno
and e.job_id = j.job_id
and d.dept_name in ('EDITION','SEQUENCE')
and j.job_title in ('SYS','PUBLIC')
group by d.location_id, j.min_sal,j.max_sal
call count cpu
elapsed disk query
current rows
------- ------ --------
---------- ---------- ---------- ----------
----------
Parse 1 0.01
0.00 0 0 0 0
Execute 1 0.00
0.00 0 0 0 0
Fetch 2 2.23
15.83 203769 203818 0 4
------- ------ --------
---------- ---------- ---------- ----------
----------
total 4 2.24 15.84
203769
203818 0 4
Thank you so much sir for such a good article. Hope to read many more articles like this from your blog.
ReplyDelete