When a transactional application updates data we store it in a
structure that is close to what we insert: one table per entity and relational
integrity among them. And we query usually in the same way, joining few rows
from several tables.
But when a database is dedicated to query, and query in BI often
involve lots of rows, we prefer to store them close to the way we want to
retrieve. Put all the measures that are related to same information (same
granularity) into a FACT table. And around that table with lots of rows we put
into smaller table with all the information need for analysis – known as
DIMENSIONS
This is the STAR schema that we prefer to call a “dimensional” model.
A query on star schema involves:
·
Several predicates on dimension attributes –
columns on dimension tables (tables are either small or medium)
·
One or several measures to be retrieved – they
are columns in fact table, that has many rows and we usually need to read many
rows and aggregate them.
·
Additional information from dimension tables for
display purpose.
So basically, we built a schema with
·
One FACT table that has minimum of columns (only
the dimension key and measures)
·
Several dimension table that has the keys,
columns where we will have predicates on, and all other information – they can
be large, because they don’t have huge number of rows and can be de-normalized,
because they are static.
·
The dimension keys on the FACT tables are
declared as foreign key to dimension tables.
·
Each dimension key on the FACT table has bitmap
index on it – so that all the predicate results can be merged quickly, before
going to the large FACT table.
demo@PDB1> create table
dim1
2 as
3
select rownum as dim1_id ,
4 Initcap( to_char( to_date(level,'j'),'jsp') )
dim1_cod,
5
rpad('*',800,'*') dim1_text
6 from
dual
7
connect by level <= 10;
Table created.
demo@PDB1> create table
dim2
2 as
3
select rownum as dim2_id ,
4
Initcap( to_char( to_date(level,'j'),'jsp') ) dim2_cod,
5
rpad('*',800,'*') dim2_text
6 from
dual
7
connect by level <= 10;
Table created.
demo@PDB1> create table
dim3
2 as
3
select rownum as dim3_id ,
4
Initcap( to_char( to_date(level,'j'),'jsp') ) dim3_cod,
5
rpad('*',800,'*') dim3_text
6 from
dual
7
connect by level <= 10;
Table created.
Those are our three
dimension table each having 10 rows.
demo@PDB1> create table
fact as
2 select
rownum as fact_id,
3
dim1_id, dim2_id, dim3_id, mod(rownum,1000)/10 as fact_measure
4 from
dim1,dim2,dim3,( select level
5
from dual
6
connect by level <= 10000 );
Table created.
This is our FACT
table and we have 10K row for each combination of dimensions. Therefore, 10M
rows in total.
demo@PDB1> alter table
dim1 add constraint dim1_pk primary key(dim1_id);
Table altered.
demo@PDB1> alter table
dim2 add constraint dim2_pk primary key(dim2_id);
Table altered.
demo@PDB1> alter table
dim3 add constraint dim3_pk primary key(dim3_id);
Table altered.
demo@PDB1> alter table
fact add constraint fact_fk1 foreign key(dim1_id)
2
references dim1;
Table altered.
demo@PDB1> alter table
fact add constraint fact_fk2 foreign key(dim2_id)
2
references dim1;
Table altered.
demo@PDB1> alter table
fact add constraint fact_fk3 foreign key(dim3_id)
2
references dim1;
Table altered.
demo@PDB1> create bitmap
index fact_dim1 on fact(dim1_id);
Index created.
demo@PDB1> create bitmap
index fact_dim2 on fact(dim2_id);
Index created.
demo@PDB1> create bitmap
index fact_dim3 on fact(dim3_id);
Index created.
demo@PDB1>
now we have defined
the relationship between these tables along with Bitmap indexes on FK columns.
Now it is time to
check the execution plan for this SQL
demo@PDB1> $ type
script.sql
set termout off
select *
from fact
join dim1 on (fact.dim1_id =
dim1.dim1_id )
join dim2 on (fact.dim2_id =
dim2.dim2_id )
join dim3 on (fact.dim3_id =
dim3.dim3_id )
where dim1.dim1_cod = 'One'
and dim2.dim2_cod = 'One'
and dim3.dim3_cod = 'One';
set termout on
select * from table(
dbms_xplan.display_cursor(format=>'allstats last +adaptive') );
Here is the
execution plan when we leave the parameter “star_transformation_enabled” to its
default value.
demo@PDB1> show parameter
star_trans
NAME TYPE VALUE
------------------------------------
----------- --------
star_transformation_enabled string FALSE
demo@PDB1> @script.sql
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------
SQL_ID bwx1y7bh32t8v, child number 0
-------------------------------------
select *
from fact join dim1 on
(fact.dim1_id = dim1.dim1_id ) join
dim2 on (fact.dim2_id =
dim2.dim2_id ) join dim3 on
(fact.dim3_id =
dim3.dim3_id ) where dim1.dim1_cod = 'One' and dim2.dim2_cod = 'One'
and dim3.dim3_cod = 'One'
Plan hash value: 3368458141
---------------------------------------------------------------------------------
| Id |
Operation |
Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
| 10000 |
|- * 1 |
HASH JOIN
| | 1 |
10000 | 10000 |
| 2 |
NESTED LOOPS
| | 1 |
10000 | 10000 |
| 3 |
NESTED LOOPS | | 1 |
10000 | 10000 |
|- 4 |
STATISTICS COLLECTOR | | 1 |
| 1 |
| 5 |
MERGE JOIN CARTESIAN | | 1 |
1 | 1 |
| 6 |
MERGE JOIN CARTESIAN | | 1 |
1 | 1 |
| * 7
| TABLE ACCESS FULL | DIM1 |
1 | 1 | 1 |
| 8 |
BUFFER SORT | | 1 |
1 | 1 |
| * 9
| TABLE ACCESS FULL | DIM2 |
1 | 1 | 1 |
| 10 |
BUFFER SORT | | 1 |
1 | 1 |
| * 11 |
TABLE ACCESS FULL |
DIM3 | 1 |
1 | 1 |
| 12 |
BITMAP CONVERSION TO ROWIDS|
| 1 | |
10000 |
| 13 |
BITMAP AND | | 1 |
| 4 |
| * 14 |
BITMAP INDEX SINGLE VALUE| FACT_DIM1 | 1 |
| 52 |
| * 15 |
BITMAP INDEX SINGLE VALUE| FACT_DIM2 | 1 |
| 119 |
| * 16 |
BITMAP INDEX SINGLE VALUE| FACT_DIM3 | 1 |
| 394 |
| 17 |
TABLE ACCESS BY INDEX ROWID | FACT
| 10000 | 10000 |
10000 |
|- 18 |
TABLE ACCESS FULL |
FACT | 0 |
10000 | 0 |
---------------------------------------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
1 -
access("FACT"."DIM3_ID"="DIM3"."DIM3_ID"
AND
"FACT"."DIM2_ID"="DIM2"."DIM2_ID"
AND
"FACT"."DIM1_ID"="DIM1"."DIM1_ID")
7 -
filter("DIM1"."DIM1_COD"='One')
9 -
filter("DIM2"."DIM2_COD"='One')
11 -
filter("DIM3"."DIM3_COD"='One')
14 -
access("FACT"."DIM1_ID"="DIM1"."DIM1_ID")
15 -
access("FACT"."DIM2_ID"="DIM2"."DIM2_ID")
16 -
access("FACT"."DIM3_ID"="DIM3"."DIM3_ID")
Note
-----
- this is an adaptive plan (rows marked '-'
are inactive)
That’s a long plan,
but not so complex – Here is what it does.
·
First it reads all the dimensions
(DIM1,DIM2,DIM3 – each filtered on its own predicate) and does a Cartesian join
to get all the combination that are allowed by our predicates – this resultset
has the dimension keys to get to the FACT and also has the additional
information we need for the final result.
·
Then the STATISTICS COLLECTOR operation –
introduced in 12c as part of Adaptive plans – will decide on which subplans to
choose.
·
If the number of combinations is not too large
it will do a NESTED LOOP, for each combination, we get to the matching rows
from FACT table. This is done through bitmap index: for each dimension key the
corresponding bitmap index is accessed (BITMAP INDEX SINGLE VALUE) , giving a
bitmap of rows which are merged (BITMAP AND) and converted to ROWID, then with
those rowid we access to the FACT table.
·
If the number of combinations is large, then it
is better to do a full scan on FACT table and do the join with the dimension
combination through a HASH join.
More to come.