Attribute clustering is a feature that’s new to Oracle
Database 12.1.0.2 (Enterprise Edition). It was designed to work with other
features, such as compression, storage indexes, and especially with a new
feature called zone maps (Note zone maps don’t work on non-Exadata storage)
So what is attribute clustering? It is simply a table
property - just like compression - but it defines how rows should be ordered
and clustered together in close physical proximity, based on one or more column
values. For example, in a sales transaction table you could choose to cluster
together rows that share common customer ID values. Perhaps the system
frequently queries data related to particular customers. The physical database design will probably
incorporate an index on customer_id column, but you can gain further benefit if
all rows for a particular customer are physically next to each other.
With attribute clustering, the matching sales table rows are
near to one another, so it is likely that fewer database blocks will need to be
read from storage (or database cache) than if the rows are scattered throughout
the sales table. The reason for this is that database blocks will usually
contain multiple rows, so it is beneficial if each block we read happens to
contains multiple matching rows
Many DBAs have used a similar trick in the past by ordering
rows as they are loaded into the database (using an explicit SQL “ORDER BY”
clause). Attribute clustering has the advantage of being transparent and a
property of the table itself; clustering behavior is inherited from the table
definition and is implemented automatically. Just like compression, attribute
clustering is a directive that transparently kicks in for certain operations,
namely direct path insertion or data movement.
Here’s an example of using attribute clustering to speed up
a query. We'll compare before and after; so start by creating a table that is
not attribute clustered:
rajesh@PDB1> create table t
2 as
3 select a.* ,
4 mod(rownum,10000)+1 as cust_id
5 from all_objects a ,
6 all_users
7 where rownum <=1e6;
Table created.
rajesh@PDB1> create index t_idx on t(cust_id);
Index created.
rajesh@PDB1> set autotrace traceonly explain statistics
rajesh@PDB1> select sum(object_id)
2 from t
3 where cust_id = 55 ;
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1339972470
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
1 | 9 | 103
(0)| 00:00:01 |
| 1 | SORT AGGREGATE | |
1 | 9 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T |
100 | 900 | 103
(0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_IDX | 100 |
| 3
(0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 -
access("CUST_ID"=55)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
103 consistent gets
0 physical reads
0 redo size
551 bytes sent via SQL*Net to client
552 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
rajesh@PDB1> set autotrace off
Attribute clustering is a property of the table, so when it is added,
existing rows are not re-ordered. The following command is very fast because it
just makes a change to the data dictionary:
rajesh@PDB1> alter table t
2 add clustering by linear order(cust_id)
3 without materialized zonemap;
Table altered.
rajesh@PDB1>
Now we can physically cluster the table data by moving the
table.
rajesh@PDB1> alter table t move;
Table altered.
rajesh@PDB1> alter index t_idx rebuild nologging;
Index altered.
rajesh@PDB1>
Now running the same query, we see this:
rajesh@PDB1> set autotrace traceonly explain statistics
rajesh@PDB1> select sum(object_id)
2 from t
3 where cust_id = 55 ;
1 row selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1339972470
----------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
1 | 9 | 5
(0)| 00:00:01 |
| 1 | SORT AGGREGATE | |
1 | 9 | | |
| 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T |
100 | 900 | 5
(0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | T_IDX | 100 |
| 3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 -
access("CUST_ID"=55)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
6 consistent gets
0 physical reads
0 redo size
551 bytes sent via SQL*Net to client
552 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
rajesh@PDB1> set autotrace off
rajesh@PDB1>