Wednesday, January 28, 2015

Attribute clustering


Attribute clustering is a feature that’s new to Oracle Database 12.1.0.2 (Enterprise Edition). It was designed to work with other features, such as compression, storage indexes, and especially with a new feature called zone maps (Note zone maps don’t work on non-Exadata storage)

So what is attribute clustering? It is simply a table property - just like compression - but it defines how rows should be ordered and clustered together in close physical proximity, based on one or more column values. For example, in a sales transaction table you could choose to cluster together rows that share common customer ID values. Perhaps the system frequently queries data related to particular customers.  The physical database design will probably incorporate an index on customer_id column, but you can gain further benefit if all rows for a particular customer are physically next to each other.



With attribute clustering, the matching sales table rows are near to one another, so it is likely that fewer database blocks will need to be read from storage (or database cache) than if the rows are scattered throughout the sales table. The reason for this is that database blocks will usually contain multiple rows, so it is beneficial if each block we read happens to contains multiple matching rows

Many DBAs have used a similar trick in the past by ordering rows as they are loaded into the database (using an explicit SQL “ORDER BY” clause). Attribute clustering has the advantage of being transparent and a property of the table itself; clustering behavior is inherited from the table definition and is implemented automatically. Just like compression, attribute clustering is a directive that transparently kicks in for certain operations, namely direct path insertion or data movement.

Here’s an example of using attribute clustering to speed up a query. We'll compare before and after; so start by creating a table that is not attribute clustered:

rajesh@PDB1> create table t
  2  as
  3  select a.* ,
  4     mod(rownum,10000)+1 as cust_id
  5  from all_objects a ,
  6     all_users
  7  where rownum <=1e6; 

Table created. 

rajesh@PDB1> create index t_idx on t(cust_id); 

Index created. 

rajesh@PDB1> set autotrace traceonly explain statistics
rajesh@PDB1> select sum(object_id)
  2  from t
  3  where cust_id = 55 ; 

1 row selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1339972470

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |     1 |     9 |   103   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |       |     1 |     9 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T     |   100 |   900 |   103   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T_IDX |   100 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------- 

Predicate Information (identified by operation id):
--------------------------------------------------- 

   3 - access("CUST_ID"=55)
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        103  consistent gets
          0  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed 

rajesh@PDB1> set autotrace off


Attribute clustering is a property of the table, so when it is added, existing rows are not re-ordered. The following command is very fast because it just makes a change to the data dictionary:


rajesh@PDB1> alter table t
  2  add clustering by linear order(cust_id)
  3  without materialized zonemap; 

Table altered. 

rajesh@PDB1>

Now we can physically cluster the table data by moving the table.

rajesh@PDB1> alter table t move; 

Table altered. 

rajesh@PDB1> alter index t_idx rebuild nologging; 

Index altered. 

rajesh@PDB1>


Now running the same query, we see this:

rajesh@PDB1> set autotrace traceonly explain statistics
rajesh@PDB1> select sum(object_id)
  2  from t
  3  where cust_id = 55 ; 

1 row selected.
 

Execution Plan
----------------------------------------------------------
Plan hash value: 1339972470 

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |     1 |     9 |     5   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE                      |       |     1 |     9 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T     |   100 |   900 |     5   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T_IDX |   100 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
 

Predicate Information (identified by operation id):
--------------------------------------------------- 

   3 - access("CUST_ID"=55)
 

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        552  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed 

rajesh@PDB1> set autotrace off
rajesh@PDB1>

No comments:

Post a Comment