THE RAJESH BLOG: Clustering factor

Learnt something newly from Oracle product documentation, its about Clustering factor. Its very crystal clear here

If the value is near the number of blocks, then the table is very well ordered. In this case, the index entries in a single leaf block tend to point to rows in the same data blocks.
If the value is near the number of rows, then the table is very randomly ordered. In this case, it is unlikely that index entries in the same leaf block point to rows in the same data blocks.

rajesh@10GR2> create table t
2 nologging
3 as
4 select rownum as id,
5        object_name as data
6 from all_objects;

Table created.

Elapsed: 00:00:08.87
rajesh@10GR2>
rajesh@10GR2> alter table t add constraint t_pk primary key(id);

Table altered.

Elapsed: 00:00:00.48
rajesh@10GR2> create table t1
2 nologging
3 as
4 select rownum as id,
5        object_name as data
6 from all_objects
7 order by dbms_random.random;

Table created.

Elapsed: 00:00:07.43
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> alter table t1 add constraint t1_pk primary key(id);

Table altered.

Elapsed: 00:00:00.43

The table 'T' and 'T1' are identical except for the physical order of their rows on disk, One is sorted by primary key and the other is not. The optimizer is aware of this via the Clustering factor

rajesh@10GR2> begin
2    dbms_stats.gather_table_stats(ownname=>user,tabname=>'T',estimate_percent=>100,method_opt=>'for all indexed columns size auto',cascade=>true);
3    dbms_stats.gather_table_stats(ownname=>user,tabname=>'T1',estimate_percent=>100,method_opt=>'for all indexed columns size auto',cascade=>true);
4 end;
5 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.78
rajesh@10GR2>
rajesh@10GR2>
rajesh@10GR2> analyze index t_pk validate structure;

Index analyzed.

Elapsed: 00:00:00.09
rajesh@10GR2>
rajesh@10GR2> select i.name, i.blocks,u.num_rows,u.clustering_factor
2 from index_stats i,
3       user_indexes u
4 where u.index_name = i.name;

NAME                               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- ---------- -----------------
T_PK                                  128      56392               261

Elapsed: 00:00:00.23
rajesh@10GR2>
rajesh@10GR2> analyze index t1_pk validate structure;

Index analyzed.

Elapsed: 00:00:00.07
rajesh@10GR2>
rajesh@10GR2> select i.name, i.blocks,u.num_rows,u.clustering_factor
2 from index_stats i,
3       user_indexes u
4 where u.index_name = i.name;

NAME                               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ ---------- ---------- -----------------
T1_PK                                 128      56394             56195

Elapsed: 00:00:00.01

********************************************************************************

select *
from t
where id between 50 and 2500

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.05          0          0          0           0
Fetch       18      0.03       0.81          0         49          0        2451
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total       20      0.03       0.87          0         49          0        2451

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 100

Rows     Row Source Operation
------- ---------------------------------------------------
   2451 TABLE ACCESS BY INDEX ROWID T (cr=49 pr=0 pw=0 time=7423 us)
   2451   INDEX RANGE SCAN T_PK (cr=23 pr=0 pw=0 time=7396 us)(object id 141636)

********************************************************************************

select *
from t1
where id between 50 and 2500

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       18      0.01       0.00          0        280          0        2451
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total       20      0.01       0.00          0        280          0        2451

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 100

Rows     Row Source Operation
------- ---------------------------------------------------
   2451 TABLE ACCESS FULL T1 (cr=280 pr=0 pw=0 time=7398 us)

********************************************************************************

Here the Optimizer correctly chose a full table scan where the data was scattered (when the table was not sorted by primary key). It is the correct plan, you might have forced the use of an index in the misguided belief that indexes are good and Full table scans are bad, then i would have ruined performance.

********************************************************************************

select /*+ index(t1,t1_pk) */ *
from t1
where id between 50 and 2500

call     count       cpu    elapsed       disk      query    current        rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch       18      0.00       0.01          0       2461          0        2451
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total       20      0.00       0.01          0       2461          0        2451

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: 100

Rows     Row Source Operation
------- ---------------------------------------------------
   2451 TABLE ACCESS BY INDEX ROWID T1 (cr=2461 pr=0 pw=0 time=19649 us)
   2451   INDEX RANGE SCAN T1_PK (cr=23 pr=0 pw=0 time=2496 us)(object id 141639)

********************************************************************************

Tuesday, September 14, 2010

Clustering factor

No comments:

Post a Comment