Friday, December 2, 2016

GUID or Sequence

Yesterday got a question on using GUID as primary keys in Oracle database than sequences.

The main difference lies in the fact about how the data is organized in the table and indexes.

GUID’s are

  • Randomly unique values
  • Will insert randomly across the entire breadth of the index.
  • will need the entire index in memory in order to say “we do no physical reads on the index during an insert”
  • These values are NOT monotonically increasing values
  • They have a terrific impact on the clustering factor of the index (close to the number of rows - so range scans are not likely possible).
  • Index will be larger in size when compare to sequence based index, will end up with 50-50 index splits ( half of the index will be empty at any time – that’s ok – don’t rebuild them to reclaim the space, that will only be a waste of time)

Sequences are

  • Sequentially unique values
  • As you insert – the values are increasing, you only hit the right hand size of the index.
  • Only the right hand side of the index in the cache, the left hand side – which you don’t query so much anymore since it is old data – it is rarely in the cache,  hence when we insert we into this index – you will do very little if any physical IO.
  • These values are monotonically increasing values ( as you insert into it, the values are increasing and hit the right hand side of index)
  • They will end up with an appealing clustering factor of the index (close to the number of blocks – so range scans are likely possible)
  • Index will be thin, will end up with 90-10 split ups.


 Here is my demo to show the impact of using GUID as primary keys (modeled using a classical OLTP case – where each transaction is supposed to insert only a couple/few rows ).

demo@ORA12C> create table t1(x raw(16) constraint t1_pk primary key, data varchar2(100));

Table created.

demo@ORA12C> create table t2(x number constraint t2_pk primary key, data varchar2(100));

Table created.

demo@ORA12C>
demo@ORA12C> create sequence s1 cache 1000;

Sequence created.

demo@ORA12C> variable n number
demo@ORA12C> exec :n := 100000;

PL/SQL procedure successfully completed.

demo@ORA12C> exec runstats_pkg.rs_start;

PL/SQL procedure successfully completed.

demo@ORA12C> begin
  2     for i in 1..:n
  3     loop
  4             insert into t1(x,data)
  5                     values(sys_guid(),'x');
  6             commit;
  7     end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

demo@ORA12C> exec runstats_pkg.rs_middle;

PL/SQL procedure successfully completed.

demo@ORA12C> begin
  2     for i in 1..:n
  3     loop
  4             insert into t2(x,data)
  5                     values(s1.nextval,'x');
  6             commit;
  7     end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

demo@ORA12C> exec runstats_pkg.rs_stop(1000);
Run1 ran in 2373 hsecs  and CPU time in 1320 hsecs
Run2 ran in 1808 hsecs  and CPU time in 889 hsecs
run 1 ran in 131.25% of the time

Name                                  Run1        Run2        Diff
STAT...commit cleanouts            201,485     200,479      -1,006
STAT...commit cleanouts succes     201,481     200,475      -1,006
STAT...immediate (CURRENT) blo       1,489         374      -1,115
STAT...deferred (CURRENT) bloc     148,796     150,004       1,208
STAT...physical reads for flas       4,890       3,673      -1,217
STAT...physical reads cache          4,895       3,678      -1,217
STAT...physical reads                4,895       3,678      -1,217
STAT...physical read total IO        4,895       3,678      -1,217
STAT...physical read IO reques       4,895       3,678      -1,217
STAT...free buffer requested         4,925       3,707      -1,218
STAT...non-idle wait count           4,921       3,698      -1,223
STAT...db block gets from cach     102,656     101,391      -1,265
LATCH.row cache objects              2,245       3,594       1,349
LATCH.undo global data             306,373     304,738      -1,635
LATCH.checkpoint queue latch         7,490       9,323       1,833
LATCH.object queue header oper      20,925      18,964      -1,961
STAT...redo entries                308,755     303,767      -4,988
STAT...db block gets from cach     520,397     512,708      -7,689
STAT...db block gets               520,397     512,708      -7,689
STAT...db block changes            613,707     605,896      -7,811
STAT...session logical reads       523,017     514,618      -8,399
LATCH.Consistent RBA                32,396      21,652     -10,744
STAT...messages sent                32,844      21,909     -10,935
LATCH.lgwr LWN SCN                  32,935      21,956     -10,979
LATCH.messages                      72,350      46,897     -25,453
LATCH.cache buffers chains       2,123,816   2,092,403     -31,413
LATCH.redo writing                  97,361      65,210     -32,151
LATCH.redo allocation               98,424      65,823     -32,601
STAT...session pga memory           65,536           0     -65,536
STAT...session pga memory max       65,536           0     -65,536
STAT...session uga memory max      157,672      65,488     -92,184
STAT...calls to get snapshot s     100,564     200,452      99,888
STAT...file io wait time         9,408,791   9,213,081    -195,710
LATCH.sequence cache                     0     300,206     300,206
STAT...KTFB alloc space (block  23,986,176  21,889,024  -2,097,152
STAT...undo change vector size  27,851,092  24,010,264  -3,840,828
STAT...redo size                97,029,204  87,294,744  -9,734,460
STAT...physical read bytes      40,099,840  30,130,176  -9,969,664
STAT...physical read total byt  40,099,840  30,130,176  -9,969,664
STAT...cell physical IO interc  40,099,840  30,130,176  -9,969,664
STAT...logical read bytes from######################## -68,804,608

Run1 latches total versus runs -- difference and pct
        Run1        Run2        Diff       Pct
   3,748,534   3,905,655     157,121     95.98%

PL/SQL procedure successfully completed.

demo@ORA12C>

With sequence based primary keys, we could see 24% improvements on elapsed timings and 44% improvements on CPU timings.

The most important thing to look over with the statistics is the physical IO that got reduced to 1/3 with sequence based primary keys (highlighted in red above) in place.

demo@ORA12C> column index_name format a10
demo@ORA12C> select index_name, clustering_factor,t.num_rows,t.blocks
  2  from user_indexes i,
  3       user_tables t
  4  where i.table_name in ('T1','T2')
  5  and i.table_name = t.table_name
  6  order by index_name ;

INDEX_NAME CLUSTERING_FACTOR   NUM_ROWS     BLOCKS
---------- ----------------- ---------- ----------
T1_PK                  99741     100000        370
T2_PK                    164     100000        244

2 rows selected.

demo@ORA12C>

The index T1_PK has the clustering factor close the number of rows – means the data is randomly scattered across the table, with organized to index,  so the changes of using the index to answer the queries is likely less.

demo@ORA12C> column min_x new_val min_x
demo@ORA12C> column max_x new_val max_x
demo@ORA12C> select min(x) min_x,max(x) max_x
  2  from ( select x from t1
  3  order by x
  4  fetch first 100 row only );

MIN_X                            MAX_X
-------------------------------- --------------------------------
00057F09AAAD444DAAB3C086EC8AC754 0038DFA03B12479CB8A8638E7F96FFC2

1 row selected.

demo@ORA12C> set autotrace traceonly explain
demo@ORA12C> select * from t1 where x between
  2        hextoraw('&min_x') and hextoraw('&max_x');
old   2:       hextoraw('&min_x') and hextoraw('&max_x')
new   2:       hextoraw('00057F09AAAD444DAAB3C086EC8AC754') and hextoraw('0038DFA03B12479CB8A8638E7F96FFC2')

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    80 |  1520 |    79   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    80 |  1520 |    79   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("X"<=HEXTORAW('0038DFA03B12479CB8A8638E7F96FFC2') AND
              "X">=HEXTORAW('00057F09AAAD444DAAB3C086EC8AC754'))

The index T2_PK has the clustering factor close the number of blocks – means the data in the table is organized to index, so the changes of using the index to answer the queries is more possible.

demo@ORA12C> select min(x) min_x,max(x) max_x
  2  from ( select x from t2
  3  order by x
  4  fetch first 100 row only );

     MIN_X      MAX_X
---------- ----------
    200001     200100

1 row selected.

demo@ORA12C> set autotrace traceonly explain
demo@ORA12C> select * from t2 where x between &min_x and &max_x;
old   1: select * from t2 where x between &min_x and &max_x
new   1: select * from t2 where x between     200001 and     200100

Execution Plan
----------------------------------------------------------
Plan hash value: 1879001825

----------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |     1 |     7 |     3|
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     1 |     7 |     3|
|*  2 |   INDEX RANGE SCAN                  | T2_PK |     1 |       |     2|
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("X">=200001 AND "X"<=200100))

Now comparing the size of the index, GUID based indexes took 2x space when compare to Sequence based indexes, (remember bigger the size means, more disk space, more IO and more latches and less scalable)

demo@ORA12C> analyze index t1_pk validate structure;

Index analyzed.

demo@ORA12C> select name, blocks,lf_blks,btree_space,used_space,pct_used from index_stats;

NAME           BLOCKS    LF_BLKS BTREE_SPACE USED_SPACE   PCT_USED
---------- ---------- ---------- ----------- ---------- ----------
T1_PK             640        504     4040032    2704473         67

1 row selected.

demo@ORA12C> exec show_space('T1_PK',user,'INDEX');
Unformatted Blocks .....................             110
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................              15
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................             497
Total Blocks............................             640
Total Bytes.............................       5,242,880
Total MBytes............................               5
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               8
Last Used Ext BlockId...................          22,656
Last Used Block.........................             128

PL/SQL procedure successfully completed.

Totally there were about 640 blocks almost 33% free spaces available in all blocks, since the GUID based inserts were spread across the breath (due to its random nature).

demo@ORA12C> analyze index t2_pk validate structure;

Index analyzed.

demo@ORA12C> select name, blocks,lf_blks,btree_space,used_space,pct_used from index_stats;

NAME           BLOCKS    LF_BLKS BTREE_SPACE USED_SPACE   PCT_USED
---------- ---------- ---------- ----------- ---------- ----------
T2_PK             256        187     1504032    1490919        100

1 row selected.

demo@ORA12C> exec show_space('T2_PK',user,'INDEX');
Unformatted Blocks .....................              32
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................              25
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................             187
Total Blocks............................             256
Total Bytes.............................       2,097,152
Total MBytes............................               2
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               8
Last Used Ext BlockId...................          23,040
Last Used Block.........................             128

PL/SQL procedure successfully completed.

Almost 50% reduction on the disk space, with sequence based indexes in place, and each block is well utilized (almost 100% utilized – no free space in them) before moving into the next leaf block.

All these factor into less logical IO when using sequence based primary keys in place.

demo@ORA12C> set autotrace traceonly explain statistics
demo@ORA12C> select x from t1;

100000 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   100K|  1660K|    78   (3)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| T1   |   100K|  1660K|    78   (3)| 00:00:01 |
--------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1041  consistent gets
          0  physical reads
          0  redo size
    1930381  bytes sent via SQL*Net to client
       7878  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed

So why do we get a full Table scan rather than index fast full scan (since we access only a single column that is the primary key of the table)? , when we compare the size of the table and index, table look smaller than the index (since it holds the key value + row id in place for each row in the leaf block).

demo@ORA12C> exec show_space('T1_PK',user,'INDEX');
Unformatted Blocks .....................             110
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................              15
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................               0
Full Blocks        .....................             497
Total Blocks............................             640
Total Bytes.............................       5,242,880
Total MBytes............................               5
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               8
Last Used Ext BlockId...................          22,656
Last Used Block.........................             128

PL/SQL procedure successfully completed.

demo@ORA12C> exec show_space('T1',user,'TABLE');
Unformatted Blocks .....................               0
FS1 Blocks (0-25)  .....................               0
FS2 Blocks (25-50) .....................               0
FS3 Blocks (50-75) .....................               0
FS4 Blocks (75-100).....................              39
Full Blocks        .....................             331
Total Blocks............................             384
Total Bytes.............................       3,145,728
Total MBytes............................               3
Unused Blocks...........................               0
Unused Bytes............................               0
Last Used Ext FileId....................               8
Last Used Ext BlockId...................          21,504
Last Used Block.........................             128

PL/SQL procedure successfully completed.


demo@ORA12C> select x from t2;

100000 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 871972594

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |   100K|   488K|    41   (5)| 00:00:01 |
|   1 |  INDEX FAST FULL SCAN| T2_PK |   100K|   488K|    41   (5)| 00:00:01 |
------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        892  consistent gets
          0  physical reads
          0  redo size
     729371  bytes sent via SQL*Net to client
       7878  bytes received via SQL*Net from client
        668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed

demo@ORA12C> set autotrace off

By all means, sequence based primary keys looks very appealing (less cpu, less IO, less diskspace, 100% PCT on leaf blocks, good clustering factor)  and highly scalable than GUID based primary keys.

1 comment:

  1. Very informative, thank you for sharing Rajesh!

    ReplyDelete