Yesterday got a question on using GUID as primary keys in Oracle database than sequences.
The main difference lies in the fact about how the data is organized in the table and indexes.
GUID’s are
Sequences are
Here is my demo to show the impact of using GUID as primary keys (modeled using a classical OLTP case – where each transaction is supposed to insert only a couple/few rows ).
The main difference lies in the fact about how the data is organized in the table and indexes.
GUID’s are
- Randomly unique values
- Will insert randomly across the entire breadth of the index.
- will need the entire index in memory in order to say “we do no physical reads on the index during an insert”
- These values are NOT monotonically increasing values
- They have a terrific impact on the clustering factor of the index (close to the number of rows - so range scans are not likely possible).
- Index will be larger in size when compare to sequence based index, will end up with 50-50 index splits ( half of the index will be empty at any time – that’s ok – don’t rebuild them to reclaim the space, that will only be a waste of time)
Sequences are
- Sequentially unique values
- As you insert – the values are increasing, you only hit the right hand size of the index.
- Only the right hand side of the index in the cache, the left hand side – which you don’t query so much anymore since it is old data – it is rarely in the cache, hence when we insert we into this index – you will do very little if any physical IO.
- These values are monotonically increasing values ( as you insert into it, the values are increasing and hit the right hand side of index)
- They will end up with an appealing clustering factor of the index (close to the number of blocks – so range scans are likely possible)
- Index will be thin, will end up with 90-10 split ups.
Here is my demo to show the impact of using GUID as primary keys (modeled using a classical OLTP case – where each transaction is supposed to insert only a couple/few rows ).
demo@ORA12C> create table t1(x raw(16) constraint t1_pk primary key, data varchar2(100));
Table created.
demo@ORA12C> create table t2(x number constraint t2_pk primary key, data varchar2(100));
Table created.
demo@ORA12C>
demo@ORA12C> create sequence s1 cache 1000;
Sequence created.
demo@ORA12C> variable n number
demo@ORA12C> exec :n := 100000;
PL/SQL procedure successfully completed.
demo@ORA12C> exec runstats_pkg.rs_start;
PL/SQL procedure successfully completed.
demo@ORA12C> begin
2 for i in 1..:n
3 loop
4 insert into t1(x,data)
5 values(sys_guid(),'x');
6 commit;
7 end loop;
8 end;
9 /
PL/SQL procedure successfully completed.
demo@ORA12C> exec runstats_pkg.rs_middle;
PL/SQL procedure successfully completed.
demo@ORA12C> begin
2 for i in 1..:n
3 loop
4 insert into t2(x,data)
5 values(s1.nextval,'x');
6 commit;
7 end loop;
8 end;
9 /
PL/SQL procedure successfully completed.
demo@ORA12C> exec runstats_pkg.rs_stop(1000);
Run1 ran in 2373 hsecs and CPU time in 1320 hsecs
Run2 ran in 1808 hsecs and CPU time in 889 hsecs
run 1 ran in 131.25% of the time
Name Run1 Run2 Diff
STAT...commit cleanouts 201,485 200,479 -1,006
STAT...commit cleanouts succes 201,481 200,475 -1,006
STAT...immediate (CURRENT) blo 1,489 374 -1,115
STAT...deferred (CURRENT) bloc 148,796 150,004 1,208
STAT...physical reads for flas 4,890 3,673 -1,217
STAT...physical reads cache 4,895 3,678 -1,217
STAT...physical reads 4,895 3,678 -1,217
STAT...physical read total IO 4,895 3,678 -1,217
STAT...physical read IO reques 4,895 3,678 -1,217
STAT...free buffer requested 4,925 3,707 -1,218
STAT...non-idle wait count 4,921 3,698 -1,223
STAT...db block gets from cach 102,656 101,391 -1,265
LATCH.row cache objects 2,245 3,594 1,349
LATCH.undo global data 306,373 304,738 -1,635
LATCH.checkpoint queue latch 7,490 9,323 1,833
LATCH.object queue header oper 20,925 18,964 -1,961
STAT...redo entries 308,755 303,767 -4,988
STAT...db block gets from cach 520,397 512,708 -7,689
STAT...db block gets 520,397 512,708 -7,689
STAT...db block changes 613,707 605,896 -7,811
STAT...session logical reads 523,017 514,618 -8,399
LATCH.Consistent RBA 32,396 21,652 -10,744
STAT...messages sent 32,844 21,909 -10,935
LATCH.lgwr LWN SCN 32,935 21,956 -10,979
LATCH.messages 72,350 46,897 -25,453
LATCH.cache buffers chains 2,123,816 2,092,403 -31,413
LATCH.redo writing 97,361 65,210 -32,151
LATCH.redo allocation 98,424 65,823 -32,601
STAT...session pga memory 65,536 0 -65,536
STAT...session pga memory max 65,536 0 -65,536
STAT...session uga memory max 157,672 65,488 -92,184
STAT...calls to get snapshot s 100,564 200,452 99,888
STAT...file io wait time 9,408,791 9,213,081 -195,710
LATCH.sequence cache 0 300,206 300,206
STAT...KTFB alloc space (block 23,986,176 21,889,024 -2,097,152
STAT...undo change vector size 27,851,092 24,010,264 -3,840,828
STAT...redo size 97,029,204 87,294,744 -9,734,460
STAT...physical read bytes 40,099,840 30,130,176 -9,969,664
STAT...physical read total byt 40,099,840 30,130,176 -9,969,664
STAT...cell physical IO interc 40,099,840 30,130,176 -9,969,664
STAT...logical read bytes from######################## -68,804,608
Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
3,748,534 3,905,655 157,121 95.98%
PL/SQL procedure successfully completed.
demo@ORA12C>
With sequence based primary keys, we could see 24% improvements on elapsed timings and 44% improvements on CPU timings.
The most important thing to look over with the statistics is the physical IO that got reduced to 1/3 with sequence based primary keys (highlighted in red above) in place.
demo@ORA12C> column index_name format a10
demo@ORA12C> select index_name, clustering_factor,t.num_rows,t.blocks
2 from user_indexes i,
3 user_tables t
4 where i.table_name in ('T1','T2')
5 and i.table_name = t.table_name
6 order by index_name ;
INDEX_NAME CLUSTERING_FACTOR NUM_ROWS BLOCKS
---------- ----------------- ---------- ----------
T1_PK 99741 100000 370
T2_PK 164 100000 244
2 rows selected.
demo@ORA12C>
The index T1_PK has the clustering factor close the number of rows – means the data is randomly scattered across the table, with organized to index, so the changes of using the index to answer the queries is likely less.
demo@ORA12C> column min_x new_val min_x
demo@ORA12C> column max_x new_val max_x
demo@ORA12C> select min(x) min_x,max(x) max_x
2 from ( select x from t1
3 order by x
4 fetch first 100 row only );
MIN_X MAX_X
-------------------------------- --------------------------------
00057F09AAAD444DAAB3C086EC8AC754 0038DFA03B12479CB8A8638E7F96FFC2
1 row selected.
demo@ORA12C> set autotrace traceonly explain
demo@ORA12C> select * from t1 where x between
2 hextoraw('&min_x') and hextoraw('&max_x');
old 2: hextoraw('&min_x') and hextoraw('&max_x')
new 2: hextoraw('00057F09AAAD444DAAB3C086EC8AC754') and hextoraw('0038DFA03B12479CB8A8638E7F96FFC2')
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 80 | 1520 | 79 (4)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| T1 | 80 | 1520 | 79 (4)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("X"<=HEXTORAW('0038DFA03B12479CB8A8638E7F96FFC2') AND
"X">=HEXTORAW('00057F09AAAD444DAAB3C086EC8AC754'))
The index T2_PK has the clustering factor close the number of blocks – means the data in the table is organized to index, so the changes of using the index to answer the queries is more possible.
demo@ORA12C> select min(x) min_x,max(x) max_x
2 from ( select x from t2
3 order by x
4 fetch first 100 row only );
MIN_X MAX_X
---------- ----------
200001 200100
1 row selected.
demo@ORA12C> set autotrace traceonly explain
demo@ORA12C> select * from t2 where x between &min_x and &max_x;
old 1: select * from t2 where x between &min_x and &max_x
new 1: select * from t2 where x between 200001 and 200100
Execution Plan
----------------------------------------------------------
Plan hash value: 1879001825
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 3|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 7 | 3|
|* 2 | INDEX RANGE SCAN | T2_PK | 1 | | 2|
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("X">=200001 AND "X"<=200100))
Now comparing the size of the index, GUID based indexes took 2x space when compare to Sequence based indexes, (remember bigger the size means, more disk space, more IO and more latches and less scalable)
demo@ORA12C> analyze index t1_pk validate structure;
Index analyzed.
demo@ORA12C> select name, blocks,lf_blks,btree_space,used_space,pct_used from index_stats;
NAME BLOCKS LF_BLKS BTREE_SPACE USED_SPACE PCT_USED
---------- ---------- ---------- ----------- ---------- ----------
T1_PK 640 504 4040032 2704473 67
1 row selected.
demo@ORA12C> exec show_space('T1_PK',user,'INDEX');
Unformatted Blocks ..................... 110
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 15
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 497
Total Blocks............................ 640
Total Bytes............................. 5,242,880
Total MBytes............................ 5
Unused Blocks........................... 0
Unused Bytes............................ 0
Last Used Ext FileId.................... 8
Last Used Ext BlockId................... 22,656
Last Used Block......................... 128
PL/SQL procedure successfully completed.
Totally there were about 640 blocks almost 33% free spaces available in all blocks, since the GUID based inserts were spread across the breath (due to its random nature).
demo@ORA12C> analyze index t2_pk validate structure;
Index analyzed.
demo@ORA12C> select name, blocks,lf_blks,btree_space,used_space,pct_used from index_stats;
NAME BLOCKS LF_BLKS BTREE_SPACE USED_SPACE PCT_USED
---------- ---------- ---------- ----------- ---------- ----------
T2_PK 256 187 1504032 1490919 100
1 row selected.
demo@ORA12C> exec show_space('T2_PK',user,'INDEX');
Unformatted Blocks ..................... 32
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 25
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 187
Total Blocks............................ 256
Total Bytes............................. 2,097,152
Total MBytes............................ 2
Unused Blocks........................... 0
Unused Bytes............................ 0
Last Used Ext FileId.................... 8
Last Used Ext BlockId................... 23,040
Last Used Block......................... 128
PL/SQL procedure successfully completed.
Almost 50% reduction on the disk space, with sequence based indexes in place, and each block is well utilized (almost 100% utilized – no free space in them) before moving into the next leaf block.
All these factor into less logical IO when using sequence based primary keys in place.
demo@ORA12C> set autotrace traceonly explain statistics
demo@ORA12C> select x from t1;
100000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013
--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 1660K| 78 (3)| 00:00:01 |
| 1 | TABLE ACCESS FULL| T1 | 100K| 1660K| 78 (3)| 00:00:01 |
--------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
1041 consistent gets
0 physical reads
0 redo size
1930381 bytes sent via SQL*Net to client
7878 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100000 rows processed
So why do we get a full Table scan rather than index fast full scan (since we access only a single column that is the primary key of the table)? , when we compare the size of the table and index, table look smaller than the index (since it holds the key value + row id in place for each row in the leaf block).
demo@ORA12C> exec show_space('T1_PK',user,'INDEX');
Unformatted Blocks ..................... 110
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 15
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 0
Full Blocks ..................... 497
Total Blocks............................ 640
Total Bytes............................. 5,242,880
Total MBytes............................ 5
Unused Blocks........................... 0
Unused Bytes............................ 0
Last Used Ext FileId.................... 8
Last Used Ext BlockId................... 22,656
Last Used Block......................... 128
PL/SQL procedure successfully completed.
demo@ORA12C> exec show_space('T1',user,'TABLE');
Unformatted Blocks ..................... 0
FS1 Blocks (0-25) ..................... 0
FS2 Blocks (25-50) ..................... 0
FS3 Blocks (50-75) ..................... 0
FS4 Blocks (75-100)..................... 39
Full Blocks ..................... 331
Total Blocks............................ 384
Total Bytes............................. 3,145,728
Total MBytes............................ 3
Unused Blocks........................... 0
Unused Bytes............................ 0
Last Used Ext FileId.................... 8
Last Used Ext BlockId................... 21,504
Last Used Block......................... 128
PL/SQL procedure successfully completed.
demo@ORA12C> select x from t2;
100000 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 871972594
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 100K| 488K| 41 (5)| 00:00:01 |
| 1 | INDEX FAST FULL SCAN| T2_PK | 100K| 488K| 41 (5)| 00:00:01 |
------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
892 consistent gets
0 physical reads
0 redo size
729371 bytes sent via SQL*Net to client
7878 bytes received via SQL*Net from client
668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100000 rows processed
demo@ORA12C> set autotrace off
By all means, sequence based primary keys looks very appealing (less cpu, less IO, less diskspace, 100% PCT on leaf blocks, good clustering factor) and highly scalable than GUID based primary keys.
Very informative, thank you for sharing Rajesh!
ReplyDelete