Saturday, January 7, 2023

JSON Datatype - Part I

One of many nice things about Oracle database 21c is “JSON” datatype, which is an optimized native binary storage format using the OSON format. All json documents in autonomous database are automatically stored in OSON format. This delivers big performance benefits to json applications, including faster query performance, efficient updates, multi-value function index and efficient storage sizes. This native json format is completely transparent to applications and applications always use standard json documents, but all the database operations on json documents are optimized under the covers via binary format.
 
OSON is based on tree encoding and a simple json documents are serialized into OSON byte array having tree pointers for easy navigation offsets.
 
Let’s compare the text storage of json documents with json datatypes
 
demo@ATP21C> create table test_json(x number,y json);
 
Table created.
 
demo@ATP21C> create table test_text(x number,y clob check(y is json format json) );
 
Table created.
 
Now the table “big_table” got loaded with some 100K rows(copy of all_objects) and the average size of json document sized 620 bytes in size.
 
demo@ATP21C> select round(avg(length(json_object(*))),2) size_bytes  from big_table;
 
SIZE_BYTES
----------
    622.89
 
To make it big, let’s aggregate that into a bigger json document using SQL/JSON function’s and now it has grown into a bigger size of 59.5 MB
 
demo@ATP21C> select round(length(json_arrayagg(json_object(*) returning blob)),2) size_bytes from big_table;
 
SIZE_BYTES
----------
  62388939
 
When we inserted that bigger json document into our test table
 
demo@ATP21C> insert into test_json(x,y)
  2  select 1,json_arrayagg(json_object(*) returning blob)
  3  from big_table;
 
1 row created.
 
demo@ATP21C> insert into test_text(x,y)
  2  select x, json_serialize(y returning clob) from test_json;
 
1 row created.
 
Now the size of large json document from both tables are
 
demo@ATP21C> select round( length(y)/1024/1024,2) size_mb from test_text;
 
   SIZE_MB
----------
      59.5
 
demo@ATP21C> select round( length(y)/1024/1024,2) size_mb from test_json;
 
   SIZE_MB
----------
     27.63
 
Pretty much the json datatype size half the storage of text-based storage for json documents. The reason for this half the storage for new json datatype is that the json document it got many repeated json objects as array, so each “key” got repeated within the array and these repeated “keys” with the documents are replaced with ID’s and result in efficient reduction in space.
 
Now let’s see the effect of JSON datatype on many small documents instead of one large document.
Let’s reload the above two tables as every row as json document into each table.
 
demo@ATP21C> truncate table test_json;
 
Table truncated.
 
demo@ATP21C> truncate table test_text;
 
Table truncated.
 
demo@ATP21C> insert all
  2     into test_json(x,y) values(r,j)
  3     into test_text(x,y) values(r,j)
  4  select rownum r, json_object(*) j
  5  from big_table;
 
200000 rows created.
 
demo@ATP21C> select round( sum(length(y))/1024/1024,2) size_mb from test_text;
 
   SIZE_MB
----------
      59.4
 
demo@ATP21C> select round( sum(length(y))/1024/1024,2) size_mb from test_json;
 
   SIZE_MB
----------
     64.63
 
demo@ATP21C>
 
now the json datatype got little bit bigger, because each json document persisted in that datatype is self-contained and we got no repeated “keys” there to benefit the compression. 

No comments:

Post a Comment