Tuesday, December 27, 2016

Storing JSON in Oracle database

Starting with Oracle database 12c (12.1.0.2) Oracle not only introduced Oracle in-memory option but also native JSON support within the Oracle database. Just as with XML in the past, Oracle is now aware of JSON data structures.  This means that you can now persists JSON data in its native structure within the database.  But on top of that you can store and retrieve it in a simple JSON friendly way or access it fully transparently via SQL.

The JSON document store got introduced with XDB component, XDB component was originally introduced in Oracle 9i for handling XML data structures; in 12c XDB component got enhanced to support JSON along with XML data structures, and available within all database editions ( both enterprise edition (EE) and standard edition (SE2) ), this means it is completely free of charge.

Oracle has chosen to store JSON data in existing data types instead of adding new data types.  JSON can be stored in Varchar2, CLOB or BLOB data type, these data types have different properties, Varchar2 can have a maximum length of 32K (when MAX_STRING_SIZE parameter is set to EXTENDED in 12c) or 4000 bytes by default. CLOB encodes characters as UCS2 which means every character takes up two bytes; BLOB doesn’t perform such encoding but instead stores Unicode bytes un-modified.

Let’s start with few rows,

demo@ORA12C> create table t(id int,doc varchar2(4000) );

Table created.

demo@ORA12C> insert into t values (1, '{
  2        "firstName": "Tom",
  3        "lastName": "Kyte",
  4        "location": {
  5          "country": "US",
  6          "city": "Virginia",
  7          "postCode": "VA2M"
  8        }
  9      }' );

1 row created.

demo@ORA12C> insert into t values (2, '{
  2        "firstName": "Dr",
  3        "lastName": "DBA",
  4        "location": {
  5          "country": "US",
  6          "city": "Chicago",
  7          "postCode": "CA"
  8        }
  9      }' );

1 row created.

demo@ORA12C> insert into t values (3, '{
  2        firstName: "Steven",
  3        "lastName": "Flemming",
  4        "location": {
  5          "country": "NZ",
  6          "city": "Nigeria",
  7          "postCode": "NI"
  8        }
  9      }' );

1 row created.

demo@ORA12C> insert into t values (4, 'I am not a valid JSON' );

1 row created.

demo@ORA12C> commit;

Commit complete.

demo@ORA12C>


As one can see the last insert statement inserts data that is not a valid JSON document. So how to identify such a rows? And how to prevent such invalid JSON from being inserted in the first place?

This brings us the new JSON operator IS JSON to find all the valid JSON documents (and invalid JSON documents can be identified using IS NOT JSON


demo@ORA12C> column doc format a20 trunc
demo@ORA12C> select * from t where doc IS JSON;

        ID DOC
---------- --------------------
         1 {
         2 {
         3 {

3 rows selected.

demo@ORA12C> select * from t where doc IS NOT JSON;

        ID DOC
---------- --------------------
         4 I am not a valid JSO

1 row selected.

demo@ORA12C>



If you look carefully at the third row, you’ll see that field name “firstName” is not inside the quotes. Technically this makes the data invalid JSON because the key names need to be enquoted but not everyone follow this rule. The IS JSON operator still accepts this row as JSON, because JSON parser by default runs in tolerant mode.  But not every JSON tool accepts unquoted names (try out http://jsonlint.com/ )


So for JSON sending to others, you may want to validate this and make sure that it compiles to all rules, you can do so by running IS JSON in strict mode.


demo@ORA12C> select * from t
  2  where doc IS JSON STRICT ;

        ID DOC
---------- --------------------
         1 {
         2 {

2 rows selected.

demo@ORA12C>



The JSON specification says nothing about duplicate key names within the same object, for instance this is technically valid JSON although it has two keys with the same FIRSTNAME.

{
      "firstName": "Steven",
      "lastName": "Flemming",
      "location": {
        "country": "NZ",
        "city": "Nigeria",
        "postCode": "NI"
      } ,
      "firstName": "Tom"
}


How to avoid those duplicate key names? The clause WITH UNIQUE KEYS checks that keys are not repeated within the same object level (keys with the same name can occur at multiple places in the data but not in the same object).


demo@ORA12C>
demo@ORA12C> insert into t values( 5,
  2  ' {
  3        "firstName": "Steven",
  4        "lastName": "Flemming",
  5        "location": {
  6          "country": "NZ",
  7          "city": "Nigeria",
  8          "postCode": "NI"
  9        } ,
 10       "firstName": "Tom"
 11  } ') ;

1 row created.

demo@ORA12C>
demo@ORA12C> commit;

Commit complete.

demo@ORA12C> select * from t
  2  where doc IS JSON STRICT ;

        ID DOC
---------- --------------------
         5 {
         1 {
         2 {

3 rows selected.

demo@ORA12C> select * from t
  2  where doc IS JSON STRICT WITH UNIQUE KEYS ;

        ID DOC
---------- --------------------
         1 {
         2 {

2 rows selected.

demo@ORA12C>


So now we understand the IS JSON, how can we use that to prevent non-JSON data to be inserted into our table in the first place?   The answer would be to use a check constraint. 

demo@ORA12C> truncate table t;

Table truncated.

demo@ORA12C>
demo@ORA12C> alter table t
  2  add constraint check_json_valid
  3  check( doc IS JSON STRICT );

Table altered.

demo@ORA12C> column column_name format a10
demo@ORA12C> select column_name,data_type
  2  from user_json_columns
  3  where table_name ='T';

COLUMN_NAM DATA_TYPE
---------- -------------
DOC        VARCHAR2

1 row selected.

demo@ORA12C> insert into t values(1, '{
  2        "firstName": "Tom",
  3        "lastName": "Kyte",
  4        "location": {
  5          "country": "US",
  6          "city": "Virginia",
  7          "postCode": "VA2M"
  8        }
  9      }' );

1 row created.

demo@ORA12C> insert into t values(2, '{
  2        "firstName": "Dr",
  3        "lastName": "DBA",
  4        "location": {
  5          "country": "US",
  6          "city": "Chicago",
  7          "postCode": "CA"
  8        }
  9      }' );

1 row created.

demo@ORA12C> insert into t values(3, '{
  2        firstName: "Steven",
  3        "lastName": "Flemming",
  4        "location": {
  5          "country": "NZ",
  6          "city": "Nigeria",
  7          "postCode": "NI"
  8        }
  9      }' );
insert into t values(3, '{
*
ERROR at line 1:
ORA-02290: check constraint (DEMO.CHECK_JSON_VALID) violated


Check constraint violated due to the missing quotes around the key “first_name”. (Since the check constraint is IS JSON STRICT)


demo@ORA12C> insert into t values(4, 'I am not a valid JSON' );
insert into t values(4, 'I am not a valid JSON' )
*
ERROR at line 1:
ORA-02290: check constraint (DEMO.CHECK_JSON_VALID) violated


Check constraint violated due invalid JSON format.


demo@ORA12C> insert into t values( 5,
  2  ' {
  3        "firstName": "Steven",
  4        "lastName": "Flemming",
  5        "location": {
  6          "country": "NZ",
  7          "city": "Nigeria",
  8          "postCode": "NI"
  9        } ,
 10       "firstName": "Tom"
 11  } ') ;

1 row created.

demo@ORA12C>

No, Check constraint violation, because check constraint is defined with IS JSON STRICT option – so no options to check for duplicate keys with in the object. Hence it got bypassed and loaded into the table.

No comments:

Post a Comment