Wednesday, October 18, 2023

Load data from files in the cloud storage

DBMS_CLOUD package was introduced in Oracle Autonomous database to provide a simple way to interact with an object store from the database. It can be used to support data loading from various cloud service provides like OCI, Azure blob storage, Amazon S3, Git hub repository and Google cloud storage.
 
In the following example I will be using my Oracle Free tier account for demonstration to show how to do step by step configuration for copying data from a delimited file located in object storage to an existing table in the database.
 
Log into the Oracle cloud and from the “Top-left” hamburger menu select the “Storage > Object storage & Archive storage > Buckets “option, finally click on the “Create Bucket” button on the “Object storage” dashboard and enter the required details like ( name of the bucket, storage tier and key management, then click the “create” button )

 


 
Once the bucket has been provisioned, drill into it by clicking the bucket name.
 


 
Click the “upload” button, to upload the required file into the object storage bucket, once the file got uploaded click on the “kebab” menu on the right hand side of the uploaded item and select the “view object details” option, the resulting screen will display details about the uploaded object, including the URI path, which will be needed to access the object.
 


 
To grant the access to the bucket and its contents, we must create an access token. using the “Top-left” hamburger menu select the “ Identity & Security > Identity > users “ option, and click on the user that needs access to the bucket, then click on the “Auth Tokens” link in the “Resources” menu  on the bottom left of the screen, and click on the “Generate Token” button.
 


 
Enter the description of the token and click the “Generate Token” button. Click the “copy” link to record the token in clipboard, which will be utilized shortly to create the credentials in the database.
 


 
Now we will see how to use this generated Token to access the details in object storage from SQL Statements.
 
First we will create a schema user account with required set of privileges to it.

 
admin@ATP19C> create user demo_user identified by "********";
 
User created.
 
admin@ATP19C> grant connect,resource,unlimited tablespace to demo_user;
 
Grant succeeded.
 
admin@ATP19C> grant execute on dbms_cloud to demo_user;
 
Grant succeeded.
 
Then connect back to demo_user account and create a credential for object storage using create_credential procedure.
 
admin@ATP19C> conn demo_user/"********"@atp19c
Connected.
demo_user@ATP19C> begin
  2     dbms_cloud.create_credential(
  3             credential_name =>'my_demo_cred',
  4             username => 'your_oci_login_email_account',
  5             password => 'the_above_copied_userAuth_Token_Here');
  6  end;
  7  /
 
PL/SQL procedure successfully completed.
 
To access the objects in the object storage all we need its URL location.
 
demo_user@ATP19C> variable filename varchar2(800)
demo_user@ATP19C> variable namespace varchar2(80)
demo_user@ATP19C> variable uri varchar2(80)
demo_user@ATP19C>
demo_user@ATP19C> exec :filename := 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/*******/b/MY_DEMO_BUCKET/o/mycsvdemo.txt';
 
PL/SQL procedure successfully completed.
 
demo_user@ATP19C> exec :uri := 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/';
 
PL/SQL procedure successfully completed.
 
demo_user@ATP19C> exec :namespace := '********';
 
PL/SQL procedure successfully completed.
 
demo_user@ATP19C> select object_name, bytes
  2  from dbms_cloud.list_objects(
  3     'my_demo_cred',
  4     :uri||:namespace||'/b/MY_DEMO_BUCKET/o/');
 
OBJECT_NAME                         BYTES
------------------------------ ----------
File1.json                          55878
File2.json                          45992
File3.json                         484455
File4.json                         113873
File5.json                           2757
File6.json                          10612
mycsvdemo.txt                         933
 
7 rows selected.
 
The dbms_cloud package got several methods related to file handling in the object storage, ‘list_object” method is there to list all the files in the specific location. Likewise, to read the contents of the specific file we can use the “get_object” method – that returns the binary contents of the data.
 
demo_user@ATP19C> select dbms_cloud.get_object('my_demo_cred',:filename) from dual;
 
DBMS_CLOUD.GET_OBJECT('MY_DEMO_CRED',:FILENAME)
----------------------------------------------------------------------------------------------------------------------------------------------------------------
22454D504E4F222C22454E414D45222C224A4F42222C224D4752222C224849524544415445222C2253414C222C22434F4D4D222C22444550544E4F220D0A373336392C22534D495448222C22434C4552
4B222C373930322C31372D4445432D313938302031323A30303A303020414D2C3830302C2C32300D0A373439392C22414C4C454E222C2253414C45534D414E222C373639382C32302D4645422D313938
312031323A30303A303020414D2C313630302C3330302C33300D0A373532312C2257415244222C2253414C45534D414E222C373639382C32322D4645422D313938312031323A30303A303020414D2C31
3235302C3530302C33300D0A373536362C224A4F4E4553222C224D414E41474552222C373833392C30322D4150522D313938312031323A30303A303020414D2C323937352C2C32300D0A373635342C22
4D415254494E222C2253414C45534D414E222C373639382C32382D5345502D313938312031323A30303A303020414D2C313235302C313430302C33300D0A373639382C22424C414B45222C224D414E41
474552222C373833392C30312D4D41592D313938312031323A30303A303020414D2C323835302C2C33300D0A373738322C22434C41524B222C224D414E41474552222C373833392C30392D4A554E2D31
3938312031323A30303A303020414D2C323435302C2C31300D0A373738382C2253434F5454222C22414E414C595354222C373536362C31392D4150522D303038372031323A30303A303020414D2C3330
30302C2C32300D0A373833392C224B494E47222C22505245534944454E54222C2C31372D4E4F562D313938312031323A30303A303020414D2C353030302C2C31300D0A373834342C225455524E455222
2C2253414C45534D414E222C373639382C30382D5345502D313938312031323A30303A303020414D2C313530302C302C33300D0A373837362C224144414D53222C22434C45524B222C373738382C3233
2D4D41592D303038372031323A30303A303020414D2C313130302C2C32300D0A373930302C224A414D4553222C22434C45524B222C373639382C30332D4445432D313938312031323A30303A30302041
4D2C3935302C2C33300D0A373930322C22464F5244222C22414E414C595354222C373536362C30332D4445432D313938312031323A30303A303020414D2C333030302C2C32300D0A373933342C224D49
4C4C4552222C22434C45524B222C373738322C32332D4A414E2D313938322031323A30303A303020414D2C313330302C2C31300D0A
 
To return the data in the text format, we need to cast this raw data to text format using UTL_RAW API.
 
demo_user@ATP19C> select utl_raw.cast_to_varchar2(dbms_cloud.get_object('my_demo_cred',:filename)) from dual;
 
UTL_RAW.CAST_TO_VARCHAR2(DBMS_CLOUD.GET_OBJECT('MY_DEMO_CRED',:FILENAME))
-------------------------------------------------------------------------
"EMPNO","ENAME","JOB","MGR","HIREDATE","SAL","COMM","DEPTNO"
7369,"SMITH","CLERK",7902,17-DEC-1980 12:00:00 AM,800,,20
7499,"ALLEN","SALESMAN",7698,20-FEB-1981 12:00:00 AM,1600,300,30
7521,"WARD","SALESMAN",7698,22-FEB-1981 12:00:00 AM,1250,500,30
7566,"JONES","MANAGER",7839,02-APR-1981 12:00:00 AM,2975,,20
7654,"MARTIN","SALESMAN",7698,28-SEP-1981 12:00:00 AM,1250,1400,30
7698,"BLAKE","MANAGER",7839,01-MAY-1981 12:00:00 AM,2850,,30
7782,"CLARK","MANAGER",7839,09-JUN-1981 12:00:00 AM,2450,,10
7788,"SCOTT","ANALYST",7566,19-APR-0087 12:00:00 AM,3000,,20
7839,"KING","PRESIDENT",,17-NOV-1981 12:00:00 AM,5000,,10
7844,"TURNER","SALESMAN",7698,08-SEP-1981 12:00:00 AM,1500,0,30
7876,"ADAMS","CLERK",7788,23-MAY-0087 12:00:00 AM,1100,,20
7900,"JAMES","CLERK",7698,03-DEC-1981 12:00:00 AM,950,,30
7902,"FORD","ANALYST",7566,03-DEC-1981 12:00:00 AM,3000,,20
7934,"MILLER","CLERK",7782,23-JAN-1982 12:00:00 AM,1300,,10
 
Likewise to copy the data into another table, we got a method called “copy_data” to copy its content.
 
demo_user@ATP19C> create table emp_demo as
  2  select *
  3  from demo.emp
  4  where 1 = 0 ;
 
Table created.
 
demo_user@ATP19C> begin
  2     dbms_cloud.copy_data(
  3             table_name =>'emp_demo',
  4             credential_name =>'my_demo_cred',
  5             file_uri_list => :filename,
  6             format => json_object('delimiter' value ',',
  7                             'dateformat' value 'dd-mon-yyyy hh12:mi:ss am',
  8                             'quote' value '"','enablelogs' value false,
  9                             'skipheaders' value '1') );
 10  end;
 11  /
begin
*
ERROR at line 1:
ORA-20000: ORA-06564: object DATA_PUMP_DIR does not exist
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 1693
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 8286
ORA-06512: at "C##CLOUD$SERVICE.DBMS_CLOUD", line 8309
ORA-06512: at line 2
 
The above error is due to writing logs to default directory object “data_pump_dir” & since the schema owner don’t have any privilege to it, It got errored.
 
Once we grant the appropriate privileges

 
admin@ATP19C> grant write on directory data_pump_dir to demo_user;
 
Grant succeeded.
 
admin@ATP19C> grant read on directory data_pump_dir to demo_user ;
 
Grant succeeded.
 
The load got succeed
 
demo_user@ATP19C> begin
  2     dbms_cloud.copy_data(
  3             table_name =>'emp_demo',
  4             credential_name =>'my_demo_cred',
  5             file_uri_list => :filename,
  6             format => json_object('delimiter' value ',',
  7                             'dateformat' value 'dd-mon-yyyy hh12:mi:ss am',
  8                             'quote' value '"','enablelogs' value false,
  9                             'skipheaders' value '1') );
 10  end;
 11  /
 
PL/SQL procedure successfully completed.
 
demo_user@ATP19C> select * from emp_demo;
 
     EMPNO ENAME      JOB              MGR HIREDATE           SAL       COMM     DEPTNO
---------- ---------- --------- ---------- ----------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-1980        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-1981       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-1981       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-1981       2975                    20
      7654 MARTIN     SALESMAN        7698 28-SEP-1981       1250       1400         30
      7698 BLAKE      MANAGER         7839 01-MAY-1981       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-1981       2450                    10
      7788 SCOTT      ANALYST         7566 19-APR-0087       3000                    20
      7839 KING       PRESIDENT            17-NOV-1981       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-1981       1500          0         30
      7876 ADAMS      CLERK           7788 23-MAY-0087       1100                    20
      7900 JAMES      CLERK           7698 03-DEC-1981        950                    30
      7902 FORD       ANALYST         7566 03-DEC-1981       3000                    20
      7934 MILLER     CLERK           7782 23-JAN-1982       1300                    10
 
14 rows selected.
 
demo_user@ATP19C>
 

No comments:

Post a Comment