Thursday, October 31, 2024

Multi files loading in ORDS POST request

One of many new features introduced in ORDS 24.3 is the ability to upload multiple files as part of POST request (multipart/form-data) to an ORDS endpoint.
 
To get into an example, first we need to familiar ourselves with the recent enhancements to procedures and functions introduced in ORDS supplied packages.
 
Method to save the file being uploaded into database session for later use.
 
PROCEDURE add_body_file (
  p_parameter_name    IN varchar2,
  p_file_name         IN varchar2,
  p_content_type      IN varchar2,
  p_file_blob         IN blob);
 
method to return the number of files saved in the session.
 
FUNCTION body_file_count RETURN NUMBER
 
Method to return the file specified at the index, that was saved into database session.
 
PROCEDURE get_body_file (
  p_file_index        IN NUMBER,
  p_parameter_name    OUT varchar2,
  p_file_name         OUT varchar2,
  p_content_type      OUT varchar2,
  p_file_blob         OUT blob);
 
method to clear the files that were stored in the session. Files are also cleared automatically when the connection is returned to pool.
 
PROCEDURE clear_body_files;
 
 
Here is the Implementation for the REST module using the above methods.
 
demo@FREEPDB1> create table t( file_id number generated always as identity
  2     , parameter_name varchar2(80)
  3     , file_name varchar2(80)
  4     , content_type varchar2(80)
  5*    , file_blob blob );
 
Table T created.
 
demo@FREEPDB1> -- Generated by Oracle SQL Developer REST Data Services 24.3.0.284.2209
demo@FREEPDB1> -- Exported REST Definitions from ORDS Schema Version 24.3.0.r2620924
demo@FREEPDB1> -- Schema: DEMO   Date: Fri Oct 25 17:44:53 IST 2024
demo@FREEPDB1> --
  2  BEGIN
  3    ORDS.ENABLE_SCHEMA(
  4        p_enabled             => TRUE,
  5        p_schema              => 'DEMO',
  6        p_url_mapping_type    => 'BASE_PATH',
  7        p_url_mapping_pattern => 'demo',
  8        p_auto_rest_auth      => FALSE);
  9
 10    ORDS.DEFINE_MODULE(
 11        p_module_name    => 'body.files.demo',
 12        p_base_path      => '/V1/',
 13        p_items_per_page =>  25,
 14        p_status         => 'PUBLISHED',
 15        p_comments       => NULL);
 16    ORDS.DEFINE_TEMPLATE(
 17        p_module_name    => 'body.files.demo',
 18        p_pattern        => 'example',
 19        p_priority       => 0,
 20        p_etag_type      => 'HASH',
 21        p_etag_query     => NULL,
 22        p_comments       => NULL);
 23    ORDS.DEFINE_HANDLER(
 24        p_module_name    => 'body.files.demo',
 25        p_pattern        => 'example',
 26        p_method         => 'POST',
 27        p_source_type    => 'plsql/block',
 28        p_items_per_page =>  0,
 29        p_mimes_allowed  => '',
 30        p_comments       => NULL,
 31        p_source         =>
 32  'declare
 33      l_json clob := :body_json;
 34      l_parameter_name varchar2(4000) ;
 35      l_file_name varchar2(4000) ;
 36      l_content_type varchar2(4000) ;
 37      l_file_blob blob;
 38  begin
 39      htp.p(''Number of files received = ''|| ords.body_file_count);
 40
 41      for k in 1..ords.body_file_count
 42      loop
 43          ords.get_body_file
 44              ( p_file_index => k
 45              , p_parameter_name => l_parameter_name
 46              , p_file_name => l_file_name
 47              , p_content_type => l_content_type
 48              , p_file_blob => l_file_blob );
 49
 50         insert into t(
 51              parameter_name
 52             , file_name
 53             , content_type
 54             , file_blob )
 55          values( l_parameter_name
 56              , l_file_name
 57              , l_content_type
 58              , l_file_blob );
 59         htp.p('' Processed file #''||k) ;
 60      end loop;
 61  end;'
 62        );
 63
 64
 65    COMMIT;
 66  END;
 67* /
 
PL/SQL procedure successfully completed.
 
demo@FREEPDB1>
 
In practice, here is how the things work. I’m using the Postman as a proxy for my application                 (i.e, the client) 
 
 


 
The header part of the request looks something like this

 
 
The output from the database after the above POST call looks like this
 
demo@FREEPDB1> select parameter_name, dbms_lob.getlength(file_blob) from t;
 
PARAMETER_NAME       DBMS_LOB.GETLENGTH(FILE_BLOB)
_________________ ________________________________
file_one                                        70
file_two                                      1604
 
 
when posting multipart/form-data request, we must bind :BODY_JSON implicit parameter to something, even if that something will be an empty property.
 


Sunday, October 27, 2024

External Tables Partitioning in Autonomous database - Part II

This will be the continuation of the previous blog post.
 
To efficiently scan data in object store, Autonomous database (ADB) provides various optimizations for all common data types, table formats and directory structures in object storage. With the recently introduced new feature (called Implicit partitioning) for external tables, this enables query performance by enable pruning of data based on logical conditions. This new feature also make it much easier to create and manage these partitioned tables, by simply pointing the root directory for your tables (data) files. That is, we can easily create external tables across thousands of partitions and millions of files within a single command.
 
External tables with Implicit partitioning automatically discover the underlying partition structure based on the hierarchical structure of files and add these additional partition keys to external table structures.
 
Here is the structure of our object storage.
 
 
demo@ATP19C> select object_name, bytes
  2      from dbms_cloud.list_objects(
  3         credential_name => 'API_KEY_CRED',
  4        location_uri => :uri);
 
OBJECT_NAME                                              BYTES
--------------------------------------------------- ----------
Sales/                                                       0
Sales/Country=UK/                                            0
Sales/Country=UK/year=2024/                                  0
Sales/Country=UK/year=2024/data_deptno_10.parquet         2163
Sales/Country=UK/year=2024/data_deptno_20.parquet         2218
Sales/Country=US/                                            0
Sales/Country=US/year=2023/                                  0
Sales/Country=US/year=2023/data_deptno_30.parquet         2257
 
8 rows selected.
 
To emphasize this external table as Automatic Implicit partitioning, we need to create it as using DBMS_CLOUD API
 
demo@ATP19C> begin
  2     dbms_cloud.create_external_table(
  3             table_name => 'xt_implicit_hive_demo'
  4             , credential_name => 'API_KEY_CRED'
  5             , file_uri_list => :uri
  6             , format => '{"schema":"first","type":"parquet","implicit_partition_type":"hive"}');
  7  end;
  8  /
 
PL/SQL procedure successfully completed.
 
 
Partition columns are automatically detected, and implicit partitioning is enabled by setting the format option IMPLICIT_PARTITION_TYPE to HIVE.
 
To detect the partition columns ADB starts searching from the beginning of the URL path, specified by FILE_URI_LIST for "=" , when found the left part of = is treated as the column name and the right part of = as value, then the search continues for next equal sign "=" until after the first partition key column and so on.
 
Query using implicit partitioning.
 
demo@ATP19C> select * from xt_implicit_hive_demo;
 
EMPNO ENAME  JOB            MGR HIREDATE           SAL    COMM  DEPTNO COUNTRY YEAR
_____ ______ ____________ _____ ______________ _______ _______ _______ _______ _____
 7839 KING   PRESIDENT          17-NOV-1981       5000              10 UK      2024
 7782 CLARK  MANAGER       7839 09-JUN-1981       2450              10 UK      2024
 7934 MILLER CLERK         7782 23-JAN-1982       1300              10 UK      2024
 7566 JONES  MANAGER       7839 02-APR-1981       2975              20 UK      2024
 7788 SCOTT  ANALYST       7566 19-APR-1987       3000              20 UK      2024
 7902 FORD   ANALYST       7566 03-DEC-1981       3000              20 UK      2024
 7369 SMITH  CLERK         7902 17-DEC-1980        800              20 UK      2024
 7876 ADAMS  CLERK         7788 23-MAY-1987       1100              20 UK      2024
 7698 BLAKE  MANAGER       7839 01-MAY-1981       2850              30 US      2023
 7499 ALLEN  SALESMAN      7698 20-FEB-1981       1600     300      30 US      2023
 7521 WARD   SALESMAN      7698 22-FEB-1981       1250     500      30 US      2023
 7654 MARTIN SALESMAN      7698 28-SEP-1981       1250    1400      30 US      2023
 7844 TURNER SALESMAN      7698 08-SEP-1981       1500       0      30 US      2023
 7900 JAMES  CLERK         7698 03-DEC-1981        950              30 US      2023
 
demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2023';
 
  COUNT(*)
----------
         6
 
By specifying predicates country ='US' and year = '2023', forces the database to read only the underlying directory that corresponds to this filter.
 
Moreover, we can add new files and directories in object storage, this new data is automatically recognized by ADB, since all partition pruning is done at run-time.
 
demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2025';
 
  COUNT(*)
----------
         0
 
Once we upload a sample data for the year 2025, ADB automatically prunes partitions at runtime.
 
demo@ATP19C> select object_name, bytes
  2      from dbms_cloud.list_objects(
  3         credential_name => 'API_KEY_CRED',
  4        location_uri => :uri);
 
OBJECT_NAME                                              BYTES
--------------------------------------------------- ----------
Sales/                                                       0
Sales/Country=UK/                                            0
Sales/Country=UK/year=2024/                                  0
Sales/Country=UK/year=2024/data_deptno_10.parquet         2163
Sales/Country=UK/year=2024/data_deptno_20.parquet         2218
Sales/Country=US/                                            0
Sales/Country=US/year=2023/                                  0
Sales/Country=US/year=2023/data_deptno_30.parquet         2257
Sales/Country=US/year=2025/                                  0
Sales/Country=US/year=2025/data_deptno_40.parquet         2414
 
10 rows selected.
 
 
demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2025';
 
  COUNT(*)
----------
         8
 
This shows how ADB automatically manages partitions by enhancing query performance by skipping unnecessary data segments based on predicates, if the underlying data structure of the object store changes, then there is no need to modify / Sync anything on ADB.
 
 
 
 
 


Friday, October 18, 2024

External Tables Partitioning in Autonomous database - Part I

Partitioning is a well-established technique for improving performance and manageability of the database system by diving larger objects into smaller ones, including any large data warehouse takes advantages of it. This is also true for large objects for both inside and outside the database such as data lakes in object stores. In this blogpost we will see about how Partitioning the external tables got evolved in Autonomous database.
 
Autonomous database allows the creation of partitioned external table through DBMS_CLOUD API for quite some time, this API simplifies the definition of external table structure, but it relies on you to provide a lot of table metadata, for example we have to
 
  • Specify the list of all columns with data types.
  • Maintain table definition. If some new data – equivalent to partitions – is added or removed from object storage, then the table must be manually adjusted using ADD / DROP partitions.
  • Value of the partitioned column not in the query results
 
Lets have a look at what all of these means from a relatively small partitioned table in example
 
Got a sample of employee data organized based on department number in my object store.
 
demo-user@ATP19C> exec :uri := 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/';
 
PL/SQL procedure successfully completed.
 
demo-user@ATP19C> exec :namespace := 'idcglquusbz6';
 
PL/SQL procedure successfully completed.
 
demo-user@ATP19C> select object_name, bytes
  2  from dbms_cloud.list_objects(
  3     'my_demo_cred',
  4     :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');
 
OBJECT_NAME                              BYTES
----------------------------------- ----------
DEPTNO=10/dept10.csv                       129
DEPTNO=20/dept20.csv                       213
DEPTNO=30/dept30.csv                       258
 
Let’s create a partitioned external table for each department using the traditional DBMS_CLOUD Interface.
 
demo-user@ATP19C> begin
  2    dbms_cloud.create_external_part_table(
  3               table_name => 'emp_demo',
  4               credential_name => 'my_demo_cred',
  5               file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*',
  6               column_list => 'empno number,ename varchar2(12),sal number,hiredate date,DEPTNO number',
  7               field_list =>  'empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO',
  8               format => json_object( 'type' value 'csv' ,
  9                     'partition_columns' value json_array( json_object('name' value 'DEPTNO', 'type' value 'number') )
 10                        ) );
 11  end;
 12  /
 
PL/SQL procedure successfully completed.
 
Now using this single external table, we can query data files in the object storage and represented as multiple logical partitions.
 
demo-user@ATP19C> set autotrace traceonly exp
demo-user@ATP19C> select * from emp_demo where deptno = 10;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 636994847
 
----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |    82 |  4592 |     8   (0)| 00:00:01 |       |       |
|   1 |  PX COORDINATOR               |          |       |       |            |          |       |       |
|   2 |   PX SEND QC (RANDOM)         | :TQ10000 |    82 |  4592 |     8   (0)| 00:00:01 |       |       |
|   3 |    PX BLOCK ITERATOR          |          |    82 |  4592 |     8   (0)| 00:00:01 |     1 |     1 |
|   4 |     EXTERNAL TABLE ACCESS FULL| EMP_DEMO |    82 |  4592 |     8   (0)| 00:00:01 |     1 |     1 |
----------------------------------------------------------------------------------------------------------
 
It looks like the predicate was used by the database and pruned down to only the necessary partitions to satisfy the query.
 
Looking into the table metadata it shows the files in the object storage are mapped to individual partitions in the table.
 
demo-user@ATP19C> select dbms_metadata.get_ddl('TABLE','EMP_DEMO') from dual;
 
DBMS_METADATA.GET_DDL('TABLE','EMP_DEMO')
--------------------------------------------------------------------------------
 
  CREATE TABLE "DEMO_USER"."EMP_DEMO"
   (    "EMPNO" NUMBER,
        "ENAME" VARCHAR2(12) COLLATE "USING_NLS_COMP",
        "SAL" NUMBER,
        "HIREDATE" DATE,
        "DEPTNO" NUMBER
   )  DEFAULT COLLATION "USING_NLS_COMP"
   ORGANIZATION EXTERNAL
    ( TYPE ORACLE_LOADER
      DEFAULT DIRECTORY "DATA_PUMP_DIR"
      ACCESS PARAMETERS
      ( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
         READSIZE=10000000 CREDENTIAL 'my_demo_cred'
filename_columns="['DEPTNO']"
file_uri_list="https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*"
credential_schema="DEMO_USER"
credential_name="MY_DEMO_CRED"
    FIELDS CSV WITHOUT EMBEDDED NOTRIM ( empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO )
  )
    )
   REJECT LIMIT 0
  PARTITION BY LIST ("DEPTNO")
 (PARTITION "P1"  VALUES ((10))
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=10/*' ),
 PARTITION "P2"  VALUES ((20))
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=20/*' ),
 PARTITION "P3"  VALUES ((30))
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=30/*' ))
  PARALLEL
 
So what happens in case if we add / remove files from the object storage buckets supporting these partitions , to cope with that situation we do have to manually adjust the external table definition, but simply use another new interface in DBMS_CLOUD to do the job
 
Say we got new file for deptno = 40 in the object storage bucket
 
demo-user@ATP19C> select object_name, bytes
  2  from dbms_cloud.list_objects(
  3     'my_demo_cred',
  4     :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');
 
OBJECT_NAME                              BYTES
----------------------------------- ----------
DEPTNO=10/dept10.csv                       129
DEPTNO=20/dept20.csv                       213
DEPTNO=30/dept30.csv                       258
DEPTNO=40/dept_40.csv                       83
 
And our existing external table got reference to three partitions each got reference to indivjual files in object storage.
 
demo-user@ATP19C> select deptno,count(*)
  2  from emp_demo
  3  group by deptno;
 
    DEPTNO   COUNT(*)
---------- ----------
        30          6
        10          3
        20          5
 
Now the call to SYNC the external table will be like this
 
demo-user@ATP19C> exec dbms_cloud.sync_external_part_table(table_name =>'EMP_DEMO');
 
PL/SQL procedure successfully completed.
 
And post that the query returns for all the departments available in the object storage.
 
demo-user@ATP19C> select deptno,count(*)
  2  from emp_demo
  3  group by deptno;
 
    DEPTNO   COUNT(*)
---------- ----------
        40          2
        30          6
        10          3
        20          5
 
Looking into the table metadata it shows the new files got mapped to additional partitions in the table.
 
demo-user@ATP19C> select dbms_metadata.get_ddl('TABLE','EMP_DEMO') from dual;
 
DBMS_METADATA.GET_DDL('TABLE','EMP_DEMO')
-----------------   ---------------------------------------------------------------
 
  CREATE TABLE "DEMO_USER"."EMP_DEMO"
   (    "EMPNO" NUMBER,
              ....
              ....
        "DEPTNO" NUMBER
   )  DEFAULT COLLATION "USING_NLS_COMP"
   ORGANIZATION EXTERNAL
    ( TYPE ORACLE_LOADER
      DEFAULT DIRECTORY "DATA_PUMP_DIR"
      ACCESS PARAMETERS
      ( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
              ....
              ....  )
    )
   REJECT LIMIT 0
  PARTITION BY LIST ("DEPTNO")
 (PARTITION "P1"  VALUES (10)
              ....
              ....
 PARTITION "P4"  VALUES (40)
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=40/*'
       ))
  PARALLEL
 
 
In the next blogpost we will see about how this Sync process got improved in the recent releases.