THE RAJESH BLOG: October 2024

Thursday, October 31, 2024

Multi files loading in ORDS POST request

One of many new features introduced in ORDS 24.3 is the ability to upload multiple files as part of POST request (multipart/form-data) to an ORDS endpoint.

To get into an example, first we need to familiar ourselves with the recent enhancements to procedures and functions introduced in ORDS supplied packages.

Method to save the file being uploaded into database session for later use.

PROCEDURE add_body_file (
p_parameter_name IN varchar2,
p_file_name IN varchar2,
p_content_type IN varchar2,
p_file_blob IN blob);

method to return the number of files saved in the session.

FUNCTION body_file_count RETURN NUMBER

Method to return the file specified at the index, that was saved into database session.

PROCEDURE get_body_file (
p_file_index IN NUMBER,
p_parameter_name OUT varchar2,
p_file_name OUT varchar2,
p_content_type OUT varchar2,
p_file_blob OUT blob);

method to clear the files that were stored in the session. Files are also cleared automatically when the connection is returned to pool.

PROCEDURE clear_body_files;

Here is the Implementation for the REST module using the above methods.

demo@FREEPDB1> create table t( file_id number generated always as identity
2     , parameter_name varchar2(80)
3     , file_name varchar2(80)
4     , content_type varchar2(80)
5*    , file_blob blob );

Table T created.

demo@FREEPDB1> -- Generated by Oracle SQL Developer REST Data Services 24.3.0.284.2209
demo@FREEPDB1> -- Exported REST Definitions from ORDS Schema Version 24.3.0.r2620924
demo@FREEPDB1> -- Schema: DEMO   Date: Fri Oct 25 17:44:53 IST 2024
demo@FREEPDB1> --
2 BEGIN
3    ORDS.ENABLE_SCHEMA(
4        p_enabled             => TRUE,
5        p_schema              => 'DEMO',
6        p_url_mapping_type    => 'BASE_PATH',
7        p_url_mapping_pattern => 'demo',
8        p_auto_rest_auth      => FALSE);
9
10    ORDS.DEFINE_MODULE(
11        p_module_name    => 'body.files.demo',
12        p_base_path      => '/V1/',
13        p_items_per_page => 25,
14        p_status         => 'PUBLISHED',
15        p_comments       => NULL);
16    ORDS.DEFINE_TEMPLATE(
17        p_module_name    => 'body.files.demo',
18        p_pattern        => 'example',
19        p_priority       => 0,
20        p_etag_type      => 'HASH',
21        p_etag_query     => NULL,
22        p_comments       => NULL);
23    ORDS.DEFINE_HANDLER(
24        p_module_name    => 'body.files.demo',
25        p_pattern        => 'example',
26        p_method         => 'POST',
27        p_source_type    => 'plsql/block',
28        p_items_per_page => 0,
29        p_mimes_allowed => '',
30        p_comments       => NULL,
31        p_source         =>
32 'declare
33      l_json clob := :body_json;
34      l_parameter_name varchar2(4000) ;
35      l_file_name varchar2(4000) ;
36      l_content_type varchar2(4000) ;
37      l_file_blob blob;
38 begin
39      htp.p(''Number of files received = ''|| ords.body_file_count);
40
41      for k in 1..ords.body_file_count
42      loop
43          ords.get_body_file
44              ( p_file_index => k
45              , p_parameter_name => l_parameter_name
46              , p_file_name => l_file_name
47              , p_content_type => l_content_type
48              , p_file_blob => l_file_blob );
49
50         insert into t(
51              parameter_name
52             , file_name
53             , content_type
54             , file_blob )
55          values( l_parameter_name
56              , l_file_name
57              , l_content_type
58              , l_file_blob );
59         htp.p('' Processed file #''||k) ;
60      end loop;
61 end;'
62        );
63
64
65    COMMIT;
66 END;
67* /

PL/SQL procedure successfully completed.

demo@FREEPDB1>

In practice, here is how the things work. I’m using the Postman as a proxy for my application                 (i.e, the client)

The header part of the request looks something like this

The output from the database after the above POST call looks like this

demo@FREEPDB1> select parameter_name, dbms_lob.getlength(file_blob) from t;

PARAMETER_NAME       DBMS_LOB.GETLENGTH(FILE_BLOB)
_________________ ________________________________
file_one                                        70
file_two                                      1604

when posting multipart/form-data request, we must bind :BODY_JSON implicit parameter to something, even if that something will be an empty property.

Sunday, October 27, 2024

External Tables Partitioning in Autonomous database - Part II

This will be the continuation of the previous blog post.

To efficiently scan data in object store, Autonomous database (ADB) provides various optimizations for all common data types, table formats and directory structures in object storage. With the recently introduced new feature (called Implicit partitioning) for external tables, this enables query performance by enable pruning of data based on logical conditions. This new feature also make it much easier to create and manage these partitioned tables, by simply pointing the root directory for your tables (data) files. That is, we can easily create external tables across thousands of partitions and millions of files within a single command.

External tables with Implicit partitioning automatically discover the underlying partition structure based on the hierarchical structure of files and add these additional partition keys to external table structures.

Here is the structure of our object storage.

demo@ATP19C> select object_name, bytes
2      from dbms_cloud.list_objects(
3         credential_name => 'API_KEY_CRED',
4        location_uri => :uri);

OBJECT_NAME                                              BYTES
--------------------------------------------------- ----------
Sales/                                                      0
Sales/Country=UK/                                            0
Sales/Country=UK/year=2024/                                  0
Sales/Country=UK/year=2024/data_deptno_10.parquet         2163
Sales/Country=UK/year=2024/data_deptno_20.parquet         2218
Sales/Country=US/                                            0
Sales/Country=US/year=2023/                                  0
Sales/Country=US/year=2023/data_deptno_30.parquet         2257

8 rows selected.

To emphasize this external table as Automatic Implicit partitioning, we need to create it as using DBMS_CLOUD API

demo@ATP19C> begin
2     dbms_cloud.create_external_table(
3             table_name => 'xt_implicit_hive_demo'
4             , credential_name => 'API_KEY_CRED'
5             , file_uri_list => :uri
6             , format => '{"schema":"first","type":"parquet","implicit_partition_type":"hive"}');
7 end;
8 /

PL/SQL procedure successfully completed.

Partition columns are automatically detected, and implicit partitioning is enabled by setting the format option IMPLICIT_PARTITION_TYPE to HIVE.

To detect the partition columns ADB starts searching from the beginning of the URL path, specified by FILE_URI_LIST for "=" , when found the left part of = is treated as the column name and the right part of = as value, then the search continues for next equal sign "=" until after the first partition key column and so on.

Query using implicit partitioning.

demo@ATP19C> select * from xt_implicit_hive_demo;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO COUNTRY YEAR

_____ ______ ____________ _____ ______________ _______ _______ _______ _______ _____

7839 KING PRESIDENT 17-NOV-1981 5000 10 UK 2024

7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 UK 2024

7934 MILLER CLERK 7782 23-JAN-1982 1300 10 UK 2024

7566 JONES MANAGER 7839 02-APR-1981 2975 20 UK 2024

7788 SCOTT ANALYST 7566 19-APR-1987 3000 20 UK 2024

7902 FORD ANALYST 7566 03-DEC-1981 3000 20 UK 2024

7369 SMITH CLERK 7902 17-DEC-1980 800 20 UK 2024

7876 ADAMS CLERK 7788 23-MAY-1987 1100 20 UK 2024

7698 BLAKE MANAGER 7839 01-MAY-1981 2850 30 US 2023

7499 ALLEN SALESMAN 7698 20-FEB-1981 1600 300 30 US 2023

7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 US 2023

7654 MARTIN SALESMAN 7698 28-SEP-1981 1250 1400 30 US 2023

7844 TURNER SALESMAN 7698 08-SEP-1981 1500 0 30 US 2023

7900 JAMES CLERK 7698 03-DEC-1981 950 30 US 2023

demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2023';

COUNT(*)
----------
         6

By specifying predicates country ='US' and year = '2023', forces the database to read only the underlying directory that corresponds to this filter.

Moreover, we can add new files and directories in object storage, this new data is automatically recognized by ADB, since all partition pruning is done at run-time.

demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2025';

COUNT(*)
----------
         0

Once we upload a sample data for the year 2025, ADB automatically prunes partitions at runtime.

demo@ATP19C> select object_name, bytes
2      from dbms_cloud.list_objects(
3         credential_name => 'API_KEY_CRED',
4        location_uri => :uri);

OBJECT_NAME                                              BYTES
--------------------------------------------------- ----------
Sales/                                                       0
Sales/Country=UK/                                            0
Sales/Country=UK/year=2024/                                  0
Sales/Country=UK/year=2024/data_deptno_10.parquet         2163
Sales/Country=UK/year=2024/data_deptno_20.parquet         2218
Sales/Country=US/                                            0
Sales/Country=US/year=2023/                                  0
Sales/Country=US/year=2023/data_deptno_30.parquet         2257
Sales/Country=US/year=2025/                                  0
Sales/Country=US/year=2025/data_deptno_40.parquet         2414

10 rows selected.

demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2025';

COUNT(*)
----------
         8

This shows how ADB automatically manages partitions by enhancing query performance by skipping unnecessary data segments based on predicates, if the underlying data structure of the object store changes, then there is no need to modify / Sync anything on ADB.

Friday, October 18, 2024

External Tables Partitioning in Autonomous database - Part I

Partitioning is a well-established technique for improving performance and manageability of the database system by diving larger objects into smaller ones, including any large data warehouse takes advantages of it. This is also true for large objects for both inside and outside the database such as data lakes in object stores. In this blogpost we will see about how Partitioning the external tables got evolved in Autonomous database.

Autonomous database allows the creation of partitioned external table through DBMS_CLOUD API for quite some time, this API simplifies the definition of external table structure, but it relies on you to provide a lot of table metadata, for example we have to

Specify the list of all columns with data types.
Maintain table definition. If some new data – equivalent to partitions – is added or removed from object storage, then the table must be manually adjusted using ADD / DROP partitions.
Value of the partitioned column not in the query results

Lets have a look at what all of these means from a relatively small partitioned table in example

Got a sample of employee data organized based on department number in my object store.

demo-user@ATP19C> exec :uri := 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/';

PL/SQL procedure successfully completed.

demo-user@ATP19C> exec :namespace := 'idcglquusbz6';

PL/SQL procedure successfully completed.

demo-user@ATP19C> select object_name, bytes
2 from dbms_cloud.list_objects(
3     'my_demo_cred',
4     :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');

OBJECT_NAME                              BYTES
----------------------------------- ----------
DEPTNO=10/dept10.csv                       129
DEPTNO=20/dept20.csv                       213
DEPTNO=30/dept30.csv                       258

Let’s create a partitioned external table for each department using the traditional DBMS_CLOUD Interface.

demo-user@ATP19C> begin
2    dbms_cloud.create_external_part_table(
3               table_name => 'emp_demo',
4               credential_name => 'my_demo_cred',
5               file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*',
6               column_list => 'empno number,ename varchar2(12),sal number,hiredate date,DEPTNO number',
7               field_list => 'empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO',
8               format => json_object( 'type' value 'csv' ,
9                     'partition_columns' value json_array( json_object('name' value 'DEPTNO', 'type' value 'number') )
10                        ) );
11 end;
12 /

PL/SQL procedure successfully completed.

Now using this single external table, we can query data files in the object storage and represented as multiple logical partitions.

demo-user@ATP19C> set autotrace traceonly exp
demo-user@ATP19C> select * from emp_demo where deptno = 10;

Execution Plan
----------------------------------------------------------
Plan hash value: 636994847

----------------------------------------------------------------------------------------------------------
| Id | Operation                     | Name     | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |    82 | 4592 |     8   (0)| 00:00:01 |       |      |
|   1 | PX COORDINATOR               |          |       |       |            |          |       |       |
|   2 |   PX SEND QC (RANDOM)         | :TQ10000 |    82 | 4592 |     8   (0)| 00:00:01 |       |       |
|   3 |    PX BLOCK ITERATOR          |          |    82 | 4592 |     8   (0)| 00:00:01 |    1 |     1 |
|   4 |     EXTERNAL TABLE ACCESS FULL| EMP_DEMO |    82 | 4592 |     8   (0)| 00:00:01 |     1 |     1 |
----------------------------------------------------------------------------------------------------------

It looks like the predicate was used by the database and pruned down to only the necessary partitions to satisfy the query.

Looking into the table metadata it shows the files in the object storage are mapped to individual partitions in the table.

demo-user@ATP19C> select dbms_metadata.get_ddl('TABLE','EMP_DEMO') from dual;

DBMS_METADATA.GET_DDL('TABLE','EMP_DEMO')
--------------------------------------------------------------------------------

CREATE TABLE "DEMO_USER"."EMP_DEMO"
   (    "EMPNO" NUMBER,
        "ENAME" VARCHAR2(12) COLLATE "USING_NLS_COMP",
        "SAL" NUMBER,
        "HIREDATE" DATE,
        "DEPTNO" NUMBER
   ) DEFAULT COLLATION "USING_NLS_COMP"
   ORGANIZATION EXTERNAL
  ( TYPE ORACLE_LOADER
      DEFAULT DIRECTORY "DATA_PUMP_DIR"
      ACCESS PARAMETERS
      ( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
       READSIZE=10000000 CREDENTIAL 'my_demo_cred'
filename_columns="['DEPTNO']"
file_uri_list="https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*"
credential_schema="DEMO_USER"
credential_name="MY_DEMO_CRED"
    FIELDS CSV WITHOUT EMBEDDED NOTRIM ( empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO )
)
    )
   REJECT LIMIT 0
PARTITION BY LIST ("DEPTNO")
(PARTITION "P1" VALUES ((10))
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=10/*' ),
PARTITION "P2" VALUES ((20))
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=20/*' ),
PARTITION "P3" VALUES ((30))
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=30/*' ))
PARALLEL

So what happens in case if we add / remove files from the object storage buckets supporting these partitions , to cope with that situation we do have to manually adjust the external table definition, but simply use another new interface in DBMS_CLOUD to do the job

Say we got new file for deptno = 40 in the object storage bucket

demo-user@ATP19C> select object_name, bytes
2 from dbms_cloud.list_objects(
3     'my_demo_cred',
4     :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');

OBJECT_NAME                              BYTES
----------------------------------- ----------
DEPTNO=10/dept10.csv                       129
DEPTNO=20/dept20.csv                       213
DEPTNO=30/dept30.csv                       258
DEPTNO=40/dept_40.csv                       83

And our existing external table got reference to three partitions each got reference to indivjual files in object storage.

demo-user@ATP19C> select deptno,count(*)
2 from emp_demo
3 group by deptno;

    DEPTNO   COUNT(*)
---------- ----------
        30          6
        10          3
        20          5

Now the call to SYNC the external table will be like this

demo-user@ATP19C> exec dbms_cloud.sync_external_part_table(table_name =>'EMP_DEMO');

PL/SQL procedure successfully completed.

And post that the query returns for all the departments available in the object storage.

demo-user@ATP19C> select deptno,count(*)
2 from emp_demo
3 group by deptno;

    DEPTNO   COUNT(*)
---------- ----------
        40          2
       30          6
        10          3
        20          5

Looking into the table metadata it shows the new files got mapped to additional partitions in the table.

demo-user@ATP19C> select dbms_metadata.get_ddl('TABLE','EMP_DEMO') from dual;

DBMS_METADATA.GET_DDL('TABLE','EMP_DEMO')
-----------------   ---------------------------------------------------------------

CREATE TABLE "DEMO_USER"."EMP_DEMO"
   (    "EMPNO" NUMBER,
              ....
              ....
        "DEPTNO" NUMBER
   ) DEFAULT COLLATION "USING_NLS_COMP"
   ORGANIZATION EXTERNAL
    ( TYPE ORACLE_LOADER
      DEFAULT DIRECTORY "DATA_PUMP_DIR"
      ACCESS PARAMETERS
      ( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
              ....
              .... )
  )
   REJECT LIMIT 0
PARTITION BY LIST ("DEPTNO")
(PARTITION "P1" VALUES (10)
              ....
              ....
PARTITION "P4" VALUES (40)
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=40/*'
       ))
PARALLEL

In the next blogpost we will see about how this Sync process got improved in the recent releases.

Books I follow...

Advanced SQL Functions in Oracle 10g - by Richard Walsh Earp
Cost-Based Oracle Fundamentals - By Jonathan Lewis
Effective Oracle by Design - by Thomas kyte
Expert One on One Oracle - by Thomas kyte
Expert Oracle - by Thomas kyte
Expert Oracle Database Architecture Oracle Database 9i, 10g - Thomas Kyte
Expert Oracle Database Architecture Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions - Thomas Kyte
Expert Oracle Practices Oracle Database Administration from the Oak Table
High-Performance Oracle Database Applications - by Donald K. Burleson
Mastering Oracle PL/SQL: Practical Solutions - by Connor McDonald
Mastering Oracle Scheduler in Oracle 11g Databases - by Ronald Rood
Mastering Oracle SQL and SQL*Plus - By LEX DE HAAN
Optimizing Oracle Performance - By Cary Millsap
Oracle Database 10g Performance Tuning Tips & Techniques - By Richard J. Niemiec
Oracle Database 10g XML & SQL: Design, Build & Manage XML Applications in Java, C, C++ & PL/SQL - by Mark V. Scardina, Ben Chang and Jinyu Wang
Oracle Database 10g: The Complete Reference - by Kevin Loney
Oracle Database 11g DBA Handbook - by Bob Bryla , Kevin Loney
Oracle Database 11g New Features - Maximize the new capabilities of the Latest Database Release - by Robert G. Freeman
Oracle Performance Tuning for 10gR2 - By Gavin Powell
Oracle PL/SQL for DBAs By Steven Feuerstein, Arup Nanda
Oracle SQL*Plus: The Definitive Guide - By Jonathan Gennick
Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning by Richmond Shee, Kirtikumar Deshpande and K Gopalakrishnan
Oracle8i Internal Services for Waits, Latches, Locks, and Memory - by Steve Adams
Professional Oracle 8i Application Programming - By Thomas Kyte
Secrets of the Oracle Database - By Norbert Debes
Troubleshooting Oracle Performance - By Christian Antognini
Tuning Third-party Vendor Oracle systems Tuning when you can't touch the code - By Mike Ault
Using Oracle SQL Stored Outlines & Optimizer Plan Stability - By Mike Ault