THE RAJESH BLOG: External Tables Partitioning in Autonomous database

This will be the continuation of the previous blog post.

To efficiently scan data in object store, Autonomous database (ADB) provides various optimizations for all common data types, table formats and directory structures in object storage. With the recently introduced new feature (called Implicit partitioning) for external tables, this enables query performance by enable pruning of data based on logical conditions. This new feature also make it much easier to create and manage these partitioned tables, by simply pointing the root directory for your tables (data) files. That is, we can easily create external tables across thousands of partitions and millions of files within a single command.

External tables with Implicit partitioning automatically discover the underlying partition structure based on the hierarchical structure of files and add these additional partition keys to external table structures.

Here is the structure of our object storage.

demo@ATP19C> select object_name, bytes
2      from dbms_cloud.list_objects(
3         credential_name => 'API_KEY_CRED',
4        location_uri => :uri);

OBJECT_NAME                                              BYTES
--------------------------------------------------- ----------
Sales/                                                      0
Sales/Country=UK/                                            0
Sales/Country=UK/year=2024/                                  0
Sales/Country=UK/year=2024/data_deptno_10.parquet         2163
Sales/Country=UK/year=2024/data_deptno_20.parquet         2218
Sales/Country=US/                                            0
Sales/Country=US/year=2023/                                  0
Sales/Country=US/year=2023/data_deptno_30.parquet         2257

8 rows selected.

To emphasize this external table as Automatic Implicit partitioning, we need to create it as using DBMS_CLOUD API

demo@ATP19C> begin
2     dbms_cloud.create_external_table(
3             table_name => 'xt_implicit_hive_demo'
4             , credential_name => 'API_KEY_CRED'
5             , file_uri_list => :uri
6             , format => '{"schema":"first","type":"parquet","implicit_partition_type":"hive"}');
7 end;
8 /

PL/SQL procedure successfully completed.

Partition columns are automatically detected, and implicit partitioning is enabled by setting the format option IMPLICIT_PARTITION_TYPE to HIVE.

To detect the partition columns ADB starts searching from the beginning of the URL path, specified by FILE_URI_LIST for "=" , when found the left part of = is treated as the column name and the right part of = as value, then the search continues for next equal sign "=" until after the first partition key column and so on.

Query using implicit partitioning.

demo@ATP19C> select * from xt_implicit_hive_demo;

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO COUNTRY YEAR

_____ ______ ____________ _____ ______________ _______ _______ _______ _______ _____

7839 KING PRESIDENT 17-NOV-1981 5000 10 UK 2024

7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 UK 2024

7934 MILLER CLERK 7782 23-JAN-1982 1300 10 UK 2024

7566 JONES MANAGER 7839 02-APR-1981 2975 20 UK 2024

7788 SCOTT ANALYST 7566 19-APR-1987 3000 20 UK 2024

7902 FORD ANALYST 7566 03-DEC-1981 3000 20 UK 2024

7369 SMITH CLERK 7902 17-DEC-1980 800 20 UK 2024

7876 ADAMS CLERK 7788 23-MAY-1987 1100 20 UK 2024

7698 BLAKE MANAGER 7839 01-MAY-1981 2850 30 US 2023

7499 ALLEN SALESMAN 7698 20-FEB-1981 1600 300 30 US 2023

7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 US 2023

7654 MARTIN SALESMAN 7698 28-SEP-1981 1250 1400 30 US 2023

7844 TURNER SALESMAN 7698 08-SEP-1981 1500 0 30 US 2023

7900 JAMES CLERK 7698 03-DEC-1981 950 30 US 2023

demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2023';

COUNT(*)
----------
         6

By specifying predicates country ='US' and year = '2023', forces the database to read only the underlying directory that corresponds to this filter.

Moreover, we can add new files and directories in object storage, this new data is automatically recognized by ADB, since all partition pruning is done at run-time.

demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2025';

COUNT(*)
----------
         0

Once we upload a sample data for the year 2025, ADB automatically prunes partitions at runtime.

demo@ATP19C> select object_name, bytes
2      from dbms_cloud.list_objects(
3         credential_name => 'API_KEY_CRED',
4        location_uri => :uri);

OBJECT_NAME                                              BYTES
--------------------------------------------------- ----------
Sales/                                                       0
Sales/Country=UK/                                            0
Sales/Country=UK/year=2024/                                  0
Sales/Country=UK/year=2024/data_deptno_10.parquet         2163
Sales/Country=UK/year=2024/data_deptno_20.parquet         2218
Sales/Country=US/                                            0
Sales/Country=US/year=2023/                                  0
Sales/Country=US/year=2023/data_deptno_30.parquet         2257
Sales/Country=US/year=2025/                                  0
Sales/Country=US/year=2025/data_deptno_40.parquet         2414

10 rows selected.

demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2025';

COUNT(*)
----------
         8

This shows how ADB automatically manages partitions by enhancing query performance by skipping unnecessary data segments based on predicates, if the underlying data structure of the object store changes, then there is no need to modify / Sync anything on ADB.

Sunday, October 27, 2024

External Tables Partitioning in Autonomous database - Part II

No comments:

Post a Comment