Sunday, October 27, 2024

External Tables Partitioning in Autonomous database - Part II

This will be the continuation of the previous blog post.
 
To efficiently scan data in object store, Autonomous database (ADB) provides various optimizations for all common data types, table formats and directory structures in object storage. With the recently introduced new feature (called Implicit partitioning) for external tables, this enables query performance by enable pruning of data based on logical conditions. This new feature also make it much easier to create and manage these partitioned tables, by simply pointing the root directory for your tables (data) files. That is, we can easily create external tables across thousands of partitions and millions of files within a single command.
 
External tables with Implicit partitioning automatically discover the underlying partition structure based on the hierarchical structure of files and add these additional partition keys to external table structures.
 
Here is the structure of our object storage.
 
 
demo@ATP19C> select object_name, bytes
  2      from dbms_cloud.list_objects(
  3         credential_name => 'API_KEY_CRED',
  4        location_uri => :uri);
 
OBJECT_NAME                                              BYTES
--------------------------------------------------- ----------
Sales/                                                       0
Sales/Country=UK/                                            0
Sales/Country=UK/year=2024/                                  0
Sales/Country=UK/year=2024/data_deptno_10.parquet         2163
Sales/Country=UK/year=2024/data_deptno_20.parquet         2218
Sales/Country=US/                                            0
Sales/Country=US/year=2023/                                  0
Sales/Country=US/year=2023/data_deptno_30.parquet         2257
 
8 rows selected.
 
To emphasize this external table as Automatic Implicit partitioning, we need to create it as using DBMS_CLOUD API
 
demo@ATP19C> begin
  2     dbms_cloud.create_external_table(
  3             table_name => 'xt_implicit_hive_demo'
  4             , credential_name => 'API_KEY_CRED'
  5             , file_uri_list => :uri
  6             , format => '{"schema":"first","type":"parquet","implicit_partition_type":"hive"}');
  7  end;
  8  /
 
PL/SQL procedure successfully completed.
 
 
Partition columns are automatically detected, and implicit partitioning is enabled by setting the format option IMPLICIT_PARTITION_TYPE to HIVE.
 
To detect the partition columns ADB starts searching from the beginning of the URL path, specified by FILE_URI_LIST for "=" , when found the left part of = is treated as the column name and the right part of = as value, then the search continues for next equal sign "=" until after the first partition key column and so on.
 
Query using implicit partitioning.
 
demo@ATP19C> select * from xt_implicit_hive_demo;
 
EMPNO ENAME  JOB            MGR HIREDATE           SAL    COMM  DEPTNO COUNTRY YEAR
_____ ______ ____________ _____ ______________ _______ _______ _______ _______ _____
 7839 KING   PRESIDENT          17-NOV-1981       5000              10 UK      2024
 7782 CLARK  MANAGER       7839 09-JUN-1981       2450              10 UK      2024
 7934 MILLER CLERK         7782 23-JAN-1982       1300              10 UK      2024
 7566 JONES  MANAGER       7839 02-APR-1981       2975              20 UK      2024
 7788 SCOTT  ANALYST       7566 19-APR-1987       3000              20 UK      2024
 7902 FORD   ANALYST       7566 03-DEC-1981       3000              20 UK      2024
 7369 SMITH  CLERK         7902 17-DEC-1980        800              20 UK      2024
 7876 ADAMS  CLERK         7788 23-MAY-1987       1100              20 UK      2024
 7698 BLAKE  MANAGER       7839 01-MAY-1981       2850              30 US      2023
 7499 ALLEN  SALESMAN      7698 20-FEB-1981       1600     300      30 US      2023
 7521 WARD   SALESMAN      7698 22-FEB-1981       1250     500      30 US      2023
 7654 MARTIN SALESMAN      7698 28-SEP-1981       1250    1400      30 US      2023
 7844 TURNER SALESMAN      7698 08-SEP-1981       1500       0      30 US      2023
 7900 JAMES  CLERK         7698 03-DEC-1981        950              30 US      2023
 
demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2023';
 
  COUNT(*)
----------
         6
 
By specifying predicates country ='US' and year = '2023', forces the database to read only the underlying directory that corresponds to this filter.
 
Moreover, we can add new files and directories in object storage, this new data is automatically recognized by ADB, since all partition pruning is done at run-time.
 
demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2025';
 
  COUNT(*)
----------
         0
 
Once we upload a sample data for the year 2025, ADB automatically prunes partitions at runtime.
 
demo@ATP19C> select object_name, bytes
  2      from dbms_cloud.list_objects(
  3         credential_name => 'API_KEY_CRED',
  4        location_uri => :uri);
 
OBJECT_NAME                                              BYTES
--------------------------------------------------- ----------
Sales/                                                       0
Sales/Country=UK/                                            0
Sales/Country=UK/year=2024/                                  0
Sales/Country=UK/year=2024/data_deptno_10.parquet         2163
Sales/Country=UK/year=2024/data_deptno_20.parquet         2218
Sales/Country=US/                                            0
Sales/Country=US/year=2023/                                  0
Sales/Country=US/year=2023/data_deptno_30.parquet         2257
Sales/Country=US/year=2025/                                  0
Sales/Country=US/year=2025/data_deptno_40.parquet         2414
 
10 rows selected.
 
 
demo@ATP19C> select count(*) from xt_implicit_hive_demo where country ='US' and year = '2025';
 
  COUNT(*)
----------
         8
 
This shows how ADB automatically manages partitions by enhancing query performance by skipping unnecessary data segments based on predicates, if the underlying data structure of the object store changes, then there is no need to modify / Sync anything on ADB.
 
 
 
 
 


No comments:

Post a Comment