Friday, October 18, 2024

External Tables Partitioning in Autonomous database - Part I

Partitioning is a well-established technique for improving performance and manageability of the database system by diving larger objects into smaller ones, including any large data warehouse takes advantages of it. This is also true for large objects for both inside and outside the database such as data lakes in object stores. In this blogpost we will see about how Partitioning the external tables got evolved in Autonomous database.
 
Autonomous database allows the creation of partitioned external table through DBMS_CLOUD API for quite some time, this API simplifies the definition of external table structure, but it relies on you to provide a lot of table metadata, for example we have to
 
  • Specify the list of all columns with data types.
  • Maintain table definition. If some new data – equivalent to partitions – is added or removed from object storage, then the table must be manually adjusted using ADD / DROP partitions.
  • Value of the partitioned column not in the query results
 
Lets have a look at what all of these means from a relatively small partitioned table in example
 
Got a sample of employee data organized based on department number in my object store.
 
demo-user@ATP19C> exec :uri := 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/';
 
PL/SQL procedure successfully completed.
 
demo-user@ATP19C> exec :namespace := 'idcglquusbz6';
 
PL/SQL procedure successfully completed.
 
demo-user@ATP19C> select object_name, bytes
  2  from dbms_cloud.list_objects(
  3     'my_demo_cred',
  4     :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');
 
OBJECT_NAME                              BYTES
----------------------------------- ----------
DEPTNO=10/dept10.csv                       129
DEPTNO=20/dept20.csv                       213
DEPTNO=30/dept30.csv                       258
 
Let’s create a partitioned external table for each department using the traditional DBMS_CLOUD Interface.
 
demo-user@ATP19C> begin
  2    dbms_cloud.create_external_part_table(
  3               table_name => 'emp_demo',
  4               credential_name => 'my_demo_cred',
  5               file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*',
  6               column_list => 'empno number,ename varchar2(12),sal number,hiredate date,DEPTNO number',
  7               field_list =>  'empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO',
  8               format => json_object( 'type' value 'csv' ,
  9                     'partition_columns' value json_array( json_object('name' value 'DEPTNO', 'type' value 'number') )
 10                        ) );
 11  end;
 12  /
 
PL/SQL procedure successfully completed.
 
Now using this single external table, we can query data files in the object storage and represented as multiple logical partitions.
 
demo-user@ATP19C> set autotrace traceonly exp
demo-user@ATP19C> select * from emp_demo where deptno = 10;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 636994847
 
----------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |    82 |  4592 |     8   (0)| 00:00:01 |       |       |
|   1 |  PX COORDINATOR               |          |       |       |            |          |       |       |
|   2 |   PX SEND QC (RANDOM)         | :TQ10000 |    82 |  4592 |     8   (0)| 00:00:01 |       |       |
|   3 |    PX BLOCK ITERATOR          |          |    82 |  4592 |     8   (0)| 00:00:01 |     1 |     1 |
|   4 |     EXTERNAL TABLE ACCESS FULL| EMP_DEMO |    82 |  4592 |     8   (0)| 00:00:01 |     1 |     1 |
----------------------------------------------------------------------------------------------------------
 
It looks like the predicate was used by the database and pruned down to only the necessary partitions to satisfy the query.
 
Looking into the table metadata it shows the files in the object storage are mapped to individual partitions in the table.
 
demo-user@ATP19C> select dbms_metadata.get_ddl('TABLE','EMP_DEMO') from dual;
 
DBMS_METADATA.GET_DDL('TABLE','EMP_DEMO')
--------------------------------------------------------------------------------
 
  CREATE TABLE "DEMO_USER"."EMP_DEMO"
   (    "EMPNO" NUMBER,
        "ENAME" VARCHAR2(12) COLLATE "USING_NLS_COMP",
        "SAL" NUMBER,
        "HIREDATE" DATE,
        "DEPTNO" NUMBER
   )  DEFAULT COLLATION "USING_NLS_COMP"
   ORGANIZATION EXTERNAL
    ( TYPE ORACLE_LOADER
      DEFAULT DIRECTORY "DATA_PUMP_DIR"
      ACCESS PARAMETERS
      ( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
         READSIZE=10000000 CREDENTIAL 'my_demo_cred'
filename_columns="['DEPTNO']"
file_uri_list="https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*"
credential_schema="DEMO_USER"
credential_name="MY_DEMO_CRED"
    FIELDS CSV WITHOUT EMBEDDED NOTRIM ( empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO )
  )
    )
   REJECT LIMIT 0
  PARTITION BY LIST ("DEPTNO")
 (PARTITION "P1"  VALUES ((10))
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=10/*' ),
 PARTITION "P2"  VALUES ((20))
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=20/*' ),
 PARTITION "P3"  VALUES ((30))
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=30/*' ))
  PARALLEL
 
So what happens in case if we add / remove files from the object storage buckets supporting these partitions , to cope with that situation we do have to manually adjust the external table definition, but simply use another new interface in DBMS_CLOUD to do the job
 
Say we got new file for deptno = 40 in the object storage bucket
 
demo-user@ATP19C> select object_name, bytes
  2  from dbms_cloud.list_objects(
  3     'my_demo_cred',
  4     :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');
 
OBJECT_NAME                              BYTES
----------------------------------- ----------
DEPTNO=10/dept10.csv                       129
DEPTNO=20/dept20.csv                       213
DEPTNO=30/dept30.csv                       258
DEPTNO=40/dept_40.csv                       83
 
And our existing external table got reference to three partitions each got reference to indivjual files in object storage.
 
demo-user@ATP19C> select deptno,count(*)
  2  from emp_demo
  3  group by deptno;
 
    DEPTNO   COUNT(*)
---------- ----------
        30          6
        10          3
        20          5
 
Now the call to SYNC the external table will be like this
 
demo-user@ATP19C> exec dbms_cloud.sync_external_part_table(table_name =>'EMP_DEMO');
 
PL/SQL procedure successfully completed.
 
And post that the query returns for all the departments available in the object storage.
 
demo-user@ATP19C> select deptno,count(*)
  2  from emp_demo
  3  group by deptno;
 
    DEPTNO   COUNT(*)
---------- ----------
        40          2
        30          6
        10          3
        20          5
 
Looking into the table metadata it shows the new files got mapped to additional partitions in the table.
 
demo-user@ATP19C> select dbms_metadata.get_ddl('TABLE','EMP_DEMO') from dual;
 
DBMS_METADATA.GET_DDL('TABLE','EMP_DEMO')
-----------------   ---------------------------------------------------------------
 
  CREATE TABLE "DEMO_USER"."EMP_DEMO"
   (    "EMPNO" NUMBER,
              ....
              ....
        "DEPTNO" NUMBER
   )  DEFAULT COLLATION "USING_NLS_COMP"
   ORGANIZATION EXTERNAL
    ( TYPE ORACLE_LOADER
      DEFAULT DIRECTORY "DATA_PUMP_DIR"
      ACCESS PARAMETERS
      ( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
              ....
              ....  )
    )
   REJECT LIMIT 0
  PARTITION BY LIST ("DEPTNO")
 (PARTITION "P1"  VALUES (10)
              ....
              ....
 PARTITION "P4"  VALUES (40)
      LOCATION
       ( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=40/*'
       ))
  PARALLEL
 
 
In the next blogpost we will see about how this Sync process got improved in the recent releases.

No comments:

Post a Comment