Partitioning is a well-established
technique for improving performance and manageability of the database system by
diving larger objects into smaller ones, including any large data warehouse
takes advantages of it. This is also true for large objects for both inside and
outside the database such as data lakes in object stores. In this blogpost we
will see about how Partitioning the external tables got evolved in Autonomous
database.
Autonomous database allows the creation
of partitioned external table through DBMS_CLOUD API for quite some time, this
API simplifies the definition of external table structure, but it relies on you
to provide a lot of table metadata, for example we have to
Lets have a look at what all of these
means from a relatively small partitioned table in example
Got a sample of employee data organized
based on department number in my object store.
demo-user@ATP19C> exec
:uri := 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/';
PL/SQL procedure successfully
completed.
demo-user@ATP19C> exec
:namespace := 'idcglquusbz6';
PL/SQL procedure successfully
completed.
demo-user@ATP19C> select
object_name, bytes
2 from dbms_cloud.list_objects(
3 'my_demo_cred',
4 :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');
OBJECT_NAME BYTES
----------------------------------- ----------
DEPTNO=10/dept10.csv 129
DEPTNO=20/dept20.csv 213
DEPTNO=30/dept30.csv 258
Let’s create a partitioned external table for each
department using the traditional DBMS_CLOUD Interface.
demo-user@ATP19C> begin
2 dbms_cloud.create_external_part_table(
3 table_name => 'emp_demo',
4 credential_name => 'my_demo_cred',
5 file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*',
6 column_list => 'empno number,ename varchar2(12),sal number,hiredate date,DEPTNO number',
7 field_list => 'empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO',
8 format => json_object( 'type' value 'csv' ,
9 'partition_columns' value json_array( json_object('name' value 'DEPTNO', 'type' value 'number') )
10 ) );
11 end;
12 /
PL/SQL procedure successfully
completed.
Now using this single external table, we can query
data files in the object storage and represented as multiple logical partitions.
demo-user@ATP19C> set
autotrace traceonly exp
demo-user@ATP19C> select * from emp_demo where deptno = 10;
Execution Plan
----------------------------------------------------------
Plan hash value: 636994847
----------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 82 | 4592 | 8 (0)| 00:00:01 | | |
| 1 | PX COORDINATOR | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 82 | 4592 | 8 (0)| 00:00:01 | | |
| 3 | PX BLOCK ITERATOR | | 82 | 4592 | 8 (0)| 00:00:01 | 1 | 1 |
| 4 | EXTERNAL TABLE ACCESS FULL| EMP_DEMO | 82 | 4592 | 8 (0)| 00:00:01 | 1 | 1 |
----------------------------------------------------------------------------------------------------------
It looks like the predicate was used by the database
and pruned down to only the necessary partitions to satisfy the query.
Looking into the table metadata it shows the files in
the object storage are mapped to individual partitions in the table.
demo-user@ATP19C> select
dbms_metadata.get_ddl('TABLE','EMP_DEMO') from dual;
DBMS_METADATA.GET_DDL('TABLE','EMP_DEMO')
--------------------------------------------------------------------------------
CREATE TABLE
"DEMO_USER"."EMP_DEMO"
( "EMPNO" NUMBER,
"ENAME" VARCHAR2(12) COLLATE "USING_NLS_COMP",
"SAL" NUMBER,
"HIREDATE" DATE,
"DEPTNO" NUMBER
) DEFAULT COLLATION "USING_NLS_COMP"
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "DATA_PUMP_DIR"
ACCESS PARAMETERS
( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
READSIZE=10000000 CREDENTIAL 'my_demo_cred'
filename_columns="['DEPTNO']"
file_uri_list="https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*"
credential_schema="DEMO_USER"
credential_name="MY_DEMO_CRED"
FIELDS CSV WITHOUT EMBEDDED NOTRIM ( empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO )
)
)
REJECT LIMIT 0
PARTITION BY LIST ("DEPTNO")
(PARTITION "P1" VALUES ((10))
LOCATION
( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=10/*' ),
PARTITION "P2" VALUES ((20))
LOCATION
( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=20/*' ),
PARTITION "P3" VALUES ((30))
LOCATION
( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=30/*' ))
PARALLEL
So what happens in case if we add / remove files from
the object storage buckets supporting these partitions , to cope with that
situation we do have to manually adjust the external table definition, but
simply use another new interface in DBMS_CLOUD to do the job
Say we got new file for deptno = 40 in the object
storage bucket
demo-user@ATP19C> select
object_name, bytes
2 from dbms_cloud.list_objects(
3 'my_demo_cred',
4 :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');
OBJECT_NAME BYTES
----------------------------------- ----------
DEPTNO=10/dept10.csv 129
DEPTNO=20/dept20.csv 213
DEPTNO=30/dept30.csv 258
DEPTNO=40/dept_40.csv 83
And our existing external table got reference to three
partitions each got reference to indivjual files in object storage.
demo-user@ATP19C> select
deptno,count(*)
2 from emp_demo
3 group by deptno;
DEPTNO
COUNT(*)
---------- ----------
30 6
10 3
20 5
Now the call to SYNC the external table will be like
this
demo-user@ATP19C> exec
dbms_cloud.sync_external_part_table(table_name =>'EMP_DEMO');
PL/SQL procedure successfully
completed.
And post that the query returns for all the
departments available in the object storage.
demo-user@ATP19C> select
deptno,count(*)
2 from emp_demo
3 group by deptno;
DEPTNO
COUNT(*)
---------- ----------
40 2
30 6
10 3
20 5
Looking into the table metadata it shows the new files
got mapped to additional partitions in the table.
demo-user@ATP19C> select
dbms_metadata.get_ddl('TABLE','EMP_DEMO') from dual;
DBMS_METADATA.GET_DDL('TABLE','EMP_DEMO')
----------------- ---------------------------------------------------------------
CREATE TABLE
"DEMO_USER"."EMP_DEMO"
( "EMPNO" NUMBER,
....
....
"DEPTNO" NUMBER
) DEFAULT COLLATION "USING_NLS_COMP"
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "DATA_PUMP_DIR"
ACCESS PARAMETERS
( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
....
.... )
)
REJECT LIMIT 0
PARTITION BY LIST ("DEPTNO")
(PARTITION "P1" VALUES (10)
....
....
PARTITION "P4" VALUES (40)
LOCATION
( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=40/*'
))
PARALLEL
In the next blogpost we will see about how this Sync
process got improved in the recent releases.
- Specify the list of all columns with data types.
- Maintain table definition. If some new data – equivalent to partitions – is added or removed from object storage, then the table must be manually adjusted using ADD / DROP partitions.
- Value of the partitioned column not in the query results
2 from dbms_cloud.list_objects(
3 'my_demo_cred',
4 :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');
----------------------------------- ----------
DEPTNO=10/dept10.csv 129
DEPTNO=20/dept20.csv 213
DEPTNO=30/dept30.csv 258
2 dbms_cloud.create_external_part_table(
3 table_name => 'emp_demo',
4 credential_name => 'my_demo_cred',
5 file_uri_list => 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*',
6 column_list => 'empno number,ename varchar2(12),sal number,hiredate date,DEPTNO number',
7 field_list => 'empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO',
8 format => json_object( 'type' value 'csv' ,
9 'partition_columns' value json_array( json_object('name' value 'DEPTNO', 'type' value 'number') )
10 ) );
11 end;
12 /
demo-user@ATP19C> select * from emp_demo where deptno = 10;
----------------------------------------------------------
Plan hash value: 636994847
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 82 | 4592 | 8 (0)| 00:00:01 | | |
| 1 | PX COORDINATOR | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 82 | 4592 | 8 (0)| 00:00:01 | | |
| 3 | PX BLOCK ITERATOR | | 82 | 4592 | 8 (0)| 00:00:01 | 1 | 1 |
| 4 | EXTERNAL TABLE ACCESS FULL| EMP_DEMO | 82 | 4592 | 8 (0)| 00:00:01 | 1 | 1 |
----------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------
( "EMPNO" NUMBER,
"ENAME" VARCHAR2(12) COLLATE "USING_NLS_COMP",
"SAL" NUMBER,
"HIREDATE" DATE,
"DEPTNO" NUMBER
) DEFAULT COLLATION "USING_NLS_COMP"
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "DATA_PUMP_DIR"
ACCESS PARAMETERS
( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
READSIZE=10000000 CREDENTIAL 'my_demo_cred'
filename_columns="['DEPTNO']"
file_uri_list="https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/*"
credential_schema="DEMO_USER"
credential_name="MY_DEMO_CRED"
FIELDS CSV WITHOUT EMBEDDED NOTRIM ( empno,ename,sal,hiredate date mask "dd-mon-yyyy hh12:mi:ss am",DEPTNO )
)
REJECT LIMIT 0
PARTITION BY LIST ("DEPTNO")
(PARTITION "P1" VALUES ((10))
LOCATION
( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=10/*' ),
PARTITION "P2" VALUES ((20))
LOCATION
( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=20/*' ),
PARTITION "P3" VALUES ((30))
LOCATION
( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=30/*' ))
PARALLEL
2 from dbms_cloud.list_objects(
3 'my_demo_cred',
4 :uri||:namespace||'/b/MY_DEMO_BUCKET/o/DEMO03/');
----------------------------------- ----------
DEPTNO=10/dept10.csv 129
DEPTNO=20/dept20.csv 213
DEPTNO=30/dept30.csv 258
DEPTNO=40/dept_40.csv 83
2 from emp_demo
3 group by deptno;
---------- ----------
30 6
10 3
20 5
2 from emp_demo
3 group by deptno;
---------- ----------
40 2
30 6
10 3
20 5
----------------- ---------------------------------------------------------------
( "EMPNO" NUMBER,
....
....
"DEPTNO" NUMBER
) DEFAULT COLLATION "USING_NLS_COMP"
ORGANIZATION EXTERNAL
( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "DATA_PUMP_DIR"
ACCESS PARAMETERS
( RECORDS DELIMITED BY DETECTED NEWLINE NOLOGFILE NOBADFILE NODISCARDFILE
....
.... )
)
REJECT LIMIT 0
PARTITION BY LIST ("DEPTNO")
(PARTITION "P1" VALUES (10)
....
....
PARTITION "P4" VALUES (40)
( 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/idcglquusbz6/b/MY_DEMO_BUCKET/o/DEMO03/DEPTNO=40/*'
))
PARALLEL
No comments:
Post a Comment