Partitioned external tables were introduced in Oracle Database 12c Release 2 (12.2), allowing external tables to benefit from partition pruning and partition-wise joins. With the exception of hash partitioning, many partitioning and subpartitioning strategies are supported with some restrictions. In this post I've created a test to get better performance of external table over HDFS.
Pre-requisites
Configuring NFS Gateway for HDFS [HDP] |
Creating Oracle External Table (12c) on HDFS using HDP NFS Gateway |
1- Create some data for the partitions for external table, I've chosen user_objects table for the purpose of this post.
SQL> select distinct trunc(created) from sys.user_objects order by 1;
TRUNC(CREATED)
------------------
26-JAN-17
18-MAY-17
19-MAY-17
29-MAY-17
...
31-DEC-17
01-JAN-18
21 rows selected.
SQL> select count(*) from sys.user_objects where trunc(created) = '26-JAN-17';
COUNT(*)
----------
51020
SQL> select count(*) from sys.user_objects where trunc(created) > '26-JAN-17';
COUNT(*)
----------
470
2- I have selected 2 dates (for 2 partitions to be created ) for our test. Now create the files, use the SQL CREATE TABLE AS SELECT statement to select the correct rows for the partition and then write those rows into the file using ORACLE_DATAPUMP driver.
create table xt_uo_dp_p1
organization external
(type oracle_datapump default directory local_os_dir location('xt_uo_dp_p1.dmp'))
as
select *
from sys.user_objects p1
where trunc(created) = '26-JAN-17'
At this point now data for partition 1 is prepared on local file system. Prepare the data for the partition 2 also.
create table xt_uo_dp_p2
organization external
(type oracle_datapump default directory local_os_dir location('xt_uo_dp_p2.dmp'))
as
select *
from sys.user_objects p2
where trunc(created) > '26-JAN-17'
3- Verify the files created by the above create table statements
SQL> host ls -ls /data/ora_ext_tab_dir/xt*
6836 -rw-r----- 1 oracle oinstall 7000064 Jan 2 08:40 /data/ora_ext_tab_dir/xt_uo_dp_p1.dmp
76 -rw-r----- 1 oracle oinstall 77824 Jan 2 08:43 /data/ora_ext_tab_dir/xt_uo_dp_p2.dmp
You can select from each of these external tables to verify that it has the data you intended to write out to HDFS and to be used by new external table. After you have executed the SQL CREATE TABLE AS SELECT statement, you can drop these external tables.
select count(*) from xt_uo_dp_p1
select count(*) from xt_uo_dp_p2
drop table xt_uo_dp_p1
drop table xt_uo_dp_p2
4- After dropping the above 2 tables,if for some reasons, you want to create the external table on already existing data pump access driver file. You can use below statement.
create table xt_uo_dp_p2
(
OBJECT_NAME VARCHAR2(128)
,SUBOBJECT_NAME VARCHAR2(128)
,OBJECT_ID NUMBER
,DATA_OBJECT_ID NUMBER
,OBJECT_TYPE VARCHAR2(23)
,CREATED DATE
,LAST_DDL_TIME DATE
,TIMESTAMP VARCHAR2(19)
,STATUS VARCHAR2(7)
,TEMPORARY VARCHAR2(1)
,GENERATED VARCHAR2(1)
,SECONDARY VARCHAR2(1)
,NAMESPACE NUMBER
,EDITION_NAME VARCHAR2(128)
,SHARING VARCHAR2(18)
,EDITIONABLE VARCHAR2(1)
,ORACLE_MAINTAINED VARCHAR2(1)
,APPLICATION VARCHAR2(1)
,DEFAULT_COLLATION VARCHAR2(100)
,DUPLICATED VARCHAR2(1)
,SHARDED VARCHAR2(1)
,CREATED_APPID NUMBER
,CREATED_VSNID NUMBER
,MODIFIED_APPID NUMBER
,MODIFIED_VSNID NUMBER
)
organization external
(
type oracle_datapump
default directory local_os_dir
location('xt_uo_dp_p2.dmp')
);
5- Copy the .dmp files created earlier to HDFS directory location (NFS_DIR)
SQL> host cp /data/ora_ext_tab_dir/xt* /data/hdfsloc/data/oraclenfs
6- Create the new partitioned ORACLE_DATAPUMP table, the table is partitioned on the CREATED column. A table may be partitioned by RANGE, HASH, LIST, SYSTEM, or Composite Range-Hash/List/Range (R+H/L/R).
drop table xt_uo_dp_nfs;
create table xt_uo_dp_nfs
(
OBJECT_NAME VARCHAR2(128)
,SUBOBJECT_NAME VARCHAR2(128)
,OBJECT_ID NUMBER
,DATA_OBJECT_ID NUMBER
,OBJECT_TYPE VARCHAR2(23)
,CREATED DATE
,LAST_DDL_TIME DATE
,TIMESTAMP VARCHAR2(19)
,STATUS VARCHAR2(7)
,TEMPORARY VARCHAR2(1)
,GENERATED VARCHAR2(1)
,SECONDARY VARCHAR2(1)
,NAMESPACE NUMBER
,EDITION_NAME VARCHAR2(128)
,SHARING VARCHAR2(18)
,EDITIONABLE VARCHAR2(1)
,ORACLE_MAINTAINED VARCHAR2(1)
,APPLICATION VARCHAR2(1)
,DEFAULT_COLLATION VARCHAR2(100)
,DUPLICATED VARCHAR2(1)
,SHARDED VARCHAR2(1)
,CREATED_APPID NUMBER
,CREATED_VSNID NUMBER
,MODIFIED_APPID NUMBER
,MODIFIED_VSNID NUMBER
)
organization external
(
type oracle_datapump
default directory NFS_DIR
)
partition by RANGE (CREATED)
(
partition p1 VALUES LESS THAN (TO_DATE('27-JAN-2017','DD-MON-YYYY')) location('xt_uo_dp_p1.dmp'),
partition p2 VALUES LESS THAN (TO_DATE('01-JAN-2018','DD-MON-YYYY')) location('xt_uo_dp_p2.dmp')
);
7- Set the parallelism of external table
alter table xt_uo_dp_nfs parallel (degree 8);
8- Check the table with queries
--Query the partition 1
SQL> select count(*) from sys.xt_uo_dp_nfs where created < TO_DATE('27-JAN-2017','DD-MON-YYYY')
2 /
COUNT(*)
----------
51020
Elapsed: 00:00:00.65
--Query partition 2
SQL> select count(*) from sys.xt_uo_dp_nfs where created > TO_DATE('27-JAN-2017','DD-MON-YYYY') and created < TO_DATE('01-JAN-2018','DD-MON-YYYY');
COUNT(*)
----------
471
Elapsed: 00:00:00.40
9- If in future you need to add another partition or modify existing location to existing external table, you need to drop and create external table again with newly added partitions' dmp files or modified locations.
No comments:
Post a Comment