Introduction
Apache Sqoop is a hadoop ecosystem's tool (hadoop client) designed to Efficiently transfers bulk data between Apache Hadoop and structured datastores like Oracle. It helps offload certain tasks (such as ETL processing) from the EDW to Hadoop for efficient execution at a much lower cost. It can also be used to extract data from Hadoop and export it into external structured datastores.
It provides two main tools (import & export) to facilitate data transfers along with other different utility tools.
The Import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS. All records are stored as text data in the text files or as binary data in Avro and Sequence files.
You can export data back from the HDFS to the RDBMS database. The target table must exist in the target database. The default operation is to insert all the record from the input files to the database table using the INSERT statement. In update mode, Sqoop generates the UPDATE statement that replaces the existing record into the database.
Installing/Configuring Sqoop 1.4.6
First download latest stable sqoop release, you can find the download location as below.http://www-eu.apache.org/dist/sqoop/1.4.6/
[root@en01 ~]# chown -R hdpclient:hadoop_edge /usr/hadoopsw/sqoop-1.4.6.bin__hadoop-0.23
[root@en01 ~]# chmod -R 777 /usr/hadoopsw/sqoop-1.4.6.bin__hadoop-0.23
Configure the sqoop related variables
vi ~/.bash_profile
export PATH=$PATH:$SQOOP_HOME/bin
vi ~/.bash_profile
### SQOOP Variables
export SQOOP_HOME=/usr/hadoopsw/sqoop-1.4.6.bin__hadoop-0.23export PATH=$PATH:$SQOOP_HOME/bin
[hdpclient@en01 bin]$ cd $SQOOP_HOME/conf
[hdpclient@en01 conf]$ mv sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/usr/hadoopsw/hadoop-2.7.3
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/usr/hadoopsw/hadoop-2.7.3
#set the path to where bin/hbase is available
#export HBASE_HOME=
#Set the path to where bin/hive is available
#export HIVE_HOME=
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
[hdpclient@en01 ~]$ sqoop-version
.....
17/06/07 11:21:01 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Sqoop 1.4.6
git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
Compiled by root on Mon Apr 27 14:20:17 CST 2015
Download JDBC drivers as per your database and place them in $SQOOP_HOME/lib. You can download drivers from below locations. I'm using Oracle for the purpose of this post.
Oracle JDBC Driver
http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html.
Microsoft SQL Server JDBC Driver
http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774
PostgreSQL JDBC Driver
http://jdbc.postgresql.org/download.html
[hdpsysuser@nn01 ~]$ hdfs dfs -mkdir /oradata
[hdpsysuser@nn01 ~]$ hdfs dfs -chown hdpclient:hadoop_edge /oradata
Create a table in Oracle schema (hdfsuser in my case) to test the sqoop import.
SQL> create table test(id number, name varchar2(10));
SQL> insert into test(&id,&name);
SQL> select * from test;
ID NAME
[hdpsysuser@nn01 ~]$ hadoop fs -cat /oradata/test/part-m-000001,Ali
Through incremental import you import only the newly added rows in a table. It is required to add ‘incremental’, ‘check-column’, and ‘last-value’ options to perform the incremental import.
Insert new data in Oracle to test incremental import
SQL> insert into test values(7,'Hassan');
Run sqoop incremental import
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST --username hdfsuser --password hdfsuser --target-dir /oradata/test --columns "id,name" -m 1 --incremental append --check-column id -last-value 6
Verify incremental import
[hdpsysuser@nn01 ~]$ hdfs dfs -ls /oradata/test
Found 3 items
[hdpsysuser@nn01 ~]$ hadoop fs -cat /oradata/test/part-m-*
[hdpsysuser@nn01 ~]$ hadoop fs -cat /oradata/test/part-m-*1
7,Hassan
Import all tables
Import data from RDBMS into hive
sqoop eval --connect "jdbc:sqlserver://sqlserver01;database=DBA" --username hdfsuser --password hdfsuser --query "SELECT * FROM [DBA].[dbo].[test]"
http://www.oracle.com/technetwork/database/enterprise-edition/jdbc-112010-090769.html.
Microsoft SQL Server JDBC Driver
http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774
PostgreSQL JDBC Driver
http://jdbc.postgresql.org/download.html
Import data from Oracle database to Hadoop HDFS
Create location in HDFS to store imported data
[hdpsysuser@nn01 ~]$ hdfs dfs -chown hdpclient:hadoop_edge /oradata
Create a table in Oracle schema (hdfsuser in my case) to test the sqoop import.
SQL> create table test(id number, name varchar2(10));
SQL> insert into test(&id,&name);
SQL> select * from test;
ID NAME
---------- ----------
Run the sqoop import now
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST --username hdfsuser --password hdfsuser --target-dir /oradata/test --columns "id,name" -m 1
[hdpclient@en01 conf]$ sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST --username hdfsuser --password hdfsuser --target-dir /oradata/test --columns "id,name" -m 1
Warning: /usr/hadoopsw/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Verify imported data in HDFS
[hdpsysuser@nn01 ~]$ hdfs dfs -ls /oradata/test
Found 2 items
-rw-r--r-- 3 hdpclient hadoop_edge 0 2017-06-07 12:03 /oradata/test/_SUCCESS
-rw-r--r-- 3 hdpclient hadoop_edge 49 2017-06-07 12:03 /oradata/test/part-m-00000
1 Ali
2 Abuzar
3 Zeeshan
4 Hafiz
5 Ikram
6 Usman
6 rows selected.
Run the sqoop import now
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST --username hdfsuser --password hdfsuser --target-dir /oradata/test --columns "id,name" -m 1
If your table has no primary key defined then you have to give -m 1 option for importing the data or you have to provide --split-by argument with some column name, otherwise it gives the error. We can import a subset of a table using the where clause too.
[hdpclient@en01 conf]$ sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST --username hdfsuser --password hdfsuser --target-dir /oradata/test --columns "id,name" -m 1
Warning: /usr/hadoopsw/sqoop-1.4.6.bin__hadoop-0.23/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /usr/hadoopsw/sqoop-1.4.6.bin__hadoop-0.23/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr/hadoopsw/sqoop-1.4.6.bin__hadoop-0.23/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /usr/hadoopsw/sqoop-1.4.6.bin__hadoop-0.23/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
17/06/07 12:03:50 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
17/06/07 12:03:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/06/07 12:03:50 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
17/06/07 12:03:50 INFO manager.SqlManager: Using default fetchSize of 1000
17/06/07 12:03:50 INFO tool.CodeGenTool: Beginning code generation
17/06/07 12:04:16 INFO manager.OracleManager: Time zone has been set to GMT
17/06/07 12:04:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM TEST t WHERE 1=0
17/06/07 12:04:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoopsw/hadoop-2.7.3
Note: /tmp/sqoop-hdpclient/compile/e2e81658abee225ed66486494c9a67b9/TEST.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/06/07 12:04:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdpclient/compile/e2e81658abee225ed66486494c9a67b9/TEST.jar
17/06/07 12:04:17 INFO manager.OracleManager: Time zone has been set to GMT
17/06/07 12:04:17 INFO mapreduce.ImportJobBase: Beginning import of TEST
17/06/07 12:04:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/06/07 12:04:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/06/07 12:04:18 INFO client.RMProxy: Connecting to ResourceManager at nn01/192.168.44.133:8032
17/06/07 12:04:22 INFO db.DBInputFormat: Using read commited transaction isolation
17/06/07 12:04:22 INFO mapreduce.JobSubmitter: number of splits:1
17/06/07 12:04:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1494154575009_0312
17/06/07 12:04:22 INFO impl.YarnClientImpl: Submitted application application_1494154575009_0312
17/06/07 12:04:22 INFO mapreduce.Job: The url to track the job: http://nn01:8088/proxy/application_1494154575009_0312/
17/06/07 12:04:22 INFO mapreduce.Job: Running job: job_1494154575009_0312
17/06/07 12:04:29 INFO mapreduce.Job: Job job_1494154575009_0312 running in uber mode : false
17/06/07 12:04:29 INFO mapreduce.Job: map 0% reduce 0%
17/06/07 12:04:35 INFO mapreduce.Job: map 100% reduce 0%
17/06/07 12:04:36 INFO mapreduce.Job: Job job_1494154575009_0312 completed successfully
17/06/07 12:04:36 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=138171
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=49
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=3686
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=3686
Total vcore-milliseconds taken by all map tasks=3686
Total megabyte-milliseconds taken by all map tasks=3774464
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=60
CPU time spent (ms)=1170
Physical memory (bytes) snapshot=179765248
Virtual memory (bytes) snapshot=2166857728
Total committed heap usage (bytes)=146800640
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=49
17/06/07 12:04:36 INFO mapreduce.ImportJobBase: Transferred 49 bytes in 17.5645 seconds (2.7897 bytes/sec)
17/06/07 12:04:36 INFO mapreduce.ImportJobBase: Retrieved 6 records.
[hdpsysuser@nn01 ~]$ hdfs dfs -ls /oradata/test
Found 2 items
-rw-r--r-- 3 hdpclient hadoop_edge 0 2017-06-07 12:03 /oradata/test/_SUCCESS
-rw-r--r-- 3 hdpclient hadoop_edge 49 2017-06-07 12:03 /oradata/test/part-m-00000
[hdpsysuser@nn01 ~]$ hadoop fs -cat /oradata/test/part-m-000001,Ali
2,Abuzar
3,Zeeshan
4,Hafiz
5,Ikram
6,Usman
Incremental Import
Insert new data in Oracle to test incremental import
SQL> insert into test values(7,'Hassan');
Run sqoop incremental import
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST --username hdfsuser --password hdfsuser --target-dir /oradata/test --columns "id,name" -m 1 --incremental append --check-column id -last-value 6
....
....
17/06/07 12:39:49 INFO mapreduce.ImportJobBase: Transferred 9 bytes in 28.3266 seconds (0.3177 bytes/sec)
17/06/07 12:39:49 INFO mapreduce.ImportJobBase: Retrieved 1 records.
17/06/07 12:39:49 INFO util.AppendUtils: Appending to directory test
17/06/07 12:39:49 INFO util.AppendUtils: Using found partition 1
17/06/07 12:39:49 INFO tool.ImportTool: Incremental import complete! To run another incremental import of all data following this import, supply the following arguments:
17/06/07 12:39:49 INFO tool.ImportTool: --incremental append
17/06/07 12:39:49 INFO tool.ImportTool: --check-column id
17/06/07 12:39:49 INFO tool.ImportTool: --last-value 7
17/06/07 12:39:49 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')
Verify incremental import
[hdpsysuser@nn01 ~]$ hdfs dfs -ls /oradata/test
Found 3 items
-rw-r--r-- 3 hdpclient hadoop_edge 0 2017-06-07 12:03 /oradata/test/_SUCCESS
-rw-r--r-- 3 hdpclient hadoop_edge 49 2017-06-07 12:03 /oradata/test/part-m-00000
-rw-r--r-- 3 hdpclient supergroup 9 2017-06-07 12:38 /oradata/test/part-m-00001
1,Ali
2,Abuzar
3,Zeeshan
4,Hafiz
5,Ikram
6,Usman
7,Hassan
[hdpsysuser@nn01 ~]$ hadoop fs -cat /oradata/test/part-m-*1
7,Hassan
Import all tables
In case of importing all the table, each table data is stored in a separate directory and the directory name is same as the table name. It is also mandatory that every table in that database must have a primary key field.
sqoop import-all-tables --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --username hdfsuser --password hdfsuser --target-dir /oradata
Import data from RDBMS into hive
Create a second table in Oracle schem to test the import into HIVE table from Oracle.
SQL> create table test2(id int,name varchar2(10));
SQL> insert into test2 select * from test;
SQL> commit;
Create table in hive to have data from sqoop import
If you have not already created table in Hive , you can use create-hive-table option in sqoop command.
[hdpsysuser@nn01 ~]$ hadoop fs -ls /oradata/test/
Found 3 items
-rw-r--r-- 3 hdpclient hadoop_edge 0 2017-06-07 12:03 /oradata/test/_SUCCESS
-rw-r--r-- 3 hdpclient hadoop_edge 49 2017-06-07 12:03 /oradata/test/part-m-00000
-rw-r--r-- 3 hdpclient supergroup 9 2017-06-07 12:38 /oradata/test/part-m-00001
[hdpsysuser@nn01 ~]$ hadoop fs -cat /oradata/test/part-m-*1
7,Hassan
sqoop export --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --username hdfsuser --password hdfsuser --table EXP_FROM_HDFS --export-dir /oradata/test
sqoop job --create import_from_oracle -- import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST --username hdfsuser --password hdfsuser --target-dir /oradata/test --columns "id,name" -m 1 --incremental append --check-column id -last-value 7
List all jobs in the meta store
[hdpclient@en01 ~]$ sqoop job --list
.....
17/06/07 16:45:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Available jobs:
import_from_oracle
Shows metadata information about your job
[hdpclient@en01 ~]$ sqoop job --show import_from_oracle
......
Enter password:
Job: import_from_oracle
Tool: import
Options:
----------------------------
verbose = false
incremental.last.value = 7
db.connect.string = jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
hbase.create.table = false
db.require.password = true
hdfs.append.dir = true
db.table = TEST
codegen.input.delimiters.escape = 0
import.fetch.size = null
accumulo.create.table = false
codegen.input.delimiters.enclose.required = false
db.username = hdfsuser
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = id
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
db.column.list = id,name
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = AppendRows
hdfs.file.format = TextFile
codegen.compile.dir = /tmp/sqoop-hdpclient/compile/bd7d0040bdbedafdeb502e0f61bcf3e8
direct.import = false
hdfs.target.dir = /oradata/test
hive.fail.table.exists = false
db.batch = false
Execute Job
--exec option is used to execute a saved job.
[hdpclient@en01 ~]$ sqoop job --exec import_from_oracle
Sqoop - Codegen tool
[hdpclient@en01 lib]$ sqoop codegen -connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST --username hdfsuser --password hdfsuser
......
......
17/06/08 13:05:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM TEST t WHERE 1=0
17/06/08 13:05:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoopsw/hadoop-2.7.3
Note: /tmp/sqoop-hdpclient/compile/aa603ae3cb66f71003ac49b95b74136c/TEST.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/06/08 13:05:53 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdpclient/compile/aa603ae3cb66f71003ac49b95b74136c/TEST.jar
Verify generated class
[hdpclient@en01 lib]$ ll /tmp/sqoop-hdpclient/compile/aa603ae3cb66f71003ac49b95b74136c
total 16
-rw-rw-r--. 1 hdpclient hdpclient 8684 Jun 8 13:05 TEST.class
-rw-rw-r--. 1 hdpclient hdpclient 3626 Jun 8 13:05 TEST.jar
Sqoop - Eval tool
[hdpclient@en01 lib]$ sqoop eval -connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --username hdfsuser --password hdfsuser --query "SELECT * FROM test WHERE rownum <= 3"
.....
SQL> create table test2(id int,name varchar2(10));
SQL> insert into test2 select * from test;
SQL> commit;
Create table in hive to have data from sqoop import
hive (scott)> create table ora_imp_hive (id int,name string);
Now run Sqoop import, you need to mention hive-import and hive-table options in sqoop command.
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST2 --username hdfsuser --password hdfsuser --target-dir /oradata/testhive --hive-import --hive-table ora_imp_hive -m 1
sqoop import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST2 --username hdfsuser --password hdfsuser --target-dir /oradata/test3 --hive-import --hive-table ora_imp_hive3 --create-hive-table --m 1
Export data back from the HDFS to the RDBMS database.
We have already file in HDFS as in below location and want to export in Oracle Database.[hdpsysuser@nn01 ~]$ hadoop fs -ls /oradata/test/
Found 3 items
-rw-r--r-- 3 hdpclient hadoop_edge 0 2017-06-07 12:03 /oradata/test/_SUCCESS
-rw-r--r-- 3 hdpclient hadoop_edge 49 2017-06-07 12:03 /oradata/test/part-m-00000
-rw-r--r-- 3 hdpclient supergroup 9 2017-06-07 12:38 /oradata/test/part-m-00001
7,Hassan
You need to create table in Oracle Schema first and then you can export to this table.
SQL> create table exp_from_hdfs(id number,name varchar2(10));
Run sqoop export commandSQL> create table exp_from_hdfs(id number,name varchar2(10));
......
......
Map-Reduce Framework
Map input records=7
Map output records=7
Input split bytes=606
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=367
CPU time spent (ms)=8960
Physical memory (bytes) snapshot=772395008
Virtual memory (bytes) snapshot=8645033984
Total committed heap usage (bytes)=588775424
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
17/06/07 16:12:57 INFO mapreduce.ExportJobBase: Transferred 746 bytes in 240.0369 seconds (3.1079 bytes/sec)
17/06/07 16:12:57 INFO mapreduce.ExportJobBase: Exported 7 records.
Verify from SQLPLUS
SQL> select * from exp_from_hdfs;
ID NAME
---------- ----------
7 Hassan
1 Ali
2 Abuzar
5 Ikram
6 Usman
3 Zeeshan
4 Hafiz
7 rows selected.
Other Sqoop Tools
Sqoop - Job
Sqoop job creates and saves the import and export commands. It specifies parameters to identify and recall the saved job. This re-calling or re-executing is used in the incremental import, which can import the updated rows from RDBMS table to HDFS.
sqoop job --create import_from_oracle -- import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --table TEST --username hdfsuser --password hdfsuser --target-dir /oradata/test --columns "id,name" -m 1 --incremental append --check-column id -last-value 7
List all jobs in the meta store
[hdpclient@en01 ~]$ sqoop job --list
.....
17/06/07 16:45:44 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Available jobs:
import_from_oracle
Shows metadata information about your job
[hdpclient@en01 ~]$ sqoop job --show import_from_oracle
......
Enter password:
Job: import_from_oracle
Tool: import
Options:
----------------------------
verbose = false
incremental.last.value = 7
db.connect.string = jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
hbase.create.table = false
db.require.password = true
hdfs.append.dir = true
db.table = TEST
codegen.input.delimiters.escape = 0
import.fetch.size = null
accumulo.create.table = false
codegen.input.delimiters.enclose.required = false
db.username = hdfsuser
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = id
codegen.input.delimiters.record = 0
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = true
db.column.list = id,name
relaxed.isolation = false
mapreduce.num.mappers = 1
accumulo.max.latency = 5000
import.direct.split.size = 0
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = AppendRows
hdfs.file.format = TextFile
codegen.compile.dir = /tmp/sqoop-hdpclient/compile/bd7d0040bdbedafdeb502e0f61bcf3e8
direct.import = false
hdfs.target.dir = /oradata/test
hive.fail.table.exists = false
db.batch = false
Execute Job
--exec option is used to execute a saved job.
[hdpclient@en01 ~]$ sqoop job --exec import_from_oracle
Sqoop - Codegen tool
codegen generates the DAO class automatically which contains getter’ and ‘setter’ methods for database table. The main usage of this tool is to check if Java lost the Java code. If so, it will create a new version of Java with the default delimiter between fields.
......
......
17/06/08 13:05:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM TEST t WHERE 1=0
17/06/08 13:05:52 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hadoopsw/hadoop-2.7.3
Note: /tmp/sqoop-hdpclient/compile/aa603ae3cb66f71003ac49b95b74136c/TEST.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/06/08 13:05:53 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdpclient/compile/aa603ae3cb66f71003ac49b95b74136c/TEST.jar
[hdpclient@en01 lib]$ ll /tmp/sqoop-hdpclient/compile/aa603ae3cb66f71003ac49b95b74136c
total 16
-rw-rw-r--. 1 hdpclient hdpclient 8684 Jun 8 13:05 TEST.class
-rw-rw-r--. 1 hdpclient hdpclient 3626 Jun 8 13:05 TEST.jar
Sqoop - Eval tool
Eval tool allows users to execute user-defined queries against respective database servers and preview the result in the console. We can evaluate any type of SQL query that can be either DDL or DML statement.
[hdpclient@en01 lib]$ sqoop eval -connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --username hdfsuser --password hdfsuser --query "SELECT * FROM test WHERE rownum <= 3"
.....
17/06/08 13:18:19 INFO manager.OracleManager: Time zone has been set to GMT
[hdpclient@en01 lib]$ sqoop eval -connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --username hdfsuser --password hdfsuser -e "INSERT INTO test VALUES(11,'Mohsin')"
....
....
17/06/08 13:21:32 INFO tool.EvalSqlTool: 1 row(s) updated.
8 rows selected.
Sqoop - List Database tool
Sqoop list-databases tool lists out the present databases on the server. In case of Oracle it will show the schemas.
[hdpclient@en01 lib]$ sqoop list-databases -connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --username system --password manager
SYS
SYSTEM
....
....
HDFSUSER
Sqoop - List Tables tool
List Tables tool lists out the tables of a particular database/schema.
-------------------------------------
| ID | NAME |
-------------------------------------
| 1 | Ali |
| 2 | Abuzar |
| 3 | Zeeshan |
-------------------------------------
....
....
17/06/08 13:21:32 INFO tool.EvalSqlTool: 1 row(s) updated.
SQL> select * from test;
ID NAME
---------- ----------
1 Ali
2 Abuzar
3 Zeeshan
4 Hafiz
5 Ikram
6 Usman
7 Hassan
11 Mohsin
Sqoop - List Database tool
Sqoop list-databases tool lists out the present databases on the server. In case of Oracle it will show the schemas.
[hdpclient@en01 lib]$ sqoop list-databases -connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --username system --password manager
SYS
SYSTEM
....
....
HDFSUSER
Sqoop - List Tables tool
List Tables tool lists out the tables of a particular database/schema.
[hdpclient@en01 lib]$ sqoop list-tables --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=en01)(port=1521))(connect_data=(service_name=hadob)))" --username hdfsuser --password hdfsuser
...
...
CATALOG_EXT
TEST
TEST2
EXP_FROM_HDFS
Connecting with SQLServer
Put the SQL Server driver in $SQOOP_HOME/lib and then you can run the same sqoop command as above for SQL Server, below is one example.
...
...
CATALOG_EXT
TEST
TEST2
EXP_FROM_HDFS
Connecting with SQLServer
Put the SQL Server driver in $SQOOP_HOME/lib and then you can run the same sqoop command as above for SQL Server, below is one example.sqoop eval --connect "jdbc:sqlserver://sqlserver01;database=DBA" --username hdfsuser --password hdfsuser --query "SELECT * FROM [DBA].[dbo].[test]"
No comments:
Post a Comment