Hive Streaming

Friday, June 23, 2017

Hive Streaming

Streaming offers an alternative way to transform data. During a streaming job, the Hadoop

Streaming API opens an I/O pipe to an external process. Data is then passed to

the process, which operates on the data it reads from the standard input and writes the

results out through the standard output, and back to the Streaming API job.

Identity Transformation
The most basic streaming job is an identity operation. The /bin/cat command echoes
the data sent to it.
hive (scott)> SELECT TRANSFORM (ename, sal) USING '/bin/cat' AS (newEname, newSal) FROM scott.emp;

Changing Types

The return columns from TRANSFORM are typed as strings, by default. There is an alternative syntax that casts the results to different types.

hive (scott)> SELECT TRANSFORM (ename, sal) USING '/bin/cat' AS (newEname STRING, newSal DOUBLE) FROM scott.emp;

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 2.33 sec HDFS Read: 5439 HDFS Write: 435 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 330 msec

newEname newSal

SMITH 800.0

ALLEN 1600.0

WARD 1250.0

JONES 2975.0

MARTIN 1250.0

BLAKE 2850.0

CLARK 2450.0

SCOTT 3000.0

KING 5000.0

TURNER 1500.0

ADAMS 1100.0

JAMES 950.0

FORD 3000.0

MILLER 1300.0

Time taken: 20.392 seconds, Fetched: 14 row(s)

DBMentors - Inam Bukhari's Blog

Pages

Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Friday, June 23, 2017

Hive Streaming

No comments:

Translate

Followers

Labels

Blog Archive

About Me

Total Pageviews