Streaming offers an alternative way to transform data. During a streaming job, the Hadoop
Streaming API opens an I/O pipe to an external process. Data is then passed to
the process, which operates on the data it reads from the standard input and writes the
results out through the standard output, and back to the Streaming API job.
Identity Transformation
The most basic streaming job is an identity operation. The /bin/cat command echoes
the data sent to it.
hive (scott)> SELECT TRANSFORM (ename, sal) USING '/bin/cat' AS (newEname, newSal) FROM scott.emp;
Changing Types
The return columns from TRANSFORM are typed as strings, by default. There is an alternative syntax that casts the results to different types.hive (scott)> SELECT TRANSFORM (ename, sal) USING '/bin/cat' AS (newEname STRING, newSal DOUBLE) FROM scott.emp;
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.33 sec HDFS Read: 5439 HDFS Write: 435 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 330 msec
OK
newEname newSal
SMITH 800.0
ALLEN 1600.0
WARD 1250.0
JONES 2975.0
MARTIN 1250.0
BLAKE 2850.0
CLARK 2450.0
SCOTT 3000.0
KING 5000.0
TURNER 1500.0
ADAMS 1100.0
JAMES 950.0
FORD 3000.0
MILLER 1300.0
Time taken: 20.392 seconds, Fetched: 14 row(s)
No comments:
Post a Comment