DBMentors - Inam Bukhari's Blog: Hive for Oracle Developers and DBAs

In the first Hive post we have discussed the basic usage and functionality of Hive , today we move forward and discuss some advance functionality. I'll cover Collection, Tables and Partitions in this post.

Collection Data Types

STRUCT

Analogous to a C struct or an “object.” Fields can be accessed using the “dot” notation. For example, if a column name is of type STRUCT {first STRING; last STRING}, then the first name field can be referenced using name.first.

MAP

collection of key-value tuples, where the fields are accessed using array notation (e.g., ['key']). For example, if a column name is of type MAP with key→value pairs 'first'→'Abuzar' and 'last'→'Ali', then the last name can be referenced using name['last'].

ARRAY

Ordered sequences of the same type that are indexable using zero-based integers. For example, if a column name is of type ARRAY of strings with the value ['Abuzar', 'Ali'], then the second element can be referenced using name[1].

More about Tables

Hive offers significant extensions to support a wide range of flexibility where the data files for tables are stored, the formats used, etc.

TBLPROPERTIES is to add additional documentation in a key-value format, TBLPROPERTIES can be used to express essential metadata about the database connection.

Managed Tables: Also called internal tables, because Hive controls the life cycle of their data. When we drop a managed table , Hive deletes the data in the table. Managed tables are less convenient for sharing with other tools. For example, suppose we have data that is created and used primarily by Pig or other tools, but we want to run some queries against it.

CREATE TABLE IF NOT EXISTS employees(

name STRING COMMENT 'Employee name',

salary FLOAT COMMENT 'Employee salary',

subordinates ARRAY<STRING> COMMENT 'Names of subordinates',

deductions MAP<STRING, FLOAT> COMMENT 'Keys are deductions names, values are percentages',

address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> COMMENT 'Home address')

COMMENT 'Table holds information about employees'

row format delimited

fields terminated by ','

collection items terminated by '$'

map keys terminated by '#'

LOCATION '/userdata/employees'

TBLPROPERTIES ('creator'='Inam', 'created_at'='2017-02-17 12:55:00','project'='Hive POC')

;

Get Table information:

hive (scott)> desc extended employees;

Replacing EXTENDED with FORMATTED provides more readable but also more verbose output.

hive (scott)> desc formatted employees;

Hive automatically adds two table properties: last_modified_by holds the username of the last user to modify the table, and last_modified_time holds the epoch time in seconds of that modification.

hive (scott)> SHOW TBLPROPERTIES employees;

prpt_name prpt_value

comment Table holds information about employees

created_at 2017-02-17 12:55:00

creator Inam

project Hive POC

transient_lastDdlTime 1493125674

Time taken: 0.251 seconds, Fetched: 5 row(s)

Prepare data for the employees (complex) table (with collection columns)

Create the test csv file for the employees table to be loaded later

employees.csv (located at /tmp/mydata)

Abuzar,1000,Ikram$Usman$,pf#3.5$loan#9.4,College Rd.$Lahore$Punjab$41000

Ali,2000,Ikram$Usman$,pf#5.5$loan#7.4,College Rd.$Lahore$Punjab$41000

Load data into employees (complex) table

hive (scott)> load data local inpath '/tmp/mydata/employees.csv' into table scott.employees;

Loading data to table scott.employees

Time taken: 0.227 seconds

view employees (complex) table data

hive (scott)> select name,salary,subordinates,deductions from employees;

name salary subordinates deductions

Abuzar 1000.0 ["Ikram","Usman",""] {"pf":3.5,"loan":9.4}

Ali 2000.0 ["Ikram","Usman",""] {"pf":5.5,"loan":7.4}

Time taken: 0.177 seconds, Fetched: 2 row(s)

hive (scott)> select name,salary,subordinates,deductions,address from employees;

name salary subordinates deductions address

Abuzar 1000.0 ["Ikram","Usman",""] {"pf":3.5,"loan":9.4} {"street":"College Rd.","city":"Lahore","state":"Punjab","zip":41000}

Ali 2000.0 ["Ikram","Usman",""] {"pf":5.5,"loan":7.4} {"street":"College Rd.","city":"Lahore","state":"Punjab","zip":41000}

Time taken: 0.13 seconds, Fetched: 2 row(s)

Load further data using put (uploading file into HDFS employees table directory)

hive (scott)> !hdfs dfs -put /tmp/mydata/employees2.csv /userdata/employees;

Query again ans you will see more rows (existing in employees2.csv)

hive (scott)> select name,salary,subordinates,deductions from employees;

name salary subordinates deductions

Abuzar 1000.0 ["Ikram","Usman",""] {"pf":3.5,"loan":9.4}

Ali 2000.0 ["Ikram","Usman",""] {"pf":5.5,"loan":7.4}

Ashraf 4000.0 ["Ikram","Usman",""] {"pf":3.5,"loan":9.4}

Mubeen 5000.0 ["Ikram","Usman",""] {"pf":5.5,"loan":7.4}

Time taken: 0.14 seconds, Fetched: 4 row(s)

Copy the schema (but not the data) of an existing table

hive (scott)> CREATE TABLE IF NOT EXISTS employees2 LIKE employees;

List the tables in other database

hive (scott)> SHOW TABLES IN scott;

tab_name

dept

dual

emp

emp2

employees

employees2

By Regular expression, Not all regular expression features are supported

hive (scott)> SHOW TABLES IN scott LIKE 'e.*';

tab_name

emp

emp2

employees

employees2

Time taken: 0.033 seconds, Fetched: 4 row(s)

External Tables: If we want to study data with many tools for different data sources (eg; datasource1,datasource2) , we have to ensure the schema we’ll use next matches the schemas of both these data sources.

Let’s assume the data files are in HDFS directory /data/stocks.

[hdpsysuser@hdpmaster ~]$ hdfs dfs -mkdir -p /data/stocks

[hdpsysuser@hdpmaster ~]$ hdfs dfs -chmod 777 /data/stocks

The following table declaration creates an external table that can read all the data files for this comma-delimited data in /data/stocks:

CREATE EXTERNAL TABLE IF NOT EXISTS stocks(

exchange_name STRING,

price_open FLOAT

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

LOCATION '/data/stocks';

hive (scott)> select * from stocks; -- -- NO ROWS RETURNED

stocks.exchange_name stocks.price_open

Time taken: 0.119 seconds

Now I created 2 CSVs and placed them in /tmp/mydata folder on hive server to furhter put them in HDFS /data/stocks location

asia.csv

Karachi Stock Exchange,1000

Riyadh Stock Exchange,2000

us.csv

NY Stock Exchange,3000

Put asia.csv to HDFS

[hdpsysuser@hdpmaster ~]$ hdfs dfs -put /tmp/mydata/asia.csv /data/stocks

hive (scott)> select * from stocks;

stocks.exchange_name stocks.price_open

Karachi Stock Exchange 1000.0

Riyadh Stock Exchange 2000.0

Time taken: 0.157 seconds, Fetched: 2 row(s)

Put the second csv also and query again

[hdpsysuser@hdpmaster ~]$ hdfs dfs -put /tmp/mydata/us.csv /data/stocks

hive (scott)> select * from stocks;

stocks.exchange_name stocks.price_open

Karachi Stock Exchange 1000.0

Riyadh Stock Exchange 2000.0

NY Stock Exchange 3000.0

Time taken: 0.168 seconds, Fetched: 3 row(s)

Hive does not assume that external table owns the data. Therefore, dropping the table does not delete the data, although the metadata for the table will be deleted. Some HiveQL constructs are not permitted for external tables.

Partitioned Managed Tables: Often partitioning is used for distributing load horizontally, moving data physically closer to its most frequent users, and other purposes. Partitioned tables have important performance benefits, and they can help organize data in a logical fashion, such as hierarchically. Partitioning tables changes how Hive structures the data storage.

If we create below table in the scott database, there will be an employees_part directory for the table, Hive will create subdirectories reflecting the partitioning structure.

CREATE TABLE IF NOT EXISTS employees_part(

name STRING COMMENT 'Employee name',

salary FLOAT COMMENT 'Employee salary',

subordinates ARRAY<STRING> COMMENT 'Names of subordinates',

deductions MAP<STRING, FLOAT> COMMENT 'Keys are deductions names, values are percentages',

address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> COMMENT 'Home address')

COMMENT 'Table holds information about employees with partitions'

PARTITIONED BY (country STRING, state STRING)

row format delimited

fields terminated by ','

collection items terminated by '$'

map keys terminated by '#'

TBLPROPERTIES ('creator'='Inam', 'created_at'='2017-05-17 12:55:00','project'='Hive POC')

;

hive (scott)> describe employees_part;

col_name data_type comment

name string

salary float

subordinates array<string>

deductions map<string,float>

address struct<street:string,city:string,state:string,zip:int>

country string

state string

# Partition Information

# col_name data_type comment

country string

state string

Time taken: 0.037 seconds, Fetched: 13 row(s)

hive (scott)>

Observe above table description, columns declared in partitioned by clause have been added in the column list automatically.

List the content of the location of employees_part table

hive (scott)> !hdfs dfs -ls /user/hive/warehouse/scott.db/employees_part;

You see no folders as data has not been loaded yet.

hive (scott)> select name,country,state from employees_part;

name country state

Time taken: 0.205 seconds

Prepare data to be loaded into partition table

Create 3 CSVs like below.

employees_part1.csv

Abuzar,1000,Ikram$Usman$,pf#3.5$loan#9.4,College Rd.$Lahore$Punjab$41000,PAK,Punjab

Ali,2000,Ikram$Usman$,pf#5.5$loan#7.4,College Rd.$Lahore$Punjab$41000,PAK,Punjab

employees_part2.csv

Zeeshan,1000,Ikram$Usman$,pf#3.5$loan#9.4,College Rd.$Lahore$Punjab$41000,PAK,Sindh

Hafiz,2000,Ikram$Usman$,pf#5.5$loan#7.4,College Rd.$Lahore$Punjab$41000,PAK,Sindh

employees_part3.csv

Saleem,1000,Ikram$Usman$,pf#3.5$loan#9.4,College Rd.$Lahore$Punjab$41000,Saudia,Riyadh

Danish,2000,Ikram$Usman$,pf#5.5$loan#7.4,College Rd.$Lahore$Punjab$41000,Saudia,Qaseem

Load Data into partitioned table (Dynamic Partitions)

Loading csv data to a partitioned table involves below mentioned two steps:
1- Load csv file to a non-partitioned table.
2- Load non-partitioned table data to partitioned table.

Create non-partitioned table

CREATE TABLE IF NOT EXISTS employees_nonpart(

name STRING COMMENT 'Employee name',

salary FLOAT COMMENT 'Employee salary',

subordinates ARRAY<STRING> COMMENT 'Names of subordinates',

deductions MAP<STRING, FLOAT> COMMENT 'Keys are deductions names, values are percentages',

address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> COMMENT 'Home address',

country STRING,

state STRING

)

COMMENT 'Table holds information about employees to be used for partitions later'

row format delimited

fields terminated by ','

collection items terminated by '$'

map keys terminated by '#'

TBLPROPERTIES ('creator'='Inam', 'created_at'='2017-05-17 12:55:00','project'='Hive POC')

;

Now load data into non-partitioned table

hive (scott)> load data local inpath '/tmp/mydata/employees_part1.csv' into table scott.employees_nonpart;

Loading data to table scott.employees_nonpart

Time taken: 0.589 seconds

hive (scott)> load data local inpath '/tmp/mydata/employees_part2.csv' into table scott.employees_nonpart;

Loading data to table scott.employees_nonpart

Time taken: 0.28 seconds

hive (scott)> load data local inpath '/tmp/mydata/employees_part3.csv' into table scott.employees_nonpart;

Loading data to table scott.employees_nonpart

Time taken: 0.211 seconds

hive (scott)> select * from employees_nonpart;

Now load non-partitioned table data to partitioned table employees_part (created earlier)
To load partitioned table we use below commands:

hive (scott)> SET hive.exec.dynamic.partition = true;

hive (scott)> SET hive.exec.dynamic.partition.mode = nonstrict;

hive (scott)> INSERT OVERWRITE TABLE employees_part PARTITION(country, state) SELECT * from employees_nonpart;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

Query ID = hdpclient_20170426145142_d2035d89-2b7c-4a26-a9f1-c34b60bbb2e7

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1493031715875_0014, Tracking URL = http://nn01:8088/proxy/application_1493031715875_0014/

Kill Command = /usr/hadoopsw/hadoop-2.7.3/bin/hadoop job -kill job_1493031715875_0014

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-04-26 14:51:49,155 Stage-1 map = 0%, reduce = 0%

2017-04-26 14:51:54,361 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.75 sec

MapReduce Total cumulative CPU time: 2 seconds 750 msec

Ended Job = job_1493031715875_0014

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to directory hdfs://nn01:9000/user/hive/warehouse/scott.db/employees_part/.hive-staging_hive_2017-04-26_14-51-42_297_7475299094026344956-1/-ext-10000

Loading data to table scott.employees_part partition (country=null, state=null)

Loaded : 4/4 partitions.

Time taken to load dynamic partitions: 0.475 seconds

Time taken for adding to write entity : 0.001 seconds

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 2.75 sec HDFS Read: 7354 HDFS Write: 751 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 750 msec

employees_nonpart.name employees_nonpart.salary employees_nonpart.subordinates employees_nonpart.deductions employees_nonpart.address employees_nonpart.country employees_nonpart.state

Time taken: 14.228 seconds

Now verify your dynamic partitions are there

hive (scott)> !hdfs dfs -ls /user/hive/warehouse/scott.db/employees_part;

Found 2 items

drwxrwxrwx - hdpclient supergroup 0 2017-04-26 14:52 /user/hive/warehouse/scott.db/employees_part/country=Pakistan

drwxrwxrwx - hdpclient supergroup 0 2017-04-26 14:52 /user/hive/warehouse/scott.db/employees_part/country=Saudia

Show partitions

hive (scott)> SHOW PARTITIONS employees_part;

partition

country=Pakistan/state=Punjab

country=Pakistan/state=Sindh

country=Saudia/state=Qaseem

country=Saudia/state=Riyadh

Time taken: 0.084 seconds, Fetched: 4 row(s)

hive (scott)> SHOW PARTITIONS employees_part PARTITION(country='Pakistan');

partition

country=Pakistan/state=Punjab

country=Pakistan/state=Sindh

Time taken: 0.086 seconds, Fetched: 2 row(s)

hive (scott)> SELECT * FROM employees_part

> WHERE country = 'Saudia' AND state = 'Riyadh';

employees_part.name employees_part.salary employees_part.subordinates employees_part.deductions employees_part.address employees_part.country employees_part.state

Saleem 1000.0 ["Ikram","Usman",""] {"pf":3.5,"loan":9.4} {"street":"College Rd.","city":"xxxx","state":"yyyyy","zip":41000} Saudia Riyadh

Time taken: 0.496 seconds, Fetched: 1 row(s)

Once created, the partition keys (country and state) behave like regular columns. Perhaps the most important reason to partition data is for faster queries. In the previous

query, which limits the results to employees in Riyadh, it is only necessary to scan the

contents of one directory even if we have thousands of country and state directories. For very large data sets, partitioning can dramatically improve query performance, but only if the partitioning scheme reflects common range filtering (e.g., by locations, timestamp ranges).

Hive Strict mode

Remember, a query across all partitions could trigger an enormous MapReduce job if the

table data and number of partitions are large. A highly suggested safety measure is

putting Hive into “strict” mode, which prohibits queries of partitioned tables without

a WHERE clause that filters on partitions. You can set the mode to “nonstrict,” as in the following session:

hive (scott)> set hive.mapred.mode=strict;

hive (scott)> SELECT ep.name, ep.salary FROM employees_part ep LIMIT 100;

FAILED: SemanticException Queries against partitioned tables without a partition filter are disabled for safety reasons. If you know what you are doing, please make sure that hive.strict.checks.large.query is set to false and that hive.mapred.mode is not set to 'strict' to enable them. No partition predicate for Alias "ep" Table "employees_part"

hive (scott)> set hive.mapred.mode=nonstrict;

hive (scott)> SELECT ep.name, ep.salary FROM employees_part ep LIMIT 100;

ep.name ep.salary

Abuzar 1000.0

Ali 2000.0

Zeeshan 1000.0

Hafiz 2000.0

Danish 2000.0

Saleem 1000.0

Time taken: 0.171 seconds, Fetched: 6 row(s)

Static Partition
We have below csv and we want to create the partition for it. For static partition you don't need to have the partition columns' data in your csv.
employee_part4.csv
Aslam,8000,Ikram$Usman$,pf#3.5$loan#9.4,College Rd.$Lahore$Punjab$41000
Akram,9000,Ikram$Usman$,pf#5.5$loan#7.4,College Rd.$Lahore$Punjab$41000

hive (scott)> LOAD DATA LOCAL INPATH '/tmp/mydata/employees_part4.csv'
> INTO TABLE employees_part
> PARTITION (country = 'UAE', state = 'Dubai');
Loading data to table scott.employees_part partition (country=UAE, state=Dubai)
OK
Time taken: 0.409 seconds

Verify partitions

hive (scott)> !hdfs dfs -ls /user/hive/warehouse/scott.db/employees_part;
Found 3 items
drwxrwxrwx - hdpclient supergroup 0 2017-04-26 14:52 /user/hive/warehouse/scott.db/employees_part/country=Pakistan
drwxrwxrwx - hdpclient supergroup 0 2017-04-26 14:52 /user/hive/warehouse/scott.db/employees_part/country=Saudia
drwxrwxrwx - hdpclient supergroup 0 2017-04-26 15:54 /user/hive/warehouse/scott.db/employees_part/country=UAE

External Partitioned Tables:

You can use partitioning with external tables. In fact, you may find that this is your most common scenario for managing large production data sets. The combination gives you a way to “share” data with other tools, while still optimizing query performance. Consider an example that fits this scenario well ie; logfile analysis.

I've a transaction log (5GB), first I checked the content to know what data is available.

hive (scott)> !head -2 /tmp/mydata/translog.csv;

"RECORD_ID","EMP_ID","REQUEST_TYPE","JSON_INPUT","RESULT_CODE","RESULT_DESCRIPTION","ERROR_TYPE","START_TIME","PROCCESSING_TIME","REQUEST_CHANNEL","CUSTOM_1","CUSTOM_2","CUSTOM_3","CUSTOM_4","CUSTOM_5","CUSTOM_6"

273483692,1084721533,"LOAD_EMP_PROFILE",1084721533,0,"?? ??? ??????? ?????","",03-DEC-16 02.47.57.856000000 AM,7,"WEB","(Google Chrome): Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36","","","192.168.155.140 ","",""

hive (scott)>

Observe that first line contains columns and rest is data. In order to load into HDFS (Table) we need to remove the first line in this csv file and then create table for this file.

Delete first line from the file
[root@en01 ~]# sed -i 1d /tmp/mydata/translog.csv

Verify first line is deleted

hive (scott)> !head -1 /tmp/mydata/translog.csv;

CREATE EXTERNAL TABLE translog(
RECORD_ID STRING,
EMP_ID STRING,
REQUEST_TYPE STRING,
JSON_INPUT STRING,
RESULT_CODE STRING,
RESULT_DESCRIPTION STRING,
ERROR_TYPE STRING,
START_TIME STRING,
PROCCESSING_TIME STRING,
REQUEST_CHANNEL STRING,
CUSTOM_1 STRING,
CUSTOM_2 STRING,
CUSTOM_3 STRING,
CUSTOM_4 STRING,
CUSTOM_5 STRING,
CUSTOM_6 STRING
)
PARTITIONED BY (YEAR int)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

;

See here we are not determining any location for this external table, we will specify the location while loading data to a partition of this table. You need to add partition to this table first and then you will load data.

Adding Partition: An ALTER TABLE statement is used to add each partition separately. It must specify a value for each partition key, the year in our case.

hive (scott)> ALTER TABLE translog ADD PARTITION(YEAR = 2016) LOCATION '/userdata/translog/2016';

Time taken: 0.073 seconds

hive (scott)>

The directory convention we used above is completely up to us. Here, we follow a hierarchical directory structure, because it’s a logical way to organize our data, but there is no requirement to do so.

Now load data to this partition

hive (scott)> LOAD DATA LOCAL INPATH '/tmp/mydata/translog.csv' into table scott.translog PARTITION (YEAR=2016);
Loading data to table scott.translog partition (year=2016)
OK
Time taken: 48.713 seconds
hive (scott)>

hive (scott)> select count(RECORD_ID) from translog;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

Query ID = hdpclient_20170427104243_2de52449-d8c8-4798-bf22-f4f46f96f4d2

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1493031715875_0016, Tracking URL = http://nn01:8088/proxy/application_1493031715875_0016/

Kill Command = /usr/hadoopsw/hadoop-2.7.3/bin/hadoop job -kill job_1493031715875_0016

Hadoop job information for Stage-1: number of mappers: 20; number of reducers: 1

2017-04-27 10:42:49,988 Stage-1 map = 0%, reduce = 0%

2017-04-27 10:42:58,326 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 48.43 sec

2017-04-27 10:42:59,358 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 65.38 sec

2017-04-27 10:43:00,386 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 109.71 sec

2017-04-27 10:43:02,451 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 116.71 sec

2017-04-27 10:43:03,478 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 151.69 sec

2017-04-27 10:43:04,508 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 172.22 sec

2017-04-27 10:43:05,540 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 175.18 sec

MapReduce Total cumulative CPU time: 2 minutes 55 seconds 180 msec

Ended Job = job_1493031715875_0016

MapReduce Jobs Launched:

Stage-Stage-1: Map: 20 Reduce: 1 Cumulative CPU: 175.18 sec HDFS Read: 5194481859 HDFS Write: 108 SUCCESS

Total MapReduce CPU Time Spent: 2 minutes 55 seconds 180 msec

14531811

Time taken: 24.327 seconds, Fetched: 1 row(s)

hive (scott)>

Add another partition

hive (scott)> ALTER TABLE translog ADD PARTITION(YEAR = 2017) LOCATION '/userdata/translog/2017';

Time taken: 0.065 seconds

hive (scott)> show partitions translog;

partition

year=2016

year=2017

Time taken: 0.077 seconds, Fetched: 2 row(s)

Now you can load data to this new partition.

hive (scott)> LOAD DATA LOCAL INPATH '/tmp/mydata/translog.csv' into table scott.translog PARTITION (YEAR=2017);

Loading data to table scott.translog partition (year=2017)

Time taken: 47.846 seconds

See some performance benefits of partition

hive (scott)> select count(RECORD_ID) from translog;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

Query ID = hdpclient_20170427113327_6b0bdbc7-3c90-4f49-8d28-7b6ad464da0c

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1493031715875_0024, Tracking URL = http://nn01:8088/proxy/application_1493031715875_0024/

Kill Command = /usr/hadoopsw/hadoop-2.7.3/bin/hadoop job -kill job_1493031715875_0024

Hadoop job information for Stage-1: number of mappers: 39; number of reducers: 1

2017-04-27 11:33:33,067 Stage-1 map = 0%, reduce = 0%

2017-04-27 11:33:41,325 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 47.02 sec

2017-04-27 11:33:44,428 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 55.76 sec

2017-04-27 11:33:45,459 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 121.97 sec

2017-04-27 11:33:46,489 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 138.21 sec

2017-04-27 11:33:47,516 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 185.94 sec

2017-04-27 11:33:48,543 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 240.19 sec

2017-04-27 11:33:51,625 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 250.42 sec

2017-04-27 11:33:52,649 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 293.86 sec

2017-04-27 11:33:54,698 Stage-1 map = 90%, reduce = 30%, Cumulative CPU 294.7 sec

2017-04-27 11:33:55,729 Stage-1 map = 92%, reduce = 30%, Cumulative CPU 305.05 sec

2017-04-27 11:33:56,757 Stage-1 map = 97%, reduce = 30%, Cumulative CPU 324.86 sec

2017-04-27 11:33:57,782 Stage-1 map = 97%, reduce = 32%, Cumulative CPU 324.94 sec

2017-04-27 11:33:58,807 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 337.91 sec

2017-04-27 11:34:00,855 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 340.31 sec

MapReduce Total cumulative CPU time: 5 minutes 40 seconds 310 msec

Ended Job = job_1493031715875_0024

MapReduce Jobs Launched:

Stage-Stage-1: Map: 39 Reduce: 1 Cumulative CPU: 340.31 sec HDFS Read: 10388960288 HDFS Write: 108 SUCCESS

Total MapReduce CPU Time Spent: 5 minutes 40 seconds 310 msec

29063622

Time taken: 34.146 seconds, Fetched: 1 row(s)

hive (scott)> select count(RECORD_ID) from translog where year=2016;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

Query ID = hdpclient_20170427113617_2f9d44ec-202a-49c2-92b5-d416c82801b9

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Job = job_1493031715875_0025, Tracking URL = http://nn01:8088/proxy/application_1493031715875_0025/

Kill Command = /usr/hadoopsw/hadoop-2.7.3/bin/hadoop job  -kill job_1493031715875_0025

Hadoop job information for Stage-1: number of mappers: 20; number of reducers: 1

2017-04-27 11:36:22,653 Stage-1 map = 0%,  reduce = 0%

2017-04-27 11:36:29,880 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 7.67 sec

2017-04-27 11:36:30,908 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 38.78 sec

2017-04-27 11:36:31,934 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 46.55 sec

2017-04-27 11:36:32,958 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 73.47 sec

2017-04-27 11:36:33,984 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 110.84 sec

2017-04-27 11:36:35,008 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 117.97 sec

2017-04-27 11:36:36,032 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 145.64 sec

2017-04-27 11:36:37,058 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 174.74 sec

2017-04-27 11:36:39,112 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 177.62 sec

MapReduce Total cumulative CPU time: 2 minutes 57 seconds 620 msec

Ended Job = job_1493031715875_0025

MapReduce Jobs Launched:

Stage-Stage-1: Map: 20  Reduce: 1   Cumulative CPU: 177.62 sec   HDFS Read: 5194481084 HDFS Write: 108 SUCCESS

Total MapReduce CPU Time Spent: 2 minutes 57 seconds 620 msec

OK

c0

14531811

Time taken: 22.753 seconds, Fetched: 1 row(s)

hive (scott)>

Drop a partition

hive (scott)> ALTER TABLE translog drop PARTITION(YEAR = 2017);

Dropped the partition year=2017

Time taken: 0.095 seconds

5 comments:

flourish loan said...: Do you need a Loan?
Are you looking for Finance?
Are you looking for a Loan to enlarge your business?
I think you have come to the right place.
We offer Loans at low interest rate.
Interested people should please contact us on
For immediate response to your application, Kindly
reply to this emails below only:
flourishloancredite@gmail.com

Please, do provide us with the Following information if interested.
LOAN APPLICATION INFORMATION FORM
First name:
Middle name:
Date of birth (yyyy-mm-dd):
Gender:
Marital status:
Total Amount Needed:
Time Duration:
Address:
City:
State/province:
Zip/postal code:
Country:
Phone:
Mobile/cellular:
Monthly Income:
Occupation:
Which sites did you know about us.....
flourishloancredite@gmail.com; March 06, 2018 11:56 PM
veera cynixit said...: Excellent Blog.
Thank you,Keep updating.

hadoop admin training
hadoop admin online course; July 13, 2020 3:29 PM
Mohamed said...: Dear Sir/Madam, We Offer all types of loans * Commercial Loan *Personal Loans* Business Loan* Payday Loan* Student Loan* Mortgage Loan* Auto Loans* Bad credit Loan* Home equity Loan*Project/Contract Loan* Debt Consolidation The following types of loans are also offered Instant Loans -$3,000- $5,000 (1-2 Hours)Cash Loans - $3,000 - $10'000 (2-24 Hours)Personal Loans -$10,000 - $150,000 (1-2 Days) We offer loans with very Minimal annual Interest Rates as Low as 3% within 1 year to 25 years maximum repayment plan to any part of the world. We give out loans within the range of $3,000 USD to $50,000,000 USD. Our loans are well insured for maximum security is our priority. For Enquires, Interested Persons should contact us via E-mail: rooseveltduncan74@gmail.com Call or add us on what's app +44-7424223395; May 12, 2021 3:33 PM
patialalegitimate said...: DO YOU NEED A LOAN? patialalegitimate515@gmail.com
We give out loans with an affordable interest rate of 2% Captain One provide Excellent and Professional Financial Services and we are known and genuine money lenders across the globe
Our services include the following:
*Student Loans
Truck Loan
Personal Loan
Debt consolidation loan
Car Loans
Business Loan
Student Loan
Mortgage Loan
Refinancing Loan
Home Loan
Improvement loan
Construction Loan
Looking forward to receive your reply and I will advise you on how to get the fund to your bank account no matter your location no collateral for more information on how to get started: (Whats App) number: +919394133968 patialalegitimate515@gmail.com Mr Jeffery; August 21, 2021 8:03 AM
KITS Technologies said...: I think this is a real great article post.Thanks Again. Really Cool.
hadoop training
hadoop online training; December 21, 2021 3:17 PM

DBMentors - Inam Bukhari's Blog

Pages

Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Thursday, April 27, 2017

Hive for Oracle Developers and DBAs - Part II

5 comments:

Translate

Followers

Labels

Blog Archive

About Me

Total Pageviews