Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Sunday, April 23, 2017

Hive Installation and Configuration


What is Hive?


Apache Hive (originated in Facebook) is a Data warehouse system which is built to work on Hadoop to manage large datasets residing in HDFS. Hive is designed to enable easy data summarization, ad-hoc querying and analysis of large volumes of data using SQL At the same time, Hive's SQL gives users multiple places to integrate their own functionality to do custom analysis, such as User Defined Functions (UDFs). It is not designed for online transaction processing and best used for traditional data warehousing tasks.

Please be noted, Hadoop is a batch processing system and Hadoop jobs tend to have high latency and incur substantial overheads in job submission and scheduling. As a result - latency for Hive queries is generally very high (minutes) even when data sets involved are very small (say a few hundred megabytes). As a result it cannot be compared with systems such as Oracle where analyses are conducted on a significantly smaller amount of data but the analyses proceed much more iteratively with the response times between iterations being less than a few minutes. Hive aims to provide acceptable (but not optimal) latency for interactive data browsing, queries over small data sets or test queries. Hive is not designed for online transaction processing and does not offer 
real-time queries and row level updates. It is best used for batch jobs over large sets of immutable data (like web logs). 

Hive needs a metastore for its tables and partitions access so it requires a RDBMS eg; Oracle or MySQL. But we will use Derby instead. Hive provides a SQL dialect, called Hive Query Language (abbreviated HiveQL or just HQL) for querying data stored in a Hadoop cluster.

Environment
- Already running 3 node Hadoop cluster 2.7.3 (1 name node, 2 data nodes) on RHEL 7.3
- We have 1 more node (hadoopedge1) as Hadoop client 
- We want to install Hive on hadoopedge1 node for analyzing data in HDFS using SQL


Installing Hive                                         


Prerequisites

Step 1: Check JAVA Installation
[hdpclient@hadoopedge1 ~]$ java -version
openjdk version "1.8.0_102"
OpenJDK Runtime Environment (build 1.8.0_102-b14)


Step 2: Verifying Hadoop Installation
[hdpclient@hadoopedge1 ~]$ hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/hadoopsw/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar


Step 3: Downloading and extracting Hive 
Download hive from http://www-us.apache.org/dist/hive/hive-2.1.1/

Place downloaded file in the folder /usr/hadoopsw


Extracting and verifying Hive Archive
The following command is used to verify the download and extract the hive archive:
[root@hadoopedge1 hadoopsw]# pwd
/usr/hadoopsw

[root@hadoopedge1 hadoopsw]# ls -ltr
total 355900
drwxr-xr-x. 9 hdpclient hadoop_edge       149 Aug 18 04:49 hadoop-2.7.3
-rw-r--r--. 1 root      root        214092195 Feb  1 12:08 hadoop-2.7.3.tar.gz
-rw-r--r--. 1 root      root        149756462 Feb 12 14:07 apache-hive-2.1.1-bin.tar.gz
-rw-r--r--. 1 root      root           587484 Feb 12 14:10 db-derby-10.13.1.1-bin.tar.gz

[root@hadoopedge1 hadoopsw]# tar zxvf apache-hive-2.1.1-bin.tar.gz
[root@hadoopedge1 hadoopsw]# ls -ltr
total 355900
drwxr-xr-x. 9 hdpclient hadoop_edge       149 Aug 18 04:49 hadoop-2.7.3
-rw-r--r--. 1 root      root        214092195 Feb  1 12:08 hadoop-2.7.3.tar.gz
-rw-r--r--. 1 root      root        149756462 Feb 12 14:07 apache-hive-2.1.1-bin.tar.gz
-rw-r--r--. 1 root      root           587484 Feb 12 14:10 db-derby-10.13.1.1-bin.tar.gz
drwxr-xr-x. 9 root      root              171 Feb 12 18:08 apache-hive-2.1.1-bin

[root@hadoopedge1 hadoopsw]# chown -R hdpclient:hadoop_edge apache-hive-2.1.1-bin
[root@hadoopedge1 hadoopsw]# ls -ltr
total 355900
drwxr-xr-x. 9 hdpclient hadoop_edge       149 Aug 18 04:49 hadoop-2.7.3
-rw-r--r--. 1 root      root        214092195 Feb  1 12:08 hadoop-2.7.3.tar.gz
-rw-r--r--. 1 root      root        149756462 Feb 12 14:07 apache-hive-2.1.1-bin.tar.gz
-rw-r--r--. 1 root      root           587484 Feb 12 14:10 db-derby-10.13.1.1-bin.tar.gz
drwxr-xr-x. 9 hdpclient hadoop_edge       171 Feb 12 18:08 apache-hive-2.1.1-bin


Step 4: Setting up environment for Hive
You need to set up the Hive environment by appending the following lines to .bash_profile
[hdpclient@hadoopedge1 ~]$ vi .bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/.local/bin:$HOME/bin
export PATH
##Hadoop Variables
export HADOOP_HOME=/usr/hadoopsw/hadoop-2.7.3
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
#Hive Variables
export HIVE_HOME=/usr/hadoopsw/apache-hive-2.1.1-bin
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib/*:.
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/*:.

[hdpclient@hadoopedge1 ~]$ source .bash_profile
[hdpclient@hadoopedge1 ~]$ echo $HIVE_HOME
/usr/hadoopsw/apache-hive-2.1.1-bin


Hive software installation is completed successfully. Now you require an external database server to configure Metastore. We will use Apache Derby database.

Installing/Configuring Apache Derby      

Apache Derby, an Apache DB subproject, is a relational database implemented in Java. Its footprint is so small that you can easily embed it in any Java-based solution. In addition to its embedded framework, Derby supports a more familiar client/server framework with the Derby Network Server. 

Step 1: Extracting and verifying Derby archive


The following commands are used for extracting and verifying the Derby archive:

[root@hadoopedge1 hadoopsw]# tar zxvf db-derby-10.13.1.1-bin.tar.gz
[root@hadoopedge1 hadoopsw]# ls -ltr
total 373416
drwxr-xr-x. 9 hdpclient hadoop_edge       149 Aug 18 04:49 hadoop-2.7.3
-rw-r--r--. 1 root      root        214092195 Feb  1 12:08 hadoop-2.7.3.tar.gz
-rw-r--r--. 1 root      root        149756462 Feb 12 14:07 apache-hive-2.1.1-bin.tar.gz
-rw-r--r--. 1 root      root         18523564 Feb 12 15:53 db-derby-10.13.1.1-bin.tar.gz
drwxr-xr-x. 9 hdpclient hadoop_edge       171 Feb 12 18:08 apache-hive-2.1.1-bin
drwxr-xr-x. 8 root      root              164 Feb 12 18:32 db-derby-10.13.1.1-bin

[root@hadoopedge1 hadoopsw]# chown -R hdpclient:hadoop_edge db-derby-10.13.1.1-bin

Step 2: Setting up environment for Derby
[hdpclient@hadoopedge1 ~]$ vi .bash_profile

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/.local/bin:$HOME/bin
export PATH
##Hadoop Variables
export HADOOP_HOME=/usr/hadoopsw/hadoop-2.7.3
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
#Hive Variables
export HIVE_HOME=/usr/hadoopsw/apache-hive-2.1.1-bin
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib/*:.
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/*:.
#Derby Variables
export DERBY_HOME=/usr/hadoopsw/db-derby-10.13.1.1-bin
export PATH=$PATH:$DERBY_HOME/bin
export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbyclient.jar:$DERBY_HOME/lib/derbytools.jar
export DERBY_OPTS=-Dij.protocol=jdbc:derby://hadoopedge1/

[hdpclient@hadoopedge1 ~]$ source .bash_profile
[hdpclient@hadoopedge1 ~]$ echo $DERBY_HOME
/usr/hadoopsw/db-derby-10.13.1.1-bin

Edit the hosts file and add entry for the hadoopedge1 node where we are installing hive
-- added in hosts file
[hdpsysuser@hdpmaster ~]$ sudo vi /etc/hosts
[hdpsysuser@hdpmaster ~]$ cat /etc/hosts
192.168.44.170  hdpmaster
192.168.44.171  hdpslave1
192.168.44.172  hdpslave2
192.168.44.161  hadoopedge1


Step 3: Run/verify the derby network server

You can run the derby database as network server so that you can access the same database from any network node if you want.

[hdpclient@hadoopedge1 ~]$ /usr/hadoopsw/db-derby-10.13.1.1-bin/bin/startNetworkServer -h hadoopedge1 -p 1527 &


--get info about network derby server
[hdpclient@hadoopedge1 bin]$ ./NetworkServerControl sysinfo -h hadoopedge1

--get db info on network derby server
[hdpclient@hadoopedge1 bin]$ ./dblook -d 'jdbc:derby://hadoopedge1/hive_metastore_db' -- created in next step


Step 4: Create database for Hive in derby database

--run ij utility located in DERBY_HOEM/bin
[hdpclient@hadoopedge1 bin]$ ./ij
ij version 10.13
ij>

--create database on derby
ij> CONNECT 'jdbc:derby://hadoopedge1:1527/hive_metastore_db;create=true';

hive_metastore_db is created in /home/hdpclient now

--create table using ij for testing purpose
ij> CREATE TABLE TABLE1(ID INT PRIMARY KEY,NAME VARCHAR(14));
0 rows inserted/updated/deleted

-- Insert three records into the table.
ij> INSERT INTO TABLE1 VALUES
(100,'ONE HUNDRED'),(200,'TWO HUNDRED'),(300,'THREE HUNDRED');
3 rows inserted/updated/deleted

-- Select all of the records in the table.
ij> SELECT * FROM TABLE1;
ID         |NAME
--------------------------
100        |ONE HUNDRED
200        |TWO HUNDRED
300        |THREE HUNDRED

3 rows selected
ij>

ij> SELECT * FROM TABLE1 WHERE ID=200;
ID         |NAME
--------------------------
200        |TWO HUNDRED

1 row selected

OPTIONAL: Now as Derby db (hive_meta_store)  is functioning fine for you , you can access this database from any other machine if you want , just you need to use some client or you can install Derby and use ji utility. I used the same derby database from another machine in my network i.e; hdpslave1 as I wanted to use it as my Hive client. I installed hive and derby on the hdpclient and then performed below steps.

[hdpsysuser@hdpslave1 ~]$ tar -xzf apache-hive-2.1.1-bin.tar.gz
[hdpsysuser@hdpslave1 ~]$ tar -xzf db-derby-10.13.1.1-bin.tar.gz

-- Check the connectivity of derby software here as client to hadoopedge1 (derby server)
[hdpsysuser@hdpslave1 ~]$ nc -v hadoopedge1  1527
Ncat: Version 6.40 ( http://nmap.org/ncat )
Ncat: Could not resolve hostname "hadoopedg1": Name or service not known. QUITTING.

--edit the hosts file and add hadoopedge1 entry and now it is connected
[hdpsysuser@hdpslave1 ~]$ nc -v hadoopedge1  1527
Ncat: Version 6.40 ( http://nmap.org/ncat )
Ncat: Connected to 192.168.44.161:1527.

-- ping the Derby network server on hadoopedge1
[hdpsysuser@hdpslave1 ~]$ cd /usr/hadoopsw/db-derby-10.13.1.1-bin/bin/
[hdpsysuser@hdpslave1 bin]$ ./NetworkServerControl ping -h hadoopedge1 -p 1527

Thu Feb 23 11:52:06 AST 2017 : Connection obtained for host: hadoopedge1, port number 1527.

-- run the ij utility and check the hive_metastore_db 
[hdpsysuser@hdpslave1 bin]$ ./ij
ij version 10.13
ij>

ij> CONNECT 'jdbc:derby://hadoopedge1:1527/hive_metastore_db;';

ij> show tables;
TABLE_SCHEM         |TABLE_NAME                    |REMARKS
------------------------------------------------------------------
SYS                 |SYSALIASES                    |
SYS                 |SYSCHECKS                     |
SYS                 |SYSCOLPERMS                   |
SYS                 |SYSCOLUMNS                    |

ij> select * from table1;
ID         |NAME
--------------------------
100        |ONE HUNDRED
200        |TWO HUNDRED
300        |THREE HUNDRED

3 rows selected

ij>

Here you see your Derby DB is accessible from network node, you can configure Hive on that network node also.


Configuring Metasotre for Hive               

Configuring Metastore means specifying to Hive where the database is stored. You can do this by editing the hive-site.xml file, which is in the $HIVE_HOME/conf directory. First of all, copy the template file using the following command:

Step 1: Set the hive-env.sh

To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed in the $HIVE_HOME/conf directory.  Use the following commands, redirect to Hive config folder and copy the template file:

[hdpclient@hadoopedge1 ~]$ cd $HIVE_HOME/conf
[hdpclient@hadoopedge1 conf]$ cp hive-env.sh.template hive-env.sh

Edit the hive-env.sh file by appending the following line:
vi hive-env.sh
export HADOOP_HOME=$HADOOP_HOME

Step 2: Set properties in hive-site.xml and initialize schema for hive

Edit hive-site.xml and append/modify the following lines between the <configuration> and </configuration> tags:

[hdpclient@hadoopedge1 conf]$ vi hive-site.xml
[hdpclient@hadoopedge1 ~]$ cat /usr/hadoopsw/apache-hive-2.1.1-bin/conf/hive-site.xml

<configuration>
<property>
  <name>hive.metastore.local</name>
  <value>false</value>
</property>
<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>

  </property>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:derby://hadoop1:1527/hive_metastore_db;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.ClientDriver</value>
  <description>Driver class name for a JDBC metastore</description>

</property>
</configuration>

hive.metastore.warehouse.dir property determines the location where hive contents will be stored in HDFS, you should have created this direcotry on HDFS before working with Hive like below.

[hdpsysuser@hdpmaster ~]$ hdfs dfs -mkdir -p /user/hive/warehouse

[hdpsysuser@hdpmaster ~]$ hdfs dfs -mkdir -p /tmp/hive
[hdpsysuser@hdpmaster ~]$ hdfs dfs -chmod 777 /tmp
[hdpsysuser@hdpmaster ~]$ hdfs dfs -chmod 777 /user/hive/warehouse

[hdpsysuser@hdpmaster ~]$ hdfs dfs -chmod 777 /tmp/hive

Use offline tool for Hive metastore schema manipulation called schematool. This tool can be used to initialize the metastore schema for Hive. 



[hdpclient@hadoopedge1 ~]$ schematool -dbType derby -initSchema hive_metastore_db
which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hdpclient/.local/bin:/home/hdpclient/bin:/usr/java/default/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/apache-hive-2.1.1-bin/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin)
Metastore connection URL:        jdbc:derby://hadoopedge1:1527/hive_metastore_db;create=true
Metastore Connection Driver :    org.apache.derby.jdbc.ClientDriver
Metastore connection User:       APP
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.derby.sql
Initialization script completed
schemaTool completed


Optional: Local storage for Hive (no Hadoop Required if using Presto)

If you wish to use the local file system instead of HDFS, you could make changes to the below properties. You may work with local storage if you use presto as your query layer. This way you can use other storages as well like minio, ceph etc.

--Config hive to use local storage hive-site.xml
<property>
       <name>hive.metastore.schema.verification</name> 
       <value>false</value> 
 </property> 

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>file:///data/hivelocalfs</value>
    <description>location of default database for the warehouse</description>

</property>

 <property> 
      <name>fs.default.name</name> 
      <value>file:///data/hivelocalfs</value> 
   </property>

Optional: mySQL as metastore
You can use mysql instead of derby if you want.

Download mysql-connector-java-5.0.5.jar file and copy it to $HIVE_HOME/lib directory.

Create database in mysql to have it as metastore for Hive.

[root@en01 hdpclient]# mysql -u root -p
mysql> create database hivemetastore;
mysql>CREATE USER 'hiveuser'@'en01' IDENTIFIED BY 'P@ssw0rd';
mysql>GRANT all on *.* to 'hiveuser'@'en01' identified by 'P@ssw0rd' with grant option;
mysql> flush privileges;


[hdpclient@en01 ~]$ schematool -initSchema -dbType mysql -userName hiveuser -passWord 'P@ssw0rd' -verbose


Edit hive-site.xml and add configuration related to mysql

<configuration>
<property>
<name>hive.metastore.local</name>
<value>faslse</value>
</property>

<property>
   <name>hive.metastore.warehouse.dir</name>
   <value>/user/hive/warehouse</value>
   <description>location of default database for the warehouse</description>
</property>



<property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://en01:3306/hivemetastore?createDatabaseIfNotExist=true</value>
            <description>metadata is stored in a MySQL server</description>
         </property>

         <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
            <description>MySQL JDBC driver class</description>
         </property>

         <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>hiveuser</value>
            <description>user name for connecting to mysql server </description>
         </property>

         <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>P@ssw0rd</value>
            <description>password for connecting to mysql server </description>
         </property>



<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
<description>   Setting this property to true will have HiveServer2 execute   Hive operations as the user making the calls to it. </description>
</property>



</configuration>


Hive metastore service
The Hive metastore could be made available as a service. All you need to do is run the following command on the terminal of your machine:

hive --service metastore &
In the case of a remote metastore configuration, all clients connect to the metastore service to query the underlying datastore (MySQL, Oracle, and so on). The communication is done through the Thrift protocol. At the client's side, a user needs to add the below configuration atleast to make the client connect to a metastore service.

hive.metastore.uri=thrift://localhost:9083

Step 3: Run hive now

[hdpclient@hadoopedge1 ~]$ hive


which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/apache-hive-2.1.1-bin/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin)

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/usr/hadoopsw/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoopsw/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
[Fatal Error] hive-site.xml:19:1: XML document structures must start and end within the same entity.

You got error SLF4J bindings because log4j-slf4j-impl-2.4.1.jar is existing at two locations (both locations are included in classpath) HIVE_HOME and HADOOP_HOME

Delete log4j-slf4j-impl-2.4.1.jar from /usr/hadoopsw/apache-hive-2.1.1-bin/lib

[hdpclient@hadoopedge1 ~]$ rm /usr/hadoopsw/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar

Try again to run hive


[hdpclient@hadoopedge1 ~]$ hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/apache-hive-2.1.1-bin/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin)

Logging initialized using configuration in jar:file:/usr/hadoopsw/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:591)
... <<output truncated>>
... <<output truncated>>

Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "BONECP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("org.apache.derby.jdbc.ClientDriver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

From the error seems issue with classpath


[hdpclient@hadoopedge1 ~]$ ll /usr/hadoopsw/hadoop-2.7.3/share/hadoop/common/lib/derb*
ls: cannot access /usr/hadoopsw/hadoop-2.7.3/share/hadoop/common/lib/derb*: No such file or directory

HADOOP_HOME/lib is not having any derby related jar (derbyclient) which is required

[hdpclient@hadoopedge1 ~]$ cp /usr/hadoopsw/db-derby-10.13.1.1-bin/lib/derbyclient.jar /usr/hadoopsw/hadoop-2.7.3/share/hadoop/common/lib/

[hdpclient@hadoopedge1 ~]$ ll /usr/hadoopsw/apache-hive-2.1.1-bin/lib/derb*

-rw-r--r--. 1 hdpsysuser hdpsysuser 2838580 Nov 28 21:11 /usr/hadoopsw/apache-hive-2.1.1-bin/lib/derby-10.10.2.0.jar

Run hive again now 

[hdpsysuser@hdpslave1 bin]$ hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/apache-hive-2.1.1-bin/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin)

Logging initialized using configuration in jar:file:/usr/hadoopsw/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive>

You got the hive prompt now, run some hive commands;

Note: If you still encounter any hive related error then run the hive in debug mode to get more details eg;

[hdpclient@hadoopedge1 ~]$ hive -hiveconf hive.root.logger=DEBUG,console

hive> show tables;
OK
dept
Time taken: 1.189 seconds, Fetched: 1 row(s)
hive>

hive> select * from dept;
OK
1       Ali     NULL
2       Zeeshan NULL
3       Saleem  NULL
Time taken: 1.403 seconds, Fetched: 3 row(s)
hive>


Congratulations! your hive is working perfectly now.

3 comments:

WAZEEM said...

Excellent

basanth said...

Thanks for youre Hive Installation and Configuration step. Excellent work!

I have one doubt regarding ==> which: no hbase in
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin:/usr/hadoopsw/.local/bin:/usr/hadoopsw/bin:/usr/hadoopsw/hadoop-2.7.3/sbin:/usr/hadoopsw/hadoop-2.7.3/bin:/usr/hadoopsw/apache-hive-2.1.1-bin/bin:/usr/hadoopsw/db-derby-10.13.1.1-bin/bin)

==> should we ignore this error and what is this no hbase in means?

Thank you

patialalegitimate said...

DO YOU NEED A LOAN? patialalegitimate515@gmail.com
We give out loans with an affordable interest rate of 2% Captain One provide Excellent and Professional Financial Services and we are known and genuine money lenders across the globe
Our services include the following:
*Student Loans
Truck Loan
Personal Loan
Debt consolidation loan
Car Loans
Business Loan
Student Loan
Mortgage Loan
Refinancing Loan
Home Loan
Improvement loan
Construction Loan
Looking forward to receive your reply and I will advise you on how to get the fund to your bank account no matter your location no collateral for more information on how to get started: (Whats App) number: +919394133968 patialalegitimate515@gmail.com Mr Jeffery