Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Tuesday, February 21, 2017

Setting up Hadoop Edge/Gateway Node (Hadoop Client)



We have 3 node Hadoop cluster (2.7.3) (One Master and two Slaves) already running in our environment, now we want to set up a fourth instance as a client machine (analogous to Oracle client) and submit commands from the client machine to the hadoop cluster. 


The procedure below explains setting up an edge node for clients to access the Hadoop Cluster for submitting commands.


1- Creating a Hadoop client user

In order to access HDFS and MapReduce, we need to create a hadoop client

Create user group

[root@hadoopedge1 ~]# groupadd hadoop_edge

Add user hdpclient in group Hadoop

[root@hadoopedge1 ~]# useradd hdpclient -G hadoop_edge

[root@hadoopedge1 ~]# passwd hdpclient

Changing password for user hdpclient.

New password:BAD PASSWORD: The password is shorter than 8 characters

Retype new password:

passwd: all authentication tokens updated successfully.


2- Enable ssh for client user 'hdpclient'

On Hadoop Edge client


For SSH configuration additional information please check the previous post  section Configuring SSH for ‘hdpsysuser’ Hadoop account

[hdpclient@hadoopedge1 ~]$ ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/home/hdpclient/.ssh/id_rsa):

/home/hdpclient/.ssh/id_rsa already exists.

Overwrite (y/n)? y

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /home/hdpclient/.ssh/id_rsa.

Your public key has been saved in /home/hdpclient/.ssh/id_rsa.pub.

The key fingerprint is:

38:1b:e9:8d:4e:5b:0b:a5:5e:16:d6:b7:1f:10:da:ef hdpclient@hadoopedge1.localdomain

The key's randomart image is:

+--[ RSA 2048]----+

|                 |

|                 |

|            .    |

|       o . o .   |

|      = S o +    |

|     . X . . +   |

|      B =   . o  |

|     + * .   o . |

|      + .     E  |

+-----------------+


[hdpclient@hadoopedge1 ~]$ ssh-copy-id hdpclient@localhost

/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keyshdpclient@localhost's password: Number of key(s) added: 1 Now try logging into the machine, with:   "ssh 'hdpclient@localhost'"and check to make sure that only the key(s) you wanted were added.

 
[hdpclient@hadoopedge1 ~]$ ssh hdpclient@localhost

Last login: Wed Feb  8 19:05:15 2017 from localhost

[hdpclient@hadoopedge1 ~]$ exit

logout

Connection to localhost closed.

[hdpclient@hadoopedge1 ~]$

We will copy this key to hdpmaster to enable password less ssh. But before we can do this, we need to do the following.

On Hadoop Master (hdpmaster)
Add 'hdpclient' as a user on hdpmaster. Login as root or sudo on hdpmaster and perform the below commands:
[hdpsysuser@hdpmaster ~]$ sudo useradd hdpclient
[hdpsysuser@hdpmaster ~]$ sudo groupadd hadoop_edge
[hdpsysuser@hdpmaster ~]$ sudo usermod hdpclient -aG hadoop_edge
[hdpsysuser@hdpmaster ~]$ sudo passwd hdpclient
Changing password for user hdpclient.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.

On Hadoop Edge Node (hadoopedge1) 
Take the public IP address of hdpmaster and add it to /etc/hosts file as root on Hadoop client (edge) node ie; hadoopedge1 in our case.

[root@hadoopedge1 ~]# vi /etc/hosts
192.168.44.170 hdpmaster ##added in hosts
[root@hadoopedge1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.44.161 hadoopedge1
192.168.44.170 hdpmaster

Now switch as hdpclient on Hadoop client node, do the following:
ssh-copy-id hdpclient@hdpmaster ##( will ask for hdpclient password on hdpmaster)
[root@hadoopedge1 ~]# su - hdpclientLast login: Wed Feb  8 19:06:38 AST 2017 from localhost on pts/4

[hdpclient@hadoopedge1 ~]$ ssh-copy-id hdpclient@hdpmaster

The authenticity of host 'hdpmaster (192.168.44.170)' can't be established.
ECDSA key fingerprint is 04:86:d2:4c:2d:3e:38:1c:61:f4:39:24:52:f4:09:4c.
Are you sure you want to continue connecting (yes/no)? yes
/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hdpclient@hdpmaster's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'hdpclient@hdpmaster'"
and check to make sure that only the key(s) you wanted were added.

[hdpclient@hadoopedge1 ~]$ ssh hdpmaster
[hdpclient@hdpmaster ~]$

[hdpclient@hdpmaster ~]$ exit
logout
Connection to hdpmaster closed.

[hdpclient@hadoopedge1 ~]$

3- Install Java JDK on Hadoop Client
For details, please see installing java/ section of  the previous post, below commands just for your reference.



[root@hadoopedge1 java]# rpm -Uvh  jdk-8u121-linux-x64.rpm
[root@hadoopedge1 java]# java -version
[root@hadoopedge1 java]# cd ~
[root@hadoopedge1 ~]# vi .bash_profile  ###edit the environment variable


4- Install Hadoop on Client Node (edge)

We need hadoop to compile the jars on hadoop client ie; hdpclient and submit the job but we will not run any hadoop daemons.


For details please see the section Installing Hadoop in previous post
Note: Don’t create Hadoop temp directories for Namenode and Datanode as this is edge client not a participating node in a Hadoop cluster.

[root@hadoopedge1 ~]# cd /usr/hadoopsw/
[root@hadoopedge1 hadoopsw]# tar xfz hadoop-2.7.3.tar.gz
[root@hadoopedge1 hadoopsw]# chown -R hdpclient:hadoop_edge /usr/hadoopsw/hadoop-2.7.3
[root@hadoopedge1 hadoopsw]# ll
total 209076
drwxr-xr-x. 9 hdpclient hadoop_edge       149 Aug 18 04:49 hadoop-2.7.3

Configuring Hadoop Environment Variables
switch to hdpclient user and add the hadoop environment variables

[root@hadoopedge1 hadoopsw]# su - hdpclient


Below is the .bash_profile for hdpclient

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

# User specific environment and startup programs
## JAVA env variables
export JAVA_HOME=/usr/java/default

export PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar

## Hadoop Variables
export HADOOP_HOME=/usr/hadoopsw/hadoop-2.7.3
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

[hdpclient@hadoopedge1 ~]$ source .bash_profile
[hdpclient@hadoopedge1 ~]$ echo $HADOOP_HOME
/usr/hadoopsw/hadoop-2.7.3

5- Update Hadoop configuration files On client node

Update Hadoop configuration files (only hadoop-env.sh ,core-site.xml and mapred-site.xml) on Hadoop client (edge) node with same parameters as you have in hdpmaster (Hadoop Master Node)

For details, please see the post for Hadoop configuration

[hdpclient@hadoopedge1 ~]$ vi /usr/hadoopsw/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/default/


Configuration file : core-site.xml
[hdpclient@hadoopedge1 ~]$ vi /usr/hadoopsw/hadoop-2.7.3/etc/hadoop/core-site.xml
<configuration>
      <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hdpmaster:9000/</value>
      </property>
</configuration>

Configuration file : mapred-site.xml

[hdpclient@hadoopedge1 ~]$ vi /usr/hadoopsw/hadoop-2.7.3/etc/hadoop/mapred-site.xml

<configuration>
     <property>
           <name>mapreduce.framework.name</name>
           <value>yarn</value>
     </property>
</configuration>

6- Running HDFS commands from Client on Hadoop Cluster

[hdpclient@hadoopedge1 ~]$ hdfs dfs -ls hdfs://hdpmaster:9000/userdata
Found 2 items
drwxr-xr-x   - hdpsysuser supergroup          0 2017-02-08 17:28 hdfs://hdpmaster:9000/userdata/bukhari
drwxr-xr-x   - hdpsysuser supergroup          0 2017-02-08 17:03 hdfs://hdpmaster:9000/userdata/zeeshan

On hadpmaster: Give permissions on the folder in HDFS to all for testing purpose using chmod

[hdpsysuser@hdpmaster ~]$ hdfs dfs -mkdir hdfs://hdpmaster:9000/hadoopedge1_data
[hdpsysuser@hdpmaster ~]$ hdfs dfs -chmod 777 hdfs://hdpmaster:9000/hadoopedge1_data
Check on the hadoopedge1 node using hdpclient user:

[hdpclient@hadoopedge1 ~]$ hdfs dfs -touchz hdfs://hdpmaster:9000/hadoopedge1_data/t1.txt
[hdpclient@hadoopedge1 ~]$ hdfs dfs -ls hdfs://hdpmaster:9000/hadoopedge1_data/t1.txt

-rw-r--r--   3 hdpclient supergroup          0 2017-02-09 17:06 hdfs://hdpmaster:9000/hadoopedge1_data/t1.txt

Put some file on the HDFS using clinet user ie; hdpclient on hadoop edge.
[hdpclient@hadoopedge1 ~]$ hdfs dfs -put /tmp/test.txt hdfs://hdpmaster:9000/hadoopedge1_data
[hdpclient@hadoopedge1 ~]$ hdfs dfs -cat hdfs://hdpmaster:9000/hadoopedge1_data/test.txt 
This is the TestFile on hadoopedg1 client machine 
This is the TestFile on hadoopedg1 client machine 
This is the TestFile on hadoopedg1 client machine 
This is the TestFile on hadoopedg1 client machine 
This is the TestFile on hadoopedg1 client machine 
This is the TestFile on hadoopedg1 client machine

[hdpclient@hadoopedge1 ~]$ hdfs dfs -mkdir hdfs://hdpmaster:9000/hadoopedge1_data/folder1
[hdpclient@hadoopedge1 ~]$ hdfs dfs -mkdir /hadoopedge1_data/folder2
[hdpclient@hadoopedge1 ~]$ hdfs dfs -ls /hadoopedge1_data 
Found 4 items
drwxr-xr-x   - hdpclient supergroup          0 2017-02-09 17:23 /hadoopedge1_data/folder1
drwxr-xr-x   - hdpclient supergroup          0 2017-02-09 17:24 /hadoopedge1_data/folder2
-rw-r--r--   3 hdpclient supergroup          0 2017-02-09 17:06 /hadoopedge1_data/t1.txt
-rw-r--r--   3 hdpclient supergroup        300 2017-02-09 17:12 /hadoopedge1_data/test.txt