Configuring NFS Gateway for HDFS [HDP]

The NFS Gateway for HDFS allows clients to mount HDFS and interact with it through NFS, as if it were part of their local file system. The gateway supports NFSv3.

After mounting HDFS, a user can:

• Browse the HDFS file system through their local file system on NFSv3 client-compatible
operating systems.

• Upload and download files between the HDFS file system and their local file system.

• Stream data directly to HDFS through the mount point.

Prerequisites
Hadoop Client already installed

Configure the HDFS NFS Gateway

1- Configuration on NN

The user running the NFS gateway must be able to proxy all users that are using NFS mounts. For example, if user "hdpclient" is running the gateway and users belong to groups "nfsgrp1" and "nfsgrp2", then set the following values in the core-site.xml file on the NameNode. on HDP you can have all these configuration files in /etc/hadoop/conf/.

In Ambari you can use custom core-site link in Advance tab of HDFS Config, where these properties are added as key,value pair.

<name>hadoop.proxyuser.hdpclient.groups</name>

<value>nfsgrp1,nfsgrp2</value>

The 'hdpclient' user is allowed to proxy all members of the 'nfsgrp1' and 'nfsgrp2' groups. Set this to '*' to allow hdpclient user to proxy any group.

</description>

</property>

<name>hadoop.proxyuser.hdpclient.hosts</name>

This is the host where the nfs gateway is running. Set this to '*' to allow requests from any hosts to be proxied.

</description>

</property>

The preceding properties are the only required configuration settings for the NFS gateway in non-secure mode. Above change in amabri will require to restart all affected services. It will make change in core-site.xml located at /etc/hadoop/2.6.1.0-129/0 on Name Node.

Keep in mind, you need to set the dfs.namenode.accesstime.precision value to 3600000 also in name node configuration (use Ambari UI) otherwise you will get the below error.

[oracle@te1-hdp-rp-en01 oraclenfs]$ cp -p /data/mydata/emp.csv .
cp: cannot create regular file ‘./emp.csv’: Input/output error

2- Configuration on HDFS NFS gateway (Edge Node)

The NFS gateway uses the same settings that are used by the NameNode and DataNode.

Configure the following properties based on your application's requirements:

a) Edit the hdfs-site.xml file on your NFS gateway machine. Modify the following property:

<name>dfs.namenode.accesstime.precision</name>

The access time for HDFS file is precise up to this value. The default value is 1 hour. Setting a value of 0 disables access times for HDFS.

</description>

</property>

[I Copied hdfs-site.xml from data node (/etc/hadoop/conf) to Edge Node (en01) /usr/hadoopsw/hadoop2.7.3, you should change as per your environment]

b) Add the following property to the hdfs-site.xml file:

</property>

The NFS client often reorders writes. Sequential writes can arrive at the NFS gateway at random order. This directory is used to temporarily save out-of-order writes before writing to HDFS. One needs to make sure the directory has enough space. For example, if the application uploads 10 files with each having 100MB, it is recommended for this directory to have 1GB space in case if a worst-case write reorder happens to every file.

c) Update the following property in the hdfs-site.xml file:

<name>dfs.nfs.exports.allowed.hosts</name>

</property>

By default, the export can be mounted by any client. You must update this property to control access. The value string contains the machine name and access privilege, separated by whitespace characters. The machine name can be in single host, wildcard, or IPv4 network format. The access privilege uses rw or ro to specify readwrite or readonly access to exports. If you do not specifiy an access privilege, the default machine access to exports is readonly. Separate machine dentries by ;. For example, 192.168.0.0/22 rw ; host*.example.com ; host1.test.org ro;.

Restart the NFS gateway after this property is updated.

d) Specify JVM heap space

Specify JVM heap space (HADOOP_NFS3_OPTS) for the NFS Gateway. You can increase the JVM heap allocation for the NFS gateway using this option.

export HADOOP_NFS3_OPTS="-Xms2048m -Xmx2048m"

vi /home/hdpclient/.bash_profile

##NFS related

export HADOOP_NFS3_OPTS="-Xms1024m -Xmx2048m"

The above example specifies a 2 GB process heap (1GB starting size and 2GB maximum):

3- Start and Verify the NFS Gateway Service

Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd. The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as

the only export. We recommend using the portmap included in NFS gateway package, as shown below. The included portmap must be used on some Linux systems, for example, SLES 11 and RHEL 6.2.

a) Stop nfs/rpcbind/portmap services provided by the platform if running:

[root@en01 ~]# service nfs stop

Redirecting to /bin/systemctl stop nfs.service

[root@en01 ~]# service rpcbind stop

Redirecting to /bin/systemctl stop rpcbind.service

Warning: Stopping rpcbind.service, but it can still be activated by:

rpcbind.socket

b) Start the included portmap package (needs root privileges), using one of the following two commands:

[root@en01 ~]# hadoop-daemon.sh start portmap

starting portmap, logging to /usr/hadoopsw/hadoop-2.7.3/logs/hadoop-root-portmap-en01.out

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

[root@en01 ~]# hadoop-daemon.sh stop portmap

stopping portmap

[root@en01 ~]# hadoop portmap

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

.....

^C17/12/24 11:50:40 ERROR portmap.Portmap: RECEIVED SIGNAL 2: SIGINT

17/12/24 11:50:40 INFO portmap.Portmap: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down Portmap at en01/192.168.44.134

************************************************************/

[root@en01 ~]# hdfs portmap

...

STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff; compiled by 'root' on 2016-08-18T01:41Z

STARTUP_MSG: java = 1.8.0_121

************************************************************/

17/12/24 12:04:42 INFO portmap.Portmap: registered UNIX signal handlers for [TERM, HUP, INT]

17/12/24 12:04:42 INFO portmap.Portmap: Portmap server started at tcp:///0.0.0.0:111, udp:///0.0.0.0:111

c) Start mountd and nfsd.

No root privileges are required for this command. However, verify that the user starting the Hadoop cluster and the user starting the NFS gateway are the same.

[root@en01 ~]# hdfs nfs3

17/12/24 12:06:19 INFO nfs3.Nfs3Base: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting Nfs3

STARTUP_MSG: host = en01/192.168.44.134

STARTUP_MSG: args = []

STARTUP_MSG: version = 2.7.3

...

17/12/24 12:06:20 INFO http.HttpServer2: Jetty bound to port 50079

17/12/24 12:06:20 INFO mortbay.log: jetty-6.1.26

17/12/24 12:06:20 INFO mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50079

17/12/24 12:06:20 INFO oncrpc.SimpleTcpServer: Started listening to TCP requests at port 2049 for Rpc program: NFS3 at localhost:2049 with workerCount 0

Verify the validity of NFS-related services

a) Execute the following command to verify that all the services are up and running:

[root@en01 ~]# rpcinfo -p en01

program vers proto port service

100005 3 udp 4242 mountd

100005 1 tcp 4242 mountd

100000 2 udp 111 portmapper

100000 2 tcp 111 portmapper

100005 3 tcp 4242 mountd

100005 2 tcp 4242 mountd

100003 3 tcp 2049 nfs

100005 2 udp 4242 mountd

100005 1 udp 4242 mountd

b) Verify that the HDFS namespace is exported and can be mounted:

[root@en01 ~]# showmount -e en01

Export list for en01:

/ *

Access HDFS

To access HDFS, first mount the export "/". Currently NFS v3 is supported. It uses TCP, as the transportation protocol is TCP.

a) Mount the HDFS namespace as follows:

Create folder to be used to mount the NFS

[root@en01 ~]# mkdir -p /data/hdfsloc

Below command (syntax) is used to mount

mount -t nfs -o vers=3,proto=tcp,nolock,sync,rsize=1048576,wsize=1048576 $server:/ $mount_point

Because NLM is not supported, the mount option nolock is needed.

Use the sync option for performance when writing large files. The sync mount option to the NFS client improves the performance and reliability of writing large files to HDFS using the NFS gateway. If the sync option is specified, the NFS client machine flush writes operations to the NFS gateway before returning control to the client application. A useful side effect of sync is that the client does not issue reordered writes. This reduces buffering requirements on the NFS gateway. sync is specified on the client machine when mounting the NFS share.

[root@en01 ~]# mount -t nfs -o vers=3,proto=tcp,nolock,sync,rsize=1048576,wsize=1048576 en01:/ /data/hdfsloc

Check the mount point

[root@en01 ~]# df

Filesystem 1K-blocks Used Available Use% Mounted on

.....

....

tmpfs 2038280 0 2038280 0% /run/user/1005

en01:/ 4264275968 188922880 4075353088 5% /data/hdfsloc

[root@en01 hdfsloc]# df -h

Filesystem Size Used Avail Use% Mounted on

...

tmpfs 2.0G 0 2.0G 0% /run/user/1005

en01:/ 4.0T 181G 3.8T 5% /data/hdfsloc

List folder/files on HDFS

[root@en01 ~]# cd /data/hdfsloc/

[root@en01 hdfsloc]# ll

total 7

drwxrwxrwx 10 3701572 3070102565 320 Dec 20 14:57 app-logs

drwxr-xr-x 5 hdfs hdfs 160 Nov 8 09:18 apps

drwxr-xr-x 4 3701572 3070102565 128 Jul 26 17:24 ats

drwxr-xr-x 3 hdfs hdfs 96 Dec 21 12:28 catalog

drwxrwxrwx 6 hdfs hdfs 192 Dec 21 14:47 data

drwxrwxrwx 6 hdfs hdfs 192 Aug 9 12:50 flume

drwxr-xr-x 3 hdfs hdfs 96 Jul 26 17:24 hdp

drwxr-xr-x 3 3213608373 hdfs 96 Jul 26 17:24 mapred

drwxrwxrwx 4 3213608373 3070102565 128 Jul 26 17:24 mr-history

drwxrwxrwx 17 109638365 3070102565 544 Dec 24 2017 spark2-history

drwxrwxrwx 20 hdfs hdfs 640 Nov 21 16:03 tmp

drwxr-xr-x 14 hdfs hdfs 448 Dec 21 17:02 user

[root@en01 10]# pwd

/data/hdfsloc/data/flume/syslogs2/2017/10

Cat any file on HDFS

[root@en01 10]# cat /data/hdfsloc/data/flume/syslogs2/2017/10/syslog.1509483515603

<37>Oct 31 23:58:33 nn01 su: (to ambari-qa) root on none

<86>Oct 31 23:58:33 nn01 su: pam_unix(su-l:session): session opened for user ambari-qa by (uid=0)

<86>Oct 31 23:58:43 nn01 su: pam_unix(su-l:session): session closed for user ambari-qa

Congrats your HDFS is available as NFS.

DBMentors - Inam Bukhari's Blog

Pages

Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Monday, December 25, 2017

Configuring NFS Gateway for HDFS [HDP]

No comments:

Translate

Followers

Labels

Blog Archive

About Me

Total Pageviews