The NFS Gateway for HDFS allows clients to mount HDFS and interact with it through NFS, as if it were part of their local file system. The gateway supports NFSv3.
After mounting HDFS, a user can:• Browse the HDFS file system through their local file system on NFSv3 client-compatible
operating systems.
• Upload and download files between the HDFS file system and their local file system.
• Stream data directly to HDFS through the mount point.
Prerequisites
Hadoop Client already installed
Configure the HDFS NFS Gateway
1- Configuration on NN
The user running the NFS gateway must be able to proxy all users that are using NFS mounts. For example, if user "hdpclient" is running the gateway and users belong to groups "nfsgrp1" and "nfsgrp2", then set the following values in the core-site.xml file on the NameNode. on HDP you can have all these configuration files in /etc/hadoop/conf/.
In Ambari you can use custom core-site link in Advance tab of HDFS Config, where these properties are added as key,value pair.
<property>
<name>hadoop.proxyuser.hdpclient.groups</name>
<value>nfsgrp1,nfsgrp2</value>
<description>
The 'hdpclient' user is allowed to proxy all members of the 'nfsgrp1' and 'nfsgrp2' groups. Set this to '*' to allow hdpclient user to proxy any group.
</description>
</property>
<property>
<name>hadoop.proxyuser.hdpclient.hosts</name>
<value>en01</value>
<description>
This is the host where the nfs gateway is running. Set this to '*' to allow requests from any hosts to be proxied.
</description>
</property>
The preceding properties are the only required configuration settings for the NFS gateway in non-secure mode. Above change in amabri will require to restart all affected services. It will make change in core-site.xml located at /etc/hadoop/2.6.1.0-129/0 on Name Node.
Keep in mind, you need to set the dfs.namenode.accesstime.precision value to 3600000 also in name node configuration (use Ambari UI) otherwise you will get the below error.
[oracle@te1-hdp-rp-en01 oraclenfs]$ cp -p /data/mydata/emp.csv .
cp: cannot create regular file ‘./emp.csv’: Input/output error
Keep in mind, you need to set the dfs.namenode.accesstime.precision value to 3600000 also in name node configuration (use Ambari UI) otherwise you will get the below error.
[oracle@te1-hdp-rp-en01 oraclenfs]$ cp -p /data/mydata/emp.csv .
cp: cannot create regular file ‘./emp.csv’: Input/output error
2- Configuration on HDFS NFS gateway (Edge Node)
The NFS gateway uses the same settings that are used by the NameNode and DataNode.
Configure the following properties based on your application's requirements:
a) Edit the hdfs-site.xml file on your NFS gateway machine. Modify the following property:
<property>
<name>dfs.namenode.accesstime.precision</name>
<value>3600000</value>
<description>
The access time for HDFS file is precise up to this value. The default value is 1 hour. Setting a value of 0 disables access times for HDFS.
</description>
</property>
[I Copied hdfs-site.xml from data node (/etc/hadoop/conf) to Edge Node (en01) /usr/hadoopsw/hadoop2.7.3, you should change as per your environment]
b) Add the following property to the hdfs-site.xml file:
<property>
<name>dfs.nfs3.dump.dir</name>
<value>/tmp/.hdfs-nfs</value>
</property>
The NFS client often reorders writes. Sequential writes can arrive at the NFS gateway at random order. This directory is used to temporarily save out-of-order writes before writing to HDFS. One needs to make sure the directory has enough space. For example, if the application uploads 10 files with each having 100MB, it is recommended for this directory to have 1GB space in case if a worst-case write reorder happens to every file.
c) Update the following property in the hdfs-site.xml file:
<property>
<name>dfs.nfs.exports.allowed.hosts</name>
<value>* rw</value>
</property>
By default, the export can be mounted by any client. You must update this property to control access. The value string contains the machine name and access privilege, separated by whitespace characters. The machine name can be in single host, wildcard, or IPv4 network format. The access privilege uses rw or ro to specify readwrite or readonly access to exports. If you do not specifiy an access privilege, the default machine access to exports is readonly. Separate machine dentries by ;. For example, 192.168.0.0/22 rw ; host*.example.com ; host1.test.org ro;.
Restart the NFS gateway after this property is updated.
d) Specify JVM heap space
Specify JVM heap space (HADOOP_NFS3_OPTS) for the NFS Gateway. You can increase the JVM heap allocation for the NFS gateway using this option.
export HADOOP_NFS3_OPTS="-Xms2048m -Xmx2048m"
vi /home/hdpclient/.bash_profile
##NFS related
export HADOOP_NFS3_OPTS="-Xms1024m -Xmx2048m"
The above example specifies a 2 GB process heap (1GB starting size and 2GB maximum):
3- Start and Verify the NFS Gateway Service
Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd. The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as
the only export. We recommend using the portmap included in NFS gateway package, as shown below. The included portmap must be used on some Linux systems, for example, SLES 11 and RHEL 6.2.
a) Stop nfs/rpcbind/portmap services provided by the platform if running:
[root@en01 ~]# service nfs stop
Redirecting to /bin/systemctl stop nfs.service
[root@en01 ~]# service rpcbind stop
Redirecting to /bin/systemctl stop rpcbind.service
Warning: Stopping rpcbind.service, but it can still be activated by:
rpcbind.socket
b) Start the included portmap package (needs root privileges), using one of the following two commands:
[root@en01 ~]# hadoop-daemon.sh start portmap
starting portmap, logging to /usr/hadoopsw/hadoop-2.7.3/logs/hadoop-root-portmap-en01.out
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
[root@en01 ~]# hadoop-daemon.sh stop portmap
stopping portmap
[root@en01 ~]# hadoop portmap
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
.....
^C17/12/24 11:50:40 ERROR portmap.Portmap: RECEIVED SIGNAL 2: SIGINT
17/12/24 11:50:40 INFO portmap.Portmap: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down Portmap at en01/192.168.44.134
************************************************************/
[root@en01 ~]# hdfs portmap
...
...
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff; compiled by 'root' on 2016-08-18T01:41Z
STARTUP_MSG: java = 1.8.0_121
************************************************************/
17/12/24 12:04:42 INFO portmap.Portmap: registered UNIX signal handlers for [TERM, HUP, INT]
17/12/24 12:04:42 INFO portmap.Portmap: Portmap server started at tcp:///0.0.0.0:111, udp:///0.0.0.0:111
c) Start mountd and nfsd.
No root privileges are required for this command. However, verify that the user starting the Hadoop cluster and the user starting the NFS gateway are the same.
[root@en01 ~]# hdfs nfs3
17/12/24 12:06:19 INFO nfs3.Nfs3Base: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting Nfs3
STARTUP_MSG: host = en01/192.168.44.134
STARTUP_MSG: args = []
STARTUP_MSG: version = 2.7.3
...
...
17/12/24 12:06:20 INFO http.HttpServer2: Jetty bound to port 50079
17/12/24 12:06:20 INFO mortbay.log: jetty-6.1.26
17/12/24 12:06:20 INFO mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:50079
17/12/24 12:06:20 INFO oncrpc.SimpleTcpServer: Started listening to TCP requests at port 2049 for Rpc program: NFS3 at localhost:2049 with workerCount 0
Verify the validity of NFS-related services
a) Execute the following command to verify that all the services are up and running:
[root@en01 ~]# rpcinfo -p en01
program vers proto port service
100005 3 udp 4242 mountd
100005 1 tcp 4242 mountd
100000 2 udp 111 portmapper
100000 2 tcp 111 portmapper
100005 3 tcp 4242 mountd
100005 2 tcp 4242 mountd
100003 3 tcp 2049 nfs
100005 2 udp 4242 mountd
100005 1 udp 4242 mountd
b) Verify that the HDFS namespace is exported and can be mounted:
[root@en01 ~]# showmount -e en01
Export list for en01:
/ *
Access HDFS
To access HDFS, first mount the export "/". Currently NFS v3 is supported. It uses TCP, as the transportation protocol is TCP.
a) Mount the HDFS namespace as follows:
Create folder to be used to mount the NFS
[root@en01 ~]# mkdir -p /data/hdfsloc
Below command (syntax) is used to mount
mount -t nfs -o vers=3,proto=tcp,nolock,sync,rsize=1048576,wsize=1048576 $server:/ $mount_point
Because NLM is not supported, the mount option nolock is needed.
Use the sync option for performance when writing large files. The sync mount option to the NFS client improves the performance and reliability of writing large files to HDFS using the NFS gateway. If the sync option is specified, the NFS client machine flush writes operations to the NFS gateway before returning control to the client application. A useful side effect of sync is that the client does not issue reordered writes. This reduces buffering requirements on the NFS gateway. sync is specified on the client machine when mounting the NFS share.
[root@en01 ~]# mount -t nfs -o vers=3,proto=tcp,nolock,sync,rsize=1048576,wsize=1048576 en01:/ /data/hdfsloc
Check the mount point
[root@en01 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
.....
....
tmpfs 2038280 0 2038280 0% /run/user/1005
en01:/ 4264275968 188922880 4075353088 5% /data/hdfsloc
[root@en01 hdfsloc]# df -h
Filesystem Size Used Avail Use% Mounted on
...
tmpfs 2.0G 0 2.0G 0% /run/user/1005
en01:/ 4.0T 181G 3.8T 5% /data/hdfsloc
List folder/files on HDFS
[root@en01 ~]# cd /data/hdfsloc/
[root@en01 hdfsloc]# ll
total 7
drwxrwxrwx 10 3701572 3070102565 320 Dec 20 14:57 app-logs
drwxr-xr-x 5 hdfs hdfs 160 Nov 8 09:18 apps
drwxr-xr-x 4 3701572 3070102565 128 Jul 26 17:24 ats
drwxr-xr-x 3 hdfs hdfs 96 Dec 21 12:28 catalog
drwxrwxrwx 6 hdfs hdfs 192 Dec 21 14:47 data
drwxrwxrwx 6 hdfs hdfs 192 Aug 9 12:50 flume
drwxr-xr-x 3 hdfs hdfs 96 Jul 26 17:24 hdp
drwxr-xr-x 3 3213608373 hdfs 96 Jul 26 17:24 mapred
drwxrwxrwx 4 3213608373 3070102565 128 Jul 26 17:24 mr-history
drwxrwxrwx 17 109638365 3070102565 544 Dec 24 2017 spark2-history
drwxrwxrwx 20 hdfs hdfs 640 Nov 21 16:03 tmp
drwxr-xr-x 14 hdfs hdfs 448 Dec 21 17:02 user
[root@en01 10]# pwd
/data/hdfsloc/data/flume/syslogs2/2017/10
Cat any file on HDFS
[root@en01 10]# cat /data/hdfsloc/data/flume/syslogs2/2017/10/syslog.1509483515603
<37>Oct 31 23:58:33 nn01 su: (to ambari-qa) root on none
<86>Oct 31 23:58:33 nn01 su: pam_unix(su-l:session): session opened for user ambari-qa by (uid=0)
<86>Oct 31 23:58:43 nn01 su: pam_unix(su-l:session): session closed for user ambari-qa
Congrats your HDFS is available as NFS.
No comments:
Post a Comment