Introduction
You may experience poor performance when using NFS. Careful analysis of your environment, both from the client and from the server point of view, is the first step necessary for optimal NFS performance. Aside from the general network configuration - appropriate network capacity, faster NICs, full duplex settings in order to reduce collisions, agreement in network speed among the switches and hubs, etc. - one of the most important client optimization settings are the NFS data transfer buffer sizes, specified by the mount command options rsize and wsize.
Pre-requisites:
You should have completed below posts before working on this one.
Configuring NFS Gateway for HDFS [HDP]
Creating Oracle External Table (12c) on HDFS using HDP NFS Gateway
Setting Block Size to Optimize Transfer Speeds
The mount command options rsize and wsize specify the size of the chunks of data that the client and server pass back and forth to each other. If no rsize and wsize options are specified, the default varies by which version of NFS we are using. The most common default is 4K (4096 bytes). The theoretical limit for the NFS V2 protocol is 8K. For the V3 protocol, the limit is specific to the server, mine is RHEL 7. You can run the below commands to know your server.
[hdpclient@en01 ~]$ uname -r
3.10.0-327.el7.x86_64
[hdpclient@en01 ~]$ uname -mrsn
Linux en01 3.10.0-327.el7.x86_64 x86_64
[hdpclient@en01 ~]$ rpm -q kernel
kernel-3.10.0-327.el7.x86_64
You can test the speed of your options with some simple commands, if your network environment is not heavily used.
Write to Local FS
*********************
The first of these commands transfers 16384 blocks of 16k each from the special file /dev/zero (which if you read it just spits out zeros really fast) to the mounted partition. We will time it to see how long it takes. So, from the client machine, type:
16384+0 records in
16384+0 records out
268435456 bytes (268 MB) copied, 0.359405 s, 747 MB/s
real 0m0.402s
user 0m0.001s
sys 0m0.241s
You could test with bigger size file also.
$ time dd if=/dev/zero of=/data/mydata/local_testfile bs=1M count=1024
Write to HDFS
*****************
[oracle@en01 ~]$ time dd if=/dev/zero of=/data/hdfsloc/data/oraclenfs/hdfs_testfile bs=16k count=16384
16384+0 records in
16384+0 records out
268435456 bytes (268 MB) copied, 815.912 s, 329 kB/s
real 13m35.936s
user 0m0.039s
sys 0m0.909s
Check the time difference with both tests.
You could test with bigger size file also.
$ time dd if=/dev/zero of=/data/hdfsloc/data/oraclenfs/hdfs_testfile bs=10M count=512
You see the significant time taken while writing to HDFS. It createe a 256Mb file of zeroed bytes. In general, you should create a file that's at least twice as large as the system RAM on the server, but make sure you have enough disk space!
Read back from HDFS
***************************
Now read back the file into the great black hole on the client machine (/dev/null) by the following command:
[oracle@en01 ~]$ time dd if=/data/hdfsloc/data/oraclenfs/hdfs_testfile of=/dev/null bs=16k
16384+0 records in
16384+0 records out
268435456 bytes (268 MB) copied, 4.76191 s, 56.4 MB/s
real 0m4.787s
user 0m0.003s
sys 0m0.147s
View/modify existing rsize,wsize
****************************************
You can use the mount command to see the existing mount points.
[hdpclient@en01 ~]$ mount
.....
en01:/ on /data/hdfsloc type nfs (rw,relatime,sync,vers=3,rsize=65536,wsize=65536,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.44.134,mountvers=3,mountport=4242,mountproto=tcp,local_lock=all,addr=192.168.48.143)
[root@en01 ~]# grep "nfs " /proc/mounts
en01:/ /data/hdfsloc nfs rw,sync,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.44.134,mountvers=3,mountport=4242,mountproto=tcp,local_lock=all,addr=192.168.48.143 0 0
Unmount your NFS file system and mount again with modified rsize and wsize and then you can run your above tests again.
[root@en01 ~]# umount /data/hdfsloc
[root@en01 ~]# mount -t nfs -o vers=3,proto=tcp,nolock,sync,rsize=1048576,wsize=1048576 en01:/ /data/hdfsloc
Note: In RHEL 7 with kernal 3.10.0 , I could find the maximum rsize wsize as 1048576. As per RHEL note (https://access.redhat.com/solutions/753853) , The server maximum is calculated based on the amount of memory specified in totalram_pages. totalram_pages is the amount of 'usable' memory and does not include memory reserved by the BIOS, reserved by the kernel for global data structures, etc. Therefore a server with exactly 4GB of RAM will use 524288 for the maximum blocksize, not 1048576.
[root@dn04 ~]# cat /proc/fs/nfsd/max_block_size1048576
Read/Write after rsize,wsize change
*******************************************
I performed different tests after rsize,wsize change as below.
--write to HDFS
[oracle@en01 ~]$ time dd if=/dev/zero of=/data/hdfsloc/data/oraclenfs/hdfs_testfile2 bs=10M count=512
512+0 records in
512+0 records out
5368709120 bytes (5.4 GB) copied, 162.752 s, 33.0 MB/s
real 2m42.780s
user 0m0.002s
sys 0m5.216s
--Read from HDFS with 10M block size
[oracle@en01 ~]$ time dd if=/data/hdfsloc/data/oraclenfs/hdfs_testfile2 of=/dev/null bs=10M
512+0 records in
512+0 records out
5368709120 bytes (5.4 GB) copied, 337.526 s, 15.9 MB/s
real 5m37.553s
user 0m0.003s
sys 0m2.996s
--Read from HDFS with 1M block size
[oracle@en01 ~]$ time dd if=/data/hdfsloc/data/oraclenfs/hdfs_testfile2 of=/dev/null bs=1M5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 0.931983 s, 5.8 GB/s
real 0m0.973s
user 0m0.000s
sys 0m0.934s
--Read from HDFS with 16k block size
[oracle@en01 ~]$ time dd if=/data/hdfsloc/data/oraclenfs/hdfs_testfile2 of=/dev/null bs=16k
327680+0 records in
327680+0 records out
5368709120 bytes (5.4 GB) copied, 0.935105 s, 5.7 GB/s
real 0m0.955s
user 0m0.029s
sys 0m0.909s
Packet Size and Network Drivers
It is worth experimenting with your network card directly to find out how it can best handle traffic. Try pinging back and forth between the two machines with large packets using the -f and -s options with ping and see if a lot of packets get dropped, or if they take a long time for a reply. If so, you may have a problem with the performance of your network card.
NFS Transactions
**********************
Use the nfsstat command to look at nfs transactions, client and server statistics, network statistics, and so forth. The -o net option will show you the number of dropped packets in relation to the total number of transactions.
[oracle@en01 ~]$ nfsstat
Server rpc stats:
calls badcalls badclnt badauth xdrcall
0 0 0 0 0
Client rpc stats:
calls retrans authrefrsh
271347 0 271347
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 6268 2% 152 0% 74025 27% 1692 0% 0 0%
read write create mkdir symlink mknod
19718 7% 155172 57% 118 0% 5 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
310 0% 5 0% 0 0% 6 0% 959 0% 933 0%
fsstat fsinfo pathconf commit
882 0% 22 0% 11 0% 11062 4%
[root@en01 ~]# nfsstat -o net
Server packet stats:
packets udp tcp tcpconn
0 0 0 0
Client packet stats:
packets udp tcp tcpconn
0 0 0 0
--Show NFS client statistics
[root@en01 ~]# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
465836 0 465837
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 9100 1% 162 0% 74671 16% 2207 0% 0 0%
read write create mkdir symlink mknod
34983 7% 294531 63% 140 0% 5 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
328 0% 5 0% 1 0% 15 0% 959 0% 1284 0%
fsstat fsinfo pathconf commit
1513 0% 22 0% 11 0% 45891 9%
--Show NFS server statistics[root@en01 ~]# nfsstat -s
Server rpc stats:
calls badcalls badclnt badauth xdrcall
0 0 0 0 0
Test for the network packet size
***************************************You can test for the network packet size using the tracepath command. It traces path to destination discovering MTU along this path.
[oracle@en01 ~]$ tracepath nn01
1?: [LOCALHOST] pmtu 1500
1: nn01 0.360ms reached
1: nn01 0.310ms reached
Resume: pmtu 1500 hops 1 back 1
Path MTU should be reported at the bottom. You can then set the MTU on your network card equal to the path MTU, by using the MTU option to ifconfig.
MTU stands for Maximum Transmission Unit, and it is the largest amount of data that can be passed in one Ethernet frame.
Typically an MTU will be set to 1500, because raising this can cause problems wiht routers and similar devices. If you're on a modern LAN though you can increase the size to allow more data to be transmitted per-frame.
You can see your MTU size in the output of the below command.
[root@en01 ~]# ip link show virbr0
6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT
link/ether 52:54:00:35:6b:15 brd ff:ff:ff:ff:ff:ff
You set set the MTU as below
ip link set dev eth0 mtu 5000
In addition, netstat -s will give the statistics collected for traffic across all supported protocols. You may also look at /proc/net/snmp for information about current network behavior
[root@en01 ~]# netstat -s
...
...
Tcp:
4659184 active connections openings
448742 passive connection openings
3931677 failed connection attempts
6169 connection resets received
79 connections established
886467265 segments received
1195352418 segments send out
100875 segments retransmited
3 bad segments received.
6372235 resets sent
....
....
-o displays active TCP connections and includes the process ID (PID) for each connection.
[root@en01 ~]# netstat -o | grep nfs
tcp 0 0 en01:29509 en01:nfs ESTABLISHED off (0.00/0/0)
tcp 0 0 en01:nfs en01:795 ESTABLISHED keepalive (6958.89/0/0)
tcp 0 0 en01:795 en01:nfs ESTABLISHED keepalive (58.41/0/0)
tcp 0 0 en01:nfs en01:56617 ESTABLISHED keepalive (5582.61/0/0)
tcp 0 0 en01:56617 en01:nfs ESTABLISHED off (0.00/0/0)
tcp 0 0 en01:nfs en01:29509 ESTABLISHED keepalive (5582.61/0/0)
No comments:
Post a Comment