Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Thursday, January 04, 2018

Optimizing NFS Performance [HDP NFS]


Introduction

You may experience poor performance when using NFS. Careful analysis of your environment, both from the client and from the server point of view, is the first step necessary for optimal NFS performance. Aside from the general network configuration - appropriate network capacity, faster NICs, full duplex settings in order to reduce collisions, agreement in network speed among the switches and hubs, etc. - one of the most important client optimization settings are the NFS data transfer buffer sizes, specified by the mount command options rsize and wsize.



Pre-requisites:

You should have completed below posts before working on this one.

Configuring NFS Gateway for HDFS [HDP]
Creating Oracle External Table (12c) on HDFS using HDP NFS Gateway


Setting Block Size to Optimize Transfer Speeds


The mount command options rsize and wsize specify the size of the chunks of data that the client and server pass back and forth to each other. If no rsize and wsize options are specified, the default varies by which version of NFS we are using. The most common default is 4K (4096 bytes). The theoretical limit for the NFS V2 protocol is 8K. For the V3 protocol, the limit is specific to the server, mine is RHEL 7. You can run the below commands to know your server.

[hdpclient@en01 ~]$ uname -r
3.10.0-327.el7.x86_64
[hdpclient@en01 ~]$ uname -mrsn
Linux en01 3.10.0-327.el7.x86_64 x86_64
[hdpclient@en01 ~]$ rpm -q kernel
kernel-3.10.0-327.el7.x86_64

You can test the speed of your options with some simple commands, if your network environment is not heavily used.

Write to Local FS
*********************
The first of these commands transfers 16384 blocks of 16k each from the special file /dev/zero (which if you read it just spits out zeros really fast) to the mounted partition. We will time it to see how long it takes. So, from the client machine, type:

[oracle@en01 ~]$ time dd if=/dev/zero of=/data/mydata/local_testfile bs=16k count=16384
16384+0 records in
16384+0 records out
268435456 bytes (268 MB) copied, 0.359405 s, 747 MB/s

real    0m0.402s
user    0m0.001s
sys     0m0.241s

You could test with bigger size file also.

$ time dd if=/dev/zero of=/data/mydata/local_testfile bs=1M count=1024


Write to HDFS
*****************
[oracle@en01 ~]$ time dd if=/dev/zero of=/data/hdfsloc/data/oraclenfs/hdfs_testfile bs=16k count=16384
16384+0 records in
16384+0 records out
268435456 bytes (268 MB) copied, 815.912 s, 329 kB/s

real    13m35.936s
user    0m0.039s
sys     0m0.909s


Check the time difference with both tests.

You could test with bigger size file also.

$ time dd if=/dev/zero of=/data/hdfsloc/data/oraclenfs/hdfs_testfile bs=10M count=512


You see the significant time taken while writing to HDFS. It createe a 256Mb file of zeroed bytes. In general, you should create a file that's at least twice as large as the system RAM on the server, but make sure you have enough disk space!

Read back from HDFS
***************************
Now read back the file into the great black hole on the client machine (/dev/null) by the following command:

[oracle@en01 ~]$ time dd if=/data/hdfsloc/data/oraclenfs/hdfs_testfile of=/dev/null bs=16k
16384+0 records in
16384+0 records out
268435456 bytes (268 MB) copied, 4.76191 s, 56.4 MB/s

real    0m4.787s
user    0m0.003s
sys     0m0.147s


View/modify existing rsize,wsize
****************************************
You can use the mount command to see the existing mount points.

[hdpclient@en01 ~]$ mount
.....
en01:/ on /data/hdfsloc type nfs (rw,relatime,sync,vers=3,rsize=65536,wsize=65536,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.44.134,mountvers=3,mountport=4242,mountproto=tcp,local_lock=all,addr=192.168.48.143)


[root@en01 ~]# grep "nfs " /proc/mounts

en01:/ /data/hdfsloc nfs rw,sync,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.44.134,mountvers=3,mountport=4242,mountproto=tcp,local_lock=all,addr=192.168.48.143 0 0


Unmount your NFS file system and mount again with modified rsize and wsize and then you can run your above tests again.

[root@en01 ~]# umount /data/hdfsloc
[root@en01 ~]# mount -t nfs -o vers=3,proto=tcp,nolock,sync,rsize=1048576,wsize=1048576 en01:/ /data/hdfsloc


Note: In RHEL 7 with kernal 3.10.0 , I could find the maximum rsize wsize as 1048576. As per RHEL note (https://access.redhat.com/solutions/753853) , The server maximum is calculated based on the amount of memory specified in totalram_pages. totalram_pages is the amount of 'usable' memory and does not include memory reserved by the BIOS, reserved by the kernel for global data structures, etc. Therefore a server with exactly 4GB of RAM will use 524288 for the maximum blocksize, not 1048576.
[root@dn04 ~]# cat /proc/fs/nfsd/max_block_size
1048576


Read/Write after rsize,wsize change
*******************************************
I performed different tests after rsize,wsize change as below.

--write to HDFS
[oracle@en01 ~]$ time dd if=/dev/zero of=/data/hdfsloc/data/oraclenfs/hdfs_testfile2 bs=10M count=512

512+0 records in
512+0 records out
5368709120 bytes (5.4 GB) copied, 162.752 s, 33.0 MB/s

real    2m42.780s
user    0m0.002s
sys     0m5.216s

--Read from HDFS with 10M block size
[oracle@en01 ~]$ time dd if=/data/hdfsloc/data/oraclenfs/hdfs_testfile2 of=/dev/null bs=10M

512+0 records in
512+0 records out
5368709120 bytes (5.4 GB) copied, 337.526 s, 15.9 MB/s

real    5m37.553s
user    0m0.003s

sys     0m2.996s

--Read from HDFS with 1M block size
[oracle@en01 ~]$ time dd if=/data/hdfsloc/data/oraclenfs/hdfs_testfile2 of=/dev/null bs=1M

5120+0 records in
5120+0 records out
5368709120 bytes (5.4 GB) copied, 0.931983 s, 5.8 GB/s

real    0m0.973s
user    0m0.000s
sys     0m0.934s

--Read from HDFS with 16k block size
[oracle@en01 ~]$ time dd if=/data/hdfsloc/data/oraclenfs/hdfs_testfile2 of=/dev/null bs=16k

327680+0 records in
327680+0 records out
5368709120 bytes (5.4 GB) copied, 0.935105 s, 5.7 GB/s

real    0m0.955s
user    0m0.029s
sys     0m0.909s



Packet Size and Network Drivers


It is worth experimenting with your network card directly to find out how it can best handle traffic. Try pinging back and forth between the two machines with large packets using the -f and -s options with ping  and see if a lot of packets get dropped, or if they take a long time for a reply. If so, you may have a problem with the performance of your network card.

NFS Transactions
**********************
Use the nfsstat command to look at nfs transactions, client and server statistics, network statistics, and so forth. The -o net option will show you the number of dropped packets in relation to the total number of transactions. 

[oracle@en01 ~]$ nfsstat

Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
0          0          0          0          0

Client rpc stats:
calls      retrans    authrefrsh
271347     0          271347

Client nfs v3:
null         getattr      setattr      lookup       access       readlink
0         0% 6268      2% 152       0% 74025    27% 1692      0% 0         0%
read         write        create       mkdir        symlink      mknod
19718     7% 155172   57% 118       0% 5         0% 0         0% 0         0%
remove       rmdir        rename       link         readdir      readdirplus
310       0% 5         0% 0         0% 6         0% 959       0% 933       0%
fsstat       fsinfo       pathconf     commit
882       0% 22        0% 11        0% 11062     4%


[root@en01 ~]# nfsstat -o net
Server packet stats:
packets    udp        tcp        tcpconn
0          0          0          0

Client packet stats:
packets    udp        tcp        tcpconn
0          0          0          0

--Show NFS client statistics

[root@en01 ~]# nfsstat -c
Client rpc stats:
calls      retrans    authrefrsh
465836     0          465837

Client nfs v3:
null         getattr      setattr      lookup       access       readlink
0         0% 9100      1% 162       0% 74671    16% 2207      0% 0         0%
read         write        create       mkdir        symlink      mknod
34983     7% 294531   63% 140       0% 5         0% 0         0% 0         0%
remove       rmdir        rename       link         readdir      readdirplus
328       0% 5         0% 1         0% 15        0% 959       0% 1284      0%
fsstat       fsinfo       pathconf     commit
1513      0% 22        0% 11        0% 45891     9%

Higher rates for retrans might indicate a problem.


--Show NFS server statistics[root@en01 ~]# nfsstat -s
Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
0          0          0          0          0

Test for the network packet size
***************************************
You can test for the network packet size using the tracepath command. It traces path to destination discovering MTU along this path.

[oracle@en01 ~]$ tracepath nn01
 1?: [LOCALHOST]                                         pmtu 1500
 1:  nn01                                       0.360ms reached
 1:  nn01                                       0.310ms reached
     Resume: pmtu 1500 hops 1 back 1


Path MTU should be reported at the bottom. You can then set the MTU on your network card equal to the path MTU, by using the MTU option to ifconfig.


MTU stands for Maximum Transmission Unit, and it is the largest amount of data that can be passed in one Ethernet frame.


Typically an MTU will be set to 1500, because raising this can cause problems wiht routers and similar devices. If you're on a modern LAN though you can increase the size to allow more data to be transmitted per-frame.

You can see your MTU size in the output of the below command.

[root@en01 ~]# ip link show virbr0
6: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT
    link/ether 52:54:00:35:6b:15 brd ff:ff:ff:ff:ff:ff

You set set the MTU as below


ip link set dev eth0 mtu 5000


In addition, netstat -s will give the statistics collected for traffic across all supported protocols. You may also look at /proc/net/snmp for information about current network behavior

[root@en01 ~]# netstat -s

...
...
Tcp:
    4659184 active connections openings
    448742 passive connection openings
    3931677 failed connection attempts
    6169 connection resets received
    79 connections established
    886467265 segments received
    1195352418 segments send out
    100875 segments retransmited
    3 bad segments received.
    6372235 resets sent
....
....

-o displays active TCP connections and includes the process ID (PID) for each connection.
[root@en01 ~]# netstat -o | grep nfs
tcp        0      0 en01:29509   en01:nfs     ESTABLISHED off (0.00/0/0)
tcp        0      0 en01:nfs     en01:795     ESTABLISHED keepalive (6958.89/0/0)
tcp        0      0 en01:795     en01:nfs     ESTABLISHED keepalive (58.41/0/0)
tcp        0      0 en01:nfs     en01:56617   ESTABLISHED keepalive (5582.61/0/0)
tcp        0      0 en01:56617   en01:nfs     ESTABLISHED off (0.00/0/0)
tcp        0      0 en01:nfs     en01:29509   ESTABLISHED keepalive (5582.61/0/0)


No comments: