Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Tuesday, March 20, 2018

HDFS Centralized Cache Management



Due to increasing memory capacity, many interesting working sets are able to fit in aggregate cluster memory. By using HDFS centralized cache management, applications can take advantage of the performance benefits of in-memory computation. Cluster cache state is aggregated and controlled by the NameNode, allowing applications schedulers to place their tasks for cache locality. 


If your applications repeatedly access the same data, you can get improved performance by enabling centralized cache management in HDFS. You need to specify paths to directories or files that will be cached by HDFS. The NameNode will communicate with DataNodes that have the desired blocks available on disk, and instruct the DataNodes to cache the blocks in off-heap caches.

Caching Use Cases

Files that are accessed repeatedly: For example, a small fact table in Hive that is often used for joins is a good candidate for caching.


Mixed workloads with performance SLAs: Caching the working set of a high priority workload ensures that it does not compete with low priority workloads for disk I/O.

Caching Terminology


Cache Directive: It defines the path that will be cached either directories or files. Directories are cached non-recursively, meaning only files in the first-level listing of the directory will be cached. 


Cache Pool: It is an administrative entity used to manage groups of Cache Directives. Cache Pools have UNIX-like permissions that restrict which users and groups have access to the pool. Write permissions allow users to add and remove Cache Directives to the pool. Read permissions allow users to list the Cache Directives in a pool, as well as additional metadata. Execute permissions are unused.

Configuring Centralized Caching


Native Libraries : In order to lock block files into memory, the DataNode relies on native JNI code found in libhadoop.so ($HODOOP_HOMElib/native). Be sure to enable JNI if you are using HDFS centralized cache management.


JNI is the Java Native Interface. It defines a way for managed code (written in the Java programming language) to interact with native code (written in C/C++).

Configuration Properties: Configuration properties for centralized caching are specified in the hdfs-site.xml file.

Required Properties


Currently, only one property dfs.datanode.max.locked.memory is required which determines the maximum amount of memory (in bytes) that a DataNode will use for caching.  The "locked-in-memory size" ulimit (ulimit -l) of the DataNode user also needs to be increased to exceed this parameter. On HDP 2.6 , add this property to HDFS service in custom hdfs-site.xml and restart service.

<property>
  <name>dfs.datanode.max.locked.memory</name>
  <value>268435456</value>

</property> 



[root@nn01 ~]# ulimit -l

64

[root@nn01 ~]# ulimit -Ha
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127526
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 127526
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited



Set memlock limits on each datanode. This will take effect after you logout and login again.

# On each datanode (max cacheable memory in KB) example for 64KB

echo "* hard  memlock 65536" >> /etc/security/limits.conf
echo "* soft  memlock 65536" >> /etc/security/limits.conf

[root@dn01 ~]# echo "* hard  memlock 65536" >> /etc/security/limits.conf
[root@dn01 ~]# echo "* soft  memlock 65536" >> /etc/security/limits.conf

[root@nn01 ~]# ulimit -Sa

 -H option instructs the command to display hard resource limits.

 -S option instructs the command to display soft resource limits.


Optional Properties

The following properties are not required, but can be specified for tuning.

dfs.namenode.path.based.cache.refresh.interval.ms : NameNode will use this value as the number of milliseconds between subsequent cache path re-scans.

<property>
  <name>dfs.namenode.path.based.cache.refresh.interval.ms</name>
  <value>300000</value>
</property>

dfs.time.between.resending.caching.directives.ms : NameNode will use this value as the number of milliseconds between resending caching directives.

<property>
  <name>dfs.time.between.resending.caching.directives.ms</name>
  <value>300000</value>
</property>

dfs.datanode.fsdatasetcache.max.threads.per.volume : DataNode will use this value as the maximum number of threads per volume to use for caching new data.

<property>
  <name>dfs.datanode.fsdatasetcache.max.threads.per.volume</name>
  <value>4</value>
</property>

dfs.cachereport.intervalMsec : DataNode will use this value as the number of milliseconds between sending a full report of its cache state to the NameNode.

<property>
  <name>dfs.cachereport.intervalMsec</name>
  <value>10000</value>
</property>


dfs.namenode.path.based.cache.block.map.allocation.percent : The percentage of the Java heap that will be allocated to the cached blocks map. 

<property>
  <name>dfs.namenode.path.based.cache.block.map.allocation.percent</name>
  <value>0.25</value>

</property>

OS Limits


If you get the error "Cannot start datanode because the configured max locked memory size...is more than the datanode's available RLIMIT_MEMLOCK ulimit," that means that the operating system is imposing a lower limit on the amount of memory that you can lock than what you have configured. To fix this, you must adjust the ulimit -l value that the DataNode runs with. 


Set the memlock (max locked-in-memory address space) to a slightly higher value than dfs.datanode.max.locked.memory property . For example, to set memlock to 130 KB (130,000 bytes) for the hdfs user, you would add the following line to /etc/security/limits.conf.


hdfs - memlock 130



Using Cache Pools and Directives

You can use the Command-Line Interface (CLI) to create, modify, and list Cache Pools and Cache Directives via the hdfs cacheadmin subcommand. Cache Directives are identified by a unique, non-repeating, 64-bit integer ID. IDs will not be reused even if a Cache Directive is removed. Cache Pools are identified by a unique string name. You must first create a Cache Pool, and then add Cache Directives to the Cache Pool.

-- Add a new Cache Pool

hdfs cacheadmin -addPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>] 

[hdfs@nn01 ~]$ hdfs cacheadmin -addPool cachePool1
Successfully added cache pool cachePool1.

--List existing pools

[hdfs@nn01 ~]$ hdfs cacheadmin -listPools
Found 1 result.
NAME        OWNER  GROUP   MODE            LIMIT  MAXTTL
cachePool1  hdfs   hadoop  rwxr-xr-x   unlimited   never

--List with additional stats

[hdfs@nn01 ~]$ hdfs cacheadmin -listPools -stats
Found 1 result.
NAME        OWNER  GROUP   MODE            LIMIT  MAXTTL  BYTES_NEEDED  BYTES_CACHED  BYTES_OVERLIMIT  FILES_NEEDED  FILES_CACHED
cachePool1  hdfs   hadoop  rwxr-xr-x   unlimited   never             0             0                0             0             0

--List for specific cache pool

[hdfs@en01 ~]$ hdfs cacheadmin -listPools -stats cachePool1
Found 1 result.
NAME        OWNER  GROUP   MODE            LIMIT  MAXTTL  BYTES_NEEDED  BYTES_CACHED  BYTES_OVERLIMIT  FILES_NEEDED  FILES_CACHED
cachePool1  hdfs   hadoop  rwxr-xr-x   unlimited   never             0             0                0             0             0

-- Get help
[hdfs@en01 ~]$ hdfs cacheadmin -help

-- -- Add a new Cache Directive
hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>] 

[hdfs@nn01 ~]$ hdfs cacheadmin -addDirective -path /data/employee/emp.csv -pool cachePool1
Added cache directive 1

[hdfs@en01 ~]$ hdfs cacheadmin -addDirective -path /data/employee/emp.csv -pool cachePool1 -replication 3

Added cache directive 2

--List directives

[hdfs@nn01 ~]$ hdfs cacheadmin -listDirectives
Found 1 entry
 ID POOL         REPL EXPIRY  PATH
  1 cachePool1      1 never   /data/employee/emp.csv


[hdfs@nn01 ~]$ hdfs cacheadmin -listPools -stats cachePool1
Found 1 result.
NAME        OWNER  GROUP   MODE            LIMIT  MAXTTL  BYTES_NEEDED  BYTES_CACHED  BYTES_OVERLIMIT  FILES_NEEDED  FILES_CACHED
cachePool1  hdfs   hadoop  rwxr-xr-x   unlimited   never           617             0                0             1             0

-- Remove a Cache Directive
[hdfs@en01 ~]$ hdfs cacheadmin -removeDirective 1
Removed cached directive 1


-- Remove all of the Cache Directives in a specified path.
hdfs cacheadmin -removeDirectives <path> 


-- Look at the datanode stats, see that our one of DN is using 1 page of cache as our directive was with replication 1

[hdfs@nn01 ~]$ hdfs dfsadmin -report
Configured Capacity: 4366617690624 (3.97 TB)
Present Capacity: 4186504465137 (3.81 TB)
DFS Remaining: 4052661449784 (3.69 TB)
DFS Used: 133843015353 (124.65 GB)
DFS Used%: 3.20%
Under replicated blocks: 63
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (3):

Name: 192.168.xx.xxx:50010 (dn02)
Hostname: dn02
Decommission Status : Normal
Configured Capacity: 1455539230208 (1.32 TB)
DFS Used: 44614710332 (41.55 GB)
Non DFS Used: 59216794052 (55.15 GB)
DFS Remaining: 1351170855272 (1.23 TB)
DFS Used%: 3.07%
DFS Remaining%: 92.83%
Configured Cache Capacity: 65536 (64 KB)
Cache Used: 0 (0 B)
Cache Remaining: 65536 (64 KB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 10
Last contact: Tue Mar 20 14:03:33 AST 2018


Name: 192.168.xx.xxx:50010 (dn03)
Hostname: dn03
Decommission Status : Normal
Configured Capacity: 1455539230208 (1.32 TB)
DFS Used: 44614280252 (41.55 GB)
Non DFS Used: 59214389700 (55.15 GB)
DFS Remaining: 1351173689704 (1.23 TB)
DFS Used%: 3.07%
DFS Remaining%: 92.83%
Configured Cache Capacity: 65536 (64 KB)
Cache Used: 4096 (4 KB)
Cache Remaining: 61440 (60 KB)
Cache Used%: 6.25%
Cache Remaining%: 93.75%
Xceivers: 10
Last contact: Tue Mar 20 14:03:33 AST 2018


Name: 192.168.xx.xxx:50010 (dn01)
Hostname: dn01
Decommission Status : Normal
Configured Capacity: 1455539230208 (1.32 TB)
DFS Used: 44614024769 (41.55 GB)
Non DFS Used: 60071430079 (55.95 GB)
DFS Remaining: 1350316904808 (1.23 TB)
DFS Used%: 3.07%
DFS Remaining%: 92.77%
Configured Cache Capacity: 65536 (64 KB)
Cache Used: 0 (0 B)
Cache Remaining: 65536 (64 KB)
Cache Used%: 0.00%
Cache Remaining%: 100.00%
Xceivers: 10
Last contact: Tue Mar 20 14:03:34 AST 2018