DBMentors - Inam Bukhari's Blog: March 2018

Tuesday, March 20, 2018

HDFS Centralized Cache Management

Due to increasing memory capacity, many interesting working sets are able to fit in aggregate cluster memory. By using HDFS centralized cache management, applications can take advantage of the performance benefits of in-memory computation. Cluster cache state is aggregated and controlled by the NameNode, allowing applications schedulers to place their tasks for cache locality.

Configuring ACLs on HDFS

ACLs extend the HDFS permission model to support more granular file access based on arbitrary combinations of users and groups. We will discuss how to use Access Control Lists (ACLs) on the Hadoop Distributed File System (HDFS).

Spooling Files to HBase using Flume

Scenario:

One of my team wants to upload the contents of file existing in a specific directory (spooling dir) to HBase for some analysis. For the purpose we will be using Flume's spooldir-source which will allow users and applications to place files in spooling dir and process each line as one event to put it in HBase. It is assumed that Hadoop cluster and HBase is running, our environment is on HDP 2.6.

DBMentors - Inam Bukhari's Blog

Pages

Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Tuesday, March 20, 2018

HDFS Centralized Cache Management

Configuring ACLs on HDFS

Tuesday, March 13, 2018

Spooling Files to HBase using Flume

Translate

Followers

Labels

Blog Archive

About Me

Total Pageviews