Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Tuesday, March 20, 2018

Configuring ACLs on HDFS


ACLs extend the HDFS permission model to support more granular file access based on arbitrary combinations of users and groups. We will discuss how to use Access Control Lists (ACLs) on the Hadoop Distributed File System (HDFS).



Use Cases for ACLs on HDFS


Multiple Users: In this use case, multiple users require Read access to a file. None of the users are the owner of the file. The users are not members of a common group, so it is impossible to use group Permission Bits.


Multiple Groups: In this use case, multiple groups require Read and Write access to a file. There is no group containing all of the group members, so it is impossible to use group Permission Bits.

Hive Partitioned Tables: In this use case, Hive contains a partitioned table of your data. Hive persists partitioned tables using a separate subdirectory for each distinct value of the partition key and you want to have ACL for different users or groups with different permissions based on the partition key. 


Default ACLs: In this use case, a file system administrator or sub-tree owner would like to define an access policy that will be applied to the entire sub-tree. This access policy must apply not only to the current set of files and directories, but also to any new files and directories that are added later.


This use case can be addressed by setting a default ACL on the directory. The default ACL can contain any arbitrary combination of entries. For example:

default:user::rwx
default:group::r--


Remember default ACLs are never considered during permission enforcement. They are only used to define the ACL that new files and subdirectories will receive automatically when they are created.


Minimal ACL/Permissions Only: HDFS ACLs support deployments that may want to use only Permission Bits and not ACLs with named user and group entries. Permission Bits are equivalent to a minimal ACL containing only 3 entries. For example:

user::rw-
group::r--
others::--- 


Block Access to a Sub-Tree for a Specific User:  In this use case, a deeply nested file system sub-tree was created as world-readable, followed by a subsequent requirement to block access for a specific user to all files in that sub-tree.


ACLs with Sticky Bit: In this use case, multiple named users or named groups require full access to a shared directory, such as "/tmp". However, Write and Execute permissions on the directory also give users the ability to delete or rename any files in the directory, even files created by other users. Users must be restricted so that they are only allowed to delete or rename files that they created.


This use case can be addressed by combining an ACL with the sticky bit. The sticky bit is existing functionality that currently works with Permission Bits. It will continue to work as expected in combination with ACLs.

Configuration



ACLs are disabled by default. When ACLs are disabled, the NameNode rejects all attempts to set an ACL. Set property  dfs.namenode.acls.enabledto "true" to enable support for ACLs in the hdfs-site.xml file.

<property>
  <name>dfs.namenode.acls.enabled</name>
  <value>true</value>
</property>

On HDP 2.6 add the property in custom hdfs-site and restart the HDFS service.


Two new sub-commands are added to FsShell: setfacl and getfacl.  setfacl sets ACLs for files and directories. getfacl displays the ACLs of files and directories. If a directory has a default ACL, getfacl also displays the default ACL.

ACL Options
-b Remove all entries, but retain the base ACL entries.
-k Remove the default ACL.
-R Apply operations to all files and directories recursively.
-m Modify the ACL. New entires are added to the ACL, and existing entries are retained.
-x Remove the specified ACL entires. All other ACL entries are retained.
--set Fully replace the ACL and discard all existing entries.
<acl_spec> A comma-separated list of ACL entries.

Syntax

-setfacl [-bkR] {-m|-x} <acl_spec> <path>
-setfacl --set <acl_spec> <path>

Examples:

hdfs dfs -setfacl -m user:hadoop:rw- /file
hdfs dfs -setfacl -x user:hadoop /file
hdfs dfs -setfacl -b /file
hdfs dfs -setfacl -k /dir
hdfs dfs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- /file
hdfs dfs -setfacl -R -m user:hadoop:r-x /dir
hdfs dfs -setfacl -m default:user:hadoop:r-x /dir


--get ACL

[hdfs@en01 ~]$ hdfs dfs -getfacl /data/employee
# file: /data/employee
# owner: hdfs
# group: hdfs
user::rwx
user:hadoop:rw-
group::rwx
mask::rwx
other::r-x

-- Remove all entries, but retain the base ACL entries.

[hdfs@en01 ~]$ hdfs dfs -setfacl -b /data/employee

-- verify ACL removal
[hdfs@en01 ~]$ hdfs dfs -getfacl /data/employee
# file: /data/employee
# owner: hdfs
# group: hdfs
user::rwx
group::rwx
other::r-x

-- set ACL for user hdpclient on a folder only read permission

[hdfs@rp-en01 ~]$ hdfs dfs -setfacl -m -R user:hdpclient:r-- /data/employee

-- verify the permission

[hdfs@en01 ~]$ hdfs dfs -getfacl  /data/employee
# file: /data/employee
# owner: hdfs
# group: hdfs
user::rwx
user:hdpclient:r--
group::rwx
mask::rwx
other::r-x

--Verify from the user

[hdpclient@en01 ~]$ hdfs dfs -ls /data/employee
ls: Permission denied: user=hdpclient, access=READ_EXECUTE, inode="/data/employee":hdfs:hdfs:drwxrwxr-x

--grant execute permission also
[hdfs@en01 ~]$ hdfs dfs -setfacl -m -R user:hdpclient:r-x /data/employee

--Verify from user again and this time you get success 

[hdpclient@en01 ~]$ hdfs dfs -ls /data/employee
Found 1 items
-rwxrwxr-x+  3 hdfs hdfs      12288 2017-12-21 14:50 /data/employee/employee.dmp

--Try to copy the file to folder
[hdpclient@en01 ~]$ hdfs dfs -put /data/mydata/emp.csv /data/employee
put: Permission denied: user=hdpclient, access=WRITE, inode="/data/employee/emp.csv._COPYING_":hdfs:hdfs:drwxrwxr-x

-- set ACL for write
[hdfs@en01 ~]$ hdfs dfs -setfacl -m -R user:hdpclient:rwx /data/employee

--Try again to copy the file to folder, this you get success
[hdpclient@en01 ~]$ hdfs dfs -put /data/mydata/emp.csv /data/employee

Note:


The plus (CLI indicator) symbol (+) is appended to the listed permissions of any file or directory with an associated ACL. To view, use the ls -l command.

[hdfs@en01 ~]$ hdfs dfs -ls /data
Found 5 items
drwxrwxr-x+  - hdfs   hdfs              0 2018-03-20 10:55 /data/employee
drwxrwxrwx   - flume  hdfs              0 2017-10-29 10:43 /data/flume
drwxrwxrwx   - hdfs   hdfs              0 2017-11-22 11:56 /data/images
drwxrwxr-x   - oracle oinstall          0 2018-01-10 10:30 /data/oraclenfs
drwxrwxrwx   - hdfs   hdfs              0 2017-08-20 12:16 /data/talend


Using ACLs does impact NameNode performance. It is therefore recommended that you use Permission Bits, if adequate, before using ACLs.