Please see my other blog for Oracle EBusiness Suite Posts - EBMentors

Search This Blog

Note: All the posts are based on practical approach avoiding lengthy theory. All have been tested on some development servers. Please don’t test any post on production servers until you are sure.

Thursday, February 23, 2017

Hadoop Administration: Accessing HDFS (File system & Shell Commands)

You can access HDFS in many different ways. HDFS provides a native Java application programming interface (API) and a native C-language wrapper for the Java API. In addition, you can use a web browser to browse HDFS files. I'll be using CLI only in this post.



Application
Description
FileSystem (FS) shell
A command-line interface similar to common Linux® and UNIX® shells (bash, csh, etc.) that allows interaction with HDFS data.
DFSAdmin
A command set that you can use to administer an HDFS cluster.
fsck
A subcommand of the Hadoop command/application. You can use the fsck command to check for inconsistencies with files, such as missing blocks, but you cannot use the fsck command to correct these inconsistencies.
Name nodes and data nodes
These have built-in web servers that let administrators check the current status of a cluster.

Shell Commands


Hadoop includes various shell-like commands that directly interact with HDFS and other file systems that Hadoop supports. The command bin/hdfs dfs -help lists the commands supported by Hadoop shell. Furthermore, the command bin/hdfs dfs -help command-name displays more detailed help for a command. These commands support most of the normal files system operations like copying files, changing file permissions, etc. It also supports a few HDFS specific operations like changing replication of files.

FS relates to a generic file system which can point to any file systems like local, HDFS etc. But DFS is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination . But specifying DFS operation relates to HDFS.


FS Shell, The FileSystem (FS) shell is invoked by bin/hadoop fs . All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. Error information is sent to stderr and the output is sent to stdout.


Please review below HDFS shell commands to work with local file system and HDFS.

Read (cat) from local file system
[hdpsysuser@hdpslave1 ~]$ hadoop fs -cat file:///etc/hosts
192.168.44.170 hdpmaster
192.168.44.171 hdpslave1
192.168.44.172 hdpslave2


List from local file system
[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls file:///usr/hadoopsw/
Found 26 items
-rw-------   1 hdpsysuser hdpsysuser        306 2017-02-07 13:21 file:///usr/hadoopsw/.ICEauthority
-rw-------   1 hdpsysuser hdpsysuser        397 2017-02-08 14:33 file:///usr/hadoopsw/.Xauthority
-rw-------   1 hdpsysuser hdpsysuser       4029 2017-02-08 14:58 file:///usr/hadoopsw/.bash_history
-rw-r--r--   1 hdpsysuser hdpsysuser         18 2016-07-12 18:17 file:///usr/hadoopsw/.bash_logout
-rw-r--r--   1 hdpsysuser hdpsysuser        572 2017-02-06 18:12 file:///usr/hadoopsw/.bash_profile
-rw-r--r--   1 hdpsysuser hdpsysuser        231 2016-07-12 18:17 file:///usr/hadoopsw/.bashrc
drwxrwxr-x   - hdpsysuser hdpsysuser       4096 2017-02-08 11:55 file:///usr/hadoopsw/.cache

Create folder in HDFS
[hdpsysuser@hdpslave1 ~]$ hadoop fs -mkdir -p hdfs://hdpmaster:9000/userdata/bukhari
[hdpsysuser@hdpslave1 ~]$ hadoop dfs -mkdir -p hdfs://hdpmaster:9000/userdata/zeeshan

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


[hdpsysuser@hdpslave1 ~]$ hdfs dfs -mkdir -p hdfs://hdpmaster:9000/userdata/Zeeshan

List from HDFS

[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls hdfs://hdpmaster:9000/


Found 1 items
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:14 hdfs://hdpmaster:9000/userdata

[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls hdfs://hdpmaster:9000/userdata
Found 2 items
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:13 hdfs://hdpmaster:9000/userdata/bukhari
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:14 hdfs://hdpmaster:9000/userdata/Zeeshan

Above examples make it clear how you use ‘hadoop’ and ‘hdfs’ commands , now we check with other options available.

Put (file from local file system to the destination file system)
First create file and then put into HDFS.

[hdpsysuser@hdpslave1 ~]$ vi /tmp/mydata.txt
[hdpsysuser@hdpslave1 ~]$ cat /tmp/mydata.txt ##reads from local file system

Name: Inam Ullah Bukhari
Location: Riyadh, Saudi Arabia

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -put /tmp/mydata.txt hdfs://hdpmaster:9000/userdata/bukhari OR
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -put /tmp/mydata.txt /userdata/bukhari 

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -cat hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
Name: Inam Ullah Bukhari
Location: Riyadh, Saudi Arabia


checksum - Returns the checksum information of a file.

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -checksum hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt

hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt MD5-of-0MD5-of-512CRC32C 0000020000000000000000008a94c02ab16c6a5dec8f0ce9cca5beba

Count the number of directories, files and bytes under the paths

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -count hdfs://hdpmaster:9000/userdata/bukhari

1 2 93 hdfs://hdpmaster:9000/userdata/bukhari

Displays free space

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -df hdfs://hdpmaster:9000/userdata/bukhari 


Filesystem Size Used Available Use%
hdfs://hdpmaster:9000 96841113600 41190 88159330304 0%


Displays sizes of files and directories 

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -du hdfs://hdpmaster:9000/userdata/bukhari
59 hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
34 hdfs://hdpmaster:9000/userdata/bukhari/test.txt


Finds all files that match the specified expression 

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -find hdfs://hdpmaster:9000/userdata/ -name tes*
hdfs://hdpmaster:9000/userdata/bukhari/test.txt

Copy files to the local file system 

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -get hdfs://hdpmaster:9000/userdata/bukhari/test.txt /tmp/


Displays the Access Control Lists (ACLs) of files and directories

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -getfacl hdfs://hdpmaster:9000/userdata/bukhari/test.txt

# file: hdfs://hdpmaster:9000/userdata/bukhari/test.txt
# owner: hdpsysuser
# group: supergroup
getfacl: The ACL operation has been rejected. Support for ACLs has been disabled by setting dfs.namenode.acls.enabled to false.


Move files

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -mv hdfs://hdpmaster:9000/userdata/bukhari/test.txt hdfs://hdpmaster:9000/userdata/zeeshan/test.txt


Remove files 

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -rm hdfs://hdpmaster:9000/userdata/zeeshan/mydata.txt


17/02/08 17:02:13 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted hdfs://hdpmaster:9000/userdata/zeeshan/mydata.txt


Delete a directory 

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -rmdir hdfs://hdpmaster:9000/userdata/zeeshan


rmdir: `hdfs://hdpmaster:9000/userdata/zeeshan': Directory is not empty

Delete files and try again 

Print statistics about the file/directory at <path> in the specified format
[hdpsysuser@hdpslave1 ~]$ hadoop fs -stat "%F %u:%g %b %y %n" hdfs://hdpmaster:9000/userdata/zeeshan/test.txt

regular file hdpsysuser:supergroup 34 2017-02-08 13:41:08 test.txt


Format accepts filesize in blocks (%b), type (%F), group name of owner (%g), name (%n), block size (%o), replication (%r), user name of owner(%u), and modification date (%y, %Y). %y shows UTC date as “yyyy-MM-dd HH:mm:ss” and %Y shows milliseconds since January 1, 1970 UTC. If the format is not specified, %y is used by default.
Displays last kilobyte of the file to stdout 

[hdpsysuser@hdpslave1 ~]$ hadoop fs -tail hdfs://hdpmaster:9000/userdata/zeeshan/test.txt

This is 1st line in test.txt file

The -f option will output appended data as the file grows, as in Unix
Create a file of zero length 

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -touchz hdfs://hdpmaster:9000/userdata/bukhari/file1.txt
Download file from HDFS to local file system 

[hdpclient@hadoopedge1 ~]$ hdfs dfs -get /hadoopedge1_data/test.txt /tmp/

Return the help for an individual command 

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -usage mkdir

Usage: hadoop fs [generic options] -mkdir [-p] <path> ...

No comments: