Hadoop Administration: Accessing HDFS (File system & Shell Commands)

You can access HDFS in many different ways. HDFS provides a native Java application programming interface (API) and a native C-language wrapper for the Java API. In addition, you can use a web browser to browse HDFS files. I'll be using CLI only in this post.

Application	Description
FileSystem (FS) shell	A command-line interface similar to common Linux® and UNIX® shells (bash, csh, etc.) that allows interaction with HDFS data.
DFSAdmin	A command set that you can use to administer an HDFS cluster.
fsck	A subcommand of the Hadoop command/application. You can use the fsck command to check for inconsistencies with files, such as missing blocks, but you cannot use the fsck command to correct these inconsistencies.
Name nodes and data nodes	These have built-in web servers that let administrators check the current status of a cluster.

Shell Commands

Hadoop includes various shell-like commands that directly interact with HDFS and other file systems that Hadoop supports. The command bin/hdfs dfs -help lists the commands supported by Hadoop shell. Furthermore, the command bin/hdfs dfs -help command-name displays more detailed help for a command. These commands support most of the normal files system operations like copying files, changing file permissions, etc. It also supports a few HDFS specific operations like changing replication of files.

FS relates to a generic file system which can point to any file systems like local, HDFS etc. But DFS is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination . But specifying DFS operation relates to HDFS.

FS Shell, The FileSystem (FS) shell is invoked by bin/hadoop fs . All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. Error information is sent to stderr and the output is sent to stdout.

Please review below HDFS shell commands to work with local file system and HDFS.

Read (cat) from local file system

[hdpsysuser@hdpslave1 ~]$ hadoop fs -cat file:///etc/hosts

192.168.44.170 hdpmaster
192.168.44.171 hdpslave1
192.168.44.172 hdpslave2

List from local file system

[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls file:///usr/hadoopsw/

Found 26 items

-rw------- 1 hdpsysuser hdpsysuser 306 2017-02-07 13:21 file:///usr/hadoopsw/.ICEauthority

-rw------- 1 hdpsysuser hdpsysuser 397 2017-02-08 14:33 file:///usr/hadoopsw/.Xauthority

-rw------- 1 hdpsysuser hdpsysuser 4029 2017-02-08 14:58 file:///usr/hadoopsw/.bash_history

-rw-r--r-- 1 hdpsysuser hdpsysuser 18 2016-07-12 18:17 file:///usr/hadoopsw/.bash_logout

-rw-r--r-- 1 hdpsysuser hdpsysuser 572 2017-02-06 18:12 file:///usr/hadoopsw/.bash_profile

-rw-r--r-- 1 hdpsysuser hdpsysuser 231 2016-07-12 18:17 file:///usr/hadoopsw/.bashrc

drwxrwxr-x - hdpsysuser hdpsysuser 4096 2017-02-08 11:55 file:///usr/hadoopsw/.cache

Create folder in HDFS
[hdpsysuser@hdpslave1 ~]$ hadoop fs -mkdir -p hdfs://hdpmaster:9000/userdata/bukhari
[hdpsysuser@hdpslave1 ~]$ hadoop dfs -mkdir -p hdfs://hdpmaster:9000/userdata/zeeshan

DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -mkdir -p hdfs://hdpmaster:9000/userdata/Zeeshan

List from HDFS

[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls hdfs://hdpmaster:9000/

Found 1 items
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:14 hdfs://hdpmaster:9000/userdata

[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls hdfs://hdpmaster:9000/userdata
Found 2 items
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:13 hdfs://hdpmaster:9000/userdata/bukhari
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:14 hdfs://hdpmaster:9000/userdata/Zeeshan

Above examples make it clear how you use ‘hadoop’ and ‘hdfs’ commands , now we check with other options available.

Put (file from local file system to the destination file system)

First create file and then put into HDFS.

[hdpsysuser@hdpslave1 ~]$ vi /tmp/mydata.txt
[hdpsysuser@hdpslave1 ~]$ cat /tmp/mydata.txt ##reads from local file system
Name: Inam Ullah Bukhari
Location: Riyadh, Saudi Arabia

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -put /tmp/mydata.txt hdfs://hdpmaster:9000/userdata/bukhari OR
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -put /tmp/mydata.txt /userdata/bukhari
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -cat hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
Name: Inam Ullah Bukhari
Location: Riyadh, Saudi Arabia

checksum - Returns the checksum information of a file.

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -checksum hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt

hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt MD5-of-0MD5-of-512CRC32C 0000020000000000000000008a94c02ab16c6a5dec8f0ce9cca5beba

Count the number of directories, files and bytes under the paths

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -count hdfs://hdpmaster:9000/userdata/bukhari
1 2 93 hdfs://hdpmaster:9000/userdata/bukhari

Displays free space

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -df hdfs://hdpmaster:9000/userdata/bukhari

Filesystem Size Used Available Use%
hdfs://hdpmaster:9000 96841113600 41190 88159330304 0%

Displays sizes of files and directories

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -du hdfs://hdpmaster:9000/userdata/bukhari 59 hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
34 hdfs://hdpmaster:9000/userdata/bukhari/test.txt

Finds all files that match the specified expression

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -find hdfs://hdpmaster:9000/userdata/ -name tes* hdfs://hdpmaster:9000/userdata/bukhari/test.txt

Copy files to the local file system

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -get hdfs://hdpmaster:9000/userdata/bukhari/test.txt /tmp/

Displays the Access Control Lists (ACLs) of files and directories

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -getfacl hdfs://hdpmaster:9000/userdata/bukhari/test.txt

# file: hdfs://hdpmaster:9000/userdata/bukhari/test.txt
# owner: hdpsysuser
# group: supergroup
getfacl: The ACL operation has been rejected. Support for ACLs has been disabled by setting dfs.namenode.acls.enabled to false.

Move files

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -mv hdfs://hdpmaster:9000/userdata/bukhari/test.txt hdfs://hdpmaster:9000/userdata/zeeshan/test.txt

Remove files

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -rm hdfs://hdpmaster:9000/userdata/zeeshan/mydata.txt

17/02/08 17:02:13 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted hdfs://hdpmaster:9000/userdata/zeeshan/mydata.txt

Delete a directory

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -rmdir hdfs://hdpmaster:9000/userdata/zeeshan

rmdir: `hdfs://hdpmaster:9000/userdata/zeeshan': Directory is not empty

Delete files and try again

Print statistics about the file/directory at <path> in the specified format

[hdpsysuser@hdpslave1 ~]$ hadoop fs -stat "%F %u:%g %b %y %n" hdfs://hdpmaster:9000/userdata/zeeshan/test.txt

regular file hdpsysuser:supergroup 34 2017-02-08 13:41:08 test.txt

Format accepts filesize in blocks (%b), type (%F), group name of owner (%g), name (%n), block size (%o), replication (%r), user name of owner(%u), and modification date (%y, %Y). %y shows UTC date as “yyyy-MM-dd HH:mm:ss” and %Y shows milliseconds since January 1, 1970 UTC. If the format is not specified, %y is used by default.

Displays last kilobyte of the file to stdout

[hdpsysuser@hdpslave1 ~]$ hadoop fs -tail hdfs://hdpmaster:9000/userdata/zeeshan/test.txt
This is 1st line in test.txt file

The -f option will output appended data as the file grows, as in Unix
Create a file of zero length

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -touchz hdfs://hdpmaster:9000/userdata/bukhari/file1.txt
Download file from HDFS to local file system

[hdpclient@hadoopedge1 ~]$ hdfs dfs -get /hadoopedge1_data/test.txt /tmp/

File count in an HDFS directory
1-
[hdfs@te1-hdp-rp-en01 ~]$ hdfs dfs -count /flume/twitter
1 11172 293202863 /flume/twitter

2-
[hdfs@te1-hdp-rp-en01 ~]$ hadoop fs -count /flume/twitter
1 11174 293299721 /flume/twitter

3-
[hdfs@te1-hdp-rp-en01 ~]$ hadoop fs -count -q /flume/twitter
none inf none inf 1 11175 293332991 /flume/twitter

4-
for i in `hdfs dfs -ls -R <DIRECTORY_PATH> | awk '{print $8}'`; do echo $i ; hdfs dfs -cat $i | wc -l; done

[hdfs@te1-hdp-rp-en01 ~]$ for i in `hdfs dfs -ls -R /flume/twitter | awk '{print $8}'`; do echo $i ; hdfs dfs -cat $i | wc -l; done
/flume/twitter/FlumeData.1514276106140
12
/flume/twitter/FlumeData.1514276143686

11

It will recursively list the files in <DIRECTORY_PATH> and then print the number of lines in each file.

Check the locations of file blocks

hdfs fsck / -files -locations

Check the locations of file blocks containing rack information

hdfs fsck / -files -blocks -racks

Delete corrupted files

hdfs fsck -delete

Move corrupted files to /lost+found

hdfs fsck -move

List all the active TaskTrackers

mapred job -list-active-trackers

List all the running jobs

mapred job -list

List all the submitted jobs since the start of the cluster

mapred job -list all

Check the status of the default queue

mapred queue -list

Check the status of a queue ACL

hadoop queue -showacls

Show all the jobs in the default queue

hadoop queue -info default -showJobs

Check the status of a job

hadoop job -status job_201302152353_0001

Set the job job_201302152353_0001 to be on high priority

hadoop job -set-priority job_201302152353_0003 HIGH

Empty the trash

hdfs dfs -expunge

Back up block locations of the data on HDFS

hdfs fsck / -files -blocks -locations > dfs.block.locations.fsck.backup

Save the list of all files on the HDFS filesystem

hdfs dfs -ls -R / > dfs.namespace.lsr.backup

Dump Hadoop config

hdfs org.apache.hadoop.conf.Configuration

Return the help for an individual command

[hdpsysuser@hdpslave1 ~]$ hdfs dfs -usage mkdir
Usage: hadoop fs [generic options] -mkdir [-p] <path> ...

3 comments:

Unknown said...: hi ,your post on hadoop hdfs helped to understand the basic commands and made easy to crack my interview thanks for your post do keep posting your blog Hadoop Training in Velachery | Hadoop Training .
Hadoop Training in Chennai | Hadoop .; April 21, 2018 2:18 PM
rmouniak said...: Thank you for your guide to with upgrade information about Hadoop
Hadoop administration Online Training Hyderabad; November 21, 2018 9:35 AM
veera said...: Thank you for sharing your knowledge with me.In this blog very useful for me.
Thank you.Keep sharing....

hadoop administration online training; May 11, 2020 2:07 PM