You can access HDFS in many different ways. HDFS provides a native Java application programming interface (API) and a native C-language wrapper for the Java API. In addition, you can use a web browser to browse HDFS files. I'll be using CLI only in this post.
Please review below HDFS shell commands to work with local file system and HDFS.
192.168.44.171 hdpslave1
192.168.44.172 hdpslave2
Create folder in HDFS
[hdpsysuser@hdpslave1 ~]$ hadoop fs -mkdir -p hdfs://hdpmaster:9000/userdata/bukhari
[hdpsysuser@hdpslave1 ~]$ hadoop dfs -mkdir -p hdfs://hdpmaster:9000/userdata/zeeshan
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -mkdir -p hdfs://hdpmaster:9000/userdata/Zeeshan
List from HDFS
[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls hdfs://hdpmaster:9000/
Found 1 items
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:14 hdfs://hdpmaster:9000/userdata
[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls hdfs://hdpmaster:9000/userdata
Found 2 items
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:13 hdfs://hdpmaster:9000/userdata/bukhari
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:14 hdfs://hdpmaster:9000/userdata/Zeeshan
Above examples make it clear how you use ‘hadoop’ and ‘hdfs’ commands , now we check with other options available.
[hdpsysuser@hdpslave1 ~]$ vi /tmp/mydata.txt
[hdpsysuser@hdpslave1 ~]$ cat /tmp/mydata.txt ##reads from local file system
Name: Inam Ullah Bukhari
Location: Riyadh, Saudi Arabia
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -put /tmp/mydata.txt hdfs://hdpmaster:9000/userdata/bukhari OR
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -put /tmp/mydata.txt /userdata/bukhari
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -cat hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
Name: Inam Ullah Bukhari
Location: Riyadh, Saudi Arabia
checksum - Returns the checksum information of a file.
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -checksum hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt MD5-of-0MD5-of-512CRC32C 0000020000000000000000008a94c02ab16c6a5dec8f0ce9cca5beba
Count the number of directories, files and bytes under the paths
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -count hdfs://hdpmaster:9000/userdata/bukhari
1 2 93 hdfs://hdpmaster:9000/userdata/bukhari
Displays free space
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -df hdfs://hdpmaster:9000/userdata/bukhari
Filesystem Size Used Available Use%
hdfs://hdpmaster:9000 96841113600 41190 88159330304 0%
Displays sizes of files and directories
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -du hdfs://hdpmaster:9000/userdata/bukhari 59 hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
34 hdfs://hdpmaster:9000/userdata/bukhari/test.txt
Finds all files that match the specified expression
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -find hdfs://hdpmaster:9000/userdata/ -name tes* hdfs://hdpmaster:9000/userdata/bukhari/test.txt
Copy files to the local file system
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -get hdfs://hdpmaster:9000/userdata/bukhari/test.txt /tmp/
Displays the Access Control Lists (ACLs) of files and directories
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -getfacl hdfs://hdpmaster:9000/userdata/bukhari/test.txt
# file: hdfs://hdpmaster:9000/userdata/bukhari/test.txt
# owner: hdpsysuser
# group: supergroup
getfacl: The ACL operation has been rejected. Support for ACLs has been disabled by setting dfs.namenode.acls.enabled to false.
Move files
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -mv hdfs://hdpmaster:9000/userdata/bukhari/test.txt hdfs://hdpmaster:9000/userdata/zeeshan/test.txt
Remove files
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -rm hdfs://hdpmaster:9000/userdata/zeeshan/mydata.txt
17/02/08 17:02:13 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted hdfs://hdpmaster:9000/userdata/zeeshan/mydata.txt
Delete a directory
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -rmdir hdfs://hdpmaster:9000/userdata/zeeshan
rmdir: `hdfs://hdpmaster:9000/userdata/zeeshan': Directory is not empty
Delete files and try again
Print statistics about the file/directory at <path> in the specified format
[hdpsysuser@hdpslave1 ~]$ hadoop fs -tail hdfs://hdpmaster:9000/userdata/zeeshan/test.txt
This is 1st line in test.txt file
The -f option will output appended data as the file grows, as in Unix
Create a file of zero length
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -touchz hdfs://hdpmaster:9000/userdata/bukhari/file1.txt
Download file from HDFS to local file system
[hdpclient@hadoopedge1 ~]$ hdfs dfs -get /hadoopedge1_data/test.txt /tmp/
File count in an HDFS directory
1-
[hdfs@te1-hdp-rp-en01 ~]$ hdfs dfs -count /flume/twitter
1 11172 293202863 /flume/twitter
2-
[hdfs@te1-hdp-rp-en01 ~]$ hadoop fs -count /flume/twitter
1 11174 293299721 /flume/twitter
3-
[hdfs@te1-hdp-rp-en01 ~]$ hadoop fs -count -q /flume/twitter
none inf none inf 1 11175 293332991 /flume/twitter
4-
for i in `hdfs dfs -ls -R <DIRECTORY_PATH> | awk '{print $8}'`; do echo $i ; hdfs dfs -cat $i | wc -l; done
[hdfs@te1-hdp-rp-en01 ~]$ for i in `hdfs dfs -ls -R /flume/twitter | awk '{print $8}'`; do echo $i ; hdfs dfs -cat $i | wc -l; done
/flume/twitter/FlumeData.1514276106140
12
/flume/twitter/FlumeData.1514276143686
11
It will recursively list the files in <DIRECTORY_PATH> and then print the number of lines in each file.
Check the locations of file blocks
hdfs fsck / -files -locations
Check the locations of file blocks containing rack information
hdfs fsck / -files -blocks -racks
Delete corrupted files
hdfs fsck -delete
Move corrupted files to /lost+found
hdfs fsck -move
List all the active TaskTrackers
mapred job -list-active-trackers
List all the running jobs
mapred job -list
List all the submitted jobs since the start of the cluster
mapred job -list all
Check the status of the default queue
mapred queue -list
Check the status of a queue ACL
hadoop queue -showacls
Show all the jobs in the default queue
hadoop queue -info default -showJobs
Check the status of a job
hadoop job -status job_201302152353_0001
Set the job job_201302152353_0001 to be on high priority
hadoop job -set-priority job_201302152353_0003 HIGH
Empty the trash
hdfs dfs -expunge
Back up block locations of the data on HDFS
hdfs fsck / -files -blocks -locations > dfs.block.locations.fsck.backup
Save the list of all files on the HDFS filesystem
hdfs dfs -ls -R / > dfs.namespace.lsr.backup
Dump Hadoop config
hdfs org.apache.hadoop.conf.Configuration
Return the help for an individual command
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -usage mkdir
Usage: hadoop fs [generic options] -mkdir [-p] <path> ...
Application
|
Description
|
FileSystem (FS)
shell
|
A command-line
interface similar to common Linux® and UNIX® shells (bash, csh, etc.) that
allows interaction with HDFS data.
|
DFSAdmin
|
A command set that
you can use to administer an HDFS cluster.
|
fsck
|
A subcommand of the
Hadoop command/application. You can use the fsck command to check for inconsistencies with files, such as
missing blocks, but you cannot use the fsck command to
correct these inconsistencies.
|
Name nodes and data
nodes
|
These have built-in
web servers that let administrators check the current status of a cluster.
|
Shell Commands
Hadoop includes various shell-like commands that directly interact with HDFS and other file systems that Hadoop supports. The command bin/hdfs dfs -help lists the commands supported by Hadoop shell. Furthermore, the command bin/hdfs dfs -help command-name displays more detailed help for a command. These commands support most of the normal files system operations like copying files, changing file permissions, etc. It also supports a few HDFS specific operations like changing replication of files.
FS relates to a generic
file system which can point to any file systems like local, HDFS etc. But DFS is
very specific to HDFS. So when we use FS it can perform
operation with from/to local or hadoop distributed file system to destination .
But specifying DFS operation relates to HDFS.
FS Shell, The FileSystem (FS) shell is invoked by bin/hadoop fs . All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. Error information is sent to stderr and the output is sent to stdout.
Read (cat) from local file system
[hdpsysuser@hdpslave1 ~]$ hadoop fs -cat file:///etc/hosts
192.168.44.170 hdpmaster192.168.44.171 hdpslave1
192.168.44.172 hdpslave2
List from local file system
[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls file:///usr/hadoopsw/
Found 26 items
-rw-------
1 hdpsysuser hdpsysuser 306
2017-02-07 13:21 file:///usr/hadoopsw/.ICEauthority
-rw-------
1 hdpsysuser hdpsysuser 397
2017-02-08 14:33 file:///usr/hadoopsw/.Xauthority
-rw-------
1 hdpsysuser hdpsysuser 4029
2017-02-08 14:58 file:///usr/hadoopsw/.bash_history
-rw-r--r--
1 hdpsysuser hdpsysuser 18
2016-07-12 18:17 file:///usr/hadoopsw/.bash_logout
-rw-r--r--
1 hdpsysuser hdpsysuser 572
2017-02-06 18:12 file:///usr/hadoopsw/.bash_profile
-rw-r--r--
1 hdpsysuser hdpsysuser 231
2016-07-12 18:17 file:///usr/hadoopsw/.bashrc
drwxrwxr-x
- hdpsysuser hdpsysuser 4096
2017-02-08 11:55 file:///usr/hadoopsw/.cache
Create folder in HDFS
[hdpsysuser@hdpslave1 ~]$ hadoop fs -mkdir -p hdfs://hdpmaster:9000/userdata/bukhari
[hdpsysuser@hdpslave1 ~]$ hadoop dfs -mkdir -p hdfs://hdpmaster:9000/userdata/zeeshan
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -mkdir -p hdfs://hdpmaster:9000/userdata/Zeeshan
List from HDFS
[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls hdfs://hdpmaster:9000/
Found 1 items
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:14 hdfs://hdpmaster:9000/userdata
[hdpsysuser@hdpslave1 ~]$ hadoop fs -ls hdfs://hdpmaster:9000/userdata
Found 2 items
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:13 hdfs://hdpmaster:9000/userdata/bukhari
drwxr-xr-x - hdpsysuser supergroup 0 2017-02-08 16:14 hdfs://hdpmaster:9000/userdata/Zeeshan
Above examples make it clear how you use ‘hadoop’ and ‘hdfs’ commands , now we check with other options available.
Put (file from local file system to the destination file system)
First create file and then put into HDFS.[hdpsysuser@hdpslave1 ~]$ vi /tmp/mydata.txt
[hdpsysuser@hdpslave1 ~]$ cat /tmp/mydata.txt ##reads from local file system
Name: Inam Ullah Bukhari
Location: Riyadh, Saudi Arabia
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -put /tmp/mydata.txt /userdata/bukhari
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -cat hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
Name: Inam Ullah Bukhari
Location: Riyadh, Saudi Arabia
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -checksum hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt MD5-of-0MD5-of-512CRC32C 0000020000000000000000008a94c02ab16c6a5dec8f0ce9cca5beba
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -count hdfs://hdpmaster:9000/userdata/bukhari
1 2 93 hdfs://hdpmaster:9000/userdata/bukhari
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -df hdfs://hdpmaster:9000/userdata/bukhari
Filesystem Size Used Available Use%
hdfs://hdpmaster:9000 96841113600 41190 88159330304 0%
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -du hdfs://hdpmaster:9000/userdata/bukhari 59 hdfs://hdpmaster:9000/userdata/bukhari/mydata.txt
34 hdfs://hdpmaster:9000/userdata/bukhari/test.txt
Finds all files that match the specified expression
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -find hdfs://hdpmaster:9000/userdata/ -name tes* hdfs://hdpmaster:9000/userdata/bukhari/test.txt
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -get hdfs://hdpmaster:9000/userdata/bukhari/test.txt /tmp/
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -getfacl hdfs://hdpmaster:9000/userdata/bukhari/test.txt
# file: hdfs://hdpmaster:9000/userdata/bukhari/test.txt
# owner: hdpsysuser
# group: supergroup
getfacl: The ACL operation has been rejected. Support for ACLs has been disabled by setting dfs.namenode.acls.enabled to false.
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -mv hdfs://hdpmaster:9000/userdata/bukhari/test.txt hdfs://hdpmaster:9000/userdata/zeeshan/test.txt
Remove files
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -rm hdfs://hdpmaster:9000/userdata/zeeshan/mydata.txt
17/02/08 17:02:13 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted hdfs://hdpmaster:9000/userdata/zeeshan/mydata.txt
Delete a directory
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -rmdir hdfs://hdpmaster:9000/userdata/zeeshan
rmdir: `hdfs://hdpmaster:9000/userdata/zeeshan': Directory is not empty
Delete files and try again
Print statistics about the file/directory at <path> in the specified format
[hdpsysuser@hdpslave1 ~]$ hadoop fs -stat "%F %u:%g %b %y %n" hdfs://hdpmaster:9000/userdata/zeeshan/test.txt
regular file hdpsysuser:supergroup 34 2017-02-08 13:41:08 test.txt
regular file hdpsysuser:supergroup 34 2017-02-08 13:41:08 test.txt
Format accepts filesize in blocks (%b), type (%F), group name of owner (%g), name (%n), block size (%o), replication (%r), user name of owner(%u), and modification date (%y, %Y). %y shows UTC date as “yyyy-MM-dd HH:mm:ss” and %Y shows milliseconds since January 1, 1970 UTC. If the format is not specified, %y is used by default.
Displays last kilobyte of the file to stdout [hdpsysuser@hdpslave1 ~]$ hadoop fs -tail hdfs://hdpmaster:9000/userdata/zeeshan/test.txt
This is 1st line in test.txt file
The -f option will output appended data as the file grows, as in Unix
Create a file of zero length
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -touchz hdfs://hdpmaster:9000/userdata/bukhari/file1.txt
Download file from HDFS to local file system
[hdpclient@hadoopedge1 ~]$ hdfs dfs -get /hadoopedge1_data/test.txt /tmp/
File count in an HDFS directory
1-
[hdfs@te1-hdp-rp-en01 ~]$ hdfs dfs -count /flume/twitter
1 11172 293202863 /flume/twitter
2-
[hdfs@te1-hdp-rp-en01 ~]$ hadoop fs -count /flume/twitter
1 11174 293299721 /flume/twitter
3-
[hdfs@te1-hdp-rp-en01 ~]$ hadoop fs -count -q /flume/twitter
none inf none inf 1 11175 293332991 /flume/twitter
4-
for i in `hdfs dfs -ls -R <DIRECTORY_PATH> | awk '{print $8}'`; do echo $i ; hdfs dfs -cat $i | wc -l; done
[hdfs@te1-hdp-rp-en01 ~]$ for i in `hdfs dfs -ls -R /flume/twitter | awk '{print $8}'`; do echo $i ; hdfs dfs -cat $i | wc -l; done
/flume/twitter/FlumeData.1514276106140
12
/flume/twitter/FlumeData.1514276143686
11
It will recursively list the files in <DIRECTORY_PATH> and then print the number of lines in each file.
Check the locations of file blocks
hdfs fsck / -files -locations
Check the locations of file blocks containing rack information
hdfs fsck / -files -blocks -racks
Delete corrupted files
hdfs fsck -delete
Move corrupted files to /lost+found
hdfs fsck -move
List all the active TaskTrackers
mapred job -list-active-trackers
List all the running jobs
mapred job -list
List all the submitted jobs since the start of the cluster
mapred job -list all
Check the status of the default queue
mapred queue -list
Check the status of a queue ACL
hadoop queue -showacls
Show all the jobs in the default queue
hadoop queue -info default -showJobs
Check the status of a job
hadoop job -status job_201302152353_0001
Set the job job_201302152353_0001 to be on high priority
hadoop job -set-priority job_201302152353_0003 HIGH
Empty the trash
hdfs dfs -expunge
Back up block locations of the data on HDFS
hdfs fsck / -files -blocks -locations > dfs.block.locations.fsck.backup
Save the list of all files on the HDFS filesystem
hdfs dfs -ls -R / > dfs.namespace.lsr.backup
Dump Hadoop config
hdfs org.apache.hadoop.conf.Configuration
Return the help for an individual command
[hdpsysuser@hdpslave1 ~]$ hdfs dfs -usage mkdir
Usage: hadoop fs [generic options] -mkdir [-p] <path> ...
3 comments:
hi ,your post on hadoop hdfs helped to understand the basic commands and made easy to crack my interview thanks for your post do keep posting your blog Hadoop Training in Velachery | Hadoop Training .
Hadoop Training in Chennai | Hadoop .
Thank you for your guide to with upgrade information about Hadoop
Hadoop administration Online Training Hyderabad
Thank you for sharing your knowledge with me.In this blog very useful for me.
Thank you.Keep sharing....
hadoop administration online training
Post a Comment