Pages

Thursday, November 09, 2017

Diagnostics: Fix Under replicated blocks [Ambari Dashboard]


I see below in Ambari dashboard under HSDS Summary.







Get the full details of the files which are causing your problem

[hdfs@te1-hdp-rp-nn01 root]$ hdfs fsck / -files -blocks -locations

...
...
user/zeppelin/.sparkStaging/application_1503219907931_0037/pyspark.zip 455033 bytes, 1 block(s):  OK
0. BP-1135333773-192.168.44.133-1501079051032:blk_1073843531_102745 len=455033 repl=3 [DatanodeInfoWithStorage[192.168.44.137:50010,DS-6cee9ae8-5113-40df-bd7c-982974892993,DISK], DatanodeInfoWithStorage[192.168.44.135:50010,DS-e9e9bfd5-f4b5-4829-b6d6-b54065b25275,DISK], DatanodeInfoWithStorage[192.168.44.136:50010,DS-5a3944bc-417f-4fc9-8f51-189aba424bc0,DISK]]

/user/zeppelin/.sparkStaging/application_1503219907931_0037/sparkr.zip 682117 bytes, 1 block(s):  OK
0. BP-1135333773-192.168.44.133-1501079051032:blk_1073843533_102747 len=682117 repl=3 [DatanodeInfoWithStorage[192.168.44.137:50010,DS-6cee9ae8-5113-40df-bd7c-982974892993,DISK], DatanodeInfoWithStorage[192.168.44.135:50010,DS-e9e9bfd5-f4b5-4829-b6d6-b54065b25275,DISK], DatanodeInfoWithStorage[192.168.44.136:50010,DS-5a3944bc-417f-4fc9-8f51-189aba424bc0,DISK]]

/user/zeppelin/test <dir>
Status: HEALTHY
 Total size:    3093146787 B (Total open files size: 742 B)
 Total dirs:    469
 Total files:   22098
 Total symlinks:                0 (Files currently being written: 6)
 Total blocks (validated):      22095 (avg. block size 139993 B) (Total open file blocks (not validated): 5)
 Minimally replicated blocks:   22095 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       9 (0.040733196 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              63 (0.09495388 %)
 Number of data-nodes:          3
 Number of racks:               1
FSCK ended at Thu Nov 09 12:14:57 AST 2017 in 1186 milliseconds


Regarding under replicated blocks, HDFS is suppose to recover them automatically (by creating missing copies to fulfill the replication factor). If after a few days it doesn't, you can trigger the recovery by running the below.


[hdfs@te1-hdp-rp-nn01 root]$ hdfs fsck / | grep 'Under replicated'
Connecting to namenode via http://te1-hdp-rp-nn01:50070/fsck?ugi=hdfs&path=%2F
/user/hive/.staging/job_1501079109321_0010/job.jar:  Under replicated BP-1135333773-192.168.44.133-1501079051032:blk_1073741932_1108. Target Replicas is 10 but found 3 live replica(s), 0 decommissioned replica(s) and 0 decommissioning replica(s).


[hdfs@te1-hdp-rp-nn01 root]$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}'
Connecting to namenode via http://te1-hdp-rp-nn01:50070/fsck?ugi=hdfs&path=%2F
/user/hive/.staging/job_1501079109321_0010/job.jar
/user/hive/.staging/job_1501079109321_0010/job.split
/user/hive/.staging/job_1501079109321_0010/libjars/hive-hcatalog-core.jar
/user/hive/.staging/job_1501079109321_0011/job.jar
/user/hive/.staging/job_1501079109321_0011/job.split
/user/hive/.staging/job_1501079109321_0011/libjars/hive-hcatalog-core.jar
/user/hive/.staging/job_1501079109321_0018/job.jar
/user/hive/.staging/job_1501079109321_0018/job.split
/user/hive/.staging/job_1501079109321_0018/libjars/hive-hcatalog-core.jar


[hdfs@te1-hdp-rp-nn01 root]$ hdfs fsck / | grep 'Under replicated' | awk -F':' '{print $1}' >> /tmp/under_replicated_files

[hdfs@te1-hdp-rp-nn01 root]$ for hdfsfile in `cat /tmp/under_replicated_files`; do echo "Fixing $hdfsfile :" ;  hadoop fs -setrep 3 $hdfsfile; done

Fixing /user/hive/.staging/job_1501079109321_0010/job.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0010/job.jar
Fixing /user/hive/.staging/job_1501079109321_0010/job.split :
Replication 3 set: /user/hive/.staging/job_1501079109321_0010/job.split
Fixing /user/hive/.staging/job_1501079109321_0010/libjars/hive-hcatalog-core.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0010/libjars/hive-hcatalog-core.jar
Fixing /user/hive/.staging/job_1501079109321_0011/job.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0011/job.jar
Fixing /user/hive/.staging/job_1501079109321_0011/job.split :
Replication 3 set: /user/hive/.staging/job_1501079109321_0011/job.split
Fixing /user/hive/.staging/job_1501079109321_0011/libjars/hive-hcatalog-core.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0011/libjars/hive-hcatalog-core.jar
Fixing /user/hive/.staging/job_1501079109321_0018/job.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0018/job.jar
Fixing /user/hive/.staging/job_1501079109321_0018/job.split :
Replication 3 set: /user/hive/.staging/job_1501079109321_0018/job.split
Fixing /user/hive/.staging/job_1501079109321_0018/libjars/hive-hcatalog-core.jar :
Replication 3 set: /user/hive/.staging/job_1501079109321_0018/libjars/hive-hcatalog-core.jar

No comments:

Post a Comment