Computing memory parameters for Namenode
Source: https://discuss.pivotal.io/hc/en-us/articles/203272527-Namenode-failed-while-loading-fsimage-with-GC-overhead-limit-exceeded
Namenode failed while loading fsimage with GC overhead limit exceeded
Problem
During startup namenode failed to load fsimage into memory
2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file /data/hadoop/nn/dfs/name/current/fsimage_0000000000252211550 using no compression 2014-05-14 17:36:56,806 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 29486731 2014-05-14 17:54:40,401 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:2271) at java.util.zip.ZipCoder.getBytes(ZipCoder.java:89)
Cause
Every file, directory and block in HDFS is stored as an object in namenode and occupies around 150 bytes of memory. 150 is not a fixed number, but is generally used as a rule of thumb. In this particular instance there were around:
- 29+ million files, out of which around 25+ million files were generated during namenode stress benchmarking test (nnbench).
- Size of fsimage file was 2.7 GB which averages to around ~90-100kb memory occupied in namenode memory per file.
- Heap Size was default 1GB.
Due to a lower value of Heap Size and higher amount of fsimage size to be loaded in memory, Namenode Garbage Collector process was spending too much time to reclaim memory causing GC overhead limit errors.
Fix
Step 1: Identify an approximate value for Xmx / Xms parameters using the below formula. Example:
[root@hdm1 current]# sudo -u hdfs hdfs oiv -p XML -printToScreen -i fsimage_0000000000252211550 -o /tmp/a | egrep "BLOCK_ID|INODE_ID" | wc -l | awk '{printf "Objects=%d : Suggested Xms=%0dm Xmx=%0dm\n", $1, (($1 / 1000000 )*1024), (($1 / 1000000 )*1024)}'
Example output with 29 million records Objects=29000000 : Suggested Xms=29696m Xmx=29696m
Step 2: Edit HADOOP_NAMENODE_OPTS parameter in /etc/gphd/hadoop/conf/hadoop-env.sh on namenode to have suggested Xmx and Xms values.
- It enabled to load image file in ~ 150 seconds for the cluster on which we had this issue.
Step 3: Start namenode
service hadoop-hdfs-namenode start
Best Practices
- Once Namenode is started, delete obsolete / unnecessary files. In this case, files created during namenode benchmarking test were deleted with skipTrash option.
- Perform a saveNamespace operation to save the namespace into storage directory(s) and reset the name-node journal (edits file) after deleting files. It will also reduce name-node startup time, because edits do not need to be digest. Or you can leave it for the checkpoint process to take care.
- In situations where GC overlimit timeout issues occur we can disable this gc overlimit checks with java opt “-XX:-UseGCOverheadLimit” and consider temporarily increasing the xmx heapsize for extreme cases of very high transactions.