Sunday, November 20, 2016

Checkpoint Status on Name Node


Checkpoint Status on SNN

While monitoring Cloudera’s ecosystem I came across an unhealthy node pointing to below issue which deals with Checkpointing.

“The filesystem checkpoint is 16 hour(s), 40 minute(s) old. This is 1600.75% of the configured checkpoint period of 1 hour(s). Critical threshold: 400.00%. 10,775 transactions have occurred since the last filesystem checkpoint. This is 1.08% of the configured checkpoint transaction target of 1,000,000."

Checkpoint supposed to be fired 

Filesystem Checkpoint Period<dfs.namenode.checkpoint.period> : 1 Hours
or
Filesystem Checkpoint Transaction Threshold<dfs.namenode.checkpoint.txns> 1,000,000

Checkpointing is triggered by one of two conditions: if enough time has elapsed since the last checkpoint (dfs.namenode.checkpoint.period), or if enough new edit log transactions have accumulated (dfs.namenode.checkpoint.txns). The checkpointing node periodically checks if either of these conditions are met (dfs.namenode.checkpoint.check.period), and if so, kicks off the checkpointing process.

Now prior drilling down to the solution; let’s first understand what is Checkpoint all about.

Checkpointing is critical part of maintaining and persisting filesystem metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and is an important indicator of cluster health.
NameNode’s primary responsibility is storing the HDFS namespace. This means things like the directory tree, file permissions, and the mapping of files to block IDs. It’s important that this metadata (and all changes to it) are safely persisted to stable storage for fault tolerance.
This filesystem metadata is stored in two different constructs: the fsimage and the edit log. The fsimage is a file that represents a point-in-time snapshot of the filesystem’s metadata. However, while the fsimage file format is very efficient to read, it’s unsuitable for making small incremental updates like renaming a single file. Thus, rather than writing a new fsimage every time the namespace is modified, the NameNode instead records the modifying operation in the edit log for durability. This way, if the NameNode crashes, it can restore its state by first loading the fsimage then replaying all the operations (also called edits or transactions) in the edit log to catch up to the most recent state of the namesystem. The edit log comprises a series of files, called edit log segments, that together represent all the namesystem modifications made since the creation of the fsimage.
Why is Checkpointing Important?

A typical edit ranges from 10s to 100s of bytes, but over time enough edits can accumulate to become unwieldy. A couple of problems can arise from these large edit logs. In extreme cases, it can fill up all the available disk capacity on a node, but more subtly, a large edit log can substantially delay NameNode startup as the NameNode reapplies all the edits. This is where checkpointing comes in.
Checkpointing is a process that takes an fsimage and edit log and compacts them into a new fsimage. This way, instead of replaying a potentially unbounded edit log, the NameNode can load the final in-memory state directly from the fsimage. This is a far more efficient operation and reduces NameNode startup time.


Checkpointing creates a new fsimage from an old fsimage and edit log.


However, creating a new fsimage is an I/O- and CPU-intensive operation, sometimes taking minutes to perform. During a checkpoint, the namesystem also needs to restrict concurrent access from other users. So, rather than pausing the active NameNode to perform a checkpoint, HDFS defers it to either the SecondaryNameNode or Standby NameNode, depending on whether NameNode high-availability is configured. The mechanics of checkpointing differs depending on if NameNode high-availability is configured.
Back to issue what is causing this and how can I get it to stop. There can be multiple reasons for the issue; either there could be a failed or improper Upgrade procedure or, it might result from the Secondary NameNode having an incorrect ${dfs.namenode.checkpoint.dir}/current/VERSION file. In the second scenario, everything under the SNN's ${dfs.namenode.checkpoint.dir} the directory needs to be wiped out and rebuilt again so that checkpointing will work again
And for me second seems to be more probable reason. As there were
a) Zero generation of Edit Logs.
b) No FS images at all.
c) No version file

Basically these files are part of current directory under DFS.
/bigdata/dfs/snn/current; here snn is secondary name node.
Files under current directory looks like e.g.

- -rw-r--r-- 1 hdfs hdfs   79098 Nov 20 23:01 edits_0000000000000086448-0000000000000087020
-rw-r--r-- 1 hdfs hdfs  297866 Nov 20 22:01 fsimage_0000000000000086447
-rw-r--r-- 1 hdfs hdfs  297866 Nov 20 22:01 fsimage_0000000000000086447
-rw-r--r-- 1 hdfs hdfs      62 Nov 20 22:01 fsimage_0000000000000086447.md5
rw-r--r-- 1 hdfs hdfs     172 Nov 20 23:01 VERSION

Resolution

Follow these steps to resolve the issue:
1.   Shutdown HDFS service(s).
2.   Log in to the Secondary NameNode host
3.   cd to the value of ${dfs.namenode.checkpoint.dir}: this you can find under configuration tab of Cloudera Manager.
4.   mv current current.bad
5.   Start up HDFS service(s) only
6.   Wait for HDFS services to come online
7.   Start the remaining Hadoop Services

Make sure not to suppress the checkpoint alert as it is very critical for building up healthy node.


5 comments: