Checkpoint Status on SNN
While monitoring Cloudera’s ecosystem I came across an unhealthy node pointing
to below issue which deals with Checkpointing.
“The filesystem
checkpoint is 16 hour(s), 40 minute(s) old. This is 1600.75% of the configured
checkpoint period of 1 hour(s). Critical threshold: 400.00%. 10,775
transactions have occurred since the last filesystem checkpoint. This is 1.08%
of the configured checkpoint transaction target of 1,000,000."
Checkpoint supposed to be fired
Filesystem Checkpoint Period<dfs.namenode.checkpoint.period> : 1 Hours
or
Filesystem Checkpoint Transaction Threshold<dfs.namenode.checkpoint.txns> 1,000,000
Checkpointing is triggered by one of two conditions: if enough time has
elapsed since the last checkpoint (dfs.namenode.checkpoint.period), or
if enough new edit log transactions have accumulated (dfs.namenode.checkpoint.txns).
The checkpointing node periodically checks if either of these conditions are
met (dfs.namenode.checkpoint.check.period), and if so, kicks off the
checkpointing process.
Now prior drilling down to the solution; let’s first understand what is
Checkpoint all about.
Checkpointing is critical part of maintaining and persisting filesystem
metadata in HDFS. It’s crucial for efficient NameNode recovery and restart, and
is an important indicator of cluster health.
NameNode’s
primary responsibility is storing the HDFS namespace. This means things like
the directory tree, file permissions, and the mapping of files to block IDs.
It’s important that this metadata (and all changes to it) are safely persisted
to stable storage for fault tolerance.
This
filesystem metadata is stored in two different constructs: the fsimage and the
edit log. The fsimage is a file that represents a point-in-time snapshot of the
filesystem’s metadata. However, while the fsimage file format is very efficient
to read, it’s unsuitable for making small incremental updates like renaming a
single file. Thus, rather than writing a new fsimage every time the namespace
is modified, the NameNode instead records the modifying operation in the edit
log for durability. This way, if the NameNode crashes, it can restore its state
by first loading the fsimage then replaying all the operations (also called
edits or transactions) in the edit log to catch up to the most recent state of
the namesystem. The edit log comprises a series of files, called edit log
segments, that together represent all the namesystem modifications made since
the creation of the fsimage.
Why is
Checkpointing Important?
A
typical edit ranges from 10s to 100s of bytes, but over time enough edits can
accumulate to become unwieldy. A couple of problems can arise from these large
edit logs. In extreme cases, it can fill up all the available disk capacity on
a node, but more subtly, a large edit log can substantially delay NameNode
startup as the NameNode reapplies all the edits. This is where checkpointing
comes in.
Checkpointing
is a process that takes an fsimage and edit log and compacts them into a new
fsimage. This way, instead of replaying a potentially unbounded edit log, the
NameNode can load the final in-memory state directly from the fsimage. This is
a far more efficient operation and reduces NameNode startup time.
Checkpointing
creates a new fsimage from an old fsimage and edit log.
However,
creating a new fsimage is an I/O- and CPU-intensive operation, sometimes taking
minutes to perform. During a checkpoint, the namesystem also needs to restrict
concurrent access from other users. So, rather than pausing the active NameNode
to perform a checkpoint, HDFS defers it to either the SecondaryNameNode or
Standby NameNode, depending on whether NameNode high-availability is
configured. The mechanics of checkpointing differs depending on if NameNode high-availability
is configured.
Back to
issue what is causing this and how can I get it to stop. There can be multiple
reasons for the issue; either there could be a failed or improper Upgrade
procedure or, it might result from the Secondary NameNode having an incorrect
${dfs.namenode.checkpoint.dir}/current/VERSION file. In the second scenario,
everything under the SNN's ${dfs.namenode.checkpoint.dir}
the directory needs to be wiped out and rebuilt again so that
checkpointing will work again
And for
me second seems to be more probable reason. As there were
a) Zero generation of Edit Logs.
b) No FS images at all.
c) No version file
Basically
these files are part of current directory under DFS.
/bigdata/dfs/snn/current;
here snn is secondary name node.
Files under
current directory looks like e.g.
- -rw-r--r--
1 hdfs hdfs 79098 Nov 20 23:01
edits_0000000000000086448-0000000000000087020
-rw-r--r-- 1
hdfs hdfs 297866 Nov 20 22:01
fsimage_0000000000000086447
-rw-r--r-- 1
hdfs hdfs 297866 Nov 20 22:01
fsimage_0000000000000086447
-rw-r--r-- 1
hdfs hdfs 62 Nov 20 22:01 fsimage_0000000000000086447.md5
rw-r--r-- 1
hdfs hdfs 172 Nov 20 23:01 VERSION
Resolution
Follow these
steps to resolve the issue:
1. Shutdown HDFS
service(s).
2. Log in to the
Secondary NameNode host
3. cd to the value of
${dfs.namenode.checkpoint.dir}: this you can find under configuration tab of
Cloudera Manager.
4. mv current
current.bad
5. Start up HDFS
service(s) only
6. Wait for HDFS
services to come online
7. Start the remaining
Hadoop Services
Make sure not to suppress the checkpoint alert as it is very critical
for building up healthy node.
Nice explanation..
ReplyDeletevery well explained, thanks Vineet !
ReplyDeletetrying with my name this time
ReplyDeleteThanks Shai :)
ReplyDeleteThanks Satish
ReplyDelete