Search

What is Difference between Secondary namenode, Checkpoint namenode & backupnod Secondary Namenode, a poorly named component of hadoop.

inode


Files and directories are represented on the NameNode by inodes. Inodes record attributes like permissions, modification and access times, namespace and disk space quotas


NameNode


The NameNode stores the metadata of the HDFS. The state of HDFS is stored in a file called fsimage and is the base of the metadata. During the runtime modifications are just written to a log file called edits. On the next start-up of the NameNode the state is read from fsimage, the changes from edits are applied to that and the new state is written back to fsimage. After this edits is cleared and contains is now ready for new log entries.


Secondary Namenode
Secondary namenode is solution for this issue. This is another machine having connectivity with namenode. It periodically copies FSImage and Editlog from name node and merged FSImage with log file. Moved back to updated FSImage file to Namenode. Secondary Namenode is not supposed to provide High Availability Namenode. Highlevel task performed by secondry namenode is


1.    Received edit logs from the namenode and merged to fsimage


2.    Copies back updated FSImage to namenode


3.    Updated FSImage will reduce the startup time


Secondary Namenode whole purpose is to have a checkpoint in HDFS.


Backup Node


The Backup Node in hadoop is an extended checkpoint node that performs checkpointing and also support online streaming of file system edits.
The advantage over the checkpoint node is that the namespace presents in it’s main memory is always in sync with primary name node FS since it maintain an In memory up to date


Checkpoint Node
In Checkpoint Node checkpoints are created on their local FS by downloading FSImages and EditLogs files from active primary Namenode and merge these two files and new image is saved in their Local FS.
So checkpoint creation in backup node will always be faster than checkpointnode.