Search

What is Difference between Secondary namenode, Checkpoint namenode & backupnod Secondary Namenode, a poorly named component of hadoop.

inode


Files and directories are represented on the NameNode by inodes. Inodes record attributes like permissions, modification and access times, namespace and disk space quotas


NameNode


The NameNode stores the metadata of the HDFS. The state of HDFS is stored in a file called fsimage and is the base of the metadata. During the runtime modifications are just written to a log file called edits. On the next start-up of the NameNode the state is read from fsimage, the changes from edits are applied to that and the new state is written back to fsimage. After this edits is cleared and contains is now ready for new log entries.


Secondary Namenode
Secondary namenode is solution for this issue. This is another machine having connectivity with namenode. It periodically copies FSImage and Editlog from name node and merged FSImage with log file. Moved back to updated FSImage file to Namenode. Secondary Namenode is not supposed to provide High Availability Namenode. Highlevel task performed by secondry namenode is


1.    Received edit logs from the namenode and merged to fsimage


2.    Copies back updated FSImage to namenode


3.    Updated FSImage will reduce the startup time


Secondary Namenode whole purpose is to have a checkpoint in HDFS.


Backup Node


The Backup Node in hadoop is an extended checkpoint node that performs checkpointing and also support online streaming of file system edits.
The advantage over the checkpoint node is that the namespace presents in it’s main memory is always in sync with primary name node FS since it maintain an In memory up to date


Checkpoint Node
In Checkpoint Node checkpoints are created on their local FS by downloading FSImages and EditLogs files from active primary Namenode and merge these two files and new image is saved in their Local FS.
So checkpoint creation in backup node will always be faster than checkpointnode.

HDFS High Availability Architecture



In order to provide a HOT back-up and consistent solution for NameNode failure, a concept of using two NameNodes (one Active and one StandBy) was introduced. The below diagram describes the architecture of HDFS high availability.





In a cluster, two nodes can be configured as NameNodes. Each NameNode is assigned a role, either Active or StandBy. The Active NameNode handles the client requests in the cluster, and the Standby NameNode acts as a back-up node and maintains enough state to provide a consistent FS-Image during failure of Active NameNode.

In order of sync the state of the NameNodes, the edit logs from the Active NameNode needs to be shared to the StandBy NameNode. There are two state synchronization methods available with Hadoop, Quorum Journal Manager or using a Network File System.

The DataNodes send block location information and heartbeats to both the NameNodes.  At any point in time, exactly one of the NameNodes should be in Active state, or if both the NameNodes are in Active state, then it’ll result in “split-brain scenario“. To avoid this scenario, an administrator should configure a fencing method.

If Active NameNode failure occurs, the StandBy NameNode state is changed to Active. This state transition from StandBy to Active is either manual or automatic. After successful transition, the client requests will be redirected to the new Active NameNode.