In order
to provide a HOT back-up and consistent solution for NameNode failure, a
concept of using two NameNodes (one Active and one StandBy) was
introduced. The below diagram describes the architecture of HDFS high
availability.
In a cluster, two nodes can be configured as NameNodes. Each
NameNode is assigned a role, either Active or StandBy. The Active NameNode
handles the client requests in the cluster, and the Standby NameNode acts as a
back-up node and maintains enough state to provide a consistent FS-Image during
failure of Active NameNode.
In order of sync the state of the NameNodes, the edit logs from the
Active NameNode needs to be shared to the StandBy NameNode. There are two state
synchronization methods available with Hadoop, Quorum Journal Manager or using
a Network File System.
The DataNodes send block location information and heartbeats to
both the NameNodes. At any point in time, exactly one of the NameNodes
should be in Active state, or if both the NameNodes are in Active state, then
it’ll result in “split-brain scenario“. To avoid this scenario, an
administrator should configure a fencing method.
If Active NameNode failure occurs, the StandBy NameNode state is
changed to Active. This state transition from StandBy to Active is either
manual or automatic. After successful transition, the client requests will be
redirected to the new Active NameNode.