Hadoop DP Notes: HDFS High Availability Architecture

In order to provide a HOT back-up and consistent solution for NameNode failure, a concept of using two NameNodes (one Active and one StandBy) was introduced. The below diagram describes the architecture of HDFS high availability.

In a cluster, two nodes can be configured as NameNodes. Each NameNode is assigned a role, either Active or StandBy. The Active NameNode handles the client requests in the cluster, and the Standby NameNode acts as a back-up node and maintains enough state to provide a consistent FS-Image during failure of Active NameNode.

In order of sync the state of the NameNodes, the edit logs from the Active NameNode needs to be shared to the StandBy NameNode. There are two state synchronization methods available with Hadoop, Quorum Journal Manager or using a Network File System.

The DataNodes send block location information and heartbeats to both the NameNodes. At any point in time, exactly one of the NameNodes should be in Active state, or if both the NameNodes are in Active state, then it’ll result in “split-brain scenario“. To avoid this scenario, an administrator should configure a fencing method.

If Active NameNode failure occurs, the StandBy NameNode state is changed to Active. This state transition from StandBy to Active is either manual or automatic. After successful transition, the client requests will be redirected to the new Active NameNode.

Hadoop DP Notes

Search

HDFS High Availability Architecture

Visitors