Search

Hadoop Risks

When Hadoop development began in 2004 no effort was expended on creating a secure dis-tributed computing environment. The Hadoop framework performed insufficient authenti-cation and authorization of both users and services. The insufficient authentication and au-thorization of users allowed any user to impersonate any other user. The framework did not perform mutual authentication and this would allow a malicious network user to impersonate cluster services. The Hadoop File System’s (HDFS) lax authorization allowed anyone to write data and any data to be read. Deploying a secure Hadoop cluster was essentially impossible.
As a result of the insufficient authentication and authorization performed by both HDFS and the MapReduce engine any user could impersonate any other user. Arbitrary java code could be submitted to Job Trackers to be executed as the Job Tracker user account. HDFS file permissions were easily circumvented. The framework did not perform mutual authentica-tion and this allowed malicious network users to impersonate cluster services. If a malicious user could discover a data block’s ID the data could be read. Write access was essentially not limited.
The only way to securely deploy Hadoop was to enforce strict network segregation. In this scenario any user given access to the cluster was trusted absolutely.