Search

A New Approach to Security

In 2009 discussion about Hadoop security reached a boiling point. Security was made a high priority. The Hadoop developers’ 2010 goals included strong mutual authentication of users and services that would be transparent to end users. In addition to the changes of Hadoop core a new workflow manager, Oozie, was introduced.
The developers chose to use the Simple Authentication and Security Layer (SASL) with Ker-beros, via GSSAPI, to authenticate users to the edge services. When a user connects to a Job Tracker that connection is mutually authenticated using Kerberos. Operating system prin-ciples are matched to a set of user and group access control lists maintained in flat configura-tion files.
In order to improve performance and ensure the KDC is not a bottleneck the developers chose to use a number of tokens for communication secured with an RPC Digest scheme. The new Hadoop security design makes use of Delegation Tokens, Job Tokens and Block Access Tokens. Each of these tokens is similar in structure and based on HMAC-SHA1. Del-egation Tokens are used for clients to communicate with the Name Node in order to gain access to HDFS data. Block Access Tokens are used to secure communication between the Name Node and Data Nodes and to enforce HDFS filesystem permissions. The Job Token is used to secure communication between the MapReduce engine Task Tracker and individual tasks. It is important to note that this scheme uses symmetric encryption and depending upon the token type the shared key may be distributed to hundreds or even thousands of hosts.
At the same time the new Kerberos and RPC Digest security mechanisms were unveiled the Hadoop developers at Yahoo open sourced a new workflow manager called Oozie. Oozie al-lows users to streamline the submission and management of MapReduce jobs. In order for Oozie to perform its function it has been designated a superuser and can perform actions on behalf of any Hadoop user. Authentication to Oozie has not been implemented. There is a pluggable authentication interface for Oozie but there are no public authentication mechan-isms ready to plug in. Anyone planning to make use of Oozie will need to develop their own authentication mechanism. According to the Hadoop Security Design whitepaper, the Ha-doop developers considered writing an authentication plugin based on SPNEGO, to support browser based Kerberos authentication, but the limitations of Jetty 6 and uneven browser support dissuaded them from this effort. In subsequent presentations by Hadoop developers the need for a default authentication plugin, with a preference for SPNEGO, has been dis-cussed.
In order to meet their development schedule and maintain backwards compatibility with previous versions of Hadoop the developers made several compromises. The new design re-quires that end users cannot have administrative rights on any machines in the cluster. If end users had administrative access to cluster machines they could discover Delegation Tokens, Job Tokens, Block Access Tokens or symmetric encryption keys and subvert the security guarantees of the system. In developing the new security features it was decided that these features must not impact GridMix performance more than 3%. This decision guided the de-velopers toward the use of symmetric encryption algorithms and did not encourage the use of secure network transports.