So einfach geht modernes Roaming fuer Notes und Nomad.pdf
Hadoop Security: Overview
1. Private Property: No Trespassing
Hadoop Security Explained
Aaron T. Myers
atm@cloudera.com
@atm
2. Who am I?
• Aaron T. Myers – Software Engineer, Cloudera
• Hadoop HDFS, Common Committer
• Masters thesis on security sandboxing in Linux kernel
• Primarily works on the Core Platform Team
3. Outline
• Hadoop Security Overview
• Hadoop Security pre CDH3
• Hadoop Security with CDH3
• Details of Deploying Secure Hadoop
• Summary
5. Why do we care about security?
• SecureCommerceWebSite, Inc has a product that has both
paid ads and search
• “Payment Fraud” team needs logs of all credit card
payments
• “Search Quality” team needs all search logs and click
history
• “Ads Fraud” team needs to access both search logs and
payment info
• So we can't segregate these datasets to different clusters
• If they can share a cluster, we also get better utilization!
6. Security pre CDH3: User Authentication
• Authentication is by vigorous assertion
• Trivial to impersonate other user:
• Just set property “hadoop.job.ugi” when
running job or command
• Group resolution is done client side
8. Security pre CDH3: HDFS
• Unix-like file permissions were introduced in
Hadoop v16.1
• Provides standard user/group/other r/w/x
• Protects well-meaning users from accidents
• Does nothing to prevent malicious users from
causing harm (weak authentication)
9. Security pre CDH3: Job Control
• ACLs per job queue for job submission / killing
• No ACLs for viewing counters / logs
• Does nothing to prevent malicious users from
causing harm (weak authentication)
10. Security pre CDH3: Tasks
• Individual tasks all run as the same user
• Whoever the TT is running as (usually 'hadoop')
• Tasks not isolated from each other
• Tasks which read/write from local storage can
interfere with each other
• Malicious tasks can kill each other
• Hadoop is designed to execute arbitrary code
12. Security with CDH3: User Authentication
• Authentication is secured by Kerberos v5
• RPC connections secured with SASL “GSSAPI”
mechanism
• Provides proven, strong authentication and
single-sign-on
• Hadoop servers can ensure that users are who
they say they are
• Group resolution is done on the server side
13. Security with CDH3: Server Authentication
• Kerberos authentication is bi-directional
• Users can be sure that they are communicating
with the Hadoop server they think they are
14. Security with CDH3: HDFS
• Same general permissions model
• Added sticky bit for directories (e.g. /tmp)
• But, a user can no longer trivially impersonate
other users (strong authentication)
15. Security with CDH3: Job Control
• A job now has its own ACLs, including a view ACL
• Job can now specify who can view logs, counters,
configuration, and who can modify (kill) it
• JT enforces these ACLs (strong authentication)
16. Security with CDH3: Tasks
• Tasks now run as the user who launched the job
• Probably the most complex part of Hadoop's
security implementation
• Ensures isolation of tasks which run on the same TT
• Local file permissions enforced
• Local system permissions enforced (e.g. signals)
• Can take advantage of per-user system limits
• e.g. Linux ulimits
17. Security with CDH3: Web Interfaces
• Out of the box Kerberized SSL support
• Pluggable servlet filters (more on this later)
18. Security with CDH3: Threat Model
• The Hadoop security system assumes that:
• Users do not have root access to cluster
machines
• Users do not have root access to shared user
machines (e.g. bastion box)
• Users cannot read or inject packets on the
network
21. Requirements: Kerberos Infrastructure
• Kerberos domain (KDC)
• eg. MIT Krb5 in RHEL, or MS Active Directory
• Kerberos principals (SPNs) for every daemon
• hdfs/hostname@REALM for DN, NN, 2NN
• mapred/hostname@REALM for TT and JT
• host/hostname@REALM for web UIs
• Keytabs for service principals distributed to
correct hosts
22. Configuring daemons for security
• Most daemons have two configs:
• Keytab location (eg dfs.datanode.keytab.file)
• Kerberos principal (eg dfs.datanode.kerberos.principal)
• Principal can use the special token '_HOST' to substitute
hostname of the daemon (eg 'hdfs/_HOST@MYREALM')
• Several other configs to enable security in the first place
• See example-confs/conf.secure in CDH3
23. Setting up users
• Each user must have a Kerberos principal
• May want some shared accounts:
• sharedaccount/alice and sharedaccount/bob
principals both act as sharedaccount on HDFS - you
can use this!
• hdfs/alice is also useful for alice to act as a superuser
• Users running MR jobs must also have unix accounts on
each of the slaves
• Centralized user database (eg LDAP) is a practical
necessity
24. Installing Secure Hadoop
• MapReduce and HDFS services should run as
separate users (e.g. 'hdfs' and 'mapred')
• New task-controller setuid executable allows
tasks to run as a user
• New JNI code in libhadoop.so to plug subtle
security holes
• Install CDH3 with hadoop-0.20-sbin and hadoop-
0.20-native packages to get this all set up
25. Securing higher-level services
• Many “middle tier” applications need to act on
behalf of their clients when interacting with
Hadoop
• e.g: Oozie, Hive Server, Hue/Beeswax
• “Proxy User” feature provides secure
impersonation (think sudo).
• hadoop.proxyuser.oozie.hosts - IPs where
“oozie” may act as an impersonator
• hadoop.proxyuser.oozie.groups - groups whose
users “oozie” may impersonate
26. Customizing Security
• Current plug-in points:
• hadoop.http.filter.initializers - may configure a
custom ServletFilter to integrate with existing
enterprise web SSO
• hadoop.security.group.mapping - map a
kerberos principal (alice@FOOCORP.COM) to a
set of groups
(users,engstaff,searchquality,adsdata)
• hadoop.security.auth_to_local - regex
mappings of Kerberos principals to usernames
27. Deployment Gotchas
• MIT Kerberos 1.8.1 (in Ubuntu, RHEL 5.6+)
incompatible with Java Krb5 implementation
• Run “kinit -R” after kinit to work around
• Enable allow_weak_crypto in /etc/krb5.conf -
necessary for kerberized SSL
• Must deploy “unlimited security policy JAR” in
JAVA_HOME/jre/lib/security
• Lifesaver: HADOOP_OPTS=
”-Dsun.security.krb5.debug=true” hadoop ...
28. Best Practices for AD Integration
• MIT Kerberos realm inside cluster:
• CLUSTER.FOOCORP.COM
• Existing Active Directory domain:
• FOOCORP.COM or maybe AD.FOOCORP.COM
• Set up one-way cross-realm trust
• Cluster realm must trust corporate AD realm
• See “Step by Step Guide to Kerberos 5
Interoperability” in Windows Server docs
30. What Hadoop Security Is
• Strong authentication
• Malicious impersonation now impossible
• Better authorization
• More control over who can view/control jobs
• Ensure isolation between running tasks
• An ongoing development priority
31. What Hadoop Security Is Not
• Encryption on the wire
• Encryption on disk
• Protection against DOS attacks
• Enabled by default
32. Security Beyond Core Hadoop
• Comprehensive documentation and best
practices
• https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide
• All components of CDH3 are capable of
interacting with a secure Hadoop cluster
• Hive 0.7 (included in CDH3) added a rich set of
access controls
• Much easier deployment if you use Cloudera
Enterprise
33. Security Roadmap
• Pluggable “edge authentication” (eg PKI, SAML)
• More authorization features across CDH
components
• e.g. HBase access controls
• Data encryption support