SlideShare a Scribd company logo
1 of 34
Private Property: No Trespassing
   Hadoop Security Explained




        Aaron T. Myers
      atm@cloudera.com
            @atm
Who am I?




• Aaron T. Myers – Software Engineer, Cloudera
• Hadoop HDFS, Common Committer
• Masters thesis on security sandboxing in Linux kernel
• Primarily works on the Core Platform Team
Outline

• Hadoop Security Overview
 • Hadoop Security pre CDH3
 • Hadoop Security with CDH3
• Details of Deploying Secure Hadoop
• Summary
Hadoop Security: Overview
Why do we care about security?
• SecureCommerceWebSite, Inc has a product that has both
   paid ads and search

• “Payment Fraud” team needs logs of all credit card
   payments

• “Search Quality” team needs all search logs and click
   history

• “Ads Fraud” team needs to access both search logs and
   payment info
  •   So we can't segregate these datasets to different clusters

• If they can share a cluster, we also get better utilization!
Security pre CDH3: User Authentication



• Authentication is by vigorous assertion
• Trivial to impersonate other user:
 • Just set property “hadoop.job.ugi” when
    running job or command
• Group resolution is done client side
Security pre CDH3: Server Authentication




                None
Security pre CDH3: HDFS


• Unix-like file permissions were introduced in
  Hadoop v16.1
• Provides standard user/group/other r/w/x
• Protects well-meaning users from accidents
• Does nothing to prevent malicious users from
  causing harm (weak authentication)
Security pre CDH3: Job Control



• ACLs per job queue for job submission / killing
• No ACLs for viewing counters / logs
• Does nothing to prevent malicious users from
  causing harm (weak authentication)
Security pre CDH3: Tasks


• Individual tasks all run as the same user
 • Whoever the TT is running as (usually 'hadoop')
• Tasks not isolated from each other
 • Tasks which read/write from local storage can
    interfere with each other
 • Malicious tasks can kill each other
• Hadoop is designed to execute arbitrary code
Security pre CDH3: Web interfaces




             None
Security with CDH3: User Authentication

• Authentication is secured by Kerberos v5
 • RPC connections secured with SASL “GSSAPI”
    mechanism
  • Provides proven, strong authentication and
    single-sign-on
• Hadoop servers can ensure that users are who
  they say they are
• Group resolution is done on the server side
Security with CDH3: Server Authentication




• Kerberos authentication is bi-directional
• Users can be sure that they are communicating
  with the Hadoop server they think they are
Security with CDH3: HDFS




• Same general permissions model
 • Added sticky bit for directories (e.g. /tmp)
• But, a user can no longer trivially impersonate
  other users (strong authentication)
Security with CDH3: Job Control



• A job now has its own ACLs, including a view ACL
• Job can now specify who can view logs, counters,
  configuration, and who can modify (kill) it
• JT enforces these ACLs (strong authentication)
Security with CDH3: Tasks

• Tasks now run as the user who launched the job
 • Probably the most complex part of Hadoop's
    security implementation
• Ensures isolation of tasks which run on the same TT
 • Local file permissions enforced
 • Local system permissions enforced (e.g. signals)
• Can take advantage of per-user system limits
 • e.g. Linux ulimits
Security with CDH3: Web Interfaces



• Out of the box Kerberized SSL support
• Pluggable servlet filters (more on this later)
Security with CDH3: Threat Model


• The Hadoop security system assumes that:
 • Users do not have root access to cluster
    machines
 • Users do not have root access to shared user
    machines (e.g. bastion box)
 • Users cannot read or inject packets on the
    network
Thanks, Yahoo!




Yahoo! did the vast majority of the
   core Hadoop security work
Hadoop Security:
Deployment Details
Requirements: Kerberos Infrastructure

• Kerberos domain (KDC)
 • eg. MIT Krb5 in RHEL, or MS Active Directory
• Kerberos principals (SPNs) for every daemon
 • hdfs/hostname@REALM for DN, NN, 2NN
 • mapred/hostname@REALM for TT and JT
 • host/hostname@REALM for web UIs
• Keytabs for service principals distributed to
  correct hosts
Configuring daemons for security

• Most daemons have two configs:
 • Keytab location (eg dfs.datanode.keytab.file)
 • Kerberos principal (eg dfs.datanode.kerberos.principal)
• Principal can use the special token '_HOST' to substitute
  hostname of the daemon (eg 'hdfs/_HOST@MYREALM')

• Several other configs to enable security in the first place
 • See example-confs/conf.secure in CDH3
Setting up users
• Each user must have a Kerberos principal
• May want some shared accounts:
 • sharedaccount/alice and sharedaccount/bob
    principals both act as sharedaccount on HDFS - you
    can use this!

  • hdfs/alice is also useful for alice to act as a superuser
• Users running MR jobs must also have unix accounts on
  each of the slaves

• Centralized user database (eg LDAP) is a practical
  necessity
Installing Secure Hadoop

• MapReduce and HDFS services should run as
  separate users (e.g. 'hdfs' and 'mapred')
• New task-controller setuid executable allows
  tasks to run as a user
• New JNI code in libhadoop.so to plug subtle
  security holes
• Install CDH3 with hadoop-0.20-sbin and hadoop-
  0.20-native packages to get this all set up
Securing higher-level services
• Many “middle tier” applications need to act on
  behalf of their clients when interacting with
  Hadoop
  • e.g: Oozie, Hive Server, Hue/Beeswax
• “Proxy User” feature provides secure
  impersonation (think sudo).
  • hadoop.proxyuser.oozie.hosts - IPs where
    “oozie” may act as an impersonator
  • hadoop.proxyuser.oozie.groups - groups whose
    users “oozie” may impersonate
Customizing Security

• Current plug-in points:
 • hadoop.http.filter.initializers - may configure a
    custom ServletFilter to integrate with existing
    enterprise web SSO
  • hadoop.security.group.mapping - map a
    kerberos principal (alice@FOOCORP.COM) to a
    set of groups
    (users,engstaff,searchquality,adsdata)
  • hadoop.security.auth_to_local - regex
    mappings of Kerberos principals to usernames
Deployment Gotchas

• MIT Kerberos 1.8.1 (in Ubuntu, RHEL 5.6+)
  incompatible with Java Krb5 implementation
  • Run “kinit -R” after kinit to work around
• Enable allow_weak_crypto in /etc/krb5.conf -
  necessary for kerberized SSL
• Must deploy “unlimited security policy JAR” in
  JAVA_HOME/jre/lib/security
• Lifesaver: HADOOP_OPTS=
    ”-Dsun.security.krb5.debug=true” hadoop ...
Best Practices for AD Integration

• MIT Kerberos realm inside cluster:
 • CLUSTER.FOOCORP.COM
• Existing Active Directory domain:
 • FOOCORP.COM or maybe AD.FOOCORP.COM
• Set up one-way cross-realm trust
 • Cluster realm must trust corporate AD realm
 • See “Step by Step Guide to Kerberos 5
    Interoperability” in Windows Server docs
Hadoop Security:
   Summary
What Hadoop Security Is


• Strong authentication
 • Malicious impersonation now impossible
• Better authorization
 • More control over who can view/control jobs
• Ensure isolation between running tasks
• An ongoing development priority
What Hadoop Security Is Not



• Encryption on the wire
• Encryption on disk
• Protection against DOS attacks
• Enabled by default
Security Beyond Core Hadoop

• Comprehensive documentation and best
  practices
  •   https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide

• All components of CDH3 are capable of
  interacting with a secure Hadoop cluster
• Hive 0.7 (included in CDH3) added a rich set of
  access controls
• Much easier deployment if you use Cloudera
  Enterprise
Security Roadmap


• Pluggable “edge authentication” (eg PKI, SAML)
• More authorization features across CDH
  components
 • e.g. HBase access controls
• Data encryption support
Questions?



  Aaron T. Myers
atm@cloudera.com
      @atm

More Related Content

What's hot

Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyAnurag Shrivastava
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Cloudera, Inc.
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revJason Shih
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with HadoopCloudera, Inc.
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetupnvvrajesh
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaCaserta
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overviewTushar Dudhatra
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Cloudera, Inc.
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 

What's hot (20)

Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
Hadoop Security, Cloudera - Todd Lipcon and Aaron Myers - Hadoop World 2010
 
Hadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117revHadoop security overview_hit2012_1117rev
Hadoop security overview_hit2012_1117rev
 
Big Data Security with Hadoop
Big Data Security with HadoopBig Data Security with Hadoop
Big Data Security with Hadoop
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Hadoop Security
Hadoop SecurityHadoop Security
Hadoop Security
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
April 2014 HUG : Apache Sentry
April 2014 HUG : Apache SentryApril 2014 HUG : Apache Sentry
April 2014 HUG : Apache Sentry
 
Apache ranger meetup
Apache ranger meetupApache ranger meetup
Apache ranger meetup
 
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by ClouderaBig Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
Big Data Warehousing Meetup: Securing the Hadoop Ecosystem by Cloudera
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
Comprehensive Security for the Enterprise III: Protecting Data at Rest and In...
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 

Similar to Hadoop Security: Overview

Secure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platformSecure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platformRemus Rusanu
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Neeraj Shrimali
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextHellmar Becker
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityChris Nauroth
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCloudIDSummit
 
Road to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoopsRoad to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoopsGianluca Varisco
 
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...Vincent Giersch
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Hellmar Becker
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopCloudera, Inc.
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSpark Summit
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark ApplicationsCloudera, Inc.
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudSalman Baset
 
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityTokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityPhil Estes
 
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)Codemotion
 

Similar to Hadoop Security: Overview (20)

Secure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platformSecure Hadoop clusters on Windows platform
Secure Hadoop clusters on Windows platform
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup. Linux container, namespaces & CGroup.
Linux container, namespaces & CGroup.
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the EnterpriseDeploying Big-Data-as-a-Service (BDaaS) in the Enterprise
Deploying Big-Data-as-a-Service (BDaaS) in the Enterprise
 
Head in the clouds
Head in the cloudsHead in the clouds
Head in the clouds
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding EdgeCIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
 
Road to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoopsRoad to Opscon (Pisa '15) - DevOoops
Road to Opscon (Pisa '15) - DevOoops
 
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
UKC - Feb 2013 - Analyzing the security of Windows 7 and Linux for cloud comp...
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo VanzinSecuring Spark Applications by Kostas Sakellis and Marcelo Vanzin
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
 
Securing Your Apache Spark Applications
Securing Your Apache Spark ApplicationsSecuring Your Apache Spark Applications
Securing Your Apache Spark Applications
 
Unraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production CloudUnraveling Docker Security: Lessons From a Production Cloud
Unraveling Docker Security: Lessons From a Production Cloud
 
Tokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker SecurityTokyo OpenStack Summit 2015: Unraveling Docker Security
Tokyo OpenStack Summit 2015: Unraveling Docker Security
 
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Recently uploaded (20)

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Hadoop Security: Overview

  • 1. Private Property: No Trespassing Hadoop Security Explained Aaron T. Myers atm@cloudera.com @atm
  • 2. Who am I? • Aaron T. Myers – Software Engineer, Cloudera • Hadoop HDFS, Common Committer • Masters thesis on security sandboxing in Linux kernel • Primarily works on the Core Platform Team
  • 3. Outline • Hadoop Security Overview • Hadoop Security pre CDH3 • Hadoop Security with CDH3 • Details of Deploying Secure Hadoop • Summary
  • 5. Why do we care about security? • SecureCommerceWebSite, Inc has a product that has both paid ads and search • “Payment Fraud” team needs logs of all credit card payments • “Search Quality” team needs all search logs and click history • “Ads Fraud” team needs to access both search logs and payment info • So we can't segregate these datasets to different clusters • If they can share a cluster, we also get better utilization!
  • 6. Security pre CDH3: User Authentication • Authentication is by vigorous assertion • Trivial to impersonate other user: • Just set property “hadoop.job.ugi” when running job or command • Group resolution is done client side
  • 7. Security pre CDH3: Server Authentication None
  • 8. Security pre CDH3: HDFS • Unix-like file permissions were introduced in Hadoop v16.1 • Provides standard user/group/other r/w/x • Protects well-meaning users from accidents • Does nothing to prevent malicious users from causing harm (weak authentication)
  • 9. Security pre CDH3: Job Control • ACLs per job queue for job submission / killing • No ACLs for viewing counters / logs • Does nothing to prevent malicious users from causing harm (weak authentication)
  • 10. Security pre CDH3: Tasks • Individual tasks all run as the same user • Whoever the TT is running as (usually 'hadoop') • Tasks not isolated from each other • Tasks which read/write from local storage can interfere with each other • Malicious tasks can kill each other • Hadoop is designed to execute arbitrary code
  • 11. Security pre CDH3: Web interfaces None
  • 12. Security with CDH3: User Authentication • Authentication is secured by Kerberos v5 • RPC connections secured with SASL “GSSAPI” mechanism • Provides proven, strong authentication and single-sign-on • Hadoop servers can ensure that users are who they say they are • Group resolution is done on the server side
  • 13. Security with CDH3: Server Authentication • Kerberos authentication is bi-directional • Users can be sure that they are communicating with the Hadoop server they think they are
  • 14. Security with CDH3: HDFS • Same general permissions model • Added sticky bit for directories (e.g. /tmp) • But, a user can no longer trivially impersonate other users (strong authentication)
  • 15. Security with CDH3: Job Control • A job now has its own ACLs, including a view ACL • Job can now specify who can view logs, counters, configuration, and who can modify (kill) it • JT enforces these ACLs (strong authentication)
  • 16. Security with CDH3: Tasks • Tasks now run as the user who launched the job • Probably the most complex part of Hadoop's security implementation • Ensures isolation of tasks which run on the same TT • Local file permissions enforced • Local system permissions enforced (e.g. signals) • Can take advantage of per-user system limits • e.g. Linux ulimits
  • 17. Security with CDH3: Web Interfaces • Out of the box Kerberized SSL support • Pluggable servlet filters (more on this later)
  • 18. Security with CDH3: Threat Model • The Hadoop security system assumes that: • Users do not have root access to cluster machines • Users do not have root access to shared user machines (e.g. bastion box) • Users cannot read or inject packets on the network
  • 19. Thanks, Yahoo! Yahoo! did the vast majority of the core Hadoop security work
  • 21. Requirements: Kerberos Infrastructure • Kerberos domain (KDC) • eg. MIT Krb5 in RHEL, or MS Active Directory • Kerberos principals (SPNs) for every daemon • hdfs/hostname@REALM for DN, NN, 2NN • mapred/hostname@REALM for TT and JT • host/hostname@REALM for web UIs • Keytabs for service principals distributed to correct hosts
  • 22. Configuring daemons for security • Most daemons have two configs: • Keytab location (eg dfs.datanode.keytab.file) • Kerberos principal (eg dfs.datanode.kerberos.principal) • Principal can use the special token '_HOST' to substitute hostname of the daemon (eg 'hdfs/_HOST@MYREALM') • Several other configs to enable security in the first place • See example-confs/conf.secure in CDH3
  • 23. Setting up users • Each user must have a Kerberos principal • May want some shared accounts: • sharedaccount/alice and sharedaccount/bob principals both act as sharedaccount on HDFS - you can use this! • hdfs/alice is also useful for alice to act as a superuser • Users running MR jobs must also have unix accounts on each of the slaves • Centralized user database (eg LDAP) is a practical necessity
  • 24. Installing Secure Hadoop • MapReduce and HDFS services should run as separate users (e.g. 'hdfs' and 'mapred') • New task-controller setuid executable allows tasks to run as a user • New JNI code in libhadoop.so to plug subtle security holes • Install CDH3 with hadoop-0.20-sbin and hadoop- 0.20-native packages to get this all set up
  • 25. Securing higher-level services • Many “middle tier” applications need to act on behalf of their clients when interacting with Hadoop • e.g: Oozie, Hive Server, Hue/Beeswax • “Proxy User” feature provides secure impersonation (think sudo). • hadoop.proxyuser.oozie.hosts - IPs where “oozie” may act as an impersonator • hadoop.proxyuser.oozie.groups - groups whose users “oozie” may impersonate
  • 26. Customizing Security • Current plug-in points: • hadoop.http.filter.initializers - may configure a custom ServletFilter to integrate with existing enterprise web SSO • hadoop.security.group.mapping - map a kerberos principal (alice@FOOCORP.COM) to a set of groups (users,engstaff,searchquality,adsdata) • hadoop.security.auth_to_local - regex mappings of Kerberos principals to usernames
  • 27. Deployment Gotchas • MIT Kerberos 1.8.1 (in Ubuntu, RHEL 5.6+) incompatible with Java Krb5 implementation • Run “kinit -R” after kinit to work around • Enable allow_weak_crypto in /etc/krb5.conf - necessary for kerberized SSL • Must deploy “unlimited security policy JAR” in JAVA_HOME/jre/lib/security • Lifesaver: HADOOP_OPTS= ”-Dsun.security.krb5.debug=true” hadoop ...
  • 28. Best Practices for AD Integration • MIT Kerberos realm inside cluster: • CLUSTER.FOOCORP.COM • Existing Active Directory domain: • FOOCORP.COM or maybe AD.FOOCORP.COM • Set up one-way cross-realm trust • Cluster realm must trust corporate AD realm • See “Step by Step Guide to Kerberos 5 Interoperability” in Windows Server docs
  • 29. Hadoop Security: Summary
  • 30. What Hadoop Security Is • Strong authentication • Malicious impersonation now impossible • Better authorization • More control over who can view/control jobs • Ensure isolation between running tasks • An ongoing development priority
  • 31. What Hadoop Security Is Not • Encryption on the wire • Encryption on disk • Protection against DOS attacks • Enabled by default
  • 32. Security Beyond Core Hadoop • Comprehensive documentation and best practices • https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide • All components of CDH3 are capable of interacting with a secure Hadoop cluster • Hive 0.7 (included in CDH3) added a rich set of access controls • Much easier deployment if you use Cloudera Enterprise
  • 33. Security Roadmap • Pluggable “edge authentication” (eg PKI, SAML) • More authorization features across CDH components • e.g. HBase access controls • Data encryption support
  • 34. Questions? Aaron T. Myers atm@cloudera.com @atm