SlideShare a Scribd company logo
1 of 25
Download to read offline
Secure Hadoop Application
Ecosystem
Boston Application Security
Conference
Oct 3 2015
Google Trends – Big Data Big Data Job Trends 2
3
Hadoop EcosystemFlumeSqoop
ZooKeeper
HBase
Hive
Pig
MapReduce
Spark
YARN – Resource Manager
HDFS – Distributed File System
Kafka
Storm
4
Why
• Hadoop is a storage/processing infrastructure
– Whether Big Data is hype or not
• Fits well for lot of use cases
• Inherent distributed storage/processing
– Provides scalability at a relatively low cost
• There is lot of backing
– IBM, Microsoft, Amazon, Google, Intel …
• Various distributions and companies
5
Hadoop Distributed File System
FileA
FileB
FileC
H1:blk0, H2:blk1
H3:blk0,H1:blk1
H2:blk0;H3:blk1
HDFS Directory
Master Host (NN)
DISK
Local File System File
FileA0
FileB1
Inode-x
Inode-y
Local FS Directory
Host 1
FileA1
FileC0
Inode-a
Inode-n
Local FS Directory
Host 2
FileB0
FileC1
Inode-r
Inode-c
Local FS Directory
Host 3
In-x
In-y
In-a
In-n
In-r
In-c
DISK
DISK
DISK
Files created
are of size
equal to the
HDFS blksize
6
HDFS - Write Flow
Client
Namespace
MetaData
Blockmap
(Fsimage
Edit files)
Name Node
Data Node Data Node Data Node
1
2
3
4
5
6 6
77
8
1. Client requests to open a file to write through fs.create() call. This will overwrite existing file.
2. Name node responds with a lease to the file path
3. Client writes to local and when data reaches block size, requests Name Node for write
4. Name Node responds with a new blockid and the destination data nodes for write and replication
5. Client sends the first data node the data and the checksum generated on the data to be written
6. First data node writes the data and checksum and in parallel pipelines the replications to other DN
7. Each data node where the data is replicated responds back with success /failure to the first DN
8. First data node in turn informs to the Name node that the write request for the block is complete
which in turn will update its block map
Note: There can be only one write at a time on a file
7
HDFS - Read Flow
Client
Namespace
MetaData
Blockmap
(Fsimage
Edit files)
Name Node
Data Node Data Node Data Node
1
2
3
4
5 6
1. Client requests to open a file to read through fs.open() call
2. Name node responds with a lease to the file path
3. Client requests for read the data in the file
4. Name Node responds with block ids in sequence and the corresponding data nodes
5. Client reaches out directly to the DNs for each block of data in the file
6. When DNs sends back data along with check sum, client performs a checksum verification by
generating a checksum
7. If the checksum verification fails client reaches out to other DNs where the re is a replication
7
8
Authorization
• POSIX model for file and directory permissions
– Associated with an owner and a group
– Permission for owner, group and others
– r for read, w for append to files
– r for listing files, w for delete/create files in dirs
– x to access child directories
– Sticky bit on dirs prevents deletions by others
9
Kerberos
10
TGS
AS
KDB
KDC
1
Create Principal
User
2 - kinit
3 – Receive TGT
4 – Request Service Ticket
Service
5 – Receive Service Ticket
For service principals Keytabs are used
Secure HDFS Cluster - Authentication
Master
Namenode
Slave
Datanode
Slave
Datanode
Slave
Datanode
KDC
Keytab Keytab Keytab
Keytab
11
Secure HDFS - Client Authentication
Namenode
Slave
Datanode
Slave
Datanode
Slave
Datanode
KDC
HDFS Client
KRB Token 1
Deleg Token
2
3
Block Tokens
Deleg Token
Key
Key Key Key
4
12
Authentication Configuration
• Set up Kerberos infrastructure
– It may be already available through AD
• Define service principals
• Create Keytabs for service principals
– E.g. HDFS, YARN
• Copy keytabs to the master and slave nodes
• Update site.xml files
• Restart the services
13
HDFS Data Encryption
HDFS
Client
Key Mgmt
Server
Key
Trusty
Namenode
Datenode
1 - EZ
2 – EZ Key
2 - Create
EZ EDEK
3
EDEK
4 – R/W
5
14
YARN
15
Resource
Manager
Node
Manager
Node
Manager
Node
Manager
Keytab Keytab Keytab
Keytab
Client submits
MapRed Job
App Master Container Container
Controlling Resource Usage
• Schedulers
– Fair
– Capacity
• Queues defined to use percentage of resource
– Hierarchy with in queues
• Users and groups attached to groups
– Administer
– Submit
16
YARN Queue
17
Root 100%
Sec 70%
sadmin, suser
Adhoc 30%
Aadmin, auser
Hadoop Cluster - Secure Perimeter
Master
Slave Slave Slave
IPS/IDS/Firewall
IPS/IDS/Firewall
Clients
DMZ/Separate Network
18
HDFS Services & Ports
HDFS Service Port
Name Node 8020
Name Node UI 50070
Secondary Name Node UI 50090
Data Node 50020
Data Node UI 50075
Journal Node 8480, 8485
HttpFS 14000, 14001
19
Principle of Least Priviledge
• hdfs-site xml
– dfs.permissions.superusergroup
– dfs.cluster.administrators
• core-site.xml
– Hadoop.security.authorization to true
• hadoop-policy.xml
– security.client.protocol.acl
– security.client.datanode.protocol.acl
– security.get.user.mappings.protocol.acl
20
Application Code Change
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://NN:PORT/user/hbase");
conf.set("hadoop.security.authentication", "Kerberos");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("ubuntu/hostname@REALM", ”ubuntu.keytab");
FileSystem fs = FileSystem.get(conf);
21
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://NN:PORT/user/hbase");
conf.set("hadoop.security.authentication", "Kerberos");
FileSystem fs = FileSystem.get(conf);
Unsecure Hadoop
Secure Hadoop
Key Takeaways
• New infrastructure will be part of enterprises
– May not be as big as the hype
• Adherence to application security principles
– Complexity and maturity may be a roadblock
• Constant follow-up on latest developments
22
References & Acknowledgements
• Hadoop Security
– https://issues.apache.org/jira/browse/HADOOP-4487
– Hadoop Project – Securing Hadoop Page
• HDFS Encryption
– https://issues.apache.org/jira/browse/HDFS-6134
– Hadoop Project Transparent Encryption Page
– http://www.slideshare.net/Hadoop_Summit/transparent-encryption-in-hdfs
• Hadoop service level authorization
• YARN
– Fair Scheduler
– Capacity Scheduler
• Hadoop Security Book
23
Thank You!!
24
bnair@asquareb.com
blog.asquareb.com
https://github.com/bijugs
@gsbiju
http://www.slideshare.net/bijugs

More Related Content

What's hot

Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: OverviewCloudera, Inc.
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosSarvesh Meena
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environmentsDataWorks Summit
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyDataWorks Summit
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessCloudera, Inc.
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesBolke de Bruin
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyAnurag Shrivastava
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayDataWorks Summit
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop EcosystemDataWorks Summit
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop securitybigdatagurus_meetup
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...DataWorks Summit
 

What's hot (20)

Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Hadoop Security: Overview
Hadoop Security: OverviewHadoop Security: Overview
Hadoop Security: Overview
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using KerberosHadoop ClusterClient Security Using Kerberos
Hadoop ClusterClient Security Using Kerberos
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Hadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster AccessHadoop Operations: How to Secure and Control Cluster Access
Hadoop Operations: How to Secure and Control Cluster Access
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Nl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenchesNl HUG 2016 Feb Hadoop security from the trenches
Nl HUG 2016 Feb Hadoop security from the trenches
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Hadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox GatewayHadoop REST API Security with Apache Knox Gateway
Hadoop REST API Security with Apache Knox Gateway
 
Securing the Hadoop Ecosystem
Securing the Hadoop EcosystemSecuring the Hadoop Ecosystem
Securing the Hadoop Ecosystem
 
Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
 

Viewers also liked

NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaBiju Nair
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsBiju Nair
 
Chef patterns
Chef patternsChef patterns
Chef patternsBiju Nair
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk ManagementBiju Nair
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance ImprovementBiju Nair
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload managementBiju Nair
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceBiju Nair
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
Managing Websphere Application Server certificates
Managing Websphere Application Server certificatesManaging Websphere Application Server certificates
Managing Websphere Application Server certificatesPiyush Chordia
 
It was just Open Source - TEDx Novara
It was just Open Source - TEDx NovaraIt was just Open Source - TEDx Novara
It was just Open Source - TEDx NovaraFabio Mora
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Futuretcloudcomputing-tw
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBlue Coat
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterEdureka!
 
WebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
WebSphere Message Broker Training | IBM WebSphere Message Broker Online TrainingWebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
WebSphere Message Broker Training | IBM WebSphere Message Broker Online Trainingecorptraining2
 
GSM/UMTS network architecture tutorial (Indonesia)
GSM/UMTS network architecture tutorial (Indonesia)GSM/UMTS network architecture tutorial (Indonesia)
GSM/UMTS network architecture tutorial (Indonesia)ejlp12
 

Viewers also liked (20)

NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezza
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentals
 
Chef patterns
Chef patternsChef patterns
Chef patterns
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk Management
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
Managing Websphere Application Server certificates
Managing Websphere Application Server certificatesManaging Websphere Application Server certificates
Managing Websphere Application Server certificates
 
It was just Open Source - TEDx Novara
It was just Open Source - TEDx NovaraIt was just Open Source - TEDx Novara
It was just Open Source - TEDx Novara
 
THE BIG PRINT - Showing the Action
THE BIG PRINT - Showing the ActionTHE BIG PRINT - Showing the Action
THE BIG PRINT - Showing the Action
 
Hadoop Security Now and Future
Hadoop Security Now and FutureHadoop Security Now and Future
Hadoop Security Now and Future
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
Big Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat ProtectionBig Data Security Intelligence and Analytics for Advanced Threat Protection
Big Data Security Intelligence and Analytics for Advanced Threat Protection
 
Advanced Security In Hadoop Cluster
Advanced Security In Hadoop ClusterAdvanced Security In Hadoop Cluster
Advanced Security In Hadoop Cluster
 
WebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
WebSphere Message Broker Training | IBM WebSphere Message Broker Online TrainingWebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
WebSphere Message Broker Training | IBM WebSphere Message Broker Online Training
 
GSM/UMTS network architecture tutorial (Indonesia)
GSM/UMTS network architecture tutorial (Indonesia)GSM/UMTS network architecture tutorial (Indonesia)
GSM/UMTS network architecture tutorial (Indonesia)
 

Similar to Hadoop security

Similar to Hadoop security (20)

Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Aziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jhaAziksa hadoop architecture santosh jha
Aziksa hadoop architecture santosh jha
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
Hadoop
HadoopHadoop
Hadoop
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-research
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
 
HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hdfs
HdfsHdfs
Hdfs
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 

More from Biju Nair

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleBiju Nair
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And OperationsBiju Nair
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka ReferenceBiju Nair
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBaseBiju Nair
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalBiju Nair
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixBiju Nair
 

More from Biju Nair (6)

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scale
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And Operations
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka Reference
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBase
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-final
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache Phoenix
 

Recently uploaded

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Recently uploaded (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Hadoop security

  • 1. Secure Hadoop Application Ecosystem Boston Application Security Conference Oct 3 2015
  • 2. Google Trends – Big Data Big Data Job Trends 2
  • 3. 3
  • 4. Hadoop EcosystemFlumeSqoop ZooKeeper HBase Hive Pig MapReduce Spark YARN – Resource Manager HDFS – Distributed File System Kafka Storm 4
  • 5. Why • Hadoop is a storage/processing infrastructure – Whether Big Data is hype or not • Fits well for lot of use cases • Inherent distributed storage/processing – Provides scalability at a relatively low cost • There is lot of backing – IBM, Microsoft, Amazon, Google, Intel … • Various distributions and companies 5
  • 6. Hadoop Distributed File System FileA FileB FileC H1:blk0, H2:blk1 H3:blk0,H1:blk1 H2:blk0;H3:blk1 HDFS Directory Master Host (NN) DISK Local File System File FileA0 FileB1 Inode-x Inode-y Local FS Directory Host 1 FileA1 FileC0 Inode-a Inode-n Local FS Directory Host 2 FileB0 FileC1 Inode-r Inode-c Local FS Directory Host 3 In-x In-y In-a In-n In-r In-c DISK DISK DISK Files created are of size equal to the HDFS blksize 6
  • 7. HDFS - Write Flow Client Namespace MetaData Blockmap (Fsimage Edit files) Name Node Data Node Data Node Data Node 1 2 3 4 5 6 6 77 8 1. Client requests to open a file to write through fs.create() call. This will overwrite existing file. 2. Name node responds with a lease to the file path 3. Client writes to local and when data reaches block size, requests Name Node for write 4. Name Node responds with a new blockid and the destination data nodes for write and replication 5. Client sends the first data node the data and the checksum generated on the data to be written 6. First data node writes the data and checksum and in parallel pipelines the replications to other DN 7. Each data node where the data is replicated responds back with success /failure to the first DN 8. First data node in turn informs to the Name node that the write request for the block is complete which in turn will update its block map Note: There can be only one write at a time on a file 7
  • 8. HDFS - Read Flow Client Namespace MetaData Blockmap (Fsimage Edit files) Name Node Data Node Data Node Data Node 1 2 3 4 5 6 1. Client requests to open a file to read through fs.open() call 2. Name node responds with a lease to the file path 3. Client requests for read the data in the file 4. Name Node responds with block ids in sequence and the corresponding data nodes 5. Client reaches out directly to the DNs for each block of data in the file 6. When DNs sends back data along with check sum, client performs a checksum verification by generating a checksum 7. If the checksum verification fails client reaches out to other DNs where the re is a replication 7 8
  • 9. Authorization • POSIX model for file and directory permissions – Associated with an owner and a group – Permission for owner, group and others – r for read, w for append to files – r for listing files, w for delete/create files in dirs – x to access child directories – Sticky bit on dirs prevents deletions by others 9
  • 10. Kerberos 10 TGS AS KDB KDC 1 Create Principal User 2 - kinit 3 – Receive TGT 4 – Request Service Ticket Service 5 – Receive Service Ticket For service principals Keytabs are used
  • 11. Secure HDFS Cluster - Authentication Master Namenode Slave Datanode Slave Datanode Slave Datanode KDC Keytab Keytab Keytab Keytab 11
  • 12. Secure HDFS - Client Authentication Namenode Slave Datanode Slave Datanode Slave Datanode KDC HDFS Client KRB Token 1 Deleg Token 2 3 Block Tokens Deleg Token Key Key Key Key 4 12
  • 13. Authentication Configuration • Set up Kerberos infrastructure – It may be already available through AD • Define service principals • Create Keytabs for service principals – E.g. HDFS, YARN • Copy keytabs to the master and slave nodes • Update site.xml files • Restart the services 13
  • 14. HDFS Data Encryption HDFS Client Key Mgmt Server Key Trusty Namenode Datenode 1 - EZ 2 – EZ Key 2 - Create EZ EDEK 3 EDEK 4 – R/W 5 14
  • 16. Controlling Resource Usage • Schedulers – Fair – Capacity • Queues defined to use percentage of resource – Hierarchy with in queues • Users and groups attached to groups – Administer – Submit 16
  • 17. YARN Queue 17 Root 100% Sec 70% sadmin, suser Adhoc 30% Aadmin, auser
  • 18. Hadoop Cluster - Secure Perimeter Master Slave Slave Slave IPS/IDS/Firewall IPS/IDS/Firewall Clients DMZ/Separate Network 18
  • 19. HDFS Services & Ports HDFS Service Port Name Node 8020 Name Node UI 50070 Secondary Name Node UI 50090 Data Node 50020 Data Node UI 50075 Journal Node 8480, 8485 HttpFS 14000, 14001 19
  • 20. Principle of Least Priviledge • hdfs-site xml – dfs.permissions.superusergroup – dfs.cluster.administrators • core-site.xml – Hadoop.security.authorization to true • hadoop-policy.xml – security.client.protocol.acl – security.client.datanode.protocol.acl – security.get.user.mappings.protocol.acl 20
  • 21. Application Code Change Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://NN:PORT/user/hbase"); conf.set("hadoop.security.authentication", "Kerberos"); UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab("ubuntu/hostname@REALM", ”ubuntu.keytab"); FileSystem fs = FileSystem.get(conf); 21 Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://NN:PORT/user/hbase"); conf.set("hadoop.security.authentication", "Kerberos"); FileSystem fs = FileSystem.get(conf); Unsecure Hadoop Secure Hadoop
  • 22. Key Takeaways • New infrastructure will be part of enterprises – May not be as big as the hype • Adherence to application security principles – Complexity and maturity may be a roadblock • Constant follow-up on latest developments 22
  • 23. References & Acknowledgements • Hadoop Security – https://issues.apache.org/jira/browse/HADOOP-4487 – Hadoop Project – Securing Hadoop Page • HDFS Encryption – https://issues.apache.org/jira/browse/HDFS-6134 – Hadoop Project Transparent Encryption Page – http://www.slideshare.net/Hadoop_Summit/transparent-encryption-in-hdfs • Hadoop service level authorization • YARN – Fair Scheduler – Capacity Scheduler • Hadoop Security Book 23