SlideShare a Scribd company logo
1 of 15
© Cloudera, Inc. All rights reserved.
Hadoop Encryption
Wei-Chiu Chuang, Cloudera
© Cloudera, Inc. All rights reserved.
Why Encryption
• Information leaks affect 10s to 100s of millions of people
• Personally identifiable information (PII)
• Credit cards, SSNs, account logins
• Encryption would have prevented some of these leaks
• Encryption is a regulatory requirement for many business sectors
• Finance (PCI DSS)
• Government (Data Protection Directive)
• Healthcare (DPD, HIPAA)
© Cloudera, Inc. All rights reserved.
Related Technologies
Data In-Motion Encryption
• SSL/TLS
• Hadoop Data Transfer Encryption
• Hadoop RPC Encryption
Data At-Rest Encryption
• At Linux volume level
• Transparent Encryption at HDFS level
• HBase Column Family level
• Parquet Column level Encryption Xinli
© Cloudera, Inc. All rights reserved.
HDFS Transparent Encryption: In a Nutshell
HDFS
Namespace
/
/data /tmp
/data/1 /data/f2
Encryption
zone
Encryption
zone key Data Encryption
Key (per file)
© Cloudera, Inc. All rights reserved.
HDFS Transparent Encryption: In a Nutshell
Client
KMS
MS
NN
DN
NameNode
DataNode
Key
Management
Server
in EZ?
© Cloudera, Inc. All rights reserved.
Features
• Minor performance impact on HDFS reads and writes
• OpenSSL and AES-NI acceleration
• 7.5% for reads, ~0% for writes
• Key ACLs
• Warm-up/Caching (*)
• Key rollover
© Cloudera, Inc. All rights reserved.
Dev History
• First released in Hadoop 2.6.0/ CDH5.3 in 2014 December
• Many, many bug fixes and enhancements
• Functional bugs, failure handling bugs, scale bugs
• Stable after Hadoop 2.8 / CDH5.11-ish
© Cloudera, Inc. All rights reserved.
Lesson Learned
Scale-out is not easy to deploy
Security
Endurance, scale tests are essential
Too little emphasis on KMS as a performance bottleneck
FileSystem#getDelegationToken() API/integration
High throughput REST API layer is hard
© Cloudera, Inc. All rights reserved.
Status Quo
Among Cloudera’s customers (pre-merger):
• 14% Data Transfer Encryption
• 16% Data at Rest Encryption
• 19% RPC Encryption
• 44% Kerberized
Largest at-rest encryption cluster: ~1,000 nodes, > 50PB
© Cloudera, Inc. All rights reserved.
Troubleshooting
Performance anomaly
• Openssl-devel lib
• Entropy
• rng-tools
• Secure Random
• hadoop.security.secure.random.impl = org.apache.hadoop.crypto.random
.OpensslSecureRandom
Proxy user configuration
© Cloudera, Inc. All rights reserved.
Bad Practices
• No KMS HA
• KMS enabled, RPC encryption not enabled
• KMS enabled, but no Kerberos
• KMS w/o SSL
• Data transfer encryption is enabled, but using an unoptimized crypto algorithm
• 3DES, RC4, AES-NI
© Cloudera, Inc. All rights reserved.
Challenges
KMS Low Throughput
• NN can sustain > 100 thousand RPC ops/second
• namespace ops, block reports
• KMS: at most a few thousand RPC ops/second
• create, append, read, reencrypt
• 3-4 KMS servers not uncommon
• Jetty
• SSL Handshake
• Impala/Parquet with wide tables (> 100 columns)
© Cloudera, Inc. All rights reserved.
Future
Pluggable KMS ACL Framework (HADOOP-14951)
WebHDFS At Rest Encryption Support (HDFS-12355)
NFS Gateway At Rest Encryption Support (HDFS-13521)
Performance Improvements (HADOOP-15743, HADOOP-15811)
KMS Benchmark Tool (HADOOP-15967)
KMS over Hadoop RPC?
© Cloudera, Inc. All rights reserved.
●Current KMS Transport Layer
KMSClient
Jetty
http
client
Name
Node
http
client
REST API/HTTP
© Cloudera, Inc. All rights reserved.
KMS over Hadoop RPC
Benefit of KMS over Hadoop RPC:
• Proven performance
• Code reuse
KMSClient
Name
Node
Hadoop RPCHadoop
RPC
Hadoop
RPC
Hadoop
RPC

More Related Content

What's hot

HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red_Hat_Storage
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewMarcel Hergaarden
 
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFS
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFSCEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFS
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFSCeph Community
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redisDaeMyung Kang
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstackopenstackindia
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed_Hat_Storage
 
Building Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxBuilding Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxDataStax
 
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAY
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAYCEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAY
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAYCeph Community
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red_Hat_Storage
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017 Karan Singh
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed_Hat_Storage
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongCeph Community
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyCeph Community
 

What's hot (20)

HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Ceph as software define storage
Ceph as software define storageCeph as software define storage
Ceph as software define storage
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
Red Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) OverviewRed Hat Storage 2014 - Product(s) Overview
Red Hat Storage 2014 - Product(s) Overview
 
Redis vs Memcached
Redis vs MemcachedRedis vs Memcached
Redis vs Memcached
 
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFS
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFSCEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFS
CEPH DAY BERLIN - UNLIMITED FILESERVER WITH SAMBA CTDB AND CEPHFS
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
 
Red Hat Storage for Mere Mortals
Red Hat Storage for Mere MortalsRed Hat Storage for Mere Mortals
Red Hat Storage for Mere Mortals
 
Building Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxBuilding Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStax
 
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAY
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAYCEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAY
CEPH DAY BERLIN - CEPH MANAGEMENT THE EASY AND RELIABLE WAY
 
Dustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep DiveDustin Black - Red Hat Storage Server Administration Deep Dive
Dustin Black - Red Hat Storage Server Administration Deep Dive
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
Red Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep DiveRed Hat Storage Server Administration Deep Dive
Red Hat Storage Server Administration Deep Dive
 
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu YongUnlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
Unlock Bigdata Analytic Efficiency with Ceph Data Lake - Zhang Jian, Fu Yong
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 

Similar to Hadoop Meetup Jan 2019 - Hadoop Encryption

Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopCloudera, Inc.
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With HadoopRommel Garcia
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubDataWorks Summit
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014Cloudera, Inc.
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Big Data Spain
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Cloudera, Inc.
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFSDataWorks Summit
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of ViewKaran Alang
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003lee tracie
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoopWei-Chiu Chuang
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming ArchitecturesCloudera, Inc.
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataRommel Garcia
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big DataGreat Wide Open
 
Owasp Indy Q2 2012 Cheat Sheet Overview
Owasp Indy Q2 2012 Cheat Sheet OverviewOwasp Indy Q2 2012 Cheat Sheet Overview
Owasp Indy Q2 2012 Cheat Sheet Overviewowaspindy
 

Similar to Hadoop Meetup Jan 2019 - Hadoop Encryption (20)

Project Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for HadoopProject Rhino: Enhancing Data Protection for Hadoop
Project Rhino: Enhancing Data Protection for Hadoop
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With Hadoop
 
The Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data HubThe Future of Data Management - the Enterprise Data Hub
The Future of Data Management - the Enterprise Data Hub
 
The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014The Future of Hadoop Security - Hadoop Summit 2014
The Future of Hadoop Security - Hadoop Summit 2014
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Transparent Encryption in HDFS
Transparent Encryption in HDFSTransparent Encryption in HDFS
Transparent Encryption in HDFS
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Hadoop security implementationon 20171003
Hadoop security implementationon 20171003Hadoop security implementationon 20171003
Hadoop security implementationon 20171003
 
Security implementation on hadoop
Security implementation on hadoopSecurity implementation on hadoop
Security implementation on hadoop
 
End to End Streaming Architectures
End to End Streaming ArchitecturesEnd to End Streaming Architectures
End to End Streaming Architectures
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Owasp Indy Q2 2012 Cheat Sheet Overview
Owasp Indy Q2 2012 Cheat Sheet OverviewOwasp Indy Q2 2012 Cheat Sheet Overview
Owasp Indy Q2 2012 Cheat Sheet Overview
 
Big data security
Big data securityBig data security
Big data security
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Hadoop Meetup Jan 2019 - Hadoop Encryption

  • 1. © Cloudera, Inc. All rights reserved. Hadoop Encryption Wei-Chiu Chuang, Cloudera
  • 2. © Cloudera, Inc. All rights reserved. Why Encryption • Information leaks affect 10s to 100s of millions of people • Personally identifiable information (PII) • Credit cards, SSNs, account logins • Encryption would have prevented some of these leaks • Encryption is a regulatory requirement for many business sectors • Finance (PCI DSS) • Government (Data Protection Directive) • Healthcare (DPD, HIPAA)
  • 3. © Cloudera, Inc. All rights reserved. Related Technologies Data In-Motion Encryption • SSL/TLS • Hadoop Data Transfer Encryption • Hadoop RPC Encryption Data At-Rest Encryption • At Linux volume level • Transparent Encryption at HDFS level • HBase Column Family level • Parquet Column level Encryption Xinli
  • 4. © Cloudera, Inc. All rights reserved. HDFS Transparent Encryption: In a Nutshell HDFS Namespace / /data /tmp /data/1 /data/f2 Encryption zone Encryption zone key Data Encryption Key (per file)
  • 5. © Cloudera, Inc. All rights reserved. HDFS Transparent Encryption: In a Nutshell Client KMS MS NN DN NameNode DataNode Key Management Server in EZ?
  • 6. © Cloudera, Inc. All rights reserved. Features • Minor performance impact on HDFS reads and writes • OpenSSL and AES-NI acceleration • 7.5% for reads, ~0% for writes • Key ACLs • Warm-up/Caching (*) • Key rollover
  • 7. © Cloudera, Inc. All rights reserved. Dev History • First released in Hadoop 2.6.0/ CDH5.3 in 2014 December • Many, many bug fixes and enhancements • Functional bugs, failure handling bugs, scale bugs • Stable after Hadoop 2.8 / CDH5.11-ish
  • 8. © Cloudera, Inc. All rights reserved. Lesson Learned Scale-out is not easy to deploy Security Endurance, scale tests are essential Too little emphasis on KMS as a performance bottleneck FileSystem#getDelegationToken() API/integration High throughput REST API layer is hard
  • 9. © Cloudera, Inc. All rights reserved. Status Quo Among Cloudera’s customers (pre-merger): • 14% Data Transfer Encryption • 16% Data at Rest Encryption • 19% RPC Encryption • 44% Kerberized Largest at-rest encryption cluster: ~1,000 nodes, > 50PB
  • 10. © Cloudera, Inc. All rights reserved. Troubleshooting Performance anomaly • Openssl-devel lib • Entropy • rng-tools • Secure Random • hadoop.security.secure.random.impl = org.apache.hadoop.crypto.random .OpensslSecureRandom Proxy user configuration
  • 11. © Cloudera, Inc. All rights reserved. Bad Practices • No KMS HA • KMS enabled, RPC encryption not enabled • KMS enabled, but no Kerberos • KMS w/o SSL • Data transfer encryption is enabled, but using an unoptimized crypto algorithm • 3DES, RC4, AES-NI
  • 12. © Cloudera, Inc. All rights reserved. Challenges KMS Low Throughput • NN can sustain > 100 thousand RPC ops/second • namespace ops, block reports • KMS: at most a few thousand RPC ops/second • create, append, read, reencrypt • 3-4 KMS servers not uncommon • Jetty • SSL Handshake • Impala/Parquet with wide tables (> 100 columns)
  • 13. © Cloudera, Inc. All rights reserved. Future Pluggable KMS ACL Framework (HADOOP-14951) WebHDFS At Rest Encryption Support (HDFS-12355) NFS Gateway At Rest Encryption Support (HDFS-13521) Performance Improvements (HADOOP-15743, HADOOP-15811) KMS Benchmark Tool (HADOOP-15967) KMS over Hadoop RPC?
  • 14. © Cloudera, Inc. All rights reserved. ●Current KMS Transport Layer KMSClient Jetty http client Name Node http client REST API/HTTP
  • 15. © Cloudera, Inc. All rights reserved. KMS over Hadoop RPC Benefit of KMS over Hadoop RPC: • Proven performance • Code reuse KMSClient Name Node Hadoop RPCHadoop RPC Hadoop RPC Hadoop RPC

Editor's Notes

  1. Caching improves performance. But some users’ environment prohibit caching due to security concerns.
  2. KMS was designed to be horizontally scalable. However, because Cloudera recommend 2 KMS-HA and 2 Keytrustee Servers for production workload, the cost for HA is high.