Submit Search
Upload
Evolving HDFS to a Generalized Storage Subsystem
•
Download as PPTX, PDF
•
9 likes
•
1,491 views
DataWorks Summit/Hadoop Summit
Follow
Evolving HDFS to a Generalized Storage Subsystem
Read less
Read more
Technology
Report
Share
Report
Share
1 of 22
Download now
Recommended
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
Spark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Recommended
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
DataWorks Summit/Hadoop Summit
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
DataWorks Summit/Hadoop Summit
End-to-End Security and Auditing in a Big Data as a Service Deployment
End-to-End Security and Auditing in a Big Data as a Service Deployment
DataWorks Summit/Hadoop Summit
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
DataWorks Summit/Hadoop Summit
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
Spark Uber Development Kit
Spark Uber Development Kit
DataWorks Summit/Hadoop Summit
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
DataWorks Summit
What's new in Ambari
What's new in Ambari
DataWorks Summit
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
DataWorks Summit/Hadoop Summit
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
Curb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
DataWorks Summit
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data
DataWorks Summit/Hadoop Summit
YARN Federation
YARN Federation
DataWorks Summit/Hadoop Summit
More Related Content
What's hot
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
DataWorks Summit/Hadoop Summit
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
DataWorks Summit
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
DataWorks Summit
What's new in Ambari
What's new in Ambari
DataWorks Summit
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DataWorks Summit
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
DataWorks Summit/Hadoop Summit
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
DataWorks Summit/Hadoop Summit
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
Curb your insecurity with HDP
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
DataWorks Summit
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
DataWorks Summit/Hadoop Summit
What's hot
(20)
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
What's new in Ambari
What's new in Ambari
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
DeathStar: Easy, Dynamic, Multi-Tenant HBase via YARN
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
Empower Data-Driven Organizations with HPE and Hadoop
Empower Data-Driven Organizations with HPE and Hadoop
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
Curb your insecurity with HDP
Curb your insecurity with HDP
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
Viewers also liked
LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data
DataWorks Summit/Hadoop Summit
YARN Federation
YARN Federation
DataWorks Summit/Hadoop Summit
Building a Data Lake on AWS
Building a Data Lake on AWS
Amazon Web Services
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
DataWorks Summit/Hadoop Summit
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
Lucidworks
Apache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
Yahoo Developer Network
NLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-Text
DataWorks Summit/Hadoop Summit
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
DataWorks Summit/Hadoop Summit
Data Preparation of Data Science
Data Preparation of Data Science
DataWorks Summit/Hadoop Summit
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
DataWorks Summit/Hadoop Summit
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
DataWorks Summit/Hadoop Summit
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Hortonworks
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFS
DataWorks Summit
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Precisely
Big Data Ready Enterprise
Big Data Ready Enterprise
DataWorks Summit/Hadoop Summit
Viewers also liked
(20)
LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data
YARN Federation
YARN Federation
Building a Data Lake on AWS
Building a Data Lake on AWS
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
Apache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
August 2013 HUG: Removing the NameNode's memory limitation
August 2013 HUG: Removing the NameNode's memory limitation
NLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-Text
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
Data Preparation of Data Science
Data Preparation of Data Science
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Lambda-less Stream Processing @Scale in LinkedIn
Lambda-less Stream Processing @Scale in LinkedIn
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
Ozone: An Object Store in HDFS
Ozone: An Object Store in HDFS
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Big Data Ready Enterprise
Big Data Ready Enterprise
Similar to Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
DataWorks Summit
Ozone and HDFS's Evolution
Ozone and HDFS's Evolution
DataWorks Summit
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
DataWorks Summit
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
DataWorks Summit
HDFS Federation++
HDFS Federation++
Hortonworks
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
HDFS- What is New and Future
HDFS- What is New and Future
DataWorks Summit
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
DataWorks Summit
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
Democratizing Memory Storage
Democratizing Memory Storage
DataWorks Summit
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
DataWorks Summit/Hadoop Summit
Big data spain keynote nov 2016
Big data spain keynote nov 2016
alanfgates
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
Similar to Evolving HDFS to a Generalized Storage Subsystem
(20)
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
Ozone and HDFS's Evolution
Ozone and HDFS's Evolution
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
HDFS Federation++
HDFS Federation++
Hadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
HDFS- What is New and Future
HDFS- What is New and Future
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
Democratizing Memory Storage
Democratizing Memory Storage
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
Big data spain keynote nov 2016
Big data spain keynote nov 2016
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
More from DataWorks Summit/Hadoop Summit
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
Hadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
Apache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
Dataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
More from DataWorks Summit/Hadoop Summit
(20)
Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Hadoop Crash Course
Hadoop Crash Course
Data Science Crash Course
Data Science Crash Course
Apache Spark Crash Course
Apache Spark Crash Course
Dataflow with Apache NiFi
Dataflow with Apache NiFi
Schema Registry - Set you Data Free
Schema Registry - Set you Data Free
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
HBase in Practice
HBase in Practice
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Recently uploaded
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
comworks
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Rizwan Syed
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Lorenzo Miniero
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Florian Wilhelm
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
RankYa
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
NavinnSomaal
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Safe Software
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
SeasiaInfotech2
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Miki Katsuragi
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
Recently uploaded
(20)
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Evolving HDFS to a Generalized Storage Subsystem
1.
Evolving HDFS to
a Generalized Storage Subsystem Sanjay Radia Chief Architect, Founder, Hortonworks
2.
2 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved © Hortonworks Inc. 2013 - Confidential Hello, my name is Sanjay Radia Chief Architect, Founder, Hortonworks Part of the original Hadoop team at Yahoo! since 2007 –Chief Architect of Hadoop Core at Yahoo! –Apache Hadoop PMC and Committer Prior –Data center automation, virtualization, Java, HA, OSs, File Systems – Startup, Sun Microsystems, Inria … –Ph.D., University of Waterloo Page 2 Architecting the Future of Big Data
3.
3 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Overview HDFS – Evolution in past and motivations for the future Scaling HDFS • Where we do well (# of clients/cluster size, raw storage) • Where we have challenges (Small files and blocks) • Solution • Partial namespace (Briefly) • Block Containers - But we are generalizing the storage layer to support this Storage Containers to Generalize the Storage Layer
4.
4 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Background: HDFS Layering DN 1 DN 2 DN m .. .. .. NS1 Foreign NS n ... ... NS k Block Management Layer Block Pool nBlock Pool kBlock Pool 1 NN-1 NN-k NN-n Common Storage BlockStorageNamespace
5.
5 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Security in virtualized compute env HDFS Dimensions Large # of compute clients: 100K cores Reliability Reliability Reliability, Disk/DN FT HA, DR, Snapshots …. PBs of Data (Big Data) Horizontal Scaling Bad Apps Multi-tenancy Resource Mgt/Isolation, Audit Large number of files and blocks Beyond files: optimized storage Heterogeneous storage Erasure codes (In Beta) Performance File co-location Fat DataNodes BRs Transparent Encryption
6.
6 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Scalability The Problems and the Solutions
7.
7 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Scalability – What HDFS Does Well • HDFS NN stores all namespace metadata in memory (as per GFS) • Scales to large clusters (5K) since all metadata in memory – 60K-100K tasks can share the Namenode – Low latency • Large data if files are large • Proof points of large data and large clusters – Single Organizations have over 600PB in HDFS – Single clusters with over 200PB using federation – Large clusters over 4K multi-core nodes bombarding a single NN Metadata in memory the strength of the original GFS and HDFS design But also its weakness in scaling number of files and blocks
8.
8 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Scalability - The Challenges • Challenges • Large number of files (> 350 million) • NN’s strength has become a limitation • Number of File operations • Need to improve concurrency move to multiple name servers HDFS Federation is the current solution • Add NameNodes to scale number of files & operations • Deployed at Twitter • Cluster with three NameNodes > 5000 node cluster (Plans to grow to 10,000 nodes) • Back ported and used at Facebook to scale HDFS
9.
9 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Scaling Files and Blocks 1. Scale Namespace • Keep only partial namespace in memory - the workingSet • Of last 3-5 years data only small portion is actively used – the working set metadata fits in memory - Do not want to page the working set =>still large NN memory to scale to 100K tasks 2. Scale Block Management • Keeping only part of the BlockMap in mem does not work • Soln: Containers of blocks (2GB-16GB+) • Will reduce BlockMap • Reduce Number of Block/Container reports But extend DN to support generalized Storage Container
10.
10 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Big Picture A Brief Interlude on Partial Namespace + Volumes Partial Namespace in Memory is not focus of this talk
11.
11 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Partial Namespace - Briefly • Has been prototyped • Benchmarks so that model works well • Most file systems keep only partial namespace in memory but not at this scale – Hence Cache replacement policies of working-set is important • Work in progress to get it into HDFS • Namespace Volumes – a better way to Federate the Namespace service • Partial Namespace in Memory will allow multiple namespace volumes • Scale both namespace and number of operations using multiple servers • BTW Nameservers can run on DataNodes if you prefer …
12.
12 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved © Hortonworks Inc. 2013 - Confidential Big Picture on HDFS Namespace + Volumes .. Only WorkingSet of namespace in memory › Scale beyond memory of NN NameServer – Containers for namespaces › More namespace volumes – Chosen per user/tenant/DBs – Management policies (quota, backup, DR …) – Mount tables for unified namespace • Can be managed by a central volume server Number of NameServers = › Sum of (Namespace working set) + › Sum of (Namespace throughput) › Move namespace for balancing › N+K Failover amongst NameServers 12 Datanode Datanode… … NameServers as Containers of Namespaces Storage Layer
13.
13 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Storage Containers: Better HDFS and Beyond
14.
14 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved DataNodes Big Picture Support multiple data layout structures • Indexing • Caching • Use cases • HDFS Block Container (scale blocks) + Co-location • Object Store Container • Local replica + S3 replica • Hbase • Block Store (e.g. Cinder for Openstack) Common Shared Infrastructure for • Replication • Consistency • Cluster membership • Container location Other Container Benefits • Place to put in protocol enhancements • Smaller riskier features Block Container Object Store Container HBase Container Table Container Cluster Membership Replication Management Container Location Service Container Management Services (Runs on DataNodes) HBase Object Store Metadata Applications HDFS Physical Storage - Shared
15.
15 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Current vs New World (Storage Containers) Current • Namespace (in NameNode) • File=BlockIds[] • BlockManager (In NameNode) • BlockMap: BlockId->locations • PipeLine repair • Replication management • BlockData in DataNode • BlockId->Data • Other • Generation Id (note BlockId=Gen#+Number) • File/Block Completion coordination New World • Namespace (in NameNode) • File=BlockIds[] (but BlockId=ContainerId+LocalBid) • ContainerManager (logically central) • ContainerMap: ContainerId->locations • Replication management • Cluster membership • Containers (in DataNode) • Container’s BlockMetadata + Data • BlockId->Data • PipeLine repair • Block Completion • GenerationId equivalent? (Epoc of Raft?)
16.
16 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved © Hortonworks Inc. 2013 - Confidential Storage Container Contains data for many blocks with different block ids Recall how the client will perform the mapping: –file blockId[] (NN) –blockId ->ContainerLocation (Container Manager) –Container maps the blockId to data (DataNode) A container can be viewed as a local key-value store. –Block Id is the key and Block data is the value Page 16
17.
17 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Container Structure (Using LevelDB/RocksDB) Container Index Chunk data file Chunk data file Chunk data file Chunk data file Key 1 LSM LevelDB/RocksDB Key N Chunk Data File Name Offset Lengt h An embeddable key-value store BlockId is the key and filename of local chunk file is value Optimizations – Small blocks (< 1MB) can be stored directly in rocksDB – Compaction for block data to avoid lots of files • But this can be evolved over time
18.
18 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Container Structure Can Support Random Writes 4KB Chunks can be atomically updated in K-V store Chunk Data can be added at end of Chunk file (Log structured FSs) Container Index Chunk data file Chunk data file Chunk data file Chunk data file Key 1 LSM e.g LevelDB/RocksDB Key N Chunk Data File Name Offset Length
19.
19 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved © Hortonworks Inc. 2013 - Confidential Replication: Possible Approaches Data pipeline –Data pipeline as a form of chain-replication has been successfully used for data –However, its correctness depended on central coordinator –Needs to be extended for block metadata, but hard to get it right given no central coordinator Use RAFT replication instead of data pipeline, for both data and metadata –Proven to be correct –Has been primarily used for small updates and transactions, fits well for metadata –Performance concerns for large streaming writes, needs prototyping Hybrid: RAFT + Pipeline –Hybrid approach: It can be viewed as if central coordinator is replaced by RAFT –Data pipeline approach for the data + the raft protocol -- under discussion Page 19
20.
20 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Next steps • Remove Block management layer’s locking with Namespace • Reduce lock contention, remove the tight coupling (immediate benefit) • Allows us to implement a cleanly separated Container Management layer • Block container (to support tens of billions of blocks) • 2-4gb block containers initially => reduction of 40-80 in BR and block map • Reduce BR pressure in on NN • Early release: – Single Replica Containers for a Cloud Storage Caching FS (Similar to HDFS-9806) • Partial Namespace (to billions of files per volume) • Will take us to 2B files initially and then more as we gain experience on file-working-set management • Volumes + N+K failover • Scale both ops and namespace + operational improvement for HA • Other containers • Local Replica & Cloud storage (e.g. S3) replica (Caching Mount) • Object store, HBase …..
21.
21 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Summary • HDFS scale proven in real production systems • 4K+ clusters • Raw Storage >200PB in single federated NN cluster and >30PB in non-federated clusters • But very large number of small files is a challenge • Important Area of Current Focus: Scaling # Files and Blocks • Partial Namespace: initially scale to 2B files, later 5-10B files per volume + multiple volumes • Block containers: initially scale to 6B-12B blocks, later to 100B+ blocks – However we are implementing this to extend the storage layer • Restructuring storage layer to support generalized storage containers • Support storage needs beyond HDFS: Object Store, better HBase support, etc.
22.
22 © Hortonworks
Inc. 2011 – 2016. All Rights Reserved Q&A Thank You
Download now