SlideShare a Scribd company logo
1 of 16
1© Copyright 2015 EMC Corporation. All rights reserved.
HADOOP EVERYWHERE
GEO-DISTRIBUTED STORAGE FOR BIG DATA
2© Copyright 2015 EMC Corporation. All rights reserved.
ABOUT US
Nikhil Joshi
Consultant Product Manager
EMC
nikhil.joshi@emc.com
@nikhilj0shi
Vishrut Shah
Director of Engineering
EMC
vishrut.shah@emc.com
3© Copyright 2015 EMC Corporation. All rights reserved.
CLOUD APPS & THEIR DATA ARE GLOBAL
ParisDenver Beijing
…BUT ANALYTICS IS CONFINED TO A SINGLE DATACENTER
4© Copyright 2015 EMC Corporation. All rights reserved.
Workflow
– Real-time analytics on transactional system
– Push to Risk-Analytics infra via distcp
– Push to a cold-archive for compliance
Scale
– 3 datacenters, 7 tenants
– 200gb a day. Expect 3x increase in 2 years
– 27 Hadoop clusters (test, dev, prod, DR)
– Regain control from the ‘shadow IT’
BEYOND YOUR FIRST HADOOP CLUSTER
What we heard from a large European bank…
5© Copyright 2015 EMC Corporation. All rights reserved.
GEO-DISTRIBUTED ANALYTICS INFRASTRUCTURE
Polyglot
Hadoop-ready Storage
3x Replication
Exabyte Scale
Multi-tenant Large/Small Files
Geo-accessible
WAN-efficient
Active-active
Strong Consistency
NEEDS TO ADDRESS SILOS, UNDERUTILIZATION, RUNAWAY REPLICATION
6© Copyright 2015 EMC Corporation. All rights reserved.
ECS: A THOROUGHBRED GEO-PLATFORM
Global Namespace
Denver Beijing Paris
STORAGE EFFICIENCYGEO CACHINGSTRONG CONSISTENCY ACTIVE-ACTIVE w/ FAILOVER
Geo-distributed
Data
Geo-distributed
Data
Apps
7© Copyright 2015 EMC Corporation. All rights reserved.
ECS AS A HADOOP STORAGE BACKEND
HCFS*
Vendor
Agnostic
Primary
Filesystem
Geo-distributed
Kerberos
Enabled
Apache
Hadoop 2.7
Ambari
Compatible
Multi-tenant
8© Copyright 2015 EMC Corporation. All rights reserved.
SHARED STORAGE
Hadoop Compute Cluster
Shared Storage Backend
9© Copyright 2015 EMC Corporation. All rights reserved.
HCFS CLIENT LIBRARY
Client
Library
10© Copyright 2015 EMC Corporation. All rights reserved.
HaaS
Tenant 1
Tenant 2
Tenant 3
11© Copyright 2015 EMC Corporation. All rights reserved.
Hadoop as a Service
Analysts
Independently scalable primary
storage to run Hadoop analytics
Site
Compute Grid
COMMON CONFIGURATIONS
Disaster Recovery
Geo-distribution
Site 1
Site 2
Compute Grid
Archive/failover tier to Hadoop
Multi-site analytics
Geo-distribution
Site 1
Site 2
Compute Grid
Distributed storage to run in-place
Hadoop analytics across sites
Compute Grid
12© Copyright 2015 EMC Corporation. All rights reserved.
HORTONWORKS CERTIFIED
HDP 2.3 – ECS 2.2
Certified and Validated
Ambari 2.2
Joint Support
Collateral
Training
13© Copyright 2015 EMC Corporation. All rights reserved.
ENOUGH TALK.
SHOW ME THE DEMO.
14© Copyright 2015 EMC Corporation. All rights reserved.
15© Copyright 2015 EMC Corporation. All rights reserved.
Hadoop Everywhere

More Related Content

What's hot

Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Databricks
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data InsightsDataWorks Summit
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is DistributedAlluxio, Inc.
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersDataWorks Summit
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMAlluxio, Inc.
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Spark Summit
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...Spark Summit
 

What's hot (20)

Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBMPowering Data Science and AI with Apache Spark, Alluxio, and IBM
Powering Data Science and AI with Apache Spark, Alluxio, and IBM
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
BigDL: A Distributed Deep Learning Library on Spark: Spark Summit East talk b...
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 

Viewers also liked

Why PTC for SLM?
Why PTC for SLM?Why PTC for SLM?
Why PTC for SLM?Tom Kenslea
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmDataWorks Summit
 
Cloud- A Technical or Organisational Challenge? Or Both?
Cloud- A Technical or Organisational Challenge? Or Both?Cloud- A Technical or Organisational Challenge? Or Both?
Cloud- A Technical or Organisational Challenge? Or Both?Justin Pirie
 
Designing the Industrial Internet
Designing the Industrial InternetDesigning the Industrial Internet
Designing the Industrial InternetDane Petersen
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopDataWorks Summit
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Why PTC for SLM?
Why PTC for SLM?Why PTC for SLM?
Why PTC for SLM?
 
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
Cloud- A Technical or Organisational Challenge? Or Both?
Cloud- A Technical or Organisational Challenge? Or Both?Cloud- A Technical or Organisational Challenge? Or Both?
Cloud- A Technical or Organisational Challenge? Or Both?
 
Designing the Industrial Internet
Designing the Industrial InternetDesigning the Industrial Internet
Designing the Industrial Internet
 
Retaam_ThingWorx
Retaam_ThingWorxRetaam_ThingWorx
Retaam_ThingWorx
 
Data Process Systems, connecting everything
Data Process Systems, connecting everythingData Process Systems, connecting everything
Data Process Systems, connecting everything
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big DataPowering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
 
Protecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache HadoopProtecting Enterprise Data in Apache Hadoop
Protecting Enterprise Data in Apache Hadoop
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
IoT13: Thingworx showcase
IoT13: Thingworx showcaseIoT13: Thingworx showcase
IoT13: Thingworx showcase
 
A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed Hadoop
 
Practical advice to build a data driven company
Practical advice to build a data driven companyPractical advice to build a data driven company
Practical advice to build a data driven company
 
NLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-TextNLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-Text
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 

Similar to Hadoop Everywhere

Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop StacksDataWorks Summit
 
Emc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshopEmc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshopsolarisyougood
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesKrisztián Horváth
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesYugabyte
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...EMC
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14John Sing
 
EMC HADOOP Storage Strategy
EMC HADOOP Storage StrategyEMC HADOOP Storage Strategy
EMC HADOOP Storage Strategywalshe1
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...Srivatsan Ramanujam
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps.com
 
Operating Flink on Mesos at Scale
Operating Flink on Mesos at ScaleOperating Flink on Mesos at Scale
Operating Flink on Mesos at ScaleBiswajit Das
 
EMC Big Data Solutions Overview
EMC Big Data Solutions OverviewEMC Big Data Solutions Overview
EMC Big Data Solutions Overviewwalshe1
 
Emc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overviewEmc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overviewsolarisyougood
 
Emc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overviewEmc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overviewsolarisyougood
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessDataWorks Summit
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonDataWorks Summit/Hadoop Summit
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese天青 王
 

Similar to Hadoop Everywhere (20)

Disaggregated Hadoop Stacks
Disaggregated Hadoop StacksDisaggregated Hadoop Stacks
Disaggregated Hadoop Stacks
 
EMC config Hadoop
EMC config HadoopEMC config Hadoop
EMC config Hadoop
 
Emc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshopEmc ecs 2 technical deep dive workshop
Emc ecs 2 technical deep dive workshop
 
EMC EC Overview
EMC EC OverviewEMC EC Overview
EMC EC Overview
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on Kubernetes
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on Kubernetes
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on Kubernetes
 
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
Pivotal: Hadoop for Powerful Processing of Unstructured Data for Valuable Ins...
 
Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14Hadoop_Its_Not_Just_Internal_Storage_V14
Hadoop_Its_Not_Just_Internal_Storage_V14
 
EMC HADOOP Storage Strategy
EMC HADOOP Storage StrategyEMC HADOOP Storage Strategy
EMC HADOOP Storage Strategy
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
 
Greenplum feature
Greenplum featureGreenplum feature
Greenplum feature
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
 
Operating Flink on Mesos at Scale
Operating Flink on Mesos at ScaleOperating Flink on Mesos at Scale
Operating Flink on Mesos at Scale
 
EMC Big Data Solutions Overview
EMC Big Data Solutions OverviewEMC Big Data Solutions Overview
EMC Big Data Solutions Overview
 
Emc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overviewEmc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overview
 
Emc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overviewEmc vi pr hdfs data service technical overview
Emc vi pr hdfs data service technical overview
 
In-Place analytics with Unified Data Access
In-Place analytics with Unified Data AccessIn-Place analytics with Unified Data Access
In-Place analytics with Unified Data Access
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC IsilonImproving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
 
Virtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In ChineseVirtual Hadoop Introduction In Chinese
Virtual Hadoop Introduction In Chinese
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Hadoop Everywhere

  • 1. 1© Copyright 2015 EMC Corporation. All rights reserved. HADOOP EVERYWHERE GEO-DISTRIBUTED STORAGE FOR BIG DATA
  • 2. 2© Copyright 2015 EMC Corporation. All rights reserved. ABOUT US Nikhil Joshi Consultant Product Manager EMC nikhil.joshi@emc.com @nikhilj0shi Vishrut Shah Director of Engineering EMC vishrut.shah@emc.com
  • 3. 3© Copyright 2015 EMC Corporation. All rights reserved. CLOUD APPS & THEIR DATA ARE GLOBAL ParisDenver Beijing …BUT ANALYTICS IS CONFINED TO A SINGLE DATACENTER
  • 4. 4© Copyright 2015 EMC Corporation. All rights reserved. Workflow – Real-time analytics on transactional system – Push to Risk-Analytics infra via distcp – Push to a cold-archive for compliance Scale – 3 datacenters, 7 tenants – 200gb a day. Expect 3x increase in 2 years – 27 Hadoop clusters (test, dev, prod, DR) – Regain control from the ‘shadow IT’ BEYOND YOUR FIRST HADOOP CLUSTER What we heard from a large European bank…
  • 5. 5© Copyright 2015 EMC Corporation. All rights reserved. GEO-DISTRIBUTED ANALYTICS INFRASTRUCTURE Polyglot Hadoop-ready Storage 3x Replication Exabyte Scale Multi-tenant Large/Small Files Geo-accessible WAN-efficient Active-active Strong Consistency NEEDS TO ADDRESS SILOS, UNDERUTILIZATION, RUNAWAY REPLICATION
  • 6. 6© Copyright 2015 EMC Corporation. All rights reserved. ECS: A THOROUGHBRED GEO-PLATFORM Global Namespace Denver Beijing Paris STORAGE EFFICIENCYGEO CACHINGSTRONG CONSISTENCY ACTIVE-ACTIVE w/ FAILOVER Geo-distributed Data Geo-distributed Data Apps
  • 7. 7© Copyright 2015 EMC Corporation. All rights reserved. ECS AS A HADOOP STORAGE BACKEND HCFS* Vendor Agnostic Primary Filesystem Geo-distributed Kerberos Enabled Apache Hadoop 2.7 Ambari Compatible Multi-tenant
  • 8. 8© Copyright 2015 EMC Corporation. All rights reserved. SHARED STORAGE Hadoop Compute Cluster Shared Storage Backend
  • 9. 9© Copyright 2015 EMC Corporation. All rights reserved. HCFS CLIENT LIBRARY Client Library
  • 10. 10© Copyright 2015 EMC Corporation. All rights reserved. HaaS Tenant 1 Tenant 2 Tenant 3
  • 11. 11© Copyright 2015 EMC Corporation. All rights reserved. Hadoop as a Service Analysts Independently scalable primary storage to run Hadoop analytics Site Compute Grid COMMON CONFIGURATIONS Disaster Recovery Geo-distribution Site 1 Site 2 Compute Grid Archive/failover tier to Hadoop Multi-site analytics Geo-distribution Site 1 Site 2 Compute Grid Distributed storage to run in-place Hadoop analytics across sites Compute Grid
  • 12. 12© Copyright 2015 EMC Corporation. All rights reserved. HORTONWORKS CERTIFIED HDP 2.3 – ECS 2.2 Certified and Validated Ambari 2.2 Joint Support Collateral Training
  • 13. 13© Copyright 2015 EMC Corporation. All rights reserved. ENOUGH TALK. SHOW ME THE DEMO.
  • 14. 14© Copyright 2015 EMC Corporation. All rights reserved.
  • 15. 15© Copyright 2015 EMC Corporation. All rights reserved.

Editor's Notes

  1. Good afternoon everyone, welcome to the session on Geo-Distributed storage for Big Data. Hope you have enjoyed the conference so far. I would like to start our session today with a couple of “show of hands” questions: How many of you have clusters in more than than one region/GEO? How many of you use distcp today? Other tools apart from “distcp”?
  2. About us…
  3. 1. Today’s applications generate and consume data from all across the globe. But Hadoop has mostly remained confined to a single datacenter. 2. Hadoop has grown and matured around HDFS as its filesystem. HDFS did a great job in providing a petabyte scale, reliable storage system built on commodity hardware. 3. It’s storage layer is yet to evolve into a global filesystem with characteristics like geo-access, active-active access, selective replication. 4. Most vendor tools have relied on distcp based mechanisms… ---------- New age applications such as Mobile, social and IoT apps create and consume data across the globe. This gives rise to multiple apps creating silos of data that are accessed via different protocols like S3, NFS, Swift etc depending on the application. This data is then further copied into other silos specifically for analytics purposes. This leads to overall operational complexity as well higher storage costs.
  4. [Animated] Spend a lot of time on this slide. This will be framed as “desirable characteristics of a general geo-hadoop platform”. Stress of active-active and strong consistency. Not ECS specific yet.
  5. Reiterate the concept from previous slide – Mention WAN distances
  6. Spend time here. Hopefully, HCFS has already been explained before.
  7. Vishrut can take over from here. I will try to do a humorous hand off. Something like “Now, my technical colleague will show you that this is not all smoke and mirrors. He has also graciously agreed to field all hard questions”.
  8. [Animated] Point audience to some material available for further reading about this partnership
  9. Yet to decide if we want to move this inside our video demo, or do this via a screenshot-based walkthough
  10. [Animated] Simple animation to ensure that audience understands “global hadoop” only applies to storage. Apps (MR/Hbase) are not geo-accessible. And to demonstrate this solution only replaces the storage layer (HDFS)