SlideShare a Scribd company logo
1 of 47
Download to read offline
Docker-Based
Hadoop Provisioning
On Cisco InterCloud
Innovation Architect, CIS CTO Group
Cisco
Dmitri Chtchourov Rakesh Saha
Product Management
Hortonworks
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cautionary Statement Regarding Forward-Looking Statements
This presentation contains forward-looking statements involving risks and uncertainties. Such forward-looking statements in this
presentation generally relate to future events, our ability to increase the number of support subscription customers, the growth in
usage of the Hadoop framework, our ability to innovate and develop the various open source projects that will enhance the
capabilities of the Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some cases, you can
identify forward-looking statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,”
“could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms
or expressions that concern our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as
predictions of future events. We have based the forward-looking statements contained in this presentation primarily on our current
expectations and projections about future events and trends that we believe may affect our business, financial condition and
prospects. We cannot assure you that the results, events and circumstances reflected in the forward-looking statements will be
achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking
statements.
The forward-looking statements made in this prospectus relate only to events as of the date on which the statements are made and we
undertake no obligation to update any of the information in this presentation.
Trademarks
Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other names used herein may be
trademarks of their respective owners.
Speakers
Rakesh Saha
Product Management
Hortonworks
Dmitri Chtchourov
Innovation Architect, CIS CTO Group
Cisco
Agenda
• About Hortonworks
• Cloudbreak – Docker-based Hadoop provisioning tool
• Introduction to Docker
• Hadoop Provisioning using Docker
• Cisco and Hortonworks Collaboration
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
About HortonworksONLY
100open source
Apache Hadoop data platform
% Founded in 2011
HADOOP
1ST
distribution to go public
IPO Fall 2014 (NASDAQ: HDP)
subscription
customers322 employees across
600+
countrie
s
technology partners
1000+ 17TM
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks
Mission:
Power your Modern Data Architecture
with HDP and Enterprise Apache Hadoop
Customer Momentum
• 300+ customers in seven quarters, growing at 75+/quarter
• Two thirds of customers come from F1000
Hortonworks and Hadoop at
Scale
• HDP in production on largest clusters on planet
• Multiple +1000 node clusters, including 35,000 nodes at
Yahoo!, 800 nodes at Spotify
• Founded in 2011
• Original 24 architects, developers,
operators of Hadoop from Yahoo!
• We are leaders in Hadoop community
• 500+ employees
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
HDP is deeply integrated in the data centerSOURCES
EXISTING
Systems
Clickstream Web &Social Geolocation Sensor &
Machine
Server Logs Unstructured
DATASYSTEM
RDBMS EDW MPP
APPLICATIONS
Deep Partnerships
Hortonworks engages in deep
engineered relationships with the
leaders in the data center, such as
Cisco, Microsoft, EMC, Pivotal,
Teradata, Red Hat, SAS & SAP.
Broad Partnerships
Over a 1,000 partners work with us
to certify their applications to work
with Hadoop so they can extend big
data to their users.
HDP
Governance
&Integration
Security
Operations
Data Access
Data Management
YARN
Agenda
Cloudbreak Docker Provisioning Collaboration
Cloudbreak
• Developed by SequenceIQ
• Open source with Apache 2.0
license [ Apache project soon ]
• Deploys selected services to
public and private cloud via
Ambari Blueprints
• Elastic – can spin up any number
of nodes, add/remove on the fly
• Provides full cloud lifecycle
management post-deployment
BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test
(all HDP services)
Data Science
(Spark)
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
Example Ambari
Blueprints:
IoT Apps, BI / Analytics, Data Science, Dev /
Test
Hadoop in Cloud Provisioning with Cloudbreak
Create
Templates
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Provisioning: Template
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Provisioning: Blueprint
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Provisioning: Provider Credentials
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Provisioning: Launch
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Specialized Blueprints
Quick productivity with pre-configured clusters blueprints
 Lambda Architecture
 Machine Learning
 Batch ETL
 …
BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Dev / Test
(all HDP services)
Data Science
(Spark)
Autoscaling
Policy
• Policies based on any Ambari metrics
• Coordinates with YARN
• Policies are based on Metrics or Time
• Scaling can be service or component
type specific
Optimize cloud usage via Elastic Clusters
Auto-scale
Policy
Auto-scale
Policy
Auto-scale
Policy
YARN
Ambari
Alerts
Ambari
Metrics
Ambari
Ambari
Ambari
Provisioning
Cloudbreak
Static
Dynamic
Enforces Policies
Scales
Cluster/YARN Apps
Metrics and Alerts Feed
Cloudbreak
Scaling for Static and Dynamic Clusters
Provisioning – How it works
Start VMs -
with a running
Docker
daemon
Cloudbreak
Bootstrap
•Start Consul
Cluster
•Start Swarm
Cluster (Consul
for discovery)
Start Ambari
servers/agents
- Swarm API
Ambari
services
registered in
Consul
(Registrator)
Post Blueprint
Agenda
Cloudbreak Docker Provisioning Collaboration
Multiplicity
of
Stacks
Multiplicity
of hardware
environments
Static website Web frontendUser DB Queue Analytics DB
Development
VM QA server Public Cloud
Contributor’s
laptopProduction
Cluster
Customer Data
Center
An engine that enables any payload to be
encapsulated as a lightweight, portable,
self-sufficient container
Docker is a “Shipping Container” System for Code
 Lightweight, portable
 Build once, run anywhere
 VM – without the overhead of a VM
 Isolated containers
 Automated and scripted
Docker
Why Is Docker So Exciting?
For Developers:
Build once…run anywhere
• A clean, safe, and portable runtime
environment for your app.
• No missing dependencies, packages etc.
• Run each app in its own isolated container
• Automate testing, integration, packaging
• Reduce/eliminate concerns about
compatibility on different platforms
• Cheap, zero-penalty containers to deploy
services
For DevOps:
Configure once…run anything
• Make the entire lifecycle more efficient,
consistent, and repeatable
• Eliminate inconsistencies between SDLC
stages
• Support segregation of duties
• Significantly improves the speed and
reliability of CICD
• Significantly lightweight compared to VMs
App
A
Hypervisor (Type 2)
Host OS
Server
Guest
OS
Bins/
Libs
App
A’
Guest
OS
Bins/
Libs
App
B
Guest
OS
Bins/
Libs
Docker
Host OS kernel
Server
bin
AppA
lib
AppB
VM
Container
Containers are isolated,
Share only the kernel
Guest
OS
Guest
OS
…result is significantly faster
deployment, much less overhead,
easier migration, faster restart
lib
AppB
lib
AppB
lib
AppB
bin
AppA
Docker: Containers vs. VMs
Agenda
Cloudbreak Docker Provisioning Collaboration
HDP as Docker
Containers
via Cloudbreak
• Running Ambari Cluster in Containers
• Use Blueprint to define services
• All HDP services share a single container
Cloudb
reak
Ambari HDP
Installs
Ambari on
the VMs
Docker
VM
Docker
VM
Docker
Linux
Instruct
s
Ambari
to build
HDP
cluster
Cloud Provider/Bare Metal
Provisions
VMs from
Cloud
Providers
Run Hadoop as Docker Containers
Swarm + Consul for Placement and Discovery
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-
agn
amb-ser
amb-
agn
amb-
agn
amb-
agn
amb-
agn
Blueprint
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-agn
- hdfs
- hbase
amb-ser
amb-agn
-hdfs
-hive
amb-agn
-hdfs
-yarn
amb-agn
-hdfs
-zookpr
amb-agn
-nmnode
-hdfs
• Quick installation with pre-pulled rpms
• Same process/images for dev/qa/prod
• Same process for single/multi-node
Benefits of running Hadoop on Docker
Demo
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Agenda
Cloudbreak Docker Provisioning Collaboration
Cisco and Hortonworks’ Partnership
100% open source Hadoop Distribution,
Support and Training
Integrated Infrastructures for Big Data
CISCO AND HORTONWORKS ARE PARTNERING TO HELP YOU BUILD
YOUR BIG DATA SOLUTION AND REACH MASSIVE SCALABILITY,
SUPERIOR EFFICIENCY AND DRAMATICALLY LOWER TOTAL COST OF
OWNERSHIP THANKS TO A VALIDATED JOINT ARCHITECTURE.
Results of the collaboration
• Efficient Hadoop as a
service
• Adoption of Docker for
enterprise Hadoop
deployment
Tasks
Cisco
InterCloud
Public Cloud
Provider
HDP installation
15:04 mins 11:55 mins
Teragen (avg of 3 execution)
7:08 mins 22:15 mins
Terasort(avg of 3 execution)
32:09 mins 60:12 mins
Teravalidate(avg of 3
execution)
2:31 mins 10:40 mins
Observations Future Collaboration
• Docker is maturing inside enterprises
• Interest to run Docker on top of bare
metal
• Big data app developers are leaning
towards containerization of apps
• YARN is becoming application
deployment platform beyond big data
apps
• Demand for native containerized fully
managed app on YARN
• Run Docker natively on
Openstack
• Run Docker on Yarn
• OpenStack bare metal
Conclusion
Data Science
IoT
BI / Analytics
Dev / Test
Blueprints
HDP
HDP + Cisco InterCloud - Efficient Hadoop-as-a-service
Learn More
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about Cisco & Hortonworks
http://hortonworks.com/partner/cisco/
More about Hortonworks’ Acquisition of SequenceIQ
http://bit.ly/1R1ktxO

More Related Content

What's hot

How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentBlueData, Inc.
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoopGergely Devenyi
 
DevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineDevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineKit Merker
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBlueData, Inc.
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
 
Webinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwareWebinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwarePlatform9
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemOwen O'Malley
 
Openshift Container Platform on Azure
Openshift Container Platform on AzureOpenshift Container Platform on Azure
Openshift Container Platform on AzureGlenn West
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSDataWorks Summit
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkSpark Summit
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...VMware Tanzu
 
Dev ops for big data cluster management tools
Dev ops for big data  cluster management toolsDev ops for big data  cluster management tools
Dev ops for big data cluster management toolsRan Silberman
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 

What's hot (20)

How to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environmentHow to deploy Apache Spark in a multi-tenant, on-premises environment
How to deploy Apache Spark in a multi-tenant, on-premises environment
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
DevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container EngineDevNexus 2015: Kubernetes & Container Engine
DevNexus 2015: Kubernetes & Container Engine
 
Bare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containersBare-metal performance for Big Data workloads on Docker containers
Bare-metal performance for Big Data workloads on Docker containers
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
Webinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMwareWebinar: OpenStack Benefits for VMware
Webinar: OpenStack Benefits for VMware
 
Ansible + Hadoop
Ansible + HadoopAnsible + Hadoop
Ansible + Hadoop
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Big data and Kubernetes
Big data and KubernetesBig data and Kubernetes
Big data and Kubernetes
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
 
Openshift Container Platform on Azure
Openshift Container Platform on AzureOpenshift Container Platform on Azure
Openshift Container Platform on Azure
 
CBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFSCBlocks - Posix compliant files systems for HDFS
CBlocks - Posix compliant files systems for HDFS
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van NiekerkAPACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
 
Dev ops for big data cluster management tools
Dev ops for big data  cluster management toolsDev ops for big data  cluster management tools
Dev ops for big data cluster management tools
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 

Viewers also liked

Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...DataWorks Summit/Hadoop Summit
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Managing Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesManaging Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesMarc Sluiter
 
Docker Swarm Cluster
Docker Swarm ClusterDocker Swarm Cluster
Docker Swarm ClusterFernando Ike
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Janos Matyas
 

Viewers also liked (9)

Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Managing Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesManaging Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing Kubernetes
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Docker Swarm Cluster
Docker Swarm ClusterDocker Swarm Cluster
Docker Swarm Cluster
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 

Similar to Hadoop on Docker

A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 DataWorks Summit
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the EnterpriseEric Kavanagh
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersGiovanni Galloro
 
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of ChoicePaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of ChoiceIsaac Christoffersen
 
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamoneNovance
 
DevOps and BigData Analytics
DevOps and BigData Analytics DevOps and BigData Analytics
DevOps and BigData Analytics sbbabu
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...Srijan Technologies
 
Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.Oracle Developers
 
Open Stack Cloud Services
Open Stack Cloud ServicesOpen Stack Cloud Services
Open Stack Cloud ServicesSaurabh Gupta
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Development on cloud_paa_s_sddc_mkim_20141216_final
Development on cloud_paa_s_sddc_mkim_20141216_finalDevelopment on cloud_paa_s_sddc_mkim_20141216_final
Development on cloud_paa_s_sddc_mkim_20141216_finalminseok kim
 
Containers, microservices and serverless for realists
Containers, microservices and serverless for realistsContainers, microservices and serverless for realists
Containers, microservices and serverless for realistsKarthik Gaekwad
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak
 
Developing Hybrid Cloud Applications
Developing Hybrid Cloud ApplicationsDeveloping Hybrid Cloud Applications
Developing Hybrid Cloud ApplicationsDaniel Berg
 

Similar to Hadoop on Docker (20)

A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
 
4 hp converged_cloud
4 hp converged_cloud4 hp converged_cloud
4 hp converged_cloud
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the Enterprise
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux Containers
 
Cloud foundry
Cloud foundryCloud foundry
Cloud foundry
 
Red hat cloud platforms
Red hat cloud platformsRed hat cloud platforms
Red hat cloud platforms
 
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of ChoicePaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
PaaS Anywhere - Deploying an OpenShift PaaS into your Cloud Provider of Choice
 
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
 
DevOps and BigData Analytics
DevOps and BigData Analytics DevOps and BigData Analytics
DevOps and BigData Analytics
 
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
[Srijan Wednesday Webinars] How to Build a Cloud Native Platform for Enterpri...
 
Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.Building Cloud Native Applications with Oracle Autonomous Database.
Building Cloud Native Applications with Oracle Autonomous Database.
 
Open Stack Cloud Services
Open Stack Cloud ServicesOpen Stack Cloud Services
Open Stack Cloud Services
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Javantura v4 - Support SpringBoot application development lifecycle using Ora...
Javantura v4 - Support SpringBoot application development lifecycle using Ora...Javantura v4 - Support SpringBoot application development lifecycle using Ora...
Javantura v4 - Support SpringBoot application development lifecycle using Ora...
 
Development on cloud_paa_s_sddc_mkim_20141216_final
Development on cloud_paa_s_sddc_mkim_20141216_finalDevelopment on cloud_paa_s_sddc_mkim_20141216_final
Development on cloud_paa_s_sddc_mkim_20141216_final
 
Apresentação Hadoop
Apresentação HadoopApresentação Hadoop
Apresentação Hadoop
 
Containers, microservices and serverless for realists
Containers, microservices and serverless for realistsContainers, microservices and serverless for realists
Containers, microservices and serverless for realists
 
Sukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud ManagementSukumar Nayak-Agile-DevOps-Cloud Management
Sukumar Nayak-Agile-DevOps-Cloud Management
 
Developing Hybrid Cloud Applications
Developing Hybrid Cloud ApplicationsDeveloping Hybrid Cloud Applications
Developing Hybrid Cloud Applications
 

Recently uploaded

Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesShyamsundar Das
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기Chiwon Song
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9Jürgen Gutsch
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntelliSource Technologies
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?AmeliaSmith90
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxPrakarsh -
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024Mind IT Systems
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 

Recently uploaded (20)

Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
Watermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security ChallengesWatermarking in Source Code: Applications and Security Challenges
Watermarking in Source Code: Applications and Security Challenges
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기20240330_고급진 코드를 위한 exception 다루기
20240330_고급진 코드를 위한 exception 다루기
 
About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9About .NET 8 and a first glimpse into .NET9
About .NET 8 and a first glimpse into .NET9
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptx
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?How Does the Epitome of Spyware Differ from Other Malicious Software?
How Does the Epitome of Spyware Differ from Other Malicious Software?
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Sustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire ThornewillSustainable Web Design - Claire Thornewill
Sustainable Web Design - Claire Thornewill
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
Kubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptxKubernetes go-live checklist for your microservices.pptx
Kubernetes go-live checklist for your microservices.pptx
 
Top Software Development Trends in 2024
Top Software Development Trends in  2024Top Software Development Trends in  2024
Top Software Development Trends in 2024
 
online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 

Hadoop on Docker

  • 1. Docker-Based Hadoop Provisioning On Cisco InterCloud Innovation Architect, CIS CTO Group Cisco Dmitri Chtchourov Rakesh Saha Product Management Hortonworks
  • 2. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cautionary Statement Regarding Forward-Looking Statements This presentation contains forward-looking statements involving risks and uncertainties. Such forward-looking statements in this presentation generally relate to future events, our ability to increase the number of support subscription customers, the growth in usage of the Hadoop framework, our ability to innovate and develop the various open source projects that will enhance the capabilities of the Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some cases, you can identify forward-looking statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,” “could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms or expressions that concern our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as predictions of future events. We have based the forward-looking statements contained in this presentation primarily on our current expectations and projections about future events and trends that we believe may affect our business, financial condition and prospects. We cannot assure you that the results, events and circumstances reflected in the forward-looking statements will be achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking statements. The forward-looking statements made in this prospectus relate only to events as of the date on which the statements are made and we undertake no obligation to update any of the information in this presentation. Trademarks Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other names used herein may be trademarks of their respective owners.
  • 3. Speakers Rakesh Saha Product Management Hortonworks Dmitri Chtchourov Innovation Architect, CIS CTO Group Cisco
  • 4. Agenda • About Hortonworks • Cloudbreak – Docker-based Hadoop provisioning tool • Introduction to Docker • Hadoop Provisioning using Docker • Cisco and Hortonworks Collaboration
  • 5. © Hortonworks Inc. 2011 – 2015. All Rights Reserved About HortonworksONLY 100open source Apache Hadoop data platform % Founded in 2011 HADOOP 1ST distribution to go public IPO Fall 2014 (NASDAQ: HDP) subscription customers322 employees across 600+ countrie s technology partners 1000+ 17TM
  • 6. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hortonworks Mission: Power your Modern Data Architecture with HDP and Enterprise Apache Hadoop Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter • Two thirds of customers come from F1000 Hortonworks and Hadoop at Scale • HDP in production on largest clusters on planet • Multiple +1000 node clusters, including 35,000 nodes at Yahoo!, 800 nodes at Spotify • Founded in 2011 • Original 24 architects, developers, operators of Hadoop from Yahoo! • We are leaders in Hadoop community • 500+ employees
  • 7. © Hortonworks Inc. 2011 – 2015. All Rights Reserved OPERATIONAL TOOLS DEV & DATA TOOLS INFRASTRUCTURE HDP is deeply integrated in the data centerSOURCES EXISTING Systems Clickstream Web &Social Geolocation Sensor & Machine Server Logs Unstructured DATASYSTEM RDBMS EDW MPP APPLICATIONS Deep Partnerships Hortonworks engages in deep engineered relationships with the leaders in the data center, such as Cisco, Microsoft, EMC, Pivotal, Teradata, Red Hat, SAS & SAP. Broad Partnerships Over a 1,000 partners work with us to certify their applications to work with Hadoop so they can extend big data to their users. HDP Governance &Integration Security Operations Data Access Data Management YARN
  • 9. Cloudbreak • Developed by SequenceIQ • Open source with Apache 2.0 license [ Apache project soon ] • Deploys selected services to public and private cloud via Ambari Blueprints • Elastic – can spin up any number of nodes, add/remove on the fly • Provides full cloud lifecycle management post-deployment
  • 10. BI / Analytics (Hive) IoT Apps (Storm, HBase, Hive) Launch HDP on Any Cloud for Any Application Dev / Test (all HDP services) Data Science (Spark) Cloudbreak 1. Pick a Blueprint 2. Choose a Cloud 3. Launch HDP! Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev / Test
  • 11. Hadoop in Cloud Provisioning with Cloudbreak Create Templates Provide Blueprint Associate Credentials Launch Cluster
  • 16. Specialized Blueprints Quick productivity with pre-configured clusters blueprints  Lambda Architecture  Machine Learning  Batch ETL  …
  • 17. BI / Analytics (Hive) IoT Apps (Storm, HBase, Hive) Dev / Test (all HDP services) Data Science (Spark) Autoscaling Policy • Policies based on any Ambari metrics • Coordinates with YARN • Policies are based on Metrics or Time • Scaling can be service or component type specific Optimize cloud usage via Elastic Clusters
  • 19. Provisioning – How it works Start VMs - with a running Docker daemon Cloudbreak Bootstrap •Start Consul Cluster •Start Swarm Cluster (Consul for discovery) Start Ambari servers/agents - Swarm API Ambari services registered in Consul (Registrator) Post Blueprint
  • 21. Multiplicity of Stacks Multiplicity of hardware environments Static website Web frontendUser DB Queue Analytics DB Development VM QA server Public Cloud Contributor’s laptopProduction Cluster Customer Data Center An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container Docker is a “Shipping Container” System for Code
  • 22.  Lightweight, portable  Build once, run anywhere  VM – without the overhead of a VM  Isolated containers  Automated and scripted Docker
  • 23. Why Is Docker So Exciting? For Developers: Build once…run anywhere • A clean, safe, and portable runtime environment for your app. • No missing dependencies, packages etc. • Run each app in its own isolated container • Automate testing, integration, packaging • Reduce/eliminate concerns about compatibility on different platforms • Cheap, zero-penalty containers to deploy services For DevOps: Configure once…run anything • Make the entire lifecycle more efficient, consistent, and repeatable • Eliminate inconsistencies between SDLC stages • Support segregation of duties • Significantly improves the speed and reliability of CICD • Significantly lightweight compared to VMs
  • 24. App A Hypervisor (Type 2) Host OS Server Guest OS Bins/ Libs App A’ Guest OS Bins/ Libs App B Guest OS Bins/ Libs Docker Host OS kernel Server bin AppA lib AppB VM Container Containers are isolated, Share only the kernel Guest OS Guest OS …result is significantly faster deployment, much less overhead, easier migration, faster restart lib AppB lib AppB lib AppB bin AppA Docker: Containers vs. VMs
  • 26. HDP as Docker Containers via Cloudbreak • Running Ambari Cluster in Containers • Use Blueprint to define services • All HDP services share a single container Cloudb reak Ambari HDP Installs Ambari on the VMs Docker VM Docker VM Docker Linux Instruct s Ambari to build HDP cluster Cloud Provider/Bare Metal Provisions VMs from Cloud Providers Run Hadoop as Docker Containers
  • 27. Swarm + Consul for Placement and Discovery
  • 28. Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker
  • 29. Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb- agn amb-ser amb- agn amb- agn amb- agn amb- agn Blueprint
  • 30. Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb-agn - hdfs - hbase amb-ser amb-agn -hdfs -hive amb-agn -hdfs -yarn amb-agn -hdfs -zookpr amb-agn -nmnode -hdfs
  • 31. • Quick installation with pre-pulled rpms • Same process/images for dev/qa/prod • Same process for single/multi-node Benefits of running Hadoop on Docker
  • 32. Demo
  • 43. Cisco and Hortonworks’ Partnership 100% open source Hadoop Distribution, Support and Training Integrated Infrastructures for Big Data CISCO AND HORTONWORKS ARE PARTNERING TO HELP YOU BUILD YOUR BIG DATA SOLUTION AND REACH MASSIVE SCALABILITY, SUPERIOR EFFICIENCY AND DRAMATICALLY LOWER TOTAL COST OF OWNERSHIP THANKS TO A VALIDATED JOINT ARCHITECTURE.
  • 44. Results of the collaboration • Efficient Hadoop as a service • Adoption of Docker for enterprise Hadoop deployment Tasks Cisco InterCloud Public Cloud Provider HDP installation 15:04 mins 11:55 mins Teragen (avg of 3 execution) 7:08 mins 22:15 mins Terasort(avg of 3 execution) 32:09 mins 60:12 mins Teravalidate(avg of 3 execution) 2:31 mins 10:40 mins
  • 45. Observations Future Collaboration • Docker is maturing inside enterprises • Interest to run Docker on top of bare metal • Big data app developers are leaning towards containerization of apps • YARN is becoming application deployment platform beyond big data apps • Demand for native containerized fully managed app on YARN • Run Docker natively on Openstack • Run Docker on Yarn • OpenStack bare metal
  • 46. Conclusion Data Science IoT BI / Analytics Dev / Test Blueprints HDP HDP + Cisco InterCloud - Efficient Hadoop-as-a-service
  • 47. Learn More Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 More about Cisco & Hortonworks http://hortonworks.com/partner/cisco/ More about Hortonworks’ Acquisition of SequenceIQ http://bit.ly/1R1ktxO

Editor's Notes

  1. Deploying Hadoop on Openstack is never been easier but Hortonworks and Cisco collaboration in last few months makes it completely automated and seamless.
  2. This is cautionary statement as this presentation may have product and collaboration direction which are subject to change.
  3. We were founded in 2011 by 24 developers from Yahoo where Hadoop was conceived to address data challenges at internet scale. What we now know of as Hadoop really started in 2005, when a team at Yahoo was directed to build out a large-scale data storage and processing technology that would allow them to improve their most critical application, Search. Their challenge was essentially two-fold. First they needed to capture and archive the contents of the internet, and then process the data so that users could search through it effectively an efficiently. Clearly traditional approaches were both technically (due to the size of the data) and commercially (due to the cost) impractical. The result was the Apache Hadoop project that delivered large scale storage (HDFS) and processing (MapReduce). Today we are over 600 employees and have partnered with over 1000 companies who are the leaders in the data center We have also been very fortunate to achieve very significant customer adoption with over 330 customers as of the end of 2014, spanning nearly every vertical.   Hortonworks was founded the sole intent to make Hadoop an enterprise data platform. With YARN as its foundation, HDP delivers a centralized architecture with true multi-tenancy for data-processing and shared services for Security, Governance and Operations to satisfy enterprise requirements, all deeply integrated and certified with leading datacenter technologies. We are uniquely focused on this transformation of Hadoop and doing our work completely in open source. This is all predicated on our leadership in the community, which enables not only to best support users of but also provides uniquely present customer requirements within this open, thriving community.
  4. Hortonworks approach is quite clear… we are focused on delivery of enterprise grade Hadoop as a reliable data platform that will enable your transition to a modern data architecture. To this end, we work solely within the broad open source community with a focus on innovation at the core of Apache Hadoop with YARN as a foundation and then within all the related projects that deliver on the key requirements for the enterprise such as governance, security and operation. Since our incepetion just three years ago, we have grown to more than 450 employees and have partnered closely with the leaders in the datacenter, all of whom share this vision: to enable a modern data architecture with Hadoop in order to allow their customers to address the architectural challenge that they all are facing due to exploding data volumes.
  5. Hortonworks Open platform approach enables us to partner and co-exist with other data center technologies. Our deep engineering relationship with data center leaders like Cisco makes it possible for customers to augment their data center with Hadoop technologies for their next generation modern data architecture.
  6. Hortonwork’s Hadoop platform had already been enabled deployment Hadoop in any environment from Linux to Windows , Bare metal to Cloud so that Hadoop deployment environment should be business decision rather than a technical one. In continuation of such Hadoop Everywhere vision, Hortonworks recent acquisition of SequenceIQ added a provisioning and auto-scaling toolset which makes it even more easier to deploy Hadoop in private and public Cloud to accelerate the time-to-value for Hadoop deployment.
  7. Cloudbreak is developed by SequenceIQ company from beautiful city of Budapest. Hortonworks acquired them in the month of April. Cloudbreak is open source with Apache 2.0 license and uses many other open source technologies as the build blocks including Docker. It is Hadoop cluster deployment and management tool which can deploy any app or use case specific hadoop cluster to public and private cloud environment in matter of minutes. It also provide on-going cluster infrastructure management including policy based auto-scaling of clusters to optimize infrastructure usage.
  8. Cloudbreak enables launching Hadoop cluster in 4 easy steps.
  9. Create template captures your hadoop cluster infrastructure definition – node size , network setup . Cloudbreak support heterogeneous instances for building the hadoop cluster as all service or service components are not same in terms of their resource requirement.
  10. Cloudbreak not only simplify the Hadoop cluster provisioning in Cisco Openstack Cloud but also automatically scale the Hadoop clusters based on SLA or time based policies. SLA is monitored through Hadoop service metrics captured by Ambari. This way Cloudbreak enables you to get an elastic Hadoop clusters very quickly in Cisco Openstack Cloud.
  11. Cloudbreak actively monitors Ambari metrics to assess health of every Hadoop service. It allows defining policies based on these metrics for every cluster deployed and enabled for auto-scaling. Based on these metrics and user defined policies , cloudbreak can scale clusters or services by adding nodes or allocating more yarn containers depending of the type of hadoop service.
  12. View from 10000 ft high. Only thing it will need is a Docker daemon. All cloud providers are going towards Docker including Cisco Intercloud.
  13. Quick question - How many of you have used Docker before. Docker is a container based virtualization framework. It is an open platform for developers and admins to build, ship, and run distributed applications.
  14. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. Docker is Lightweight, portable VM but without the overhead of a VM.
  15. Unlike traditional virtualization Docker is fast, lightweight and easy to use. Docker allows you to create containers holding all the dependencies for an application. Each container is kept isolated from any other, and nothing gets shared.
  16. Steps: Can span us Docker containers remotely on hosts considering: 1. Resource management - aware of the cluster resources (e.g. can schedule it with bin packing - anywhere where 1GB memory is available) or randomly 2. Constraints using labels (label one node and stsrt the container based on labels) 3. Affinity - containers can be co-scheduled (link, vollumes-from, net=container on the same host)
  17. Best of Hadoop , Docker and Openstack in a single cloud platform to our joint customers. Description Texas 3 GCP VM types GP2-2Xlarge n1-standard-8 Cores 8 8 Memory 32 GB 30 GB Volume size 2 x 400GB 2 x 400GB Volume type HDD (magnetic) generic (magnetic) Data nodes count 10 10 HDFS size 8 TB 8 TB Yarn memory 240 GB 240 GB HDP blueprint multinode-hdfs-yarn
  18. We are expanding our Cloud strategy to meet Enterprise customer demand. Look at the top first. We’ve done a great job of taking our platform for Private Cloud and provisioning Enterprise workloads. We’ve done a great job with UCS, with VBlock, with FlexPod. As a matter of fact, we are the leader in converged infrastructure today, and that market is expanding as customers look to Cisco and our Partners to deliver the Enterprise workloads and the benefits of Private Cloud. They’re also asking for Dev/Ops models. They want to create truly native applications for the Public Cloud. They want to harness the value of Hadoop and Big Data Analytics and Hana. And they want to leverage the collaborative platform present today. We are the leader in Private Cloud infrastructure. Along the left-hand side, our Partners have done some amazing things. 3 Million seats of HCS, the IaaS platforms that they’ve invested in, small, medium, large, local community-based infrastructure platforms. Some Partners have enabled the PaaS platform. Some Partners are hosting MicroSoft applications, like Dimension Data does today…globally around the world. Some Partners have managed to build a Citrix or VMware virtual desktop offer. So what Cisco Cloud Services offer is an engine to generate more services to augment capabilities we’ve invested in, and to do so in a way that only we could do together. You’ll see us leverage the extensions through innovations in the WebEx platform. You’ll see that Meraki is a very powerful model to continue to expand. You’ll hear more about the portfolio of Unified Threat Defense, and comprehensive threat defense that we think only we can bring to the cloud. You’ll see more about analytics, and the Platforms that we have in store. You’ll soon see more about Hana-as-a-Service. And all the capabilities we can bring, will be an acceleration of those offers that we can bring to you. Why not accelerate all of our capabilities together, using our capabilities in a way that no one else has. And btw, we can’t ignore the big Public Clouds. Let’s use the Intercloud FabricT manager when appropriate to just move a workload out to that Public Cloud. I don’t care if its Azure, or Amazon or Google. Only Cisco can do this through some of the innovations that we have. How are we going to do this?
  19. Cisco Intercloud Fabric: Solution Overview