SlideShare a Scribd company logo
1 of 21
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Docker based Hadoop provisioning - anywhere
April 16th, 2015
Janos Matyas
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Overview
• Introduction
• Goals and motivations
• Technology stack
• How it works
• Results/achievements/future plans
• Demo and Q&A
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Goals and motivations
• Full Hadoop stack provisioning – everywhere
• Automate and unify the process
• Zero-configuration approach
• Same process through a cluster lifecycle (Dev, QA, UAT, Prod)
• Provide tooling - UI, REST API and CLI/shell
• Secure and multi-tenant
• SLA policy based autoscaling
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Technology stack
• Docker
• Swarm
• Consul
• Apache Ambari
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Docker
• Container based virtualization
• Lightweight and portable
• Build once, run anywhere
• Ease of packaging applications
• Automated and scripted
• Isolated
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Docker – How it works
• Containers are isolated, but share OS and
bins/libraries
• No need to emulate hardware
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Swarm
• Native clustering for Docker
• Distributed container orchestration
• Same API as Docker
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Swarm – How it works
• Swarm managers/agents
• Discovery services
• Advanced scheduling
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Consul
• Service discovery/registry
• Health checking
• Key/Value store
• DNS
• Multi datacenter aware
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Consul – How it works
• Consul servers/agents
• Consistency through a quorum (RAFT)
• Scalability due to gossip based protocol (SWIM)
• Decentralized and fault tolerant
• Highly available
• Consistency over availability (CP)
• Multiple interfaces - HTTP and DNS
• Support for watches
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ambari
• Easy Hadoop cluster provisioning
• Management and monitoring
• Key feature - Blueprints
• REST API, CLI shell
• Extensible
• Stacks
• Services
• Views
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Ambari – How it works
• Ambari server/agents
• Define a blueprint (blueprint.json)
• Define a host mapping (hostmapping.json)
• Post the cluster create
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
Cloudbreak is a cloud-agnostic Hadoop as a
Service API. Abstracts the provisioning and ease
management and monitoring of on-demand
clusters.
Cloudbreak is a powerful left surf that
breaks over a coral reef, a mile off
southwest the island of Tavarua, Fiji.
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak
• Benefits
• Zero configuration
• Elastic
• Secure
• Infrastructure agnostic
• Heterogenous clusters
• Auto-scaling
• Main REST resources
• /template – specify an instance group infrastructure
• /stack – creates an infrastructure based on a template
• /blueprint – describes a Hadoop cluster
• /cluster – creates a Hadoop cluster
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak – How it works
• Start VMs - with a running Docker daemon
• Cloudbreak Bootstrap
• Start Consul Cluster
• Start Swarm Cluster (Consul for discovery)
• Start Ambari servers/agents - Swarm API
• Ambari services registered in Consul (Registrator)
• Post Blueprint
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak - Features
• Support Kerberized clusters
• Cloudbreak “recipes”
• Automate host configuration
• Pre/post Ambari lifecycle hooks
• Services reconfiguration
• Automate/execute custom actions
• Side – effects
• Ambari CLI/shell and Groovy based client
• Cloud Foundry’s UAA Dockerized
• Munchausen – bootstrap Swarm with Consul
• Dockerized full Hadoop stack (Apache Hadoop 42K+, Ambari 8K+, Spark 6K+ downloads)
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cloudbreak - Hadoop as a Service API
• Public tech preview
• Microsoft Azure
• Amazon AWS
• Google Cloud Platform
• OpenStack
• Private tech preview – R&D
• Bare metal
• Rackspace Managed Cloud
• HP Helion Public Cloud
*integration SPI is available
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Periscope
Periscope is a heuristic Hadoop scheduler
associated with a QoS profile. Built on
YARN schedulers, cloud and VM resource
management API's it allows to associate
SLA's to applications and customers.
Periscope is a powerful, fast, thick and top-
to-bottom right-hander, eastward from
Sumbawa's famous west-coast.
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Periscope
• Benefits
• Zero configuration
• Metric and time based alarms
• SLA policy based autoscaling
• Secure
• Hostgroup specific
• Main REST resources
• /clusters – specify a cluster to be monitored
• /alerts– time and metric based
• /policies – specify an SLA policy for a cluster based on an alarm
• /applications – specify an SLA policy for an application (under development)
Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Periscope – How it works
• Configures/monitors alarms in Ambari
• Setup alarm, cooldown periods
• Manages cluster sizes
• Allow to associate SLA scaling policies to alarms
• Orchestrates Cloudbreak to up/downscale the cluster
Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Demo and Q&A

More Related Content

What's hot

Designing Telco Scaled OpenStack Architectures
Designing Telco Scaled OpenStack ArchitecturesDesigning Telco Scaled OpenStack Architectures
Designing Telco Scaled OpenStack ArchitecturesSriram Subramanian
 
RedHat OpenStack Platform Overview
RedHat OpenStack Platform OverviewRedHat OpenStack Platform Overview
RedHat OpenStack Platform Overviewindevlab
 
Designing OpenStack Architectures
Designing OpenStack ArchitecturesDesigning OpenStack Architectures
Designing OpenStack ArchitecturesMirantis
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldSean Roberts
 
OpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2DOpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2DAlessandro Pilotti
 
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...Cloud Native Day Tel Aviv
 
Red Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud InfrastructureRed Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud InfrastructureAlex Baretto
 
An Intrudction to OpenStack 2017
An Intrudction to OpenStack 2017An Intrudction to OpenStack 2017
An Intrudction to OpenStack 2017Haim Ateya
 
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...Giuseppe Paterno'
 
Stratoscale Latest and Greatest
Stratoscale Latest and GreatestStratoscale Latest and Greatest
Stratoscale Latest and GreatestZach Lanksbury
 
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerFebruary 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerYahoo Developer Network
 
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemOwen O'Malley
 
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSBetter, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSJohn Burwell
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittalbuildacloud
 
Enterprise Ready OpenStack, Wiekus Beukes, Oracle
Enterprise Ready OpenStack,  Wiekus Beukes, OracleEnterprise Ready OpenStack,  Wiekus Beukes, Oracle
Enterprise Ready OpenStack, Wiekus Beukes, OracleSriram Subramanian
 
OpenStack 101 Presentation
OpenStack 101 PresentationOpenStack 101 Presentation
OpenStack 101 PresentationEVault
 
Introduction To OpenStack
Introduction To OpenStackIntroduction To OpenStack
Introduction To OpenStackHaim Ateya
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila finalWei Ting Chen
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessArun Gupta
 

What's hot (20)

Designing Telco Scaled OpenStack Architectures
Designing Telco Scaled OpenStack ArchitecturesDesigning Telco Scaled OpenStack Architectures
Designing Telco Scaled OpenStack Architectures
 
RedHat OpenStack Platform Overview
RedHat OpenStack Platform OverviewRedHat OpenStack Platform Overview
RedHat OpenStack Platform Overview
 
Designing OpenStack Architectures
Designing OpenStack ArchitecturesDesigning OpenStack Architectures
Designing OpenStack Architectures
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
AWS-compared-to-OpenStack
AWS-compared-to-OpenStackAWS-compared-to-OpenStack
AWS-compared-to-OpenStack
 
OpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2DOpenStack + Nano Server + Hyper-V + S2D
OpenStack + Nano Server + Hyper-V + S2D
 
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
Avishay Traeger & Shimshon Zimmerman, Stratoscale - Deploying OpenStack Cinde...
 
Red Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud InfrastructureRed Hat OpenStack - Open Cloud Infrastructure
Red Hat OpenStack - Open Cloud Infrastructure
 
An Intrudction to OpenStack 2017
An Intrudction to OpenStack 2017An Intrudction to OpenStack 2017
An Intrudction to OpenStack 2017
 
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
 
Stratoscale Latest and Greatest
Stratoscale Latest and GreatestStratoscale Latest and Greatest
Stratoscale Latest and Greatest
 
February 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with DockerFebruary 2016 HUG: Running Spark Clusters in Containers with Docker
February 2016 HUG: Running Spark Clusters in Containers with Docker
 
Running An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid ThemRunning An Apache Project: 10 Traps and How to Avoid Them
Running An Apache Project: 10 Traps and How to Avoid Them
 
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CSBetter, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
Better, Faster, Cheaper Infrastructure: Apache CloudStack and Riak CS
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
 
Enterprise Ready OpenStack, Wiekus Beukes, Oracle
Enterprise Ready OpenStack,  Wiekus Beukes, OracleEnterprise Ready OpenStack,  Wiekus Beukes, Oracle
Enterprise Ready OpenStack, Wiekus Beukes, Oracle
 
OpenStack 101 Presentation
OpenStack 101 PresentationOpenStack 101 Presentation
OpenStack 101 Presentation
 
Introduction To OpenStack
Introduction To OpenStackIntroduction To OpenStack
Introduction To OpenStack
 
20151027 sahara + manila final
20151027 sahara + manila final20151027 sahara + manila final
20151027 sahara + manila final
 
NoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern SuccessNoSQL - Vital Open Source Ingredient for Modern Success
NoSQL - Vital Open Source Ingredient for Modern Success
 

Similar to Docker based Hadoop provisioning - anywhere

One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
 
Oracle IaaS including OCM and Ravello
Oracle IaaS including OCM and RavelloOracle IaaS including OCM and Ravello
Oracle IaaS including OCM and RavelloAndrey Akulov
 
The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...
The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...
The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...jeckels
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureDataWorks Summit
 
Embracing SOA and the Cloud
Embracing SOA and the CloudEmbracing SOA and the Cloud
Embracing SOA and the CloudHeba Fouad
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureVinod Kumar Vavilapalli
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNHortonworks
 
Hybrid and On-premise AWS workloads using HP Helion Eucalyptus
Hybrid and On-premise AWS workloads using HP Helion EucalyptusHybrid and On-premise AWS workloads using HP Helion Eucalyptus
Hybrid and On-premise AWS workloads using HP Helion EucalyptusVedanta Barooah
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
 
D-DAY 2015 Paas ORACLE
D-DAY 2015 Paas ORACLED-DAY 2015 Paas ORACLE
D-DAY 2015 Paas ORACLEDEVOPS D-DAY
 
Cloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalCloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalAlain Delafosse
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018Oracle Developers
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Accumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache AccumuloAccumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache AccumuloAccumulo Summit
 

Similar to Docker based Hadoop provisioning - anywhere (20)

One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Oracle IaaS including OCM and Ravello
Oracle IaaS including OCM and RavelloOracle IaaS including OCM and Ravello
Oracle IaaS including OCM and Ravello
 
The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...
The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...
The Power of Java and Oracle WebLogic Server in the Public Cloud (OpenWorld, ...
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Embracing SOA and the Cloud
Embracing SOA and the CloudEmbracing SOA and the Cloud
Embracing SOA and the Cloud
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
 
Hybrid and On-premise AWS workloads using HP Helion Eucalyptus
Hybrid and On-premise AWS workloads using HP Helion EucalyptusHybrid and On-premise AWS workloads using HP Helion Eucalyptus
Hybrid and On-premise AWS workloads using HP Helion Eucalyptus
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
D-DAY 2015 Paas ORACLE
D-DAY 2015 Paas ORACLED-DAY 2015 Paas ORACLE
D-DAY 2015 Paas ORACLE
 
Apache Slider
Apache SliderApache Slider
Apache Slider
 
Cloud for agile_sw_projects-final
Cloud for agile_sw_projects-finalCloud for agile_sw_projects-final
Cloud for agile_sw_projects-final
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Accumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache AccumuloAccumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit 2014: Monitoring Apache Accumulo
 

Recently uploaded

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 

Recently uploaded (20)

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 

Docker based Hadoop provisioning - anywhere

  • 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Docker based Hadoop provisioning - anywhere April 16th, 2015 Janos Matyas
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Overview • Introduction • Goals and motivations • Technology stack • How it works • Results/achievements/future plans • Demo and Q&A
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Goals and motivations • Full Hadoop stack provisioning – everywhere • Automate and unify the process • Zero-configuration approach • Same process through a cluster lifecycle (Dev, QA, UAT, Prod) • Provide tooling - UI, REST API and CLI/shell • Secure and multi-tenant • SLA policy based autoscaling
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Technology stack • Docker • Swarm • Consul • Apache Ambari
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Docker • Container based virtualization • Lightweight and portable • Build once, run anywhere • Ease of packaging applications • Automated and scripted • Isolated
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Docker – How it works • Containers are isolated, but share OS and bins/libraries • No need to emulate hardware
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Swarm • Native clustering for Docker • Distributed container orchestration • Same API as Docker
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Swarm – How it works • Swarm managers/agents • Discovery services • Advanced scheduling
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Consul • Service discovery/registry • Health checking • Key/Value store • DNS • Multi datacenter aware
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Consul – How it works • Consul servers/agents • Consistency through a quorum (RAFT) • Scalability due to gossip based protocol (SWIM) • Decentralized and fault tolerant • Highly available • Consistency over availability (CP) • Multiple interfaces - HTTP and DNS • Support for watches
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ambari • Easy Hadoop cluster provisioning • Management and monitoring • Key feature - Blueprints • REST API, CLI shell • Extensible • Stacks • Services • Views
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ambari – How it works • Ambari server/agents • Define a blueprint (blueprint.json) • Define a host mapping (hostmapping.json) • Post the cluster create
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Cloudbreak is a cloud-agnostic Hadoop as a Service API. Abstracts the provisioning and ease management and monitoring of on-demand clusters. Cloudbreak is a powerful left surf that breaks over a coral reef, a mile off southwest the island of Tavarua, Fiji.
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak • Benefits • Zero configuration • Elastic • Secure • Infrastructure agnostic • Heterogenous clusters • Auto-scaling • Main REST resources • /template – specify an instance group infrastructure • /stack – creates an infrastructure based on a template • /blueprint – describes a Hadoop cluster • /cluster – creates a Hadoop cluster
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak – How it works • Start VMs - with a running Docker daemon • Cloudbreak Bootstrap • Start Consul Cluster • Start Swarm Cluster (Consul for discovery) • Start Ambari servers/agents - Swarm API • Ambari services registered in Consul (Registrator) • Post Blueprint
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak - Features • Support Kerberized clusters • Cloudbreak “recipes” • Automate host configuration • Pre/post Ambari lifecycle hooks • Services reconfiguration • Automate/execute custom actions • Side – effects • Ambari CLI/shell and Groovy based client • Cloud Foundry’s UAA Dockerized • Munchausen – bootstrap Swarm with Consul • Dockerized full Hadoop stack (Apache Hadoop 42K+, Ambari 8K+, Spark 6K+ downloads)
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak - Hadoop as a Service API • Public tech preview • Microsoft Azure • Amazon AWS • Google Cloud Platform • OpenStack • Private tech preview – R&D • Bare metal • Rackspace Managed Cloud • HP Helion Public Cloud *integration SPI is available
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Periscope Periscope is a heuristic Hadoop scheduler associated with a QoS profile. Built on YARN schedulers, cloud and VM resource management API's it allows to associate SLA's to applications and customers. Periscope is a powerful, fast, thick and top- to-bottom right-hander, eastward from Sumbawa's famous west-coast.
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Periscope • Benefits • Zero configuration • Metric and time based alarms • SLA policy based autoscaling • Secure • Hostgroup specific • Main REST resources • /clusters – specify a cluster to be monitored • /alerts– time and metric based • /policies – specify an SLA policy for a cluster based on an alarm • /applications – specify an SLA policy for an application (under development)
  • 20. Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Periscope – How it works • Configures/monitors alarms in Ambari • Setup alarm, cooldown periods • Manages cluster sizes • Allow to associate SLA scaling policies to alarms • Orchestrates Cloudbreak to up/downscale the cluster
  • 21. Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Demo and Q&A

Editor's Notes

  1. Two days ago I was working for SequenceIQ, as the CTO.
  2. ----- Meeting Notes (10/04/15 20:35) ----- SequenceIQ been acquired. Started February, quickly gain trackion around June.
  3. ----- Meeting Notes (10/04/15 20:38) ----- We were doing this over and over again. Scripted, Ansible, tried everything and all existing tools.
  4. ----- Meeting Notes (10/04/15 20:38) ----- Architecturally most important components
  5. ----- Meeting Notes (10/04/15 20:56) ----- Under the hood is built on: 1. cgroup and namespacing capabilities of the Linux kernel 2. Docker image specification - filesystem composed of layers, presented as one cohesive filesystem Recommended 3.8, works from 2.6.2 3. Libcontainer specification - namespacing, filesystem, resources (cgroups)
  6. ----- Meeting Notes (10/04/15 20:56) ----- Docker simplifies things - on one host. We span up containers remotely on many hosts- how? Swarm pulls together many Docker engines - presents as one virtual Docker Engine.
  7. ----- Meeting Notes (10/04/15 20:56) ----- Steps: Can span us Docker containers remotely on hosts considering: 1. Resource management - aware of the cluster resources (e.g. can schedule it with bin packing - anywhere where 1GB memory is available) or randomly 2. Constraints using labels (label one node and stsrt the container based on labels) 3. Affinity - containers can be co-scheduled (link, vollumes-from, net=container on the same host)
  8. ----- Meeting Notes (10/04/15 21:05) ----- We have a dynamic scaling cluster where nodes are coming/leaving but also failing. Register services in consul, like Ambari services Zookeeper, doozerd, etcd – same as Consul, requires a quorom, offer strong consistency, but not datacenter aware Zookeeper: no service discovery, offers primitive K/V, no DNS, does not go through DC Zookeeper provides ephemeral nodes – but stil clients need to habe keep-alive connections
  9. Agent – long running daemon, serves DNS and HTTP interface, every node Client – an agent that forwards all RPC to server. Takes part in LAN gossip Server - participates in RAFT quorum, responds to RPC, WAN gossip Datacenter – low latency, high bandwith private network Gossip – TCP and UDP UNICAST. Usually Broadcast/Multicast does not work in cloud Strong consistency: Service catalog stores all the nodes, service instances, health check data, ACLs, and Key/Value information. It is strongly consistent, and replicated using the consensus protocol. Gossip – eventual consistency, updates to catalog comes through gossip, thus state can lag behind until is reconciled.
  10. Most likely you’ve seen an Ambari session Its extensible : Stacks – set of services, multiple versions (e.g. HDP 2.1, HDP 2.2, Bigtop) Services – e.g HDFS, Kafka, Zeppelin Views – capability to add visualization, management and monitoring capabilities of a new “application”
  11. Pre-install the server and agents.
  12. Combining all these – welcome Cloudbreak. Zero configuration way to provision HDP cluters – anywhere by the push of a button, CLI or API. One consistent infrastructure agnostic API.
  13. ----- Meeting Notes (10/04/15 21:47) ----- Expand on points No configuration, need to have a running infrastructure. Any size - 200 nodes in 8 min. OAuth2, gateway (Knox will come), TLS Since YARN - Different services - different instance types: e.g. Spark - high memory, Kafka - high disk thorughput but memory as well to buffer active read/writes Scale based on load
  14. View from 10000 meter high Only thing we need is a Docker daemon. All cloud providers are going towards Docker
  15. Kerberos – we take the pain (Dockerized a Kerberos server) Recipes – built on Consul events, read results from the K/V store Anybody can push his own plugin: we use plugn – instal lyour plugin, and use it from Cloudbreak We did different projects, fixed quite a few interesting problems.
  16. Zero config, does not require pre-installation Can set alarms – based on alarms SLA policies. ----- Meeting Notes (10/04/15 22:04) ----- New features in hadoop 2.6 Our contribution, plus lots of others (move applications between queues), admission control - reserve capacity over time Most likely Vinod explained all these.
  17. Mention Baywatch ELK ElasticSearch, Logstash, Kibana – aggregate logs and metrics.
  18. Will be a Webex