SlideShare a Scribd company logo
1 of 23
Cloudbreak – Technical Deep Dive
Janos Matyas & Krisztian Horvath
Hortonworks
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Presenters
Krisztian Horvath
Senior Member of technical staff, Cloudbreak
Former Co-Founder at SequenceIQ
Janos Matyas
Senior Director of Engineering, Cloudbreak
Former Co-Founder and CTO for SequenceIQ
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Goals and Motivations
Technology Stack + Deep Dive
Lessons Learned + Best Practices
Demo + Q & A
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goals and Motivations – What We Wanted to Do…
 Declarative/full Hadoop stack provisioning in all major cloud providers
 Automate and unify the process
 Zero-configuration approach
 Same process through a cluster lifecycle (Dev, QA, UAT, Prod)
 Provide tooling - UI, REST API and CLI/shell
 Secure and multi-tenant
 SLA policy based autoscaling
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Goals and Motivations – What We Wanted to Do…
 All cloud providers are fundamentally different…
 Compute, network, security, performance
 We want to share what we found, and how we made it work!
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Goals and Motivations
Technology Stack + Deep Dive
Lessons Learned + Best Practices
Demo + Q & A
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Technology Stack
 Apache Ambari
 Cloud provider API
 Salt
 Docker
 Packer
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Dive - Overview
 Cloudbreak Deployer (CBD)
– Tool to deploy the Cloudbreak application
– Microservice architecture (using Docker)
– DevOps friendly
 Cloudbreak Application
– Extensible, available through UI, CLI, REST API
– SLA auto-scaling policy management
 Cluster deployed with Cloudbreak
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Dive – Cloudbreak Deployer
 Installation
– Single binary, written in Go
– Requires Docker 1.9.1+
– DIY installation on any RHEL / CentOS / Oracle Linux 7 (64-bit) distro
– Use one of the pre-built cloud images (AWS, Azure, GCP, OpenStack)
 Operations
– Easy upgrades/downgrades, automatic schema migration
 Cloud provider support
– AWS – generates IAM roles
– Azure – ARM and DASH config
 Utilities
– Cloudbreak shell support - interactive, remote, automated execution, OAuth2 token generation
– Local development environment setup
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Dive – Cloudbreak Application
 Installation
– Done with Cloudbreak Deployer (CBD)
 Operations
– Consistent feature set through UI, CLI and secure REST API
– Multi-tenant, ACL setup, usage reports
– Custom stack repositories, failure actions
– Event history, cluster management
– SLA based auto-scaling policy configs, enforcement
 Cloud provider support
– Agnostic API
– AWS, Azure, GCP, OpenStack, Mesos
– SPI interface – bring your own provider, stack under Cloudbreak management
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep Dive – Cluster deployed with Cloudbreak
 Installation
– Managed by Cloudbreak using cloud provider API
– Default (optimized) configs – specific to cloud provider
 Operations
– Default, custom configs for stacks, services, network, storage, security
– Declarative Hadoop cluster
– Custom instance types (heterogeneous clusters)
– Different storage types
– Configurable network
– Security (access, Kerberos, SSSD, FreeIPA)
 Utilities
– Ambari Views
– Metadata/shared clusters support
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Goals and Motivations
Technology Stack + Deep Dive
Lessons Learned + Best Practices
Demo + Q & A
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned
 Not all cloud providers are the same
– Difference in performance, storage and functionality
 (Capacity) planning
– Based on workload type (batch / interactive and ad-hoc / long running)
– Use heterogeneous clusters
– Trial and error – mistakes are cheap, iterate until you find your best fit
– Leverage the cloud - scale your cluster on demand
 Number one consideration – storage
– Multiple choices (ephemeral, block storage and BLOB store)
– Bring compute to storage – might not work (everywhere) – in cloud everything is as a service
– Independently scale storage from compute, partition your data
 Security
– Consider using strict security rules (private subnets, access, etc) and use edge nodes
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - AWS
 Compute
– Find your instance types for the workload, use heterogeneous clusters
– Different instance types for transient (e.g. C4, M4) and long running (e.g. H2, D2) clusters
– Dedicated instances (to avoid noise, regulations e.g. HIPPA)
 Storage
– Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations)
– Note that S3 gives you only eventual consistency
– Different driver implementation: S3n (native, jets3t based), S3a (successor of n) , S3 (block based)
 Network
– Use enhanced networking (Amazon Linux by default, RHEL based – apply patch)
– Placement groups
– Not all instance types can use the 10Gbit network (e.g. use 8x)
 Security
– Use instance roles to access S3, deploy in a private subnet/VPC
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - AWS
* D28xlarge used as instance type
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - AWS
* D28xlarge used as instance type
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - Azure
 Compute
– Find your instance types for the workload, use heterogeneous clusters
– Different instance types for transient (e.g. A and D family) and long running (e.g. Dv2) clusters
– Use ARM instead of old API
 Storage
– Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations)
– Storage account scaling limitations
– Use WASB or WASB with DASH (default with Cloudbreak)
– Azure Data Lake Store – soon
– Ephemeral disk is faster than root disk – does not survive auto-updates
 Network
– No PTR record/reverse lookup support
 Security
– Integrate/sync with your corporate AD
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - Azure
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - GCP
 Compute
– Find your instance types for the workload, use heterogeneous clusters
– No template based provisioning
 Storage
– Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations)
– Use Google Cloud Storage Connector
 Network
– Network isolation/DNS problem
 Security
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - OpenStack
 Compute
– Find your instance types for the workload, use heterogeneous clusters
– Use Heat templates instead of API calls (we support both)
 Storage
– Currently we support only Cinder volumes
– Swift and Ceph is planned
– Data locality through Cloudbreak – let us know your topology or rack/hypervisor mapping
 Network
– Configure DNS properly
– Use multiple network (Neutron) nodes in case of a large cluster
 Security
– Use Keystone 3 (support for OAuth, Federation, introduction of groups/domains)
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons Learned - Mesos
 In Tech Preview
– come and talk to us after the talk
– Or @Hortonworks boot
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
Goals and Motivations
Technology Stack + Deep Dive
Lessons Learned + Best Practices
Demo + Q & A
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

More Related Content

What's hot

Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryDataWorks Summit/Hadoop Summit
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Cloudera, Inc.
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopDataWorks Summit/Hadoop Summit
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Cécile Poyet
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Cedric CARBONE
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetDataWorks Summit
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataDataWorks Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationmattlieber
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingDataWorks Summit
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Cloudera, Inc.
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 

What's hot (20)

Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
 
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to HadoopSuccesses, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
Successes, Challenges, and Pitfalls Migrating a SAAS business to Hadoop
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
ebay
ebayebay
ebay
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
 
Format Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and ParquetFormat Wars: from VHS and Beta to Avro and Parquet
Format Wars: from VHS and Beta to Avro and Parquet
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Enabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government dataEnabling Modern Application Architecture using Data.gov open government data
Enabling Modern Application Architecture using Data.gov open government data
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 

Viewers also liked

Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Janos Matyas
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Janos Matyas
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSYifeng Jiang
 
Apache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basicsApache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basicsGladson Manuel
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCPaco Nathan
 
DockerCon SF 2015: Scaling New Services
DockerCon SF 2015: Scaling New ServicesDockerCon SF 2015: Scaling New Services
DockerCon SF 2015: Scaling New ServicesDocker, Inc.
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAmazon Web Services
 
Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attentionHiba Hamdan
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosDataWorks Summit
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streamsunivalence
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNDataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere Docker based Hadoop provisioning - anywhere
Docker based Hadoop provisioning - anywhere
 
Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?Ingest and Stream Processing - What will you choose?
Ingest and Stream Processing - What will you choose?
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
On Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and AmbariOn Demand HDP Clusters using Cloudbreak and Ambari
On Demand HDP Clusters using Cloudbreak and Ambari
 
Introduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWSIntroduction to Hortonworks Data Cloud for AWS
Introduction to Hortonworks Data Cloud for AWS
 
Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Apache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basicsApache Mesos: a simple explanation of basics
Apache Mesos: a simple explanation of basics
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
 
DockerCon SF 2015: Scaling New Services
DockerCon SF 2015: Scaling New ServicesDockerCon SF 2015: Scaling New Services
DockerCon SF 2015: Scaling New Services
 
7+1 myths of the new os
7+1 myths of the new os7+1 myths of the new os
7+1 myths of the new os
 
AWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the CloudAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the Cloud
 
Are you paying attention
Are you paying attentionAre you paying attention
Are you paying attention
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Big Data at your Desk with KNIME
Big Data at your Desk with KNIMEBig Data at your Desk with KNIME
Big Data at your Desk with KNIME
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Data encoding and Metadata for Streams
Data encoding and Metadata for StreamsData encoding and Metadata for Streams
Data encoding and Metadata for Streams
 
Sql Stream Intro
Sql Stream IntroSql Stream Intro
Sql Stream Intro
 
Scale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARNScale-Out Resource Management at Microsoft using Apache YARN
Scale-Out Resource Management at Microsoft using Apache YARN
 

Similar to Cloudbreak - Technical Deep Dive

Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoopGergely Devenyi
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseMingliang Liu
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesKrisztián Horváth
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureDataWorks Summit
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS PresentationJan Repnak
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
An overview of OpenStack for the VMware community
An overview of OpenStack for the VMware communityAn overview of OpenStack for the VMware community
An overview of OpenStack for the VMware communityAnthony Chow
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 

Similar to Cloudbreak - Technical Deep Dive (20)

Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on Kubernetes
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on Kubernetes
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS Hortonworks Data Cloud for AWS
Hortonworks Data Cloud for AWS
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
DCOS Presentation
DCOS PresentationDCOS Presentation
DCOS Presentation
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
An overview of OpenStack for the VMware community
An overview of OpenStack for the VMware communityAn overview of OpenStack for the VMware community
An overview of OpenStack for the VMware community
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Cloudbreak - Technical Deep Dive

  • 1. Cloudbreak – Technical Deep Dive Janos Matyas & Krisztian Horvath Hortonworks
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Presenters Krisztian Horvath Senior Member of technical staff, Cloudbreak Former Co-Founder at SequenceIQ Janos Matyas Senior Director of Engineering, Cloudbreak Former Co-Founder and CTO for SequenceIQ
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Goals and Motivations Technology Stack + Deep Dive Lessons Learned + Best Practices Demo + Q & A
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goals and Motivations – What We Wanted to Do…  Declarative/full Hadoop stack provisioning in all major cloud providers  Automate and unify the process  Zero-configuration approach  Same process through a cluster lifecycle (Dev, QA, UAT, Prod)  Provide tooling - UI, REST API and CLI/shell  Secure and multi-tenant  SLA policy based autoscaling
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Goals and Motivations – What We Wanted to Do…  All cloud providers are fundamentally different…  Compute, network, security, performance  We want to share what we found, and how we made it work!
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Goals and Motivations Technology Stack + Deep Dive Lessons Learned + Best Practices Demo + Q & A
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Technology Stack  Apache Ambari  Cloud provider API  Salt  Docker  Packer
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep Dive - Overview  Cloudbreak Deployer (CBD) – Tool to deploy the Cloudbreak application – Microservice architecture (using Docker) – DevOps friendly  Cloudbreak Application – Extensible, available through UI, CLI, REST API – SLA auto-scaling policy management  Cluster deployed with Cloudbreak
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep Dive – Cloudbreak Deployer  Installation – Single binary, written in Go – Requires Docker 1.9.1+ – DIY installation on any RHEL / CentOS / Oracle Linux 7 (64-bit) distro – Use one of the pre-built cloud images (AWS, Azure, GCP, OpenStack)  Operations – Easy upgrades/downgrades, automatic schema migration  Cloud provider support – AWS – generates IAM roles – Azure – ARM and DASH config  Utilities – Cloudbreak shell support - interactive, remote, automated execution, OAuth2 token generation – Local development environment setup
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep Dive – Cloudbreak Application  Installation – Done with Cloudbreak Deployer (CBD)  Operations – Consistent feature set through UI, CLI and secure REST API – Multi-tenant, ACL setup, usage reports – Custom stack repositories, failure actions – Event history, cluster management – SLA based auto-scaling policy configs, enforcement  Cloud provider support – Agnostic API – AWS, Azure, GCP, OpenStack, Mesos – SPI interface – bring your own provider, stack under Cloudbreak management
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep Dive – Cluster deployed with Cloudbreak  Installation – Managed by Cloudbreak using cloud provider API – Default (optimized) configs – specific to cloud provider  Operations – Default, custom configs for stacks, services, network, storage, security – Declarative Hadoop cluster – Custom instance types (heterogeneous clusters) – Different storage types – Configurable network – Security (access, Kerberos, SSSD, FreeIPA)  Utilities – Ambari Views – Metadata/shared clusters support
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Goals and Motivations Technology Stack + Deep Dive Lessons Learned + Best Practices Demo + Q & A
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned  Not all cloud providers are the same – Difference in performance, storage and functionality  (Capacity) planning – Based on workload type (batch / interactive and ad-hoc / long running) – Use heterogeneous clusters – Trial and error – mistakes are cheap, iterate until you find your best fit – Leverage the cloud - scale your cluster on demand  Number one consideration – storage – Multiple choices (ephemeral, block storage and BLOB store) – Bring compute to storage – might not work (everywhere) – in cloud everything is as a service – Independently scale storage from compute, partition your data  Security – Consider using strict security rules (private subnets, access, etc) and use edge nodes
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - AWS  Compute – Find your instance types for the workload, use heterogeneous clusters – Different instance types for transient (e.g. C4, M4) and long running (e.g. H2, D2) clusters – Dedicated instances (to avoid noise, regulations e.g. HIPPA)  Storage – Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations) – Note that S3 gives you only eventual consistency – Different driver implementation: S3n (native, jets3t based), S3a (successor of n) , S3 (block based)  Network – Use enhanced networking (Amazon Linux by default, RHEL based – apply patch) – Placement groups – Not all instance types can use the 10Gbit network (e.g. use 8x)  Security – Use instance roles to access S3, deploy in a private subnet/VPC
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - AWS * D28xlarge used as instance type
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - AWS * D28xlarge used as instance type
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - Azure  Compute – Find your instance types for the workload, use heterogeneous clusters – Different instance types for transient (e.g. A and D family) and long running (e.g. Dv2) clusters – Use ARM instead of old API  Storage – Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations) – Storage account scaling limitations – Use WASB or WASB with DASH (default with Cloudbreak) – Azure Data Lake Store – soon – Ephemeral disk is faster than root disk – does not survive auto-updates  Network – No PTR record/reverse lookup support  Security – Integrate/sync with your corporate AD
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - Azure
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - GCP  Compute – Find your instance types for the workload, use heterogeneous clusters – No template based provisioning  Storage – Use latest version of Hadoop (Hortonworks contributed cloud specific optimizations) – Use Google Cloud Storage Connector  Network – Network isolation/DNS problem  Security
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - OpenStack  Compute – Find your instance types for the workload, use heterogeneous clusters – Use Heat templates instead of API calls (we support both)  Storage – Currently we support only Cinder volumes – Swift and Ceph is planned – Data locality through Cloudbreak – let us know your topology or rack/hypervisor mapping  Network – Configure DNS properly – Use multiple network (Neutron) nodes in case of a large cluster  Security – Use Keystone 3 (support for OAuth, Federation, introduction of groups/domains)
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons Learned - Mesos  In Tech Preview – come and talk to us after the talk – Or @Hortonworks boot
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda Goals and Motivations Technology Stack + Deep Dive Lessons Learned + Best Practices Demo + Q & A
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You