SlideShare a Scribd company logo
1 of 35
Download to read offline
Anirudh Ramananathan (foxish@google.com)
Software Engineer (Kubernetes)
Timothy Chen (tim@hyperpilot.io)
Co-founder & CTO (HyperPilot)
Apache Spark on
Kubernetes
Agenda
• Kubernetes & Containers
• Motivation
• Design
• Demo
• Deep Dive
• Roadmap
What is Kubernetes?
Kubernetes
Kubernetes is an open-source system
Kubernetes
Kubernetes is an open-source system for automating
deployment, scaling, and management
Kubernetes
Kubernetes is an open-source system for automating
deployment, scaling, and management of containerized
applications.
‘Containerized’
Containers
libs
app
kernel
libs
app
libs
app
libs
app
• Repeatable Builds and Workflows
• Application Portability
• High Degree of Control over
Software
• Faster Development Cycle
• Reduced dev-ops load
• Improved Infrastructure Utilization
• Large OSS Community - 1200+
contributors and 45k+ commits
• Ecosystem and Partners - 100+
organizations involved
• One of the top 100 projects overall on
GitHub - 23k+ stars
Kubernetes
Overview
At a Glance
kubelet
UI
kubeletCLI
API
users master nodes
etcd
kubelet
scheduler
controllers
apiserver
Nodes and Pods
Pod
Volume
Containers
Pod
Containers
8080 8080 8080
Volume
Node
• A pod is a set of co-located containers
• Created by a declarative specification
supplied to the master
• Each pod has its own IP address
• Volumes can be local or
network-attached
Motivation
Why Spark on Kubernetes?
• Resource sharing between batch, serving and stateful workloads
– Streamlined developer experience
– Reduced operational costs
– Improved infrastructure utilization
• Kubernetes and the Container Ecosystem
– Lots of addon services: third-party logging, monitoring, and security tools
– For example, the Istio project, announced May 24, by IBM, Google and Lyft,
provides a service mesh for authenticating, authorizing, tracing, and timing,
and rate-limiting container-to-container communication, and more.
Design
Spark, meet Kubernetes!
Spark Core
Kubernetes Standalone YARN Mesos
GraphX SparkSQL MLlib Streaming
Spark, meet Kubernetes!
Spark Core Kubernetes Scheduler Backend
Kubernetes Clusternew executors
remove executors
configuration
• Resource Requests
• Authnz
• Communication with K8s
Kubernetes, meet Spark!
Kubernetes Cluster
File Staging Server
• Staging server: component to
stage local files
• Spark Shuffle service:
component to store shuffle data
for dynamic allocation
• ThirdParty/CustomResources:
extend Kubernetes API with
Spark Knowledge
Shuffle Service
SparkJob API
Endpoint
Kubernetes
Integration
Dependencies
Container images with dependencies
baked in
Files from GCS/S3/HDFS/HTTP
File Staging Server
Staged files and
JARs
Several ways of running Spark Jobs along with their dependencies on
Kubernetes
Administration
Namespaces
Resource
Accounting
Logging
Monitoring
Resource
Quota
Pluggable
Authorization
Admission
Control
RBAC
• Launch Spark Jobs as a particular user
into a specific namespace
• RBAC and Namespace-level resource
quotas
• Audit logging for clusters
• Several monitoring solutions to see
node, cluster and pod-level statistics
Focus Areas
Wordcloud of the command-line options we added to spark-submit
on Kubernetes
Demo
Deep Dive
Deep Dive
spark-subm
it
kubernetes cluster
apiserver
scheduler
• Spark Submit submits job to K8s
• Spark Submit submits job to K8s
• K8s schedules the driver for job
Deep Dive
kubernetes cluster
apiserver
scheduler schedule driver pod
spark driver
• Spark Submit submits job to K8s
• K8s schedules the driver for job
Deep Dive
• Spark Submit submits job to K8s
• K8s schedules the driver for job
• Driver requests executors as needed
kubernetes cluster
apiserver
scheduler
spark driver
create executor
pods
• Spark Submit submits job to K8s
• K8s schedules the driver for job
Deep Dive
• Spark Submit submits job to K8s
• K8s schedules the driver for job
• Driver requests executors as needed
• Executors scheduled and created
kubernetes cluster
apiserver
scheduler
spark driver
schedule
executorpods
executors
• Spark Submit submits job to K8s
• K8s schedules the driver for job
Deep Dive
• Spark Submit submits job to K8s
• K8s schedules the driver for job
• Driver requests executors as needed
• Executors scheduled and created
• Executors run tasks
kubernetes cluster
apiserver
scheduler
spark driver
executors
• Spark Submit submits job to K8s
• K8s schedules the driver for job
Deep Dive
• Spark Submit submits job to K8s
• K8s schedules the driver for job
• Driver requests executors as needed
• Executors scheduled and created
• Executors run tasks
• Driver “completes” job and persists
logs
kubernetes cluster
apiserver
scheduler
spark driver
Roadmap
Spark Streaming
Spark Roadmap
Spark Shell
Client Mode
Python/R support
Cluster Mode
Java/Scala Support
Dynamic Allocation Local File Staging High Availability
Spark SQL
GraphX MLlib
Dec 2016
Development
Began
Mar 2017
Alpha
Release
June 2017
Beta
Release
Nov 2016
Design
= supported but untested = not yet supported
We’re just getting started...
• Kubernetes CustomResources
• Priorities and Preemption for Pods
• Batch Scheduling and Resource Sharing
• Cluster Federation and Multi-cloud deployments
• Ecosystem: Kafka, Cassandra, HDFS, etc
Contributors
Organizations Alphabetically:
• Google
• Haiwen
• Hyperpilot
• Intel
• Palantir
• Pepperdata
• Red Hat
Links:
• Spark 2.2.0 Documentation
• https://issues.apache.org/jira/browse
/SPARK-18278
• https://github.com/kubernetes/kubern
etes/issues/34377
Try it out today!
https://github.com/apache-spark-on-k8s/spark
Thank You.
HDFS on Kubernetes - Lessons Learned
June 7 at 11:00 AM in Room 2003
Join us Wednesdays at 10am PT at the SIG BigData meeting
https://github.com/kubernetes/community/

More Related Content

What's hot

CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriDoiT International
 
Setup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes FederationSetup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes Federationinwin stack
 
What's new in Kubernetes
What's new in KubernetesWhat's new in Kubernetes
What's new in KubernetesDaniel Smith
 
Webcast - Making kubernetes production ready
Webcast - Making kubernetes production readyWebcast - Making kubernetes production ready
Webcast - Making kubernetes production readyApplatix
 
Running and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackRunning and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackVictor Palma
 
Spinnaker on Kubernetes
Spinnaker on KubernetesSpinnaker on Kubernetes
Spinnaker on KubernetesJinwoong Kim
 
Implement Advanced Scheduling Techniques in Kubernetes
Implement Advanced Scheduling Techniques in Kubernetes Implement Advanced Scheduling Techniques in Kubernetes
Implement Advanced Scheduling Techniques in Kubernetes Kublr
 
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless  - Serverless Summit 2017 - Krishna KumarKubernetes for Serverless  - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless - Serverless Summit 2017 - Krishna KumarCodeOps Technologies LLP
 
DevOps with Azure, Kubernetes, and Helm Webinar
DevOps with Azure, Kubernetes, and Helm WebinarDevOps with Azure, Kubernetes, and Helm Webinar
DevOps with Azure, Kubernetes, and Helm WebinarCodefresh
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetescraigbox
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsKublr
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019confluent
 
The Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes ClusterThe Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes ClusterKublr
 
Cloud spanner architecture and use cases
Cloud spanner architecture and use casesCloud spanner architecture and use cases
Cloud spanner architecture and use casesGDG Cloud Bengaluru
 
Container Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and KubernetesContainer Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and KubernetesWill Hall
 

What's hot (20)

CI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar DemriCI Implementation with Kubernetes at LivePerson by Saar Demri
CI Implementation with Kubernetes at LivePerson by Saar Demri
 
Setup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes FederationSetup Hybrid Clusters Using Kubernetes Federation
Setup Hybrid Clusters Using Kubernetes Federation
 
What's new in Kubernetes
What's new in KubernetesWhat's new in Kubernetes
What's new in Kubernetes
 
Webcast - Making kubernetes production ready
Webcast - Making kubernetes production readyWebcast - Making kubernetes production ready
Webcast - Making kubernetes production ready
 
Running and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStackRunning and Managing Kubernetes on OpenStack
Running and Managing Kubernetes on OpenStack
 
Spinnaker on Kubernetes
Spinnaker on KubernetesSpinnaker on Kubernetes
Spinnaker on Kubernetes
 
Implement Advanced Scheduling Techniques in Kubernetes
Implement Advanced Scheduling Techniques in Kubernetes Implement Advanced Scheduling Techniques in Kubernetes
Implement Advanced Scheduling Techniques in Kubernetes
 
AKS
AKSAKS
AKS
 
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless  - Serverless Summit 2017 - Krishna KumarKubernetes for Serverless  - Serverless Summit 2017 - Krishna Kumar
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
 
DevOps with Azure, Kubernetes, and Helm Webinar
DevOps with Azure, Kubernetes, and Helm WebinarDevOps with Azure, Kubernetes, and Helm Webinar
DevOps with Azure, Kubernetes, and Helm Webinar
 
Autoscaling Kubernetes
Autoscaling KubernetesAutoscaling Kubernetes
Autoscaling Kubernetes
 
DevOps with Kubernetes
DevOps with KubernetesDevOps with Kubernetes
DevOps with Kubernetes
 
Centralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container OperationsCentralizing Kubernetes and Container Operations
Centralizing Kubernetes and Container Operations
 
K8S in prod
K8S in prodK8S in prod
K8S in prod
 
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
 
The Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes ClusterThe Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes Cluster
 
From Code to Kubernetes
From Code to KubernetesFrom Code to Kubernetes
From Code to Kubernetes
 
Cloud spanner architecture and use cases
Cloud spanner architecture and use casesCloud spanner architecture and use cases
Cloud spanner architecture and use cases
 
Crafting Kubernetes Operators
Crafting Kubernetes OperatorsCrafting Kubernetes Operators
Crafting Kubernetes Operators
 
Container Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and KubernetesContainer Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and Kubernetes
 

Similar to [Spark Summit 2017 NA] Apache Spark on Kubernetes

Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenDatabricks
 
Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Hao H. Zhang
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMwareVMUG IT
 
Serverless spark
Serverless sparkServerless spark
Serverless sparkMamathaBusi
 
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...DevDay.org
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...DataWorks Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesDataWorks Summit
 
Spark volume requirements 2018
Spark volume requirements 2018Spark volume requirements 2018
Spark volume requirements 2018Rachit Arora
 
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd についてKubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd についてLINE Corporation
 
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps EnterpriseSpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps EnterpriseVMware Tanzu
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learningAntje Barth
 
Azure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえりAzure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえりToru Makabe
 
Kubernetes overview 101
Kubernetes overview 101Kubernetes overview 101
Kubernetes overview 101Boskey Savla
 
Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307Inhye Park
 
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...Vietnam Open Infrastructure User Group
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on KubernetesAthens Big Data
 

Similar to [Spark Summit 2017 NA] Apache Spark on Kubernetes (20)

Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Big data and Kubernetes
Big data and KubernetesBig data and Kubernetes
Big data and Kubernetes
 
Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1
 
Webinar kubernetes and-spark
Webinar  kubernetes and-sparkWebinar  kubernetes and-spark
Webinar kubernetes and-spark
 
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
 
Serverless spark
Serverless sparkServerless spark
Serverless spark
 
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
[DevDay 2017] OpenShift Enterprise - Speaker: Linh Do - DevOps Engineer at Ax...
 
Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...Why Kubernetes as a container orchestrator is a right choice for running spar...
Why Kubernetes as a container orchestrator is a right choice for running spar...
 
CNCF Projects Overview
CNCF Projects OverviewCNCF Projects Overview
CNCF Projects Overview
 
Storage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on KubernetesStorage Requirements and Options for Running Spark on Kubernetes
Storage Requirements and Options for Running Spark on Kubernetes
 
Spark volume requirements 2018
Spark volume requirements 2018Spark volume requirements 2018
Spark volume requirements 2018
 
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd についてKubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
 
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps EnterpriseSpringOne Tour: An Introduction to Azure Spring Apps Enterprise
SpringOne Tour: An Introduction to Azure Spring Apps Enterprise
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Azure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえりAzure Kubernetes Service 2019 ふりかえり
Azure Kubernetes Service 2019 ふりかえり
 
Kubernetes overview 101
Kubernetes overview 101Kubernetes overview 101
Kubernetes overview 101
 
Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307Docker kubernetes fundamental(pod_service)_190307
Docker kubernetes fundamental(pod_service)_190307
 
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
 
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
Tokyo Azure Meetup #7 - Introduction to Serverless Architectures with Azure F...
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
 

Recently uploaded

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 

Recently uploaded (20)

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 

[Spark Summit 2017 NA] Apache Spark on Kubernetes

  • 1. Anirudh Ramananathan (foxish@google.com) Software Engineer (Kubernetes) Timothy Chen (tim@hyperpilot.io) Co-founder & CTO (HyperPilot) Apache Spark on Kubernetes
  • 2. Agenda • Kubernetes & Containers • Motivation • Design • Demo • Deep Dive • Roadmap
  • 4. Kubernetes Kubernetes is an open-source system
  • 5. Kubernetes Kubernetes is an open-source system for automating deployment, scaling, and management
  • 6. Kubernetes Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.
  • 8. Containers libs app kernel libs app libs app libs app • Repeatable Builds and Workflows • Application Portability • High Degree of Control over Software • Faster Development Cycle • Reduced dev-ops load • Improved Infrastructure Utilization
  • 9. • Large OSS Community - 1200+ contributors and 45k+ commits • Ecosystem and Partners - 100+ organizations involved • One of the top 100 projects overall on GitHub - 23k+ stars Kubernetes
  • 11. At a Glance kubelet UI kubeletCLI API users master nodes etcd kubelet scheduler controllers apiserver
  • 12. Nodes and Pods Pod Volume Containers Pod Containers 8080 8080 8080 Volume Node • A pod is a set of co-located containers • Created by a declarative specification supplied to the master • Each pod has its own IP address • Volumes can be local or network-attached
  • 14. Why Spark on Kubernetes? • Resource sharing between batch, serving and stateful workloads – Streamlined developer experience – Reduced operational costs – Improved infrastructure utilization • Kubernetes and the Container Ecosystem – Lots of addon services: third-party logging, monitoring, and security tools – For example, the Istio project, announced May 24, by IBM, Google and Lyft, provides a service mesh for authenticating, authorizing, tracing, and timing, and rate-limiting container-to-container communication, and more.
  • 16. Spark, meet Kubernetes! Spark Core Kubernetes Standalone YARN Mesos GraphX SparkSQL MLlib Streaming
  • 17. Spark, meet Kubernetes! Spark Core Kubernetes Scheduler Backend Kubernetes Clusternew executors remove executors configuration • Resource Requests • Authnz • Communication with K8s
  • 18. Kubernetes, meet Spark! Kubernetes Cluster File Staging Server • Staging server: component to stage local files • Spark Shuffle service: component to store shuffle data for dynamic allocation • ThirdParty/CustomResources: extend Kubernetes API with Spark Knowledge Shuffle Service SparkJob API Endpoint
  • 19. Kubernetes Integration Dependencies Container images with dependencies baked in Files from GCS/S3/HDFS/HTTP File Staging Server Staged files and JARs Several ways of running Spark Jobs along with their dependencies on Kubernetes
  • 20. Administration Namespaces Resource Accounting Logging Monitoring Resource Quota Pluggable Authorization Admission Control RBAC • Launch Spark Jobs as a particular user into a specific namespace • RBAC and Namespace-level resource quotas • Audit logging for clusters • Several monitoring solutions to see node, cluster and pod-level statistics
  • 21. Focus Areas Wordcloud of the command-line options we added to spark-submit on Kubernetes
  • 22. Demo
  • 25. • Spark Submit submits job to K8s • K8s schedules the driver for job Deep Dive kubernetes cluster apiserver scheduler schedule driver pod spark driver
  • 26. • Spark Submit submits job to K8s • K8s schedules the driver for job Deep Dive • Spark Submit submits job to K8s • K8s schedules the driver for job • Driver requests executors as needed kubernetes cluster apiserver scheduler spark driver create executor pods
  • 27. • Spark Submit submits job to K8s • K8s schedules the driver for job Deep Dive • Spark Submit submits job to K8s • K8s schedules the driver for job • Driver requests executors as needed • Executors scheduled and created kubernetes cluster apiserver scheduler spark driver schedule executorpods executors
  • 28. • Spark Submit submits job to K8s • K8s schedules the driver for job Deep Dive • Spark Submit submits job to K8s • K8s schedules the driver for job • Driver requests executors as needed • Executors scheduled and created • Executors run tasks kubernetes cluster apiserver scheduler spark driver executors
  • 29. • Spark Submit submits job to K8s • K8s schedules the driver for job Deep Dive • Spark Submit submits job to K8s • K8s schedules the driver for job • Driver requests executors as needed • Executors scheduled and created • Executors run tasks • Driver “completes” job and persists logs kubernetes cluster apiserver scheduler spark driver
  • 31. Spark Streaming Spark Roadmap Spark Shell Client Mode Python/R support Cluster Mode Java/Scala Support Dynamic Allocation Local File Staging High Availability Spark SQL GraphX MLlib Dec 2016 Development Began Mar 2017 Alpha Release June 2017 Beta Release Nov 2016 Design = supported but untested = not yet supported
  • 32. We’re just getting started... • Kubernetes CustomResources • Priorities and Preemption for Pods • Batch Scheduling and Resource Sharing • Cluster Federation and Multi-cloud deployments • Ecosystem: Kafka, Cassandra, HDFS, etc
  • 33. Contributors Organizations Alphabetically: • Google • Haiwen • Hyperpilot • Intel • Palantir • Pepperdata • Red Hat Links: • Spark 2.2.0 Documentation • https://issues.apache.org/jira/browse /SPARK-18278 • https://github.com/kubernetes/kubern etes/issues/34377
  • 34. Try it out today! https://github.com/apache-spark-on-k8s/spark
  • 35. Thank You. HDFS on Kubernetes - Lessons Learned June 7 at 11:00 AM in Room 2003 Join us Wednesdays at 10am PT at the SIG BigData meeting https://github.com/kubernetes/community/