SlideShare a Scribd company logo
1 of 66
Download to read offline
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
$30,000/month Savings Open your eyes Robust
@ItaiYaffe, @RTeveth
●
●
●
●
●
●
@ItaiYaffe, @RTeveth
ETLs with
Airflow &
01. = 02.
03. -on-
Airflow Integration
& Contribution04.
@ItaiYaffe, @RTeveth
ETLs with
Airflow &
01. = 02.
03. -on-
Airflow Integration
& Contribution04.
@ItaiYaffe, @RTeveth
●
●
●
●
●
@ItaiYaffe, @RTeveth
~30TB ingested/day
S3
~3000 nodes/day
10’s of TB
ingested/day
druid$100K’s/month
@ItaiYaffe, @RTeveth
Scalability
Cost Efficiency
Fault-tolerance
@ItaiYaffe, @RTeveth
●
●
●
○
○
○
@ItaiYaffe, @RTeveth
~20 automatic DAG
deployments/day
~1000 DAG
Runs/day
~2 years in
production
Met all
requirements &
more
~40 users across 4
groups
6 contributions
to open-source
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
●
●
●
@ItaiYaffe, @RTeveth
●
●
●
○
●
○
@ItaiYaffe, @RTeveth
EMR is an AWS
managed service to
run Hadoop & Spark
clusters
Allows you to reduce
costs by using Spot
instances
Charges management
cost for each
instance in a cluster
@ItaiYaffe, @RTeveth
●
●
○
■
○ …
■
■
emr_create_job_flow_operator emr_add_steps_operator emr_step_sensor
Creates new emr cluster Adds Spark step to the cluster Checks if the step succeeded
@ItaiYaffe, @RTeveth
●
●
○
■
○ …
■
■
emr_create_job_flow_operator emr_add_steps_operator emr_step_sensor
Creates new emr cluster Adds Spark step to the cluster Checks if the step succeeded
@ItaiYaffe, @RTeveth
$$$ Visibility Robustness
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
●
●
○
○
●
@ItaiYaffe, @RTeveth
ClusterControl plane Worker nodes
(EC2 in our case)
Pods
group of one or more
containers (such as
Docker containers)
@ItaiYaffe, @RTeveth
●
@ItaiYaffe, @RTeveth
●
@ItaiYaffe, @RTeveth
●
●
○
○
○
■
@ItaiYaffe, @RTeveth
●
●
○
○
○
■
@ItaiYaffe, @RTeveth
●
●
○
○
○
■
@ItaiYaffe, @RTeveth
ClusterControl plane
@ItaiYaffe, @RTeveth
ClusterControl plane
@ItaiYaffe, @RTeveth
ClusterControl plane
@ItaiYaffe, @RTeveth
ClusterControl plane
@ItaiYaffe, @RTeveth
ClusterControl plane
@ItaiYaffe, @RTeveth
ClusterControl plane
@ItaiYaffe, @RTeveth
●
●
○
○
○
○
●
●
@ItaiYaffe, @RTeveth
…
@ItaiYaffe, @RTeveth
●
●
○
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
./bin/spark-submit 
--master
k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> 
--deploy-mode cluster 
--name spark-pi 
--class org.apache.spark.examples.SparkPi 
--conf spark.executor.instances =3 
--conf spark.kubernetes.container.image =<spark-image>

local:///path/to/examples.jar
Kubernetes control plane
Kubernetes cluster
SparkPi
driver
Executor
1
Executor
2
Executor
3
@ItaiYaffe, @RTeveth
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
●
●
●
@ItaiYaffe, @RTeveth
apiVersion:
"sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: "spark-pi”
namespace: default
spec:
...
driver:
...
executor:
...
Spark-pi.yaml
Kubernetes control plane
Kubernetes cluster
SparkPi
driver
Executor
1
Executor
2
Executor
3
kubectl
Spark-
on-K8s
operator
@ItaiYaffe, @RTeveth
VS
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
…
@ItaiYaffe, @RTeveth
…
https://github.com/apache/airflow/pull/7163
@ItaiYaffe, @RTeveth
ł
@ItaiYaffe, @RTeveth
KubernetesHook
SparkKubernetes
operator
SparkKubernetes
sensor
@ItaiYaffe, @RTeveth
●
○
●
○
@ItaiYaffe, @RTeveth
●
●
●
apiVersion:
"sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: "spark-pi-{{ ds }}-{{
task_instance.try_number }}"
namespace: default
spec:
...
driver:
...
executor:
...
SparkKubernetes
operator
@ItaiYaffe, @RTeveth
●
●
@ItaiYaffe, @RTeveth
●
●
○
●
○
●
○
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
●
○
●
○
●
○
@ItaiYaffe, @RTeveth
●
○
○
●
●
@ItaiYaffe, @RTeveth
●
○
●
@ItaiYaffe, @RTeveth
●
○ on_success_callback
on_failure_callback default_args
●
○ 😉
@ItaiYaffe, @RTeveth
@ItaiYaffe, @RTeveth
Reduce costs Open your eyes Robust
●
●
@ItaiYaffe, @RTeveth
●
○
■
○
○
●
○
○
Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the native way
Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the native way

More Related Content

More from Itai Yaffe

A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 

More from Itai Yaffe (20)

Mastering Partitioning for High-Volume Data Processing
Mastering Partitioning for High-Volume Data ProcessingMastering Partitioning for High-Volume Data Processing
Mastering Partitioning for High-Volume Data Processing
 
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse AutomationSolving Data Engineers Velocity - Wix's Data Warehouse Automation
Solving Data Engineers Velocity - Wix's Data Warehouse Automation
 
Lessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark ApplicationsLessons Learnt from Running Thousands of On-demand Spark Applications
Lessons Learnt from Running Thousands of On-demand Spark Applications
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"Planning a data solution - "By Failing to prepare, you are preparing to fail"
Planning a data solution - "By Failing to prepare, you are preparing to fail"
 
Evaluating Big Data & ML Solutions - Opening Notes
Evaluating Big Data & ML Solutions - Opening NotesEvaluating Big Data & ML Solutions - Opening Notes
Evaluating Big Data & ML Solutions - Opening Notes
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Data Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management MonolithsData Lakes on Public Cloud: Breaking Data Management Monoliths
Data Lakes on Public Cloud: Breaking Data Management Monoliths
 
Unleashing the Power of your Data
Unleashing the Power of your DataUnleashing the Power of your Data
Unleashing the Power of your Data
 
Data Lake on Public Cloud - Opening Notes
Data Lake on Public Cloud - Opening NotesData Lake on Public Cloud - Opening Notes
Data Lake on Public Cloud - Opening Notes
 
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
DevTalks Reimagined 2020 - Funnel Analysis with Spark and DruidDevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
 
Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)
Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)
Virtual Apache Druid Meetup: AIADA (Ask Itai and David Anything)
 
Introducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom ConnectorsIntroducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom Connectors
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Scalable Incremental Index for Druid
Scalable Incremental Index for DruidScalable Incremental Index for Druid
Scalable Incremental Index for Druid
 
Funnel Analysis with Spark and Druid
Funnel Analysis with Spark and DruidFunnel Analysis with Spark and Druid
Funnel Analysis with Spark and Druid
 
The benefits of running Spark on your own Docker
The benefits of running Spark on your own DockerThe benefits of running Spark on your own Docker
The benefits of running Spark on your own Docker
 
Optimizing Spark-based data pipelines - are you up for it?
Optimizing Spark-based data pipelines - are you up for it?Optimizing Spark-based data pipelines - are you up for it?
Optimizing Spark-based data pipelines - are you up for it?
 
Scheduling big data workloads on serverless infrastructure
Scheduling big data workloads on serverless infrastructureScheduling big data workloads on serverless infrastructure
Scheduling big data workloads on serverless infrastructure
 
GraphQL API on a Serverless Environment
GraphQL API on a Serverless EnvironmentGraphQL API on a Serverless Environment
GraphQL API on a Serverless Environment
 

Recently uploaded

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 

Recently uploaded (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 

Airflow Summit 2020 - Migrating airflow based spark jobs to kubernetes - the native way