SlideShare a Scribd company logo
1 of 65
Download to read offline
Multi-tenant
Spark workflows
in Auto Scalable
Mesos clusters
Pablo Delgado
Prathima Donapudi
@pablete
Agenda
● Netflix intro. Context for the talk
● Spark on Mesos
● Mesos cluster configuration
● Autoscaling Mesos clusters
Netflix Scale
● Started streaming 10 years ago
● > 130M members
● > 190 countries
● > 1000 device types
● 1/3 of peak US downstream traffic
● 15% of global downstream traffic
The value of recommendations
● A few seconds to find something
great to watch…
● Can only show a few titles
● Enjoyment directly impacts
customer satisfaction
● How? Personalize everything, for
130M members across 190+
countries
Everything is a recommendation!
Selection and placement of the row types is personalized
From how to construct the page
Ordering of the titles in each row is personalized
...to what shows to recommend
Personalized artwork
...to what artwork to present
Profile 1 Profile 2
Member
streaming data
Training
pipeline
Models
Precompute
System
ML for Recommendations
Member
streaming data
AB Test
Allocation
Training
pipeline
Models
Training
pipeline
Models
Training
pipeline
Models
Precompute
System
ML for Recommendations
• Try an idea offline using historical data to see if it
would have improved recommendations
• If it would, deploy a live A/B test to see if it performs
well in production
Running Experiments
Machine Learning workflows
Workflow Directed Graph of steps, global parameters, triggers...
Step Describes a job and its configuration
Python DSL / Scala DSL / Rest API / UI
Machine Learning workflows
MESON Scheduler
Mesos Agent
Mesos Master
Meson executor
Mesos Agent
Meson executor
Mesos Framework
Scheduler
Meson as a Mesos Framework
Fenzo (Netflix OSS) makes
scheduling decisions
Mesos offers resources and runs
the steps
Fenzo
MESOS Clusters
Mesos Agent
Mesos Master
Meson
executor
Meson
executor
Docker container Service
Spark
driver
Mesos Agent
Spark
Executors
Mesos Agent
Spark
Executors
Mesos Agent
Minimal spark as a mesos framework
Spark on Mesos
Spark in short
Spark Physical Cluster
Shuffle service Shuffle service Shuffle service Shuffle service
Spark Physical Cluster
(Dynamic Resource Allocation)
Shuffle service Shuffle service Shuffle service Shuffle service Shuffle service
Shuffle service Shuffle service Shuffle service Shuffle service Shuffle service
Spark Physical Cluster
(Dynamic Resource Allocation)
Thinking about spark executors
Executor shape, memory, cores
Mesos Cluster
Configuration
48 GB
6 cpu
Offer: (6 cpu, 48GB)
48 GB
6 cpu
12 GB
2 cpu
Reserved
for agent
daemons
A cluster node machine
ie: r4.2xlarge has
8 cpus and 61GB
2 cpus and 12GB
reserved for agent
daemons
6 cpus and 48GB
available for spark
executors
Available resources in a mesos node
Available for
spark
executors
48 GB
6 cpu
Offer: (2 cpu, 32GB)
16 GB
4 cpu
32 GB
2 cpu
Launch Task: (4 cpu, 16GB)
Mesos
Master
Mesos Offers
When a task (ie: a spark
executor) is launched on
an agent, an offer is
created with an updated
view of the available
resources.
48 GB
6 cpu
Offer: (4 cpu, 12GB)
36 GB
2 cpu
12 GB
4 cpu
Launch Task: (2 cpu, 36GB)
Mesos
Master
Mesos Offers
48 GB
6 cpu
Offer: (0 cpu, 32GB)
16 GB
6 cpu
Launch Task: (6 cpu, 16GB)
Mesos
Master
Mesos Offers
When a task uses all of
the resources of one
type (ie cpus) the
resulting offer is
unusable, since no other
task can execute, with
only one of the
resources present. (ie: 0
cpus)
48 GB
6 cpu
Offer: (4 cpu, 0GB)
48 GB
2 cpuLaunch Task: (2 cpu, 48GB)
Mesos
Master
Mesos Offers
Likewise if you consume
all the ram available, the
resulting offer can not be
used by other tasks.
Idea:Fixed Size Executors
The share of memory of each task, depends on the actively running tasks (N).
Each task is assigned 1/N of the available memory.
Spark Memory Model
(Dynamic Assignment)
https://www.slideshare.net/databricks/deep-dive-memory-management-in-apache-spark
Executor with 4 cores available
With N=2 Each task gets ½ of the memory.
https://www.slideshare.net/databricks/deep-dive-memory-management-in-apache-spark
Executor with 4 cores available
Spark Memory Model
(Dynamic Assignment)
With N=4 Each task gets ¼ of the memory.
Executor with 4 cores available
https://www.slideshare.net/databricks/deep-dive-memory-management-in-apache-spark
Spark Memory Model
(Dynamic Assignment)
48 GB
6 cpu
24 GB
3 cpu
1 cpu
8 GB
Equivalent executors
16 GB
2 cpu
Proposed Fixed Size Executors
spark.executor.cores = 2
spark.executor.memory = 16g
16 GB
2 cpu
24 GB
2 cpu
32 GB
2 cpu
48 GB
6 cpu
16 GB
2 cpu
16 GB
2 cpu
16 GB
2 cpu
Ideal case
24 GB
2 cpu
8 GB
2 cpu
16 GB
2 cpu
Proposed Fixed Size Executors
32 GB
2 cpu
16 GB
2 cpu
Proposed Fixed Size Executors
bad case
24 GB
2 cpu
24 GB
2 cpu
2 unused
cpus
bad case
(wasted cpu)
2 unused
cpus
Thinking about executors
Executor shape, memory, cores
Autoscaling
Spark Mesos Clusters
● A spark job runs in 1000 cpu/hours that cost
1000$
○ Run a job with 1 cpu over 1000 hours.
○ Run a job with 1000 cpus over 1 hour.
What is the effect of having more
resources on your spark job?
cores
minutes
...
Resources vs Time to completion
Time to completion
Resources
AWS Autoscaling.
48 GB
6 cpu
16 GB
4 cpu
32
GB
2
cpu 48 GB
6 cpu36
GB
2
cpu
12 GB
4 cpu
48 GB
6 cpu
16 GB
6 cpu
48 GB
6 cpu
48
GB
2
cpu
Available: (0 cpu, 32GB) Available: (4 cpu, 0GB)Available: (4 cpu, 12GB)Available: (2 cpu, 32GB) Total reported:
(10 cpu, 76GB)
But ONLY Usable:
(2 cpu, 32GB)
+
(4 cpu, 12GB)
Reporting Free CPU/Memory
48 GB
6 cpu
16
GB
2
cpu
48 GB
6 cpu
16
GB
2
cpu
16
GB
2
cpu
48 GB
6 cpu
16
GB
2
cpu
16
GB
2
cpu
16
GB
2
cpu
100%66%33%0%
Reporting Usage as number of executors in use
Average: 50%
CAPACITY
Used: 6 executors
Available: 6 executors
Total: 12 executors
48 GB
6 cpu
16
GB
2
cpu
48 GB
6 cpu
16
GB
2
cpu
16
GB
2
cpu
48 GB
6 cpu
16
GB
2
cpu
16
GB
2
cpu
16
GB
2
cpu
100%100%100%0% Average: 75%
CAPACITY
Used: 3 agents
Available: 1 agent
Total: 4 agents
Reporting Usage as binary used / unused
Scale UP policy
Controls the slope of scaling up (scale out)
● Scaling down means terminating some instances to reduce the size of your ASG
● AWS AUTOSCALING has a default termination policy
a. Balance instances in multiple Availability Zones (one region zone a/b/c)
b. Pick unprotected instances in with the oldest launch configuration
c. If there are multiple unprotected instances, pick the closest to the next billing hour
d. Out of those select instances at random
● If one instance has running executors, those executors will be rescheduled somewhere
else in the cluster and the portion of computation the lost will be reprocessed. [SLOW]
● If one instance has running drivers, the entire spark job needs to be restarted. [BAD]
Scale up is sorted.
How about scaling down?
Spark Physical Cluster
Schedule Spark drivers in a different ASG
ASG 1
ASG 2
Instance Protection
Terminate instance signal
Instance Protection
Terminate instance signal
protect protect protect protectprotect
Shuffle files
Scale DOWN policy
Controls the slope of scaling down (scale in)
Timeline of Executors in Spark
--conf spark.scheduler.minRegisteredResourcesRatio={0.15, 0.3, 0.55, 1.0}
--conf spark.scheduler.maxRegisteredResourcesWaitingTime={30s, 600s, 1200s)
15% 30% 55% 100%
30secs 600secs 1200secs=20min
When does the spark computation start?
Mesos Clusters
Adding extra capacity takes less than 5 minutes
--conf spark.scheduler.maxRegisteredResourcesWaitingTime=600s
Mesos
Master
Mesos
Agent
Mesos
Agent
Mesos
Agent
Mesos
Agent
Available resources
(called offers)
Mesos agents periodically send a
message to the Mesos master
exposing how many resources
available they have, to execute tasks.
Resources normally consist of cpus,
gpus, memory, disk, and network
bandwidth.
We will focus on cpus and memory for
now. 5 cpus, 2GB free
2 cpus, 8GB free
1 cpu, 1GB free
Mesos receiving Offers
Mesos
Master
5 cpus, 2GB free
2 cpus, 8GB free
1 cpu, 1GB free
Mesos docs: Decline resources using a large timeout
Sort by Max Share Ascending
--conf spark.mesos.rejectOfferDuration = 120s
--conf spark.mesos.rejectOfferDurationForReachedExecutorLimit = 120s
http://mesos.apache.org/documentation/latest/app-framework-development-guide/
Mesos docs: “Do not revive frequently”
Get rid of this
https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L640-L663
http://mesos.apache.org/documentation/latest/app-framework-development-guide/
Timeline of acquiring Executors in Spark
Main Cluster size (# of instances)
● Number of ec2 instances
● Actually used ec2 instances
Anatomy of our managed clusters
Agent pools / Physical separation of concerns
Managed Mesos
Spark Clusters
TRION
TRION CI
Trion Family of clusters
TRION
TRION CI
TRION PLAY
Trion Family of clusters
DRA enabled
This will allow
to maximize
the usage of
shared clusters
TRION
TRION CI
TRION PLAY
Trion Family of clusters
TRION
TRION CI
TRION PLAY
Trion with High SLA pool
HIGH
SLA
UNBOUNDED
TRION
TRION CI
TRION PLAY
Trion Family of clusters
HIGH
SLA
UNBOUNDED
1234
Thank You
Questions?
We are Hiring...
http://bit.ly/NetflixSpark
Pablo Delgado
Prathima Donapudi
@pablete pdelgado@netflix.com

More Related Content

What's hot

Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012
Roland Tritsch
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with Cassandra
Sperasoft
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
srisatish ambati
 

What's hot (20)

RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
AWS Webcast - Achieving consistent high performance with Postgres on Amazon W...
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012Puppet Camp Dublin - 06/2012
Puppet Camp Dublin - 06/2012
 
유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리유연하고 확장성 있는 빅데이터 처리
유연하고 확장성 있는 빅데이터 처리
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
 
GTC Japan 2014
GTC Japan 2014GTC Japan 2014
GTC Japan 2014
 
Making Ceph fast in the face of failure
Making Ceph fast in the face of failure Making Ceph fast in the face of failure
Making Ceph fast in the face of failure
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with Cassandra
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinar
 
Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2Upgrading from HDP 2.1 to HDP 2.2
Upgrading from HDP 2.1 to HDP 2.2
 
Cassandra in Operation
Cassandra in OperationCassandra in Operation
Cassandra in Operation
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
 
Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
 

Similar to MesosCon 2018

Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Vigyan Jain
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 

Similar to MesosCon 2018 (20)

Spark Meetup
Spark MeetupSpark Meetup
Spark Meetup
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To
 
DEM19 Advanced Auto Scaling and Deployment Tools for Kubernetes and ECS
DEM19 Advanced Auto Scaling and Deployment Tools for Kubernetes and ECSDEM19 Advanced Auto Scaling and Deployment Tools for Kubernetes and ECS
DEM19 Advanced Auto Scaling and Deployment Tools for Kubernetes and ECS
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
 
Jvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies applicationJvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies application
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
 
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech TalksDeep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
Deep Dive on Amazon EC2 Instances - January 2017 AWS Online Tech Talks
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

MesosCon 2018

  • 1. Multi-tenant Spark workflows in Auto Scalable Mesos clusters Pablo Delgado Prathima Donapudi @pablete
  • 2. Agenda ● Netflix intro. Context for the talk ● Spark on Mesos ● Mesos cluster configuration ● Autoscaling Mesos clusters
  • 3. Netflix Scale ● Started streaming 10 years ago ● > 130M members ● > 190 countries ● > 1000 device types ● 1/3 of peak US downstream traffic ● 15% of global downstream traffic
  • 4. The value of recommendations ● A few seconds to find something great to watch… ● Can only show a few titles ● Enjoyment directly impacts customer satisfaction ● How? Personalize everything, for 130M members across 190+ countries
  • 5. Everything is a recommendation!
  • 6. Selection and placement of the row types is personalized From how to construct the page
  • 7. Ordering of the titles in each row is personalized ...to what shows to recommend
  • 8. Personalized artwork ...to what artwork to present Profile 1 Profile 2
  • 9.
  • 12. • Try an idea offline using historical data to see if it would have improved recommendations • If it would, deploy a live A/B test to see if it performs well in production Running Experiments
  • 13. Machine Learning workflows Workflow Directed Graph of steps, global parameters, triggers... Step Describes a job and its configuration Python DSL / Scala DSL / Rest API / UI
  • 15. MESON Scheduler Mesos Agent Mesos Master Meson executor Mesos Agent Meson executor Mesos Framework Scheduler Meson as a Mesos Framework Fenzo (Netflix OSS) makes scheduling decisions Mesos offers resources and runs the steps Fenzo
  • 16. MESOS Clusters Mesos Agent Mesos Master Meson executor Meson executor Docker container Service Spark driver Mesos Agent Spark Executors Mesos Agent Spark Executors Mesos Agent
  • 17. Minimal spark as a mesos framework Spark on Mesos
  • 19. Spark Physical Cluster Shuffle service Shuffle service Shuffle service Shuffle service
  • 20. Spark Physical Cluster (Dynamic Resource Allocation) Shuffle service Shuffle service Shuffle service Shuffle service Shuffle service
  • 21. Shuffle service Shuffle service Shuffle service Shuffle service Shuffle service Spark Physical Cluster (Dynamic Resource Allocation)
  • 22. Thinking about spark executors Executor shape, memory, cores Mesos Cluster Configuration
  • 23. 48 GB 6 cpu Offer: (6 cpu, 48GB) 48 GB 6 cpu 12 GB 2 cpu Reserved for agent daemons A cluster node machine ie: r4.2xlarge has 8 cpus and 61GB 2 cpus and 12GB reserved for agent daemons 6 cpus and 48GB available for spark executors Available resources in a mesos node Available for spark executors
  • 24. 48 GB 6 cpu Offer: (2 cpu, 32GB) 16 GB 4 cpu 32 GB 2 cpu Launch Task: (4 cpu, 16GB) Mesos Master Mesos Offers When a task (ie: a spark executor) is launched on an agent, an offer is created with an updated view of the available resources.
  • 25. 48 GB 6 cpu Offer: (4 cpu, 12GB) 36 GB 2 cpu 12 GB 4 cpu Launch Task: (2 cpu, 36GB) Mesos Master Mesos Offers
  • 26. 48 GB 6 cpu Offer: (0 cpu, 32GB) 16 GB 6 cpu Launch Task: (6 cpu, 16GB) Mesos Master Mesos Offers When a task uses all of the resources of one type (ie cpus) the resulting offer is unusable, since no other task can execute, with only one of the resources present. (ie: 0 cpus)
  • 27. 48 GB 6 cpu Offer: (4 cpu, 0GB) 48 GB 2 cpuLaunch Task: (2 cpu, 48GB) Mesos Master Mesos Offers Likewise if you consume all the ram available, the resulting offer can not be used by other tasks.
  • 29. The share of memory of each task, depends on the actively running tasks (N). Each task is assigned 1/N of the available memory. Spark Memory Model (Dynamic Assignment) https://www.slideshare.net/databricks/deep-dive-memory-management-in-apache-spark Executor with 4 cores available
  • 30. With N=2 Each task gets ½ of the memory. https://www.slideshare.net/databricks/deep-dive-memory-management-in-apache-spark Executor with 4 cores available Spark Memory Model (Dynamic Assignment)
  • 31. With N=4 Each task gets ¼ of the memory. Executor with 4 cores available https://www.slideshare.net/databricks/deep-dive-memory-management-in-apache-spark Spark Memory Model (Dynamic Assignment)
  • 32. 48 GB 6 cpu 24 GB 3 cpu 1 cpu 8 GB Equivalent executors 16 GB 2 cpu
  • 33. Proposed Fixed Size Executors spark.executor.cores = 2 spark.executor.memory = 16g 16 GB 2 cpu 24 GB 2 cpu 32 GB 2 cpu
  • 34. 48 GB 6 cpu 16 GB 2 cpu 16 GB 2 cpu 16 GB 2 cpu Ideal case 24 GB 2 cpu 8 GB 2 cpu 16 GB 2 cpu Proposed Fixed Size Executors
  • 35. 32 GB 2 cpu 16 GB 2 cpu Proposed Fixed Size Executors bad case 24 GB 2 cpu 24 GB 2 cpu 2 unused cpus bad case (wasted cpu) 2 unused cpus
  • 36. Thinking about executors Executor shape, memory, cores Autoscaling Spark Mesos Clusters
  • 37. ● A spark job runs in 1000 cpu/hours that cost 1000$ ○ Run a job with 1 cpu over 1000 hours. ○ Run a job with 1000 cpus over 1 hour. What is the effect of having more resources on your spark job? cores minutes
  • 38. ... Resources vs Time to completion Time to completion Resources
  • 40. 48 GB 6 cpu 16 GB 4 cpu 32 GB 2 cpu 48 GB 6 cpu36 GB 2 cpu 12 GB 4 cpu 48 GB 6 cpu 16 GB 6 cpu 48 GB 6 cpu 48 GB 2 cpu Available: (0 cpu, 32GB) Available: (4 cpu, 0GB)Available: (4 cpu, 12GB)Available: (2 cpu, 32GB) Total reported: (10 cpu, 76GB) But ONLY Usable: (2 cpu, 32GB) + (4 cpu, 12GB) Reporting Free CPU/Memory
  • 41. 48 GB 6 cpu 16 GB 2 cpu 48 GB 6 cpu 16 GB 2 cpu 16 GB 2 cpu 48 GB 6 cpu 16 GB 2 cpu 16 GB 2 cpu 16 GB 2 cpu 100%66%33%0% Reporting Usage as number of executors in use Average: 50% CAPACITY Used: 6 executors Available: 6 executors Total: 12 executors
  • 42. 48 GB 6 cpu 16 GB 2 cpu 48 GB 6 cpu 16 GB 2 cpu 16 GB 2 cpu 48 GB 6 cpu 16 GB 2 cpu 16 GB 2 cpu 16 GB 2 cpu 100%100%100%0% Average: 75% CAPACITY Used: 3 agents Available: 1 agent Total: 4 agents Reporting Usage as binary used / unused
  • 43. Scale UP policy Controls the slope of scaling up (scale out)
  • 44. ● Scaling down means terminating some instances to reduce the size of your ASG ● AWS AUTOSCALING has a default termination policy a. Balance instances in multiple Availability Zones (one region zone a/b/c) b. Pick unprotected instances in with the oldest launch configuration c. If there are multiple unprotected instances, pick the closest to the next billing hour d. Out of those select instances at random ● If one instance has running executors, those executors will be rescheduled somewhere else in the cluster and the portion of computation the lost will be reprocessed. [SLOW] ● If one instance has running drivers, the entire spark job needs to be restarted. [BAD] Scale up is sorted. How about scaling down?
  • 46. Schedule Spark drivers in a different ASG ASG 1 ASG 2
  • 48. Instance Protection Terminate instance signal protect protect protect protectprotect Shuffle files
  • 49. Scale DOWN policy Controls the slope of scaling down (scale in)
  • 51. --conf spark.scheduler.minRegisteredResourcesRatio={0.15, 0.3, 0.55, 1.0} --conf spark.scheduler.maxRegisteredResourcesWaitingTime={30s, 600s, 1200s) 15% 30% 55% 100% 30secs 600secs 1200secs=20min When does the spark computation start?
  • 52. Mesos Clusters Adding extra capacity takes less than 5 minutes --conf spark.scheduler.maxRegisteredResourcesWaitingTime=600s
  • 53. Mesos Master Mesos Agent Mesos Agent Mesos Agent Mesos Agent Available resources (called offers) Mesos agents periodically send a message to the Mesos master exposing how many resources available they have, to execute tasks. Resources normally consist of cpus, gpus, memory, disk, and network bandwidth. We will focus on cpus and memory for now. 5 cpus, 2GB free 2 cpus, 8GB free 1 cpu, 1GB free Mesos receiving Offers
  • 54. Mesos Master 5 cpus, 2GB free 2 cpus, 8GB free 1 cpu, 1GB free Mesos docs: Decline resources using a large timeout Sort by Max Share Ascending --conf spark.mesos.rejectOfferDuration = 120s --conf spark.mesos.rejectOfferDurationForReachedExecutorLimit = 120s http://mesos.apache.org/documentation/latest/app-framework-development-guide/
  • 55. Mesos docs: “Do not revive frequently” Get rid of this https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala#L640-L663 http://mesos.apache.org/documentation/latest/app-framework-development-guide/
  • 56. Timeline of acquiring Executors in Spark
  • 57. Main Cluster size (# of instances) ● Number of ec2 instances ● Actually used ec2 instances
  • 58. Anatomy of our managed clusters Agent pools / Physical separation of concerns Managed Mesos Spark Clusters
  • 60. TRION TRION CI TRION PLAY Trion Family of clusters DRA enabled This will allow to maximize the usage of shared clusters
  • 61. TRION TRION CI TRION PLAY Trion Family of clusters
  • 62. TRION TRION CI TRION PLAY Trion with High SLA pool HIGH SLA UNBOUNDED
  • 63. TRION TRION CI TRION PLAY Trion Family of clusters HIGH SLA UNBOUNDED 1234
  • 65. Questions? We are Hiring... http://bit.ly/NetflixSpark Pablo Delgado Prathima Donapudi @pablete pdelgado@netflix.com