Dynamic Resource Allocation in Apache Spark

Dynamic Resource Alloca1on
in Apache Spark
Yuta Imai
@imai_factory

1. RDD Graph
val text = "Hello Spark, this is my first Spark application."
val textArray = text.split(" ").map(_.replaceAll(" ",""))

val result = sc.parallelize(textArray)
.map(item => (item, 1))
.reduceByKey((x,y) => x + y)
.collect()

Array Array ParallelCollec1onRDD
Par11on0
Par11on1
Par11on2
Par11on3
MapPar11onsRDD
Par11on0
Par11on1
Par11on2
Par11on3
ShuﬄedRDD
Par11on0
Par11on1
sc.parallelize() .map(…) .reduceByKey(…) .collect()
2. DAG Scheduler

Par11on0
Par11on1
Par11on2
Par11on3
MapPar11onsRDD
Par11on0
Par11on1
Par11on2
Par11on3
ShuﬄedRDD
Par11on0
Par11on1
2. DAG Scheduler
Narrow Dependency Shuﬄe Dependency

Par11on0
Par11on1
Par11on2
Par11on3
MapPar11onsRDD
Par11on0
Par11on1
Par11on2
Par11on3
ShuﬄedRDD
Par11on0
Par11on1
2. DAG Scheduler
Narrow Dependency Shuﬄe Dependency
Stage0 Stage1
Task0
Task1
Task2
Task3
Task4
Task5

3. Task Scheduler
Par11on0
Par11on1
Par11on2
Par11on3
Par11on0
Par11on1
Par11on2
Par11on3
Task0
Task1
Task2
Task3
Executors

Shuﬄe File
iterator.map(…).map(...)...
Executor
Thread
Storage
Worker Node
Executor
Thread
Worker Node

Dynamic Resource Alloca1on
•  Adds extra executors to an app which has
pending tasks.
– Oﬄoads challenge for exact resource planning for
an app.
•  Removes idle executors from an app.
– Helps a long running app to free idle executors.

Overview
Tasks
Executors
Insuﬃcient capacity

Overview
Tasks
Executors
Insuﬃcient capacity Op1mal capacity

Overview
Tasks
Executors
✔ ✔
Insuﬃcient capacity Op1mal capacity Idle executors

Tasks
Executors
✔ ✔
Overview
Insuﬃcient capacity Op1mal capacity Idle executors
Op1mal capacity

Request Policy
•  An app starts with user speciﬁed # of executors.
./bin/spark-submit
--class <main-class>
--master <master-url>
--num-executors <# of executors>
•  Ader spark.dynamicAlloca1on.schedulerBacklogTimeout(sec), App
starts reques1ng new executors, if it has pending task(s).
•  App requests new executors every
spark.dynamicAlloca1on.sustainedSchedulerBacklogTimeout(sec),
with doubling # of requests like 1, 2, 4, 8, 16…

Remove Policy
•  An app removes an executor when it has been idle for more
than spark.dynamicAlloca1on.executorIdleTimeout seconds.

External Shuﬄe Service
Executor
Thread
Storage
Worker Node
Executor
Thread
Worker Node

External Shuffle Service
Executor
Thread
Storage
Worker Node
Executor
Thread
Worker Node
Shuffle
Service
Shuffle
Service

Dynamic Resource Allocation in Apache Spark

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dynamic Resource Allocation in Apache Spark

Similar to Dynamic Resource Allocation in Apache Spark (20)

More from Yuta Imai

More from Yuta Imai (20)

Recently uploaded

Recently uploaded (20)

Dynamic Resource Allocation in Apache Spark