SlideShare a Scribd company logo
1 of 21
Download to read offline
Dynamic	Resource	Alloca1on	
in	Apache	Spark	
Yuta	Imai	
@imai_factory
1.	RDD	Graph	
val	text	=	"Hello	Spark,	this	is	my	first	Spark	application."	
val	textArray	=	text.split("	").map(_.replaceAll("	",""))	
	
val	result	=	sc.parallelize(textArray)	
															.map(item	=>	(item,	1))	
															.reduceByKey((x,y)	=>	x	+	y)	
															.collect()
Array	 Array	ParallelCollec1onRDD	
Par11on0	
Par11on1	
Par11on2	
Par11on3	
MapPar11onsRDD	
Par11on0	
Par11on1	
Par11on2	
Par11on3	
ShuffledRDD	
Par11on0	
Par11on1	
sc.parallelize()	 .map(…)	 .reduceByKey(…)	 .collect()	
2.	DAG	Scheduler
Array	 Array	ParallelCollec1onRDD	
Par11on0	
Par11on1	
Par11on2	
Par11on3	
MapPar11onsRDD	
Par11on0	
Par11on1	
Par11on2	
Par11on3	
ShuffledRDD	
Par11on0	
Par11on1	
sc.parallelize()	 .map(…)	 .reduceByKey(…)	 .collect()	
2.	DAG	Scheduler	
Narrow	Dependency	 Shuffle	Dependency
Array	 Array	ParallelCollec1onRDD	
Par11on0	
Par11on1	
Par11on2	
Par11on3	
MapPar11onsRDD	
Par11on0	
Par11on1	
Par11on2	
Par11on3	
ShuffledRDD	
Par11on0	
Par11on1	
sc.parallelize()	 .map(…)	 .reduceByKey(…)	 .collect()	
2.	DAG	Scheduler	
Narrow	Dependency	 Shuffle	Dependency	
Stage0	 Stage1	
Task0	
Task1	
Task2	
Task3	
Task4	
Task5
3.	Task	Scheduler	
Par11on0	
Par11on1	
Par11on2	
Par11on3	
Par11on0	
Par11on1	
Par11on2	
Par11on3	
Task0	
Task1	
Task2	
Task3	
Executors
Shuffle	File	
iterator.map(…).map(...)...	
Executor	
Thread	
Storage	
Worker	Node	
iterator.map(…).map(...)...	
Executor	
Thread	
Worker	Node
DYNAMIC	RESOURCE	ALLOCATION
Dynamic	Resource	Alloca1on	
•  Adds	extra	executors	to	an	app	which	has	
pending	tasks.	
– Offloads	challenge	for	exact	resource	planning	for	
an	app.	
•  Removes	idle	executors	from	an	app.	
– Helps	a	long	running	app	to	free	idle	executors.
Overview	
Tasks	
Executors
Overview	
Tasks	
Executors	
Insufficient	capacity
Overview	
Tasks	
Executors	
Insufficient	capacity
Overview	
Tasks	
Executors	
Insufficient	capacity
Overview	
Tasks	
Executors	
Insufficient	capacity	 Op1mal	capacity
Overview	
Tasks	
Executors	
✔ ✔
Insufficient	capacity	 Op1mal	capacity	 Idle	executors
Tasks	
Executors	
✔ ✔
Overview	
Insufficient	capacity	 Op1mal	capacity	 Idle	executors	
Op1mal	capacity
Request	Policy	
•  An	app	starts	with	user	specified	#	of	executors.	
./bin/spark-submit		
		--class	<main-class>	
		--master	<master-url>		
		--num-executors	<#	of	executors>	
•  Ader	spark.dynamicAlloca1on.schedulerBacklogTimeout(sec),	App	
starts	reques1ng	new	executors,	if	it	has	pending	task(s).	
•  App	requests	new	executors	every	
spark.dynamicAlloca1on.sustainedSchedulerBacklogTimeout(sec),	
with	doubling	#	of	requests	like	1,	2,	4,	8,	16…
Remove	Policy	
•  An	app	removes	an	executor	when	it	has	been	idle	for	more	
than	spark.dynamicAlloca1on.executorIdleTimeout	seconds.
External	Shuffle	Service	
iterator.map(…).map(...)...	
Executor	
Thread	
Storage	
Worker	Node	
iterator.map(…).map(...)...	
Executor	
Thread	
Worker	Node
External	Shuffle	Service	
iterator.map(…).map(...)...	
Executor	
Thread	
Storage	
Worker	Node	
iterator.map(…).map(...)...	
Executor	
Thread	
Worker	Node
External	Shuffle	Service	
iterator.map(…).map(...)...	
Executor	
Thread	
Storage	
Worker	Node	
iterator.map(…).map(...)...	
Executor	
Thread	
Worker	Node	
Shuffle	
Service	
Shuffle	
Service

More Related Content

What's hot

MapReduce with Scalding @ 24th Hadoop London Meetup
MapReduce with Scalding @ 24th Hadoop London MeetupMapReduce with Scalding @ 24th Hadoop London Meetup
MapReduce with Scalding @ 24th Hadoop London Meetup
Landoop Ltd
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
Sages
 

What's hot (20)

Meet scala
Meet scalaMeet scala
Meet scala
 
MapReduce with Scalding @ 24th Hadoop London Meetup
MapReduce with Scalding @ 24th Hadoop London MeetupMapReduce with Scalding @ 24th Hadoop London Meetup
MapReduce with Scalding @ 24th Hadoop London Meetup
 
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...
Scott Anderson [InfluxData] | InfluxDB Tasks – Beyond Downsampling | InfluxDa...
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
 
spaCy lightning talk for KyivPy #21
spaCy lightning talk for KyivPy #21spaCy lightning talk for KyivPy #21
spaCy lightning talk for KyivPy #21
 
Scalding Presentation
Scalding PresentationScalding Presentation
Scalding Presentation
 
JS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless BebopJS Fest 2019. Anjana Vakil. Serverless Bebop
JS Fest 2019. Anjana Vakil. Serverless Bebop
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
Hadoop meetup : HUGFR Construire le cluster le plus rapide pour l'analyse des...
 
Spark_Documentation_Template1
Spark_Documentation_Template1Spark_Documentation_Template1
Spark_Documentation_Template1
 
Spark schema for free with David Szakallas
Spark schema for free with David SzakallasSpark schema for free with David Szakallas
Spark schema for free with David Szakallas
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
 
HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?HyperLogLog in Hive - How to count sheep efficiently?
HyperLogLog in Hive - How to count sheep efficiently?
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...
Big Data Day LA 2015 - Large Scale Distinct Count -- The HyperLogLog algorith...
 
Wprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache HadoopWprowadzenie do technologi Big Data i Apache Hadoop
Wprowadzenie do technologi Big Data i Apache Hadoop
 
R and C++
R and C++R and C++
R and C++
 
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data EcosystemWprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
Wprowadzenie do technologii Big Data / Intro to Big Data Ecosystem
 
Caching a page
Caching a pageCaching a page
Caching a page
 
Spark Schema For Free with David Szakallas
 Spark Schema For Free with David Szakallas Spark Schema For Free with David Szakallas
Spark Schema For Free with David Szakallas
 

Similar to Dynamic Resource Allocation in Apache Spark

Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, Paulius
Vasil Remeniuk
 

Similar to Dynamic Resource Allocation in Apache Spark (20)

Spark devoxx2014
Spark devoxx2014Spark devoxx2014
Spark devoxx2014
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdf
 
Using spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and CassandraUsing spark 1.2 with Java 8 and Cassandra
Using spark 1.2 with Java 8 and Cassandra
 
Spark by Adform Research, Paulius
Spark by Adform Research, PauliusSpark by Adform Research, Paulius
Spark by Adform Research, Paulius
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014Scalding - the not-so-basics @ ScalaDays 2014
Scalding - the not-so-basics @ ScalaDays 2014
 
Spark4
Spark4Spark4
Spark4
 
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 
Beauty and the beast - Haskell on JVM
Beauty and the beast  - Haskell on JVMBeauty and the beast  - Haskell on JVM
Beauty and the beast - Haskell on JVM
 
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt... Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark: Taming Big Data
Spark: Taming Big DataSpark: Taming Big Data
Spark: Taming Big Data
 
JDays Lviv 2014: Java8 vs Scala: Difference points & innovation stream
JDays Lviv 2014:  Java8 vs Scala:  Difference points & innovation streamJDays Lviv 2014:  Java8 vs Scala:  Difference points & innovation stream
JDays Lviv 2014: Java8 vs Scala: Difference points & innovation stream
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache SparkNLP on a Billion Documents: Scalable Machine Learning with Apache Spark
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
 
Meetup ml spark_ppt
Meetup ml spark_pptMeetup ml spark_ppt
Meetup ml spark_ppt
 
Introduction to Spark with Scala
Introduction to Spark with ScalaIntroduction to Spark with Scala
Introduction to Spark with Scala
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Scala.js - yet another what..?
Scala.js - yet another what..?Scala.js - yet another what..?
Scala.js - yet another what..?
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 

More from Yuta Imai

More from Yuta Imai (20)

Node-RED on device to Apache NiFi on cloud, via SORACOM Canal, with no Internet
Node-RED on device to Apache NiFi on cloud, via SORACOM Canal, with no InternetNode-RED on device to Apache NiFi on cloud, via SORACOM Canal, with no Internet
Node-RED on device to Apache NiFi on cloud, via SORACOM Canal, with no Internet
 
HDP2.5 Updates
HDP2.5 UpdatesHDP2.5 Updates
HDP2.5 Updates
 
Deep Learning On Apache Spark
Deep Learning On Apache SparkDeep Learning On Apache Spark
Deep Learning On Apache Spark
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Hadoop/Spark セルフサービス系の事例まとめ
Hadoop/Spark セルフサービス系の事例まとめHadoop/Spark セルフサービス系の事例まとめ
Hadoop/Spark セルフサービス系の事例まとめ
 
IoTアプリケーションで利用するApache NiFi
IoTアプリケーションで利用するApache NiFiIoTアプリケーションで利用するApache NiFi
IoTアプリケーションで利用するApache NiFi
 
OLAP options on Hadoop
OLAP options on HadoopOLAP options on Hadoop
OLAP options on Hadoop
 
Apache ambari
Apache ambariApache ambari
Apache ambari
 
Spark at Scale
Spark at ScaleSpark at Scale
Spark at Scale
 
Apache Hiveの今とこれから - 2016
Apache Hiveの今とこれから - 2016Apache Hiveの今とこれから - 2016
Apache Hiveの今とこれから - 2016
 
Hadoop最新事情とHortonworks Data Platform
Hadoop最新事情とHortonworks Data PlatformHadoop最新事情とHortonworks Data Platform
Hadoop最新事情とHortonworks Data Platform
 
Benchmark and Metrics
Benchmark and MetricsBenchmark and Metrics
Benchmark and Metrics
 
Hadoop and Kerberos
Hadoop and KerberosHadoop and Kerberos
Hadoop and Kerberos
 
Spark Streaming + Amazon Kinesis
Spark Streaming + Amazon KinesisSpark Streaming + Amazon Kinesis
Spark Streaming + Amazon Kinesis
 
オンラインゲームの仕組みと工夫
オンラインゲームの仕組みと工夫オンラインゲームの仕組みと工夫
オンラインゲームの仕組みと工夫
 
Amazon Machine Learning
Amazon Machine LearningAmazon Machine Learning
Amazon Machine Learning
 
Global Gaming On AWS
Global Gaming On AWSGlobal Gaming On AWS
Global Gaming On AWS
 
Digital marketing on AWS
Digital marketing on AWSDigital marketing on AWS
Digital marketing on AWS
 
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
EC2のストレージどう使う? -Instance Storageを理解して高速IOを上手に活用!-
 
クラウドネイティブなアーキテクチャでサクサク解析
クラウドネイティブなアーキテクチャでサクサク解析クラウドネイティブなアーキテクチャでサクサク解析
クラウドネイティブなアーキテクチャでサクサク解析
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Dynamic Resource Allocation in Apache Spark