SlideShare a Scribd company logo
1 of 32
Download to read offline
Tuning Apache Spark for large-
scale workloads
Sital Kedia

Gaoxiang Liu Facebook
Agenda
• Apache Spark at Facebook
• Scaling Spark Driver
• Scaling Spark Executor
• Scaling External Shuffle
• Application tuning
• Tools
• Used for large scale batch workload
• Tens of thousands of jobs/day and growing
• Running on in the order of thousands of nodes
• Job scalability -
• Processes hundreds of TBs of compressed input data and
shuffle data
• Runs hundreds of thousands of tasks
Apache Spark at Facebook
Spark Architecture
Cluster

Manager
shuffle

service
Executor
Driver shuffle

service
Executor
shuffle

service
Executor
Scaling Spark Driver
Dynamic Executor Allocation
Cluster

Manager
Task Queue
Scheduler
Worker Node
shuffle

service
shuffle

service
shuffle

service
Executor
Request 

Executor
Release

Executor
• Better Resource utilization
• Good for multi-tenant environment
spark.dynamicAllocation.enabled = true

spark.dynamicAllocation.executorIdleTimeout = 2m
spark.dynamicAllocation.minExecutors = 1
spark.dynamicAllocation.maxExecutors = 2000
Multi-threaded Event Processor
[SPARK-18838]
Task

Scheduler
Single

threaded

event 

processor
Event

listeners
Event

Queue
Single threaded event processor
architecture
Task

Scheduler
Single threaded

Executor service

per listener
Multi-threaded event processor
architecture
Better Fetch Failure handling
[SPARK-19753] Avoid multiple retries of stages in case of Fetch Failure
Single fetch failure causing

single retriesSingle fetch failure causing 

multiple retries
• Avoid duplicate task run in case of Fetch Failure (SPARK-20163)
• Configurable max number of Fetch Failures (SPARK-13369)
• Ongoing effort (SPARK-20178)
Better Fetch Failure handling
spark.max.fetch.failures.per.stage = 10
• Frequent driver OOM when running many tasks in parallel
• Huge backlog of RPC requests built on Netty server of the
driver
• Increase RPC server thread to fix OOM
Tune RPC Server threads
Scaling Spark Driver
spark.rpc.io.serverThreads = 64
Scaling Spark Executor
Executor memory layout
Shuffle

Memory
User

Memory
Reserved

Memory 

(300 MB)
Memory

Buffer
}
}
}
spark.memory.fraction * (spark.executor.memory - 300 MB)
(1- spark.memory.fraction) * (spark.executor.memory - 300 MB)
spark.yarn.executor.memoryOverhead =
0.1 * (spark.executor.memory)
Tuning memory configurations
Shuffle

Memory
User

Memory
Reserved

Memory 

(300 MB)
Memory

Buffer
}
}
}
spark.memory.offHeap.enabled = true

spark.memory.offHeap.size = 3g
spark.executor.memory = 3g
spark.yarn.executor.memoryOverhead = 0.1 * 

(spark.executor.memory + spark.memory.offHeap.size)
Enable off-heap memory
Tuning memory configurations
• Large contiguous in-memory buffers allocated by Spark's shuffle
internals.
• G1GC suffers from fragmentation due to Humongous
Allocations, if object size is more than 32 MB (Maximum region
size of G1GC)
• Use parallel GC instead of G1GC
Garbage collection tuning
spark.executor.extraJavaOptions = -XX:ParallelGCThreads=4 -XX:+UseParallelGC
Eliminating disk I/O bottleneck
Sort & 

Spill to disk
























Temporary spill files on disk
Shuffle 

Partition
Final shuffle files on disk
In-memory

records
Eliminating disk I/O bottleneck
• Disk access is 10 - 100K times slower than memory access
• Make write buffer sizes for disk I/O configurable (SPARK-20074)
• Amortize disk I/O cost by doing buffered read/write
Tune Shuffle file buffer
spark.shuffle.file.buffer = 1 MB

spark.unsafe.sorter.spill.reader.buffer.size = 1MB

Eliminating disk I/O bottleneck
























Temporary spill 

files on disk
[SPARK-20014] Optimize spill files merging
















In-memory spill

file buffers


In-memory

spill merge
Shuffle 

Partition
Final shuffle 

file on disk
spark.file.transferTo = false

spark.shuffle.file.buffer = 1 MB

spark.shuffle.unsafe.file

.output.buffer = 5 MB
Eliminating disk I/O bottleneck
Tune compression block size
Compression block size vs size of shuffle files
SizeofShufflefiles(inTB)
130
155
180
205
230
Compression block size
32kb 128kb 512kb 1mb
• Default compression
block size of 32 kb is sub-
optimal
• Upto 20% reduction in
shuffle/spill file size by
increasing the block size
spark.io.compression.lz4.blockSize = 512KB
Various Memory leak fixes and
improvements
• Memory leak fixes (SPARK-14363, SPARK-17113, SPARK-18208)
• Snappy optimization (SPARK-14277)
• Reduce update frequency of shuffle bytes written metrics
(SPARK-15569)
• Configurable initial buffer size for Sorter(SPARK-15958)
Scaling External Shuffle Service
Cache Index files on Shuffle Server
SPARK-15074
index file
partition
partition
partition
Shuffle
service
Reducer
Reducer
Shuffle fetch
Shuffle fetch
Read index 

file
Read 

partition
Read 

partition
Read index 

file index file
partition
partition
partition
Shuffle
service
Reducer
Reducer
Shuffle fetch
Shuffle fetch
Read and cache

index file
Read 

partition
Read 

partition
spark.shuffle.service.index.cache.entries = 2048
Scaling External Shuffle Service
• Tune shuffle service worker thread and backlog
• Configurable shuffle registration timeout and retry
(SPARK-20640)
spark.shuffle.io.serverThreads = 128

spark.shuffle.io.backLog = 8192

spark.shuffle.registration.timeout = 2m

spark.shuffle.registration.maxAttempts = 5
https://code.facebook.com/posts/1671373793181703
Application tuning
• Improve performance of job latency (under same amount of
resource)
• Improve usability - eliminate manual tuning as much as possible
to achieve comparable job performance with manually tuned
parameters
Motivation
• Heuristics-based approach based on table input size
Auto tuning of mapper and reducer
• Max cap due to the
constrain of the scalability
of shuffle service and
drivers
• Min cap due to the
minimum guarantee of
resource to user's job
Tools
Tools
Spark UI metrics
Tools
Flame Graph
Executo Periodic
Jstack/PerfExecutoExecutor
Filter executor
threads
Worker Jstack
aggregator
service
Tools
Analysis of Task metrics using Facebook's Scuba
Task

Scheduler
Scuba

Event

listeners
Event

Queue
• Enables us to do complex queries
like -
• How many job failed because of OOM in
ExternalSorter in last 1 hour?
• What percentage of total execution time is
being spent in shuffle read?
• Did fetch failure rate go up after the last
Spark release?
• Set up monitoring and alerting to
catch regression
Resources
• Scuba: Diving into Data at Facebook
• Apache Spark @Scale: A 60 TB+ production use case
Questions?

More Related Content

What's hot

Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkDatabricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Spark Summit
 
Flash for Apache Spark Shuffle with Cosco
Flash for Apache Spark Shuffle with CoscoFlash for Apache Spark Shuffle with Cosco
Flash for Apache Spark Shuffle with CoscoDatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark Aakashdata
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsDatabricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 

What's hot (20)

Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
How to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache SparkHow to Automate Performance Tuning for Apache Spark
How to Automate Performance Tuning for Apache Spark
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
Flash for Apache Spark Shuffle with Cosco
Flash for Apache Spark Shuffle with CoscoFlash for Apache Spark Shuffle with Cosco
Flash for Apache Spark Shuffle with Cosco
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
Introduction to apache spark
Introduction to apache spark Introduction to apache spark
Introduction to apache spark
 
Best Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale PlatformsBest Practices for Enabling Speculative Execution on Large Scale Platforms
Best Practices for Enabling Speculative Execution on Large Scale Platforms
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 

Similar to Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

Metrics-driven tuning of Apache Spark at scale
Metrics-driven tuning of Apache Spark at scaleMetrics-driven tuning of Apache Spark at scale
Metrics-driven tuning of Apache Spark at scaleDataWorks Summit
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouDatabricks
 
Spark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital KediaSpark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital KediaSpark Summit
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareDatabricks
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Spark Summit
 
Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Etti Gur
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at FacebookDatabricks
 
Spark / Mesos Cluster Optimization
Spark / Mesos Cluster OptimizationSpark / Mesos Cluster Optimization
Spark / Mesos Cluster Optimizationebiznext
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsSamir Bessalah
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogVadim Semenov
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 

Similar to Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia (20)

Metrics-driven tuning of Apache Spark at scale
Metrics-driven tuning of Apache Spark at scaleMetrics-driven tuning of Apache Spark at scale
Metrics-driven tuning of Apache Spark at scale
 
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye ZhouMetrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
Metrics-Driven Tuning of Apache Spark at Scale with Edwina Lu and Ye Zhou
 
Spark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital KediaSpark Summit EU talk by Sital Kedia
Spark Summit EU talk by Sital Kedia
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why Care
 
Spark cep
Spark cepSpark cep
Spark cep
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
Migrating Complex Data Aggregation from Hadoop to Spark-(Ashish Singh andPune...
 
Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?Optimizing spark based data pipelines - are you up for it?
Optimizing spark based data pipelines - are you up for it?
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Spark / Mesos Cluster Optimization
Spark / Mesos Cluster OptimizationSpark / Mesos Cluster Optimization
Spark / Mesos Cluster Optimization
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark Jobs
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
Using apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at DatadogUsing apache spark for processing trillions of records each day at Datadog
Using apache spark for processing trillions of records each day at Datadog
 
Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Spark on Yarn @ Netflix
Spark on Yarn @ NetflixSpark on Yarn @ Netflix
Spark on Yarn @ Netflix
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 

Recently uploaded (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

  • 1. Tuning Apache Spark for large- scale workloads Sital Kedia
 Gaoxiang Liu Facebook
  • 2. Agenda • Apache Spark at Facebook • Scaling Spark Driver • Scaling Spark Executor • Scaling External Shuffle • Application tuning • Tools
  • 3. • Used for large scale batch workload • Tens of thousands of jobs/day and growing • Running on in the order of thousands of nodes • Job scalability - • Processes hundreds of TBs of compressed input data and shuffle data • Runs hundreds of thousands of tasks Apache Spark at Facebook
  • 6. Dynamic Executor Allocation Cluster
 Manager Task Queue Scheduler Worker Node shuffle
 service shuffle
 service shuffle
 service Executor Request 
 Executor Release
 Executor • Better Resource utilization • Good for multi-tenant environment spark.dynamicAllocation.enabled = true
 spark.dynamicAllocation.executorIdleTimeout = 2m spark.dynamicAllocation.minExecutors = 1 spark.dynamicAllocation.maxExecutors = 2000
  • 7. Multi-threaded Event Processor [SPARK-18838] Task
 Scheduler Single
 threaded
 event 
 processor Event
 listeners Event
 Queue Single threaded event processor architecture Task
 Scheduler Single threaded
 Executor service
 per listener Multi-threaded event processor architecture
  • 8. Better Fetch Failure handling [SPARK-19753] Avoid multiple retries of stages in case of Fetch Failure Single fetch failure causing
 single retriesSingle fetch failure causing 
 multiple retries
  • 9. • Avoid duplicate task run in case of Fetch Failure (SPARK-20163) • Configurable max number of Fetch Failures (SPARK-13369) • Ongoing effort (SPARK-20178) Better Fetch Failure handling spark.max.fetch.failures.per.stage = 10
  • 10. • Frequent driver OOM when running many tasks in parallel • Huge backlog of RPC requests built on Netty server of the driver • Increase RPC server thread to fix OOM Tune RPC Server threads Scaling Spark Driver spark.rpc.io.serverThreads = 64
  • 12. Executor memory layout Shuffle
 Memory User
 Memory Reserved
 Memory 
 (300 MB) Memory
 Buffer } } } spark.memory.fraction * (spark.executor.memory - 300 MB) (1- spark.memory.fraction) * (spark.executor.memory - 300 MB) spark.yarn.executor.memoryOverhead = 0.1 * (spark.executor.memory)
  • 13. Tuning memory configurations Shuffle
 Memory User
 Memory Reserved
 Memory 
 (300 MB) Memory
 Buffer } } } spark.memory.offHeap.enabled = true
 spark.memory.offHeap.size = 3g spark.executor.memory = 3g spark.yarn.executor.memoryOverhead = 0.1 * 
 (spark.executor.memory + spark.memory.offHeap.size) Enable off-heap memory
  • 14. Tuning memory configurations • Large contiguous in-memory buffers allocated by Spark's shuffle internals. • G1GC suffers from fragmentation due to Humongous Allocations, if object size is more than 32 MB (Maximum region size of G1GC) • Use parallel GC instead of G1GC Garbage collection tuning spark.executor.extraJavaOptions = -XX:ParallelGCThreads=4 -XX:+UseParallelGC
  • 15. Eliminating disk I/O bottleneck Sort & 
 Spill to disk 
 
 
 
 
 
 
 
 
 
 
 
 Temporary spill files on disk Shuffle 
 Partition Final shuffle files on disk In-memory
 records
  • 16. Eliminating disk I/O bottleneck • Disk access is 10 - 100K times slower than memory access • Make write buffer sizes for disk I/O configurable (SPARK-20074) • Amortize disk I/O cost by doing buffered read/write Tune Shuffle file buffer spark.shuffle.file.buffer = 1 MB
 spark.unsafe.sorter.spill.reader.buffer.size = 1MB

  • 17. Eliminating disk I/O bottleneck 
 
 
 
 
 
 
 
 
 
 
 
 Temporary spill 
 files on disk [SPARK-20014] Optimize spill files merging 
 
 
 
 
 
 
 
 In-memory spill
 file buffers 
 In-memory
 spill merge Shuffle 
 Partition Final shuffle 
 file on disk spark.file.transferTo = false
 spark.shuffle.file.buffer = 1 MB
 spark.shuffle.unsafe.file
 .output.buffer = 5 MB
  • 18. Eliminating disk I/O bottleneck Tune compression block size Compression block size vs size of shuffle files SizeofShufflefiles(inTB) 130 155 180 205 230 Compression block size 32kb 128kb 512kb 1mb • Default compression block size of 32 kb is sub- optimal • Upto 20% reduction in shuffle/spill file size by increasing the block size spark.io.compression.lz4.blockSize = 512KB
  • 19. Various Memory leak fixes and improvements • Memory leak fixes (SPARK-14363, SPARK-17113, SPARK-18208) • Snappy optimization (SPARK-14277) • Reduce update frequency of shuffle bytes written metrics (SPARK-15569) • Configurable initial buffer size for Sorter(SPARK-15958)
  • 21. Cache Index files on Shuffle Server SPARK-15074 index file partition partition partition Shuffle service Reducer Reducer Shuffle fetch Shuffle fetch Read index 
 file Read 
 partition Read 
 partition Read index 
 file index file partition partition partition Shuffle service Reducer Reducer Shuffle fetch Shuffle fetch Read and cache
 index file Read 
 partition Read 
 partition spark.shuffle.service.index.cache.entries = 2048
  • 22. Scaling External Shuffle Service • Tune shuffle service worker thread and backlog • Configurable shuffle registration timeout and retry (SPARK-20640) spark.shuffle.io.serverThreads = 128
 spark.shuffle.io.backLog = 8192
 spark.shuffle.registration.timeout = 2m
 spark.shuffle.registration.maxAttempts = 5
  • 25. • Improve performance of job latency (under same amount of resource) • Improve usability - eliminate manual tuning as much as possible to achieve comparable job performance with manually tuned parameters Motivation
  • 26. • Heuristics-based approach based on table input size Auto tuning of mapper and reducer • Max cap due to the constrain of the scalability of shuffle service and drivers • Min cap due to the minimum guarantee of resource to user's job
  • 27. Tools
  • 29. Tools Flame Graph Executo Periodic Jstack/PerfExecutoExecutor Filter executor threads Worker Jstack aggregator service
  • 30. Tools Analysis of Task metrics using Facebook's Scuba Task
 Scheduler Scuba
 Event
 listeners Event
 Queue • Enables us to do complex queries like - • How many job failed because of OOM in ExternalSorter in last 1 hour? • What percentage of total execution time is being spent in shuffle read? • Did fetch failure rate go up after the last Spark release? • Set up monitoring and alerting to catch regression
  • 31. Resources • Scuba: Diving into Data at Facebook • Apache Spark @Scale: A 60 TB+ production use case