This document summarizes a presentation on Spark tuning for enterprise system administrators. It discusses how to initially configure Spark applications to be reliable and optimal. Specifically, it recommends setting the executor memory and storage level parameters and using Kryo serialization to reduce memory usage. It also provides guidance on gracefully handling memory limitations such as by increasing the storage fraction. The overall goal is to help applications complete faster and within the cluster memory limits.
6. Default != Recommended
Example: By default, spark.executor.memory = 1g
1g allows small jobs to finish out of the box.
Spark assumes you'll increase this parameter.
!6
7. Which parameters are important?
!
How do I configure them?
!7
Default != Recommended
8. Filter* data
before an
expensive reduce
or aggregation
consider*
coalesce(
Use* data
structures that
require less
memory
Serialize*
PySpark
serializing
is built-in
Scala/
Java?
persist(storageLevel.[*]_SER)
Recommended:
kryoserializer *
tuning.html#tuning-
data-structures
See "Optimize partitions."
*
See "GC investigation." *
See "Checkpointing." *
The Spark Tuning Cheat-Sheet
28. !28
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
29. !29
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
30. !30
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
Limitation: Driver must not be
larger than a single node.
32. !32
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
34. !34
Max Memory in "pool" x 3/4 = mySparkApp_mem_limit
!
mySparkApp_mem_limit > driver.memory + (executor.memory
x dynamicAllocation.maxExecutors)
What is the memory limit for
mySparkApp?
Verify my calculations respect this
limitation.
39. Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
40. Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
41. Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
here let's talk about one scenario
42.
43. Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
44. Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
45. Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
Recommended: kryoserializer *
46. Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
persist(storageLevel.[*]_SER)
Recommended: kryoserializer *
47. Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
48. Reduce the memory needed for
mySparkApp. How?
Gracefully handle memory
limitations. How?
mySparkApp memory issues
here let's talk about one scenario