Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio

Achieving Data Locality
Using Spark + Alluxio
with and without K8s
Bin Fan, Jiacheng Liu
2019/12/17 Office Hour
Questions at https://alluxio.io/slack

Speakers
• Bin Fan
○ Founding engineer & Alluxio PMC @ Alluxio
○ Email: binfan@alluxio.com
• Jiacheng Liu
○ Core maintainer @ Alluxio
○ Email: jiacheng@alluxio.com
For Q&A, join https://alluxio.io/slack

Agenda
• Alluxio Overview
• Spark + Alluxio Data Locality without K8s
• K8s Overview
• Spark + Alluxio Data Locality with K8s
• Limitations and Future Work

The Alluxio Story
Originated as Tachyon project, at UC Berkley AMPLab by
Ph.D. student Haoyuan (H.Y.) Li - now Alluxio CTO2013
2015
Open Source project established & company to
commercialize Alluxio founded
Goal: Orchestrate Data at Memory Speed for the Cloud
for data driven apps such as Big Data Analytics, ML and AI.
20192018
2019
Top 10 Big Data
2019
Top 10 Cloud Software

Fast-growing Open Source Community
4000+ Github Stars1000+ Contributors
Join the community on Slack
(FAQ for this office hour)
alluxio.io/slack
Apache 2.0 Licensed
Contribute to source code
github.com/alluxio/alluxio

Alluxio is Open-Source Data Orchestration
Data Orchestration for the Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver GCS Driver S3 Driver Azure Driver

Why Alluxio in K8s?
Elastic Data for Elastic Compute with Tight Locality
● Improve Data locality
○ Big data analytics or ML
○ Cache data close to compute
● Enable high-speed data sharing across jobs
○ A closer staging storage layer for
compute jobs
● Provide Unification of persistent storage
○ Same data abstraction across different
storage
Host
Alluxio
Spark
AlluxioAlluxio
Spark
Alluxio
SparkSpark
or
Read more at https://www.alluxio.io/blog/kubernetes-alluxio-and-the-disaggregated-analytics-stack/

Data Challenges
● Data Copy
○ Multiple Copies of Data
● A Stateful Application on K8s can be hard
○ Data Migration and Rebalancing on elasticity
● Changing applications for a new storage system can be
hard
○ Lack of familiar API
○ Tuning can be challenging

Spark + Alluxio Data Locality
Without K8s

Spark Workflow (Without Alluxio)
Cluster Manager
(i.e. YARN/Mesos)
Application
Spark
Context s3://data/
Worker Node
1) run Spark job
Worker Node
Spark
Executor
3) launch executors
and launch tasks
4) access data and compute
2) allocate executors
Takeaway: Remote data, no locality

Step 2: Schedule compute to data cache location
Cluster Manager
(i.e. YARN/Mesos)
Application
Spark
Context
Alluxio
Client
Alluxio
Masters
HostA: 196.0.0.7
Alluxio
Worker
HostB: 196.0.0.8
Alluxio
Worker
2.2) allocate on [HostA]
block1
block2
2.1) where is
block1?
2.1) block1 is
served at
[HostA]
● Alluxio client implements HDFS compatible API with
block location info
● Alluxio masters keep track and serve a list of worker
nodes that currently have the cache.

Step 4: Find local Alluxio Worker and Efficient Data
Exchange
s3://data/
Spark Executor
Alluxio
WorkerAlluxio
Client
HostB: 196.0.0.8
Alluxio
Worker
HostA: 196.0.0.7
block1
Alluxio
Masters
block1?
[HostA]
Efficient I/O via local fs
(e.g., /mnt/ramdisk/) or
local domain socket
(/opt/domain)
● Spark Executor finds local Alluxio Worker by
hostname comparison
● Spark Executor talks to local Alluxio Worker
using either short-circuit access (via local
FS) or local domain socket

Recap: Spark Architecture with Alluxio
Cluster Manager
(i.e. YARN/Mesos)
Application
Spark
Context s3://data/
Alluxio
Client
Alluxio
Masters
Worker Node
Spark
Executor
Alluxio
Worker
Alluxio
Client
Worker Node
Alluxio
Worker
1) run spark job
2.2) allocate executors
4) access Alluxio for data
and compute
2.1) talk to Alluxio for
where the data cache is
Step 2: Help Spark schedule compute to data cache
Step 4: Enable Spark to access local Alluxio cache
3) launch executors
and launch tasks

Kubernetes (k8s) is...
“an open-source container-orchestration system for automating
application deployment, scaling, and management.”

Container Orchestration Platform
● Platform agnostic cluster management
● Service discovery and load balancing
● Storage orchestration
● Horizontal scaling
● Self-monitoring and self-healing
● Automated rollouts and rollbacks

K8s Terms
● Node
○ A VM or physical machine
● Container
○ Docker Image = Lightweight OS and application execution environment
○ Container = Image once running on Docker Engine
● Pod
○ Schedulable unit of one or more containers running together
○ Containers share storage and network etc
● Controller
○ Controls the desired state such as copies of a Pod
● DaemonSet
○ A Controller that ensures each Node has only one such Pod
● Volume
○ A directory accessible to the Containers in a Pod
● Persistent Volume
○ A storage resource with lifecycle independent of Pods

Spark + Alluxio Data Locality
In K8s environment

Spark on K8s architecture
● Spark 2.3 added native K8s support
● Spark application runs in K8s as a custom K8s Controller
● spark-submit talks to K8s API Server to create Driver
● Spark Driver creates Executor Pods
● When the application completes, the Executor Pods
terminate and are cleaned up
● The Driver Pod persists logs and remains in
“COMPLETED” state

Deployment model: Co-location
● Co-locate Spark Executor Pod with Alluxio Worker Pod
● Lifecycle
○ Spark Executors are ephemeral
○ Alluxio Worker serves cache across all Spark jobs
● Deployment order:
○ Deploy Alluxio cluster first (masters+workers)
○ 1 Alluxio Worker on each Node, by DaemonSet
○ Spark-submit to launch Spark Driver + Executors
Worker Node
Spark Executor
Alluxio
WorkerAlluxio
Client
Worker Node
Alluxio
Worker PodAlluxio
Client
Spark Executor
Pod
Co-location
No K8s
Co-location
K8s

Challenge 1: Spark Driver fails to allocate Executors correctly
Application
Spark
Context
Alluxio
Client
allocate on [AlluxioWorkerPodA]?
block1
block2
block1? [AlluxioWorkerPodA]
HostA: 196.0.0.7
AlluxioWorkerPodA:
10.0.1.1
HostB: 196.0.0.8
AlluxioWorkerPodB:
10.0.1.2
Alluxio
Master
Pods
Some Node
Spark
Driver
Pod
Some Node
Problem: Alluxio Worker Pods have different addresses from Nodes

Current Solution: Launch Alluxio Workers with hostNetwork
Application
Spark
Context
Alluxio
Client
allocate on [HostA]
block1
block2
block1? [HostA]
HostA: 196.0.0.7
HostA: 196.0.0.7
HostB: 196.0.0.8
HostB: 196.0.0.8
Alluxio
Master
Pods
Some Node
Spark
Driver
Pod
Some Node
hostNetwork: true
hostNetwork: true
Network
adapter
Network
adapter

Challenge 2: Spark Executor failed to identify if Alluxio
Pod is host-local
Problem:
● Spark Executor Pod has different hostName
○ Hostname HostA != SparkExecPod1
○ Local Alluxio Worker not identified!
HostA: 196.0.0.7
Network
adapter
block1
Alluxio
Client
SparkExecPod1:
10.0.1,1
Alluxio
Master
block1? [HostA]

Challenge 3: Spark Executor failed to find domain socket
Problem:
● Pods don’t share the File System
● Domain socket /opt/domain is in Alluxio Worker Pod
HostA: 196.0.0.7
Network
adapter
block1
Alluxio
Client
SparkExecPod1:
10.0.1,1
Alluxio
Master
block1? [HostA]
/opt/domain/

Current Solution: Use UUID to identify Alluxio Worker
and share a hostPath Volume between Pods
HostA: 196.0.0.7
block1
Alluxio
Client
SparkExecPod1:
10.0.1,1
Alluxio
Master
block1? [HostA:
UUIDABC]
/opt/domain/UUIDABC
UUID: UUIDABC
● Each Alluxio Worker has a UUID
● Share domain socket by a hostPath Volume
● Alluxio Client finds local worker’s domain socket by
finding file matching worker UUID.
● Worker domain socket path
/opt/domain/d -> /opt/domain/UUIDABC
Enabled by
alluxio.worker.data.server.domain.socket.address
● Mount hostPath Volume to Spark Executor
Enabled by Spark 2.4 (PR)

Recap: Spark + Alluxio Architecture on K8s
Application
Spark
Context
Alluxio
Client
Spark
Executor
Pod
Alluxio
Worker
Pod
Alluxio
Client
Alluxio
Worker
Pod
Worker Node
Worker Node
Spark
Driver
Pod
Alluxio
Master
Pods
Some Node
Some Node
1) run spark job
2.1) talk to Alluxio for
where the data is
s3://data/
2.2) create Driver
4) access Alluxio for data
and compute
Recap:
Step 2: Help Spark schedule compute to data location
Step 4: Enable Spark to access data locally
3) launch executors
and launch tasks

Limitations and future work
● Many enterprise environments have strict control on hostNetwork and
hostPath.
● Alluxio workers need hostNetwork
○ Plan: workarounds like supporting container network translation
● The domain socket file requires a hostPath volume
○ Plan: workarounds like using Local Persistent Volumes
● Feedback/collaboration are welcome!

Additional Resources
Spark Native K8s Support
https://spark.apache.org/docs/latest/running-on-kubernetes.html
https://kubernetes.io/blog/2018/03/apache-spark-23-with-native-kubernetes/
Spark + Alluxio Advanced Data Locality
https://www.alluxio.io/blog/top-10-tips-for-making-the-spark-alluxio-stack-blazing-fast/
Running Alluxio in K8s
https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-Kubernetes.html
Running Spark with Alluxio in K8s
https://docs.alluxio.io/os/user/stable/en/compute/Spark-On-Kubernetes.html
How Alluxio Helps Analytics Stack in Kubernetes
https://www.alluxio.io/blog/kubernetes-alluxio-and-the-disaggregated-analytics-stack/

zhuanlan.zhihu.com/alluxio
www.alluxio.com
info@alluxio.com
twitter.com/alluxio
linkedIn.com/alluxio
Thank you
binfan@alluxio.com
jiacheng@alluxio.com
https://alluxio.io/slack

Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio

Similar to Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio (20)

More from Alluxio, Inc.

More from Alluxio, Inc. (20)

Recently uploaded

Recently uploaded (20)

Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio