Alluxio Community Office Hour
Dec 17, 2019
Speakers:
Bin Fan & Jiacheng Liu, Alluxio
While adoption of the Cloud & Kubernetes has made it exceptionally easy to scale compute, the increasing spread of data across different systems and clouds has created new challenges for data engineers. Effectively accessing data from AWS S3 or on-premises HDFS becomes harder and data locality is also lost - how do you move data to compute workers efficiently, how do you unify data across multiple or remote clouds, and many more. Open source project Alluxio approaches this problem in a new way. It helps elastic compute workloads, such as Apache Spark, realize the true benefits of the cloud while bringing data locality and data accessibility to workloads orchestrated by Kubernetes.
One important performance optimization in Apache Spark is to schedule tasks on nodes with HDFS data nodes locally serving the task input data. However, more users are running Apache Spark natively on Kubernetes where HDFS is not an option. This office hour describes the concept and dataflow with respect to using the stack of Spark/Alluxio in Kubernetes with enhanced data locality even the storage service is outside or remote.
In this Office Hour, we will go over:
- Why Spark is able to make a locality-aware schedule when working with Alluxio in K8s environment using the host network
- Why a pod running Alluxio can share data efficiently with a pod running Spark on the same host using domain socket and host path volume
- The roadmap to improve this Spark / Alluxio stack in the context of K8s
Automate your Kamailio Test Calls - Kamailio World 2024
Improving Data Locality for Spark Jobs on Kubernetes Using Alluxio
1. Achieving Data Locality
Using Spark + Alluxio
with and without K8s
Bin Fan, Jiacheng Liu
2019/12/17 Office Hour
Questions at https://alluxio.io/slack
2. Speakers
• Bin Fan
○ Founding engineer & Alluxio PMC @ Alluxio
○ Email: binfan@alluxio.com
• Jiacheng Liu
○ Core maintainer @ Alluxio
○ Email: jiacheng@alluxio.com
For Q&A, join https://alluxio.io/slack
3. Agenda
• Alluxio Overview
• Spark + Alluxio Data Locality without K8s
• K8s Overview
• Spark + Alluxio Data Locality with K8s
• Limitations and Future Work
5. The Alluxio Story
Originated as Tachyon project, at UC Berkley AMPLab by
Ph.D. student Haoyuan (H.Y.) Li - now Alluxio CTO2013
2015
Open Source project established & company to
commercialize Alluxio founded
Goal: Orchestrate Data at Memory Speed for the Cloud
for data driven apps such as Big Data Analytics, ML and AI.
20192018
2019
Top 10 Big Data
2019
Top 10 Cloud Software
6. Fast-growing Open Source Community
4000+ Github Stars1000+ Contributors
Join the community on Slack
(FAQ for this office hour)
alluxio.io/slack
Apache 2.0 Licensed
Contribute to source code
github.com/alluxio/alluxio
7. Alluxio is Open-Source Data Orchestration
Data Orchestration for the Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver GCS Driver S3 Driver Azure Driver
8. Why Alluxio in K8s?
Elastic Data for Elastic Compute with Tight Locality
● Improve Data locality
○ Big data analytics or ML
○ Cache data close to compute
● Enable high-speed data sharing across jobs
○ A closer staging storage layer for
compute jobs
● Provide Unification of persistent storage
○ Same data abstraction across different
storage
Host
Alluxio
Spark
AlluxioAlluxio
Spark
Alluxio
SparkSpark
or
Read more at https://www.alluxio.io/blog/kubernetes-alluxio-and-the-disaggregated-analytics-stack/
9. Data Challenges
● Data Copy
○ Multiple Copies of Data
● A Stateful Application on K8s can be hard
○ Data Migration and Rebalancing on elasticity
● Changing applications for a new storage system can be
hard
○ Lack of familiar API
○ Tuning can be challenging
11. Spark Workflow (Without Alluxio)
Cluster Manager
(i.e. YARN/Mesos)
Application
Spark
Context s3://data/
Worker Node
1) run Spark job
Worker Node
Spark
Executor
3) launch executors
and launch tasks
4) access data and compute
2) allocate executors
Takeaway: Remote data, no locality
12. Step 2: Schedule compute to data cache location
Cluster Manager
(i.e. YARN/Mesos)
Application
Spark
Context
Alluxio
Client
Alluxio
Masters
HostA: 196.0.0.7
Alluxio
Worker
HostB: 196.0.0.8
Alluxio
Worker
2.2) allocate on [HostA]
block1
block2
2.1) where is
block1?
2.1) block1 is
served at
[HostA]
● Alluxio client implements HDFS compatible API with
block location info
● Alluxio masters keep track and serve a list of worker
nodes that currently have the cache.
13. Step 4: Find local Alluxio Worker and Efficient Data
Exchange
s3://data/
Spark Executor
Alluxio
WorkerAlluxio
Client
HostB: 196.0.0.8
Alluxio
Worker
HostA: 196.0.0.7
block1
Alluxio
Masters
block1?
[HostA]
Efficient I/O via local fs
(e.g., /mnt/ramdisk/) or
local domain socket
(/opt/domain)
● Spark Executor finds local Alluxio Worker by
hostname comparison
● Spark Executor talks to local Alluxio Worker
using either short-circuit access (via local
FS) or local domain socket
14. Recap: Spark Architecture with Alluxio
Cluster Manager
(i.e. YARN/Mesos)
Application
Spark
Context s3://data/
Alluxio
Client
Alluxio
Masters
Worker Node
Spark
Executor
Alluxio
Worker
Alluxio
Client
Worker Node
Alluxio
Worker
1) run spark job
2.2) allocate executors
4) access Alluxio for data
and compute
2.1) talk to Alluxio for
where the data cache is
Step 2: Help Spark schedule compute to data cache
Step 4: Enable Spark to access local Alluxio cache
3) launch executors
and launch tasks
16. Kubernetes (k8s) is...
“an open-source container-orchestration system for automating
application deployment, scaling, and management.”
17. Container Orchestration Platform
● Platform agnostic cluster management
● Service discovery and load balancing
● Storage orchestration
● Horizontal scaling
● Self-monitoring and self-healing
● Automated rollouts and rollbacks
18. K8s Terms
● Node
○ A VM or physical machine
● Container
○ Docker Image = Lightweight OS and application execution environment
○ Container = Image once running on Docker Engine
● Pod
○ Schedulable unit of one or more containers running together
○ Containers share storage and network etc
● Controller
○ Controls the desired state such as copies of a Pod
● DaemonSet
○ A Controller that ensures each Node has only one such Pod
● Volume
○ A directory accessible to the Containers in a Pod
● Persistent Volume
○ A storage resource with lifecycle independent of Pods
20. Spark on K8s architecture
● Spark 2.3 added native K8s support
● Spark application runs in K8s as a custom K8s Controller
● spark-submit talks to K8s API Server to create Driver
● Spark Driver creates Executor Pods
● When the application completes, the Executor Pods
terminate and are cleaned up
● The Driver Pod persists logs and remains in
“COMPLETED” state
21. Deployment model: Co-location
● Co-locate Spark Executor Pod with Alluxio Worker Pod
● Lifecycle
○ Spark Executors are ephemeral
○ Alluxio Worker serves cache across all Spark jobs
● Deployment order:
○ Deploy Alluxio cluster first (masters+workers)
○ 1 Alluxio Worker on each Node, by DaemonSet
○ Spark-submit to launch Spark Driver + Executors
Worker Node
Spark Executor
Alluxio
WorkerAlluxio
Client
Worker Node
Alluxio
Worker PodAlluxio
Client
Spark Executor
Pod
Co-location
No K8s
Co-location
K8s
22. Challenge 1: Spark Driver fails to allocate Executors correctly
Application
Spark
Context
Alluxio
Client
allocate on [AlluxioWorkerPodA]?
block1
block2
block1? [AlluxioWorkerPodA]
HostA: 196.0.0.7
AlluxioWorkerPodA:
10.0.1.1
HostB: 196.0.0.8
AlluxioWorkerPodB:
10.0.1.2
Alluxio
Master
Pods
Some Node
Spark
Driver
Pod
Some Node
Problem: Alluxio Worker Pods have different addresses from Nodes
23. Current Solution: Launch Alluxio Workers with hostNetwork
Application
Spark
Context
Alluxio
Client
allocate on [HostA]
block1
block2
block1? [HostA]
HostA: 196.0.0.7
HostA: 196.0.0.7
HostB: 196.0.0.8
HostB: 196.0.0.8
Alluxio
Master
Pods
Some Node
Spark
Driver
Pod
Some Node
hostNetwork: true
hostNetwork: true
Network
adapter
Network
adapter
24. Challenge 2: Spark Executor failed to identify if Alluxio
Pod is host-local
Problem:
● Spark Executor Pod has different hostName
○ Hostname HostA != SparkExecPod1
○ Local Alluxio Worker not identified!
HostA: 196.0.0.7
Network
adapter
block1
Alluxio
Client
SparkExecPod1:
10.0.1,1
Alluxio
Master
block1? [HostA]
25. Challenge 3: Spark Executor failed to find domain socket
Problem:
● Pods don’t share the File System
● Domain socket /opt/domain is in Alluxio Worker Pod
HostA: 196.0.0.7
Network
adapter
block1
Alluxio
Client
SparkExecPod1:
10.0.1,1
Alluxio
Master
block1? [HostA]
/opt/domain/
26. Current Solution: Use UUID to identify Alluxio Worker
and share a hostPath Volume between Pods
HostA: 196.0.0.7
block1
Alluxio
Client
SparkExecPod1:
10.0.1,1
Alluxio
Master
block1? [HostA:
UUIDABC]
/opt/domain/UUIDABC
UUID: UUIDABC
● Each Alluxio Worker has a UUID
● Share domain socket by a hostPath Volume
● Alluxio Client finds local worker’s domain socket by
finding file matching worker UUID.
● Worker domain socket path
/opt/domain/d -> /opt/domain/UUIDABC
Enabled by
alluxio.worker.data.server.domain.socket.address
● Mount hostPath Volume to Spark Executor
Enabled by Spark 2.4 (PR)
27. Recap: Spark + Alluxio Architecture on K8s
Application
Spark
Context
Alluxio
Client
Spark
Executor
Pod
Alluxio
Worker
Pod
Alluxio
Client
Alluxio
Worker
Pod
Worker Node
Worker Node
Spark
Driver
Pod
Alluxio
Master
Pods
Some Node
Some Node
1) run spark job
2.1) talk to Alluxio for
where the data is
s3://data/
2.2) create Driver
4) access Alluxio for data
and compute
Recap:
Step 2: Help Spark schedule compute to data location
Step 4: Enable Spark to access data locally
3) launch executors
and launch tasks
28. Limitations and future work
● Many enterprise environments have strict control on hostNetwork and
hostPath.
● Alluxio workers need hostNetwork
○ Plan: workarounds like supporting container network translation
● The domain socket file requires a hostPath volume
○ Plan: workarounds like using Local Persistent Volumes
● Feedback/collaboration are welcome!
29. Additional Resources
Spark Native K8s Support
https://spark.apache.org/docs/latest/running-on-kubernetes.html
https://kubernetes.io/blog/2018/03/apache-spark-23-with-native-kubernetes/
Spark + Alluxio Advanced Data Locality
https://www.alluxio.io/blog/top-10-tips-for-making-the-spark-alluxio-stack-blazing-fast/
Running Alluxio in K8s
https://docs.alluxio.io/os/user/stable/en/deploy/Running-Alluxio-On-Kubernetes.html
Running Spark with Alluxio in K8s
https://docs.alluxio.io/os/user/stable/en/compute/Spark-On-Kubernetes.html
How Alluxio Helps Analytics Stack in Kubernetes
https://www.alluxio.io/blog/kubernetes-alluxio-and-the-disaggregated-analytics-stack/