Scaling spark on kubernetes at Lyft

Scaling Spark
on Kubernetes
Li Gao (Lyft)
Bill Graham (Lyft)

Introduction
Li Gao
Works in the Data Platform team at Lyft, currently leading the Compute Infra
initiatives including Spark on Kubernetes.
Previously at: Salesforce, Fitbit, Groupon, and other startups.
Bill Graham
Engineer/Architect on the Data Platform team at Lyft, currently developing data
ingestion systems.
Previously at Twitter, CBS Interactive, CNET Networks

● Introduction of Data Landscape at Lyft
● The challenges we face
● How Apache Spark on Kubernetes can help
● Remaining work
Agenda

Data Landscape
● Batch data Ingestion and ETL
● Data Streaming
● ML platforms
● Notebooks and BI tools
● Query and Visualization
● Operational Analytics
● Data Discovery & Lineage
● Workflow orchestration
● Cloud Platforms

The Evolving Batch Compute Architecture
Future2016-2017
Vendor-based
Hadoop
Early 2018
Hive on MR
Vendor Presto
Mid 2018
Hive on Tez +
Spark adhoc
Late 2018
Spark on
Vendor GA
Early 2019
Spark on K8s
Alpha
Spark on K8s
Beta

What batch compute is used for
Events
Ext Data
RDB/KV
Sys Events
InjestPipelines
AWSS3
AWSS3
Batch
Compute
Clusters
HMS
Presto,Hive,andBITools
Analysts
Engineers
Scientists
Services

Batch Compute Challenges
● 3rd Party vendor dependency issues
● Data ETL expressed solely in SQL
● Complex logic expressed in Python that hard to adopt in
SQL
● Different dependencies and versions
● Resource load balancing for heterogeneous workloads

3rd Party Vendor Dependencies
● Proprietary patches
● Inconsistent bootstrap
● Release schedule
● Homogeneous environments
● HIPAA Compliance

What about Python functions?
“I want to express my processing logic in python functions
with external geo libraries (i.e. Geomesa) and interact with
Hive tables” --- Lyft data engineer

How Apache Spark helps?
RDB/KV
Applications
APIs
Environments
Data Sources
and Data Sinks

What challenges remain?
● Per job custom dependencies
● Handling version requirements (Py3 v.s. Py2)
● Still need to run on shared clusters for cost efficiency

How about Dependencies?
RTree Libraries
Data CodecsSpatial Libraries

How about different Spark or Hive versions?
● Legacy jobs that require Spark 2.2
● Newer Jobs require Spark 2.3 or Spark 2.4
● Hive 2.1 SQL and Hive 2.3

How Kubernetes can help?
CRD Operators
& Controllers
Pods
Ingress &
CNI Services
Namespaces
Pods
Declarative
Resources
Deployment
& Replicas
Community

What are the challenges running Spark on k8s?
● Spark on k8s is still in its infancy
● Single cluster scaling limit
● CRD and control plane update challenges
● Pod churn and IP address allocations
● ECR container image reliability

Current scale of batch jobs
● PB data lake
● (O) k batch jobs running daily
● ~ 1000s of EC2 nodes spanning multiple
clusters
● ~ 1000s of workflows running daily

How Lyft scales Spark on K8s
# of Clusters # of Namespaces
# of Pods
Pod Churn Rate
# of Nodes
Pod Size
Job:Pod ratio IP Alloc Rate Limit
ECR Rate Limit

One vs Many Kubernetes Clusters

Cluster Pool HA Support
Cluster 1
Cluster 2
Cluster 3
Cluster Pool A
Cluster 4
● Cluster rotation within a cluster pool
● Automated provisioning of a new cluster and (manually) add into rotation
● Throttle at lower bound when rotation in progress

One vs Many Kubernetes Namespaces
Pod Pod Pod
Namespace 1
Pod Pod Pod
Namespace 2
Pod Pod Pod
Namespace 3
Node A Node B Node C Node D
Role1 Role1 Role2
Max Pod Size 1 Max Pod Size 2
● Practical ~3-5K active pods per namespace observed
● Less preemption required when namespace isolated by quota
● Different namespaces can map different IAM roles and sidecar
configurations

Shared vs Dedicated Kubernetes Pods
Job
Controller Spark Driver
Pod
Spark Exec
Pods
Job 2 Driver
Pod
Job 2 Exec
Pods
Job 3 Driver
Pod
Job 3 Exec
Pods
Shared Pods
Job 1
Job 4
Job 3
Job 2
AWS
S3
Dep
Dep
Dedicate & Isolated Pods
Dep

What about Pod Churn?
Separating DDL from DML to reduce churn

Separating DDL from DML Commands

Pod Priority and Preemptions (WIP)
● Priority base
preemption
● Driver pod
has higher
priority than
executor pod
D1 D2 E1 E2 E3 E4
Scheduler
D1
E5
New Pod Req
Before
D2 E5 E2 E3 E4
After
E1
Evicted

What about ECR reliability?
Node 1 Node 2 Node 3
Pods Pods Pods
DaemonSet + Docker In Docker
Container Images

Spark Job Config Overlays
Cluster Pool Defaults
Cluster Defaults
Spark Job User Specified Config
Cluster and Namespace Overrides
Final Spark Job Config
Job
Controller
and
Event
Watcher
Spark
Operator

X-Rays of the Architecture - Job Controller

X-Rays of the Architecture - Spark Operator

Monitoring & Logging Toolbox
HEKA
JMX

Monitoring Example - OOM Kill in namespace

Automation Toolbox
Kustomize
Template
K8S Deploy
Sidecar injectors
Secrets injectors
DaemonSets
KIAM

Remaining Work
● More intelligent job routing and parameter setting
● Granular cost attribution
● Improved docker image distribution
● Spark 3.0!

Key Takeaways
● Apache Spark can help unify different batch data compute use cases
● Kubernetes can help solve the dependency and multi-version requirements
using its containerized approach
● Spark on Kubernetes can scale significantly by using a multi-cluster approach
with proper resource isolation and scheduling techniques
● Challenges remain when running Spark on Kubernetes at scale

Community
This effort would not be possible
without the help from the open
source and wider communities:

Thank you
Strata SF 2019
Li Gao, in/ligao101 @ligao
Bill Graham, @billgraham
Please rate this session!
Questions?

We’re Hiring! Apply at www.lyft.com/careers
or email data-recruiting@lyft.com
Data Engineering
Engineering Manager
San Francisco
Software Engineer
San Francisco, Seattle, &
New York City
Data Infrastructure
Engineering Manager
San Francisco
Software Engineer
San Francisco & Seattle
Experimentation
Software Engineer
San Francisco
Streaming
Software Engineer
San Francisco
Observability
Software Engineer
San Francisco

Strata SF 2019
Rate this session
session page on conference website
O’Reilly Events App

Scaling spark on kubernetes at Lyft

Scaling spark on kubernetes at Lyft

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scaling spark on kubernetes at Lyft

Similar to Scaling spark on kubernetes at Lyft (20)

Recently uploaded

Recently uploaded (20)

Scaling spark on kubernetes at Lyft