Ray Serve: A new scalable machine learning model serving library on Ray

© 2019-2020, Anyscale.io
Ray.Serve: A new scalable machine
learning model serving library on Ray
Simon Mo
xmo@anyscale.io
@simon_mo_

@simon_mo_
A system for building scalable
Python (and Java) applications.
Reinforcement
learning
Hyperparameter
tuning and
distributed training
Serving
Distributed
Applications
Data analytics
Ray Ecosystem
2

@simon_mo_
rning
Hyperparameter tuning and
distributed training Serving
Distri
Applic
Data analytics
This talk
3

Offline
Training
Data
Data
Collection
Cleaning &
Visualization
Feature Eng. &
Model Design
Training &
Validation
Model Development
Trained
Models
Training Pipelines
Live
Data
Training
Validation
End User
Application
Query
Prediction
Prediction Service
Inference
Feedback
Logic
Big Picture:
Machine Learning Lifecycle
4

End User
Application
Query
Prediction
Prediction Service
Inference
Feedback
Logic
Goal: serve predictions for
large-scale, interactive
applications
5

@simon_mo_
Two common approaches
● Embed model evaluation in the web server
● Offload prediction to an external service
6

@simon_mo_
Embed model evaluation in server
7
HTTP
/api/healthz
/api/db_query
/api/image/id/..

@simon_mo_
Embed model evaluation in server
8
/api/healthz
/api/db_query
/api/image/id/..
/api/image/predict

@simon_mo_
The web server approach
+ Simplicity
+ End to end control over how model is served
x One query at a time
x Model loaded once, as global variable
x No isolation
x No fine-grained replication
9

@simon_mo_
The web server approach (continue)
x Process-pool based deployment
10
Initial process
Worker process
Worker process
Worker process
…
Requests
Load Balanced
Forked

@simon_mo_
x Process-pool based deployment
11
Worker process
Worker process

@simon_mo_
x Process-pool based deployment -> memory issue
12
Initial process
Worker process
Worker process
Worker process
…
Requests
Load Balanced
Forked

@simon_mo_
x No complex pipeline
13
Model

@simon_mo_
14
Pipeline

@simon_mo_
15
A/B Test
80%
20%

@simon_mo_
16
Ensemble

@simon_mo_
17
Cascade
High confidence
Low
confidence

@simon_mo_
Two common approaches
● Embed model evaluation in the web server
● Offload prediction to an external service
18

@simon_mo_
Offload inference to external service
19
/api/healthz
/api/db_query
/api/image/id/..
/api/image/predict

@simon_mo_
20
-> HTTP
API Validation
Business Logic
Input Transformation
Inference
Output Transformation
Business Logic
<- HTTP

@simon_mo_
21
Web
Server
External
Service
-> HTTP
API Validation
Business Logic
Inference
Business Logic
<- HTTP
Inference

@simon_mo_
22
Web
Server
External
ServiceInference
-> HTTP
API Validation
Business Logic
Business Logic
<- HTTP

@simon_mo_
External services are mostly “tensor-in,
tensor-out”
23
-> HTTP
API Validation
Business Logic
Inference
Business Logic
<- HTTPComplexity

@simon_mo_
External services approach
+ Separation of concern
x Need to scale separately
x Model evaluation logic split from transformation logic
x Hard to learn
x Hard to debug
24

@simon_mo_
Ray.Serve
25
+ Simplicity
+ End to end control
+ Enable complex pipelines
+ Programmability and Observability

@simon_mo_
Programmable Serving System
● YAML -> Python
● serve.create_backend
● serve.create_endpoint
● serve.split
● serve.scale
27

@simon_mo_
Kubernetes? Service Mesh?
● Serve provide a layer on top Kubernetes
○ Easy to serve simple model
○ Easy to serve complex pipeline
○ API definition and the model at the same place
○ Built-in service mesh for flexible routing
28

@simon_mo_
Serve runs on top of K8s
29
Ray Serve: Run on top of Kubernetes
23
Ray Serve
Model Model Model Model Model Model
Pod Pod Pod

@simon_mo_
Comparison
30
Compare to Serve
TFServing
+ Scale to any number of nodes
+ Support arbitrary frameworks
Seldon
+ Imperative pipelines
+ Flexible queuing policy
Sagemaker
+ Better batching
+ Deploy anywhere
"Flask”
+ Fine-grained replication
+ Isolated deployment

@simon_mo_
Try it out today
pip install ray[serve]
from ray.experimental import serve
31
- Ready for early adopters
- #serve channel in slack
- Coming soon:
- Performance benchmark
- Deployment tutorial

Ray Serve: A new scalable machine learning model serving library on Ray

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ray Serve: A new scalable machine learning model serving library on Ray

Similar to Ray Serve: A new scalable machine learning model serving library on Ray (20)

Recently uploaded

Recently uploaded (20)

Ray Serve: A new scalable machine learning model serving library on Ray