Deploying Machine Learning Models to Production

Deploying
Machine Learning Models
To Production
1
Anass BENSRHIR
11-05-2017

Outline
2
•Typical ML Flow
•Different Strategies to deploy Machine Learning Models
•Verdict and Comparison

Who am I ?
• Founder et Senior Data Scientist
@Bolddata
• Big Data Project Leader
@Schlumberger
• Msc Advanced Systems &
Machine Learning @CentraleParis
anass@bolddata.net
@anassbensrhir
@abensrhir
3

Typical Machine Learning Flow
4
Source : http://blog.cloudera.com/blog/2016/02/how-to-predict-telco-churn-with-apache-spark-mllib/

Typical ML Model
5
from sklearn.datasets import load_iris
import numpy as np
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
iris = load_iris()
# Create a dataframe with the four feature variables
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
train, test = df[df['is_train']==True], df[df['is_train']==False]
features = df.columns[:4]
y = pd.factorize(train['species'])[0]
model = RandomForestClassifier(n_jobs=2)
model.fit(train[features], y)
model.predict(test[features])

“Might Work well for KAGGLE!
But Kaggle isn’t real world Machine learning!”
>>>Interpretability + Low Complexity
+ Speed
Accuracy = 0.81
Speed = 30ms
Accuracy
Accuracy = 0.91
Speed = 3s

Life Cycle of Real World ML in Production
Deployment
Management
Evaluation
Monitoring

Pickling
9
import cPickle as Pickle
with open(“mymodel.pkl”, “wb”) as mymodelfile:
Pickle.dump(model, mymodelfile)
with open(“mymodel.pkl”, “rb”) as mymodelfile:
thenewmodel = Pickle.load(mymodelfile)
thenewmodel.predict(newvector)

With Sklearn’s Joblib
10
from sklearn.externals import joblib
joblib.dump(model,"model.joblib", compress=1) # compression into 1 ﬁle
thenewmodel = joblib.load(“model.joblib")
thenewmodel.predict(newvector)

Pickle Vs Joblib Performance
TimetoLoadtheModel
0s
0.2s
0.4s
0.6s
0.8s
Loading The Model
0.72s
0.23s
Joblib Pickle
ModelFileSize
0kb
12.5kb
25kb
37.5kb
50kb
File Size
48 kb
4.7 kb
Joblib Pickle
*The Same Model

Verdict
12
GOOD BAD
• Consistant way to save time
and reuse the same model
everywhere.
• Fast !
• Might not work if Sklearn and
python versions are different
from saving to loading
environments.
• DevOPS nightmare.

Popularity of programming languages index
(TIOBE)
14
Source : https://www.tiobe.com/tiobe-index/
5

Cost vs Technological Beneﬁt Tradeoff
15
Cost $
Technological Beneﬁt
Native Java / C++ ..
models
Rebuild the whole
stack to Python
API Powered
Model
Hybrid Approach
PMML

Native Libraries
• Mostly used on legacy systems (Old CRM’s, Banking….) or High
Frequency Trading strategies.
• If used correctly, they are Fast
• Entire List of Libraries : https://github.com/josephmisiti/awesome-machine-learning
• C++ :
• LightGBM (https://github.com/Microsoft/LightGBM)
• MLPack (http://www.mlpack.org/)
• Caffe/CUDA (deeplearning) (http://caffe.berkeleyvision.org/)
• Java :
• Aerosolve (https://github.com/airbnb/aerosolve)
• H2O (https://github.com/h2oai/h2o-3)
• Weka (http://www.cs.waikato.ac.nz/ml/weka/)

Native Java/C++ Verdict
18
GOOD BAD
• Used on High Frequency
Trading ﬂoors where speed
trumps usability and agility.
• Faster !!!
• No use of Scikit-learn / pandas
Data science libraries
• Limitation of available algorithms
• Difﬁcult and Costly ($$)
• Does anybody know a Data
scientist who works exclusively on
Java or C++ ? (they are all in New
York)

PMML (Predictive Model Markup)
PMML stands for "Predictive Model Markup Language". It is the de facto
standard to represent predictive solutions. A PMML ﬁle may contain a myriad
of data transformations (pre- and post-processing) as well as one or more
predictive models.
Because it is a standard, PMML allows for different statistical and data mining
tools to speak the same language. In this way, a predictive solution can be
easily moved among different tools and applications without the need for
custom coding. For example, it may be developed in one application and
directly deployed on another.
20

PMML Pipeline
Scikit-learn
model
PMML
File
Export as PMML
sklearn2pmml
PMML
File
Import as PMML
Knime
Weka
R
SAS
C++
Java
General
Purpose
APP
Use the Model
sklearn2pmml : https://github.com/jpmml/sklearn2pmml
Java PMML Library : https://github.com/jpmml
Apache Spark PMML : https://github.com/jpmml/jpmml-spark
22
model.pmml
model.pmml

Python Code (Simplified)
23
import pandas
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn2pmml import PMMLPipeline
iris = load_iris()
# Create a dataframe with the four feature variables
iris_df = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_pipeline = PMMLPipeline([
("classifier", RandomForestClassifier())
])
iris_pipeline.fit(iris_df[iris_df.columns.difference(["Species"])], iris_df["Species"])
from sklearn2pmml import sklearn2pmml
sklearn2pmml(iris_pipeline, “RandomForestClassifier_Iris.pmml”)

Java Code (So much Simplified)
24
// Load the file using our simple util function.
PMML pmml = JPMMLUtils.loadModel( “RandomForestClassifier_Iris.pmml” );
// Now we need a prediction evaluator for the loaded model
PMMLManager mgr = new PMMLManager( pmml );
ModelEvaluator modelEvaluator = (ModelEvaluator) mgr.getModelManager(modelName,
ModelEvaluatorFactory.getInstance());
Evaluator evaluator = modelEvaluator;
…
…
…
Map results = evaluator.evaluate( features ); // prediction happens here

Use PMML in Spark
25
spark-submit --master local --class org.jpmml.spark.EvaluationExample example-1.0-SNAPSHOT.jar
RandomForestClassiﬁer_Iris.pmml Iris.csv /tmp/output/
example-1.0-SNAPSHOT.jar contains a java code to import the CSV and the pmml model

Verdict
26
GOOD BAD
• Interoperable
• Use Python data science
stack and deploy everywhere
• No Agility nor sustainability
• Not Every ML algorithm is
available
• PMML Files Are BIG
(Gigabytes…)
• Need to use unit tests and match
python output with new output =
Slow deployment

Flask - Scikit-learn Model
28
Scikit-learn
model FLASK Nginx
Features Vector
(x)
Predicted Value
(y)
Request
Response
Request : x {a = 1, b=3.4, c=3}
Response : status = 200 , y {predicted= “setosa”}
Web Request/response
Application / Webapp / Mobile App

Json POST Request
29
curl -H "Content-Type: application/json”
-X POST
-d ‘{“a":1,"b":2, “c”:4}’
http://localhost:5000/api/1.0/predict
Better With Security Enabled
curl -H "Content-Type: application/json”
-H "Authorization: Bearer <ACCESS_TOKEN>"
-X POST
-d ‘{“a":1,"b":2, “c”:4}’
http://localhost:5000/api/1.0/predict

Flask Code (Simplified)
30
from flask import Flask, request, jsonify
from config import VERSION
from mycustommodel import model2 as model
app = Flask(__name__)
@app.route('/api/{version}/predict'.format(version=VERSION), methods=['POST'])
def predict():
request = request.get_json(silent=True)
a = request.get('a')
b = request.get('b')
c = request.get('c')
prediction = model.predict([a, b, c])
response = dict(status="ok", prediction=prediction)
return jsonify(response)

Python Web Frameworks Benchmark
31
• Results for Loading and
returning a json object.
• Falcon and Flask have the
best Speed/Usability
Tradeoff

MongoDB Document
33
{
"_id" : ObjectId("4f693d40e4b04cde19f17205"),
"hostname" : “ec2-203-0-113-25.compute-1.amazonaws.com",
“user_id" : “19846”,
“prediction_id" : “3f6dcfe0-f0ac-4e94-ac46-35c1ce8d59f8”, #For traceability
"model_version": "1.0",
"request_features" : {
"a": 1.5,
"b": 0,
"c": 3.2
},
"response_prediction" {
"class": "setosa",
"probability": 0.701
},
"requested_at": ISODate("2017-03-10T10:50:42.389Z"),
"predicted_at": ISODate("2017-03-10T10:50:43.132Z"),
}

Nginx Conﬁg File
34
# Deﬁne your "upstream" servers - the
# servers request will be sent to
upstream app_example {
least_conn; # Use Least Connections strategy
server 192.168.1.19:5000; # Flask Server 1
server 192.168.1.19:5001; # Flask Server 1 Model 2
}
server {
listen 80;
server_name model.example.com
# pass the request to Flask Gunicorn server
# with some correct headers for proxy-awareness
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-NginX-Proxy true;
}
}

2 strategies
Model 1
Model 2
Sample 1
Sample 2
3000 visits
10% conversion
3000 visits
40% conversion
Use
Model 2
Everywhere
Strategie 1 : Use different models as they are, and update the models with new data afterwards
Strategie 2 : Use A/B Testing strategy to serve the best performing model to the whole sample.

Real-Time Monitoring with Grafana

Verdict
37
GOOD BAD
• SCALABLE
• Interoperable (can be used both by
backend and frontend languages think :
javascript !)
• Agile : models can be put on production
very fast and with no code change, simple
as launching a new server instance
• As Fast as you need it to be (add new
servers or docker containers)
• Did i say Agile ?
• The Infrastructure can
become overwhelming and
costly over time.

Other Options ?
• PredictionIO (http://prediction.io)
• Seldon (https://www.seldon.io/) Interesting !
• Oryx (http://oryx.io) Built for Hadoop

Deploying Machine Learning Models to Production

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deploying Machine Learning Models to Production

Similar to Deploying Machine Learning Models to Production (20)

Recently uploaded

Recently uploaded (20)

Deploying Machine Learning Models to Production