Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

3

Share

Download to read offline

Scale machine learning deployment

Download to read offline

A introduction about how different tools like seldon, clipper, mlflow and mleap, support deploy machine learning models

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Scale machine learning deployment

  1. 1. Scale Machine Learning Deployment Gang Tao
  2. 2. Data Science Project Life Cycle
  3. 3. Model Persistent
  4. 4. ▶ Python pickle based code serialization ▶ sklearn.externals.joblib ▶ Spark provide api to save model/pipeline as file ▶ Tensorflow provide tf.train.Saver that persists the tensor graph ▶ It is pickle + metadata + checkpoint Python Sklearn / Spark / Tensorflow
  5. 5. ▶ Models from different tools are not compatible ▶ Code serialization has dependency on python version ▶ Code serialization has potential security concerns ▶ For tf model, those tensor names are required ( need check if there are in the meta data) ▶ tf mode has dependency on customer code which defined customer operations Issues and Limitations
  6. 6. A simple view of model deployment
  7. 7. ▶ Enable wide range of ML modeling tools : Python, R, Tensorflow, Spark ▶ Scale up and down ▶ Performance, Latency optimization ▶ Accessing model, API ▶ Audit and Versioning ▶ CI/CD ▶ Metrics and Monitoring ▶ Optimization, AB Tests ML Deployment Challenges
  8. 8. Seldon
  9. 9. ▶ Seldon, A London Company focuses on providing control over Machine Learning based on open source software ▶ Seldon Core is a open source platform for deploying machine learning model on Kubernetes • Python/Spark/H2O/R model support • REST and gRPC API • Deploy Inference graph of Model/Routers/Combiner/Transformers as microservices • Leveraging K8s to provide scale, security, monitoring etc Seldon
  10. 10. Pros Cons ▶ Seamless K8s integration ▶ Graph definition to support AB test and ensembling ▶ No Scala support for Spark ▶ Need customer image for pySpark ▶ No customization support for liveness/readiness check due to CRD Summary
  11. 11. Clipper
  12. 12. ▶ Clipper.ai is a system developed by UC Berkeley RISE lab. ▶ Clipper is a prediction serving system that sits between user-facing applications and a wide range of commonly used machine learning models and frameworks. Clipper
  13. 13. Pros Cons ▶ Easy to use interactive model deploy ▶ Support Docker and K8s ▶ Query Latency Objective support ▶ Model Version management • Update and Rollback ▶ Cloud pickle version issue ▶ Python only ▶ Less examples/Documents ▶ Not friendly to AWS • use_internal_ip does not work well • need manually create repo for model • Failed to pull image from ecr ▶ Cluster creation is not stable ▶ Tensorflow failed to pickle Summary
  14. 14. MLFlow
  15. 15. ▶ MLflow is an open source platform for managing the end-to-end machine learning lifecycle. ▶ MLFlow is developed by Databricks MLFlow
  16. 16. Pros Cons ▶ Flexible ▶ Easy to do with SKlearn ▶ Cloud integration to support sagemaker and azure ▶ No K8s integration ▶ Spark/Tensorflow support is based on Python ▶ Projects are better managed by container Summary
  17. 17. MLeap
  18. 18. ▶ MLeap allows data scientists and engineers to deploy machine learning pipelines from Spark and Scikit-learn to a portable format and execution engine. • A JSON base serialization • A Runtime execution engine • Benchmarks ▶ http://mleap-docs.combust.ml/core-concepts/transformers/support.html MLeap
  19. 19. MLeap Serialization
  20. 20. Pros Cons ▶ Portable model between Spark and Sklearn ▶ Human readable model ▶ Easy model serving ▶ Support matrix is incomplete ▶ Extensibility • Write code for each estimator/transformer ▶ To support tensorflow, need customer build tf-java binding, and is under experiment Summary
  21. 21. Wrap up
  22. 22. ▶ Seldon tightly integrates with k8s to support the scalability of model serving, and it’s graph function is powerful. ▶ Clipper provides good interaction, while the code is not stable enough ▶ MLflow’s model serving is simple, with less functions ▶ MLeap targets to provide inter-operation between different tools which is very nice, while there is still a long way to go to support all the features. • PMML is not covered ▶ Some other tools are not touched • MXnet model server • Oracle Graphpipe Wrap up
  23. 23. Model Persistent ML Tools K8s Integration Version License Implementation Seldon Core S2i + Pickle Tensorflow, SKlearn, Keras, R, H2O, Nodejs, PMML Yes 0.3.2 Apache Docker + K8s CRD Clipper Pickle Python, PySpark, PyTorch, Tensorflow, MXnet, Customer Container Yes 0.3.0 Apache CPP / Python MLFlow Directory + Metadata Python, H2O, Kera, MLeap, PyTorch, Sklearn, Spark, Tensorflow, R No Alpha Apache Python MLeap Spark,Sklearn, Tensorflow No 0.12.0 Apache Scala/Java
  24. 24. Other findings
  25. 25. ▶ Enabling Spark is not easy • Version, pyspark version, java version • Build spark image with glibc support • Java gateway process exited before sending its port number • Access spark from k8s is not easy ▶ Some K8s pods are pending with Unknown status • kubectl delete pod {} --grace-period=0 --force ▶ Building your own ML image from python is not easy, use continuumio/miniconda may save you some time ▶ Using batch command to clean the docker images • docker images | grep "something_to_search" | awk '{print $1 ":" $2}' |xargs docker rmi -f • docker system prune Some other findings
  26. 26. References
  27. 27. ▶ https://cmry.github.io/notes/serialize ▶ https://cmry.github.io/notes/serialize-sk ▶ https://github.com/hiveml/simple-ml-serving ▶ https://medium.com/@vikati/the-rise-of-the-model-servers-9395522b6c58 ▶ https://qconsp.com/system/files/presentation-slides/qconsp18-deployingml- may18-npentreath.pdf ▶ https://www.slideshare.net/dscrankshaw/veloxampcamp5-final References
  • SangmoGu

    Jul. 5, 2021
  • zzsza

    Jul. 27, 2019
  • labardini

    Apr. 3, 2019

A introduction about how different tools like seldon, clipper, mlflow and mleap, support deploy machine learning models

Views

Total views

1,198

On Slideshare

0

From embeds

0

Number of embeds

13

Actions

Downloads

42

Shares

0

Comments

0

Likes

3

×