<p>Last summer, Databricks launched MLflow, an open source platform to manage the machine learning lifecycle, including experiment tracking, reproducible runs and model packaging. MLflow has grown quickly since then, with over 120 contributors from dozens of companies, including major contributions from R Studio and Microsoft. It has also gained new capabilities such as automatic logging from TensorFlow and Keras, Kubernetes integrations, and a high-level Java API. In this talk, we’ll cover some of the new features that have come to MLflow, and then focus on a major upcoming feature: model management with the MLflow Model Registry. Many organizations face challenges tracking which models are available in the organization and which ones are in production. The MLflow Model Registry provides a centralized database to keep track of these models, share and describe new model versions, and deploy the latest version of a model through APIs. We’ll demonstrate how these features can simplify common ML lifecycle tasks.</p>
3. Traditional Software Machine Learning
Goal: Optimize a metric (e.g., accuracy)
• Constantly experiment to improve it
Quality depends on input data
and tuning parameters
Compare + combine many libraries,
models & algorithms for the same task
Goal: Meet a functional specification
Quality depends only on code
Typically pick one software stack
4. Production ML is Even Harder
Data Prep
Training
Deployment
Raw Data
ML ENGINEER
APPLICATION
DEVELOPER
DATA
ENGINEER
ML apps must be fed new data
to keep working
Design, retraining & inference
done by different people
5. Solution: Machine Learning Platforms
Software to manage the ML lifecycle
Examples: Uber Michelangelo,
Google TFX, Facebook FBLearner
Data Prep
Training
Deployment
Raw Data
Versioning, CI/CD, QA,
ops, monitoring, etc
6. MLflow: An Open Source ML Platform
Three components:
• Tracking: experiment tracking
• Projects: reproducible runs
• Models: model packaging
140 contributors, 800K downloads/month
Works with any ML library, programming language, deployment tool
15. The Model Management Problem
When you’re working on one ML app alone, storing your
models in files is manageable
MODEL
DEVELOPER
classifier_v1.h5
classifier_v2.h5
classifier_v3_sept_19.h5
classifier_v3_new.h5
…
16. The Model Management Problem
When you work in a large organization with many models,
management becomes a major challenge:
• Where can I find the best version of this model?
• How was this model trained?
• How can I track docs for each model?
• How can I review models?
MODEL
DEVELOPER
REVIEWER
MODEL
USER
???
17. MLflow Model Registry
Repository of named, versioned
models with comments & tags
Track each model’s stage: dev,
staging, production, archived
Easily load a specific version
18. Model Registry Workflow
Model Registry
MODEL
DEVELOPER
DOWNSTREAM
USERS
AUTOMATED JOBS
REST SERVING
REVIEWERS,
CI/CD TOOLS