In this final Weave Online User Group of 2019, David Aronchick asks: have you ever struggled with having different environments to build, train and serve ML models, and how to orchestrate between them? While DevOps and GitOps have made huge traction in recent years, many customers struggle to apply these practices to ML workloads. This talk will focus on the ways MLOps has helped to effectively infuse AI into production-grade applications through establishing practices around model reproducibility, validation, versioning/tracking, and safe/compliant deployment. We will also talk about the direction for MLOps as an industry, and how we can use it to move faster, with more stability, than ever before.
The recording of this session is on our YouTube Channel here: https://youtu.be/twsxcwgB0ZQ
Speaker: David Aronchick, Head of Open Source ML Strategy, Microsoft
Bio: David leads Open Source Machine Learning Strategy at Azure. This means he spends most of his time helping humans to convince machines to be smarter. He is only moderately successful at this. Previously, David led product management for Kubernetes at Google, launched GKE, and co-founded the Kubeflow project. David has also worked at Microsoft, Amazon and Chef and co-founded three startups.
Sign up for a free Machine Learning Ops Workshop: http://bit.ly/MLOps_Workshop_List
Weaveworks will cover concepts such as GitOps (operations by pull request), Progressive Delivery (canary, A/B, blue-green), and how to apply those approaches to your machine learning operations to mitigate risk.
5. ML at scale
Monthly active
Office 365 users
using AI
180
million
Questions Asked
of Cortana
18
Billion
Number of Signals
Analyzed to Block
Emerging Threats
DAILY
6.5
Trillion
8. Building
a model
Data ingestion Data analysis
Data
transformation
Data validation Data splitting
Trainer
Model
validation
Training
at scale
LoggingRoll-out Serving Monitoring
9. Ok, but, like, I’m
a data scientist. IDGAF
I don’t care
about all that.
12. Cowboys and Ranchers Can Be Friends!
SRE/ML EngineersData Scientist
• Quick iteration
• Frameworks they
understand
• Best of breed tools
• No management
headaches
• Unlimited scale
• Reuse of tooling and
platforms
• Corporate compliance
• Observability
• Uptime
17. MLOps = ML + DEV + OPS
Experiment
Data Acquisition
Business Understanding
Initial Modeling
Develop
Modeling
Operate
Continuous Delivery
Data Feedback Loop
System + Model Monitoring
ML
+ Testing
Continuous Integration
Continuous Deployment
18. MLOps Benefits
• Code drives generation
and deployments
• Pipelines are
reproducible and
verifiable
• All artifacts can be
tagged and audited
• SWE best practices for
quality control
• Offline comparisons of
model quality
• Minimize bias and
enable explainability
• Controlled rollout
capabilities
• Live comparison of
predicted vs. expected
performance
• Results fed back to
watch for drift and
improve model
Automation /
Observability Validation
Reproducibility
/Auditability
== VELOCITY and SECURITY (For ML)
23. Real World Multi-Cloud
CI/CD Pipeline
Process Train Stage Serve
Data
Distributed Cloud
SRE/ML Engineers
Data Scientist
ENV
#1
ENV
#2
24. Azure DevOps Pipelines
Cloud-hosted pipelines for Linux, Windows and macOS.
Any language, any platform, any cloud
Build, test, and deploy Node.js, Python, Java, PHP,
Ruby, C/C++, .NET, Android, and iOS apps. Run in
parallel on Linux, macOS, and Windows. Deploy to
Azure, AWS, GCP or on-premises
Extensible
Explore and implement a wide range of community-
built build, test, and deployment tasks, along with
hundreds of extensions from Slack to SonarCloud.
Support for YAML, reporting and more
Containers and Kubernetes
Easily build and push images to container registries
like Docker Hub and Azure Container Registry.
Deploy containers to individual hosts or Kubernetes.
26. First Class Model Training Tasks
CI pipeline captures:
1. Create sandbox
2. Run unit tests and code quality checks
3. Attach to compute
4. Run training pipeline
5. Evaluate model
6. Register model
27. Automated Deployment
CD pipeline captures:
1. Package model into container
image
2. Validate and profile model
3. Deploy model to DevTest (ACI)
4. If all is well, proceed to rollout
to AKS
Everything is done via the CLI
28. Model Versioning & Storage
• which data,
• which experiment / previous model(s),
• where’s the code / notebook)
• Was it converted / quantized?
• Private / compliant data
29. Model Validation
• Data (changes to shape / profile)
• Model in isolation (offline A/B)
• Model + app (functional testing)
• Only deploy after initial validation passes
• Ramp up traffic to new model using A/B
experimentations
• Functional behavior
• Performance characteristics
31. Model Deployment
• Focus on ML, not DevOps
• Get telemetry for service health and model behavior
• code-generation
• API specifications / interfaces
• Cloud Services
• Mobile / Embedded Applications
• Edge Devices
• Quantize / optimize models for target platform
• Compliant + Safe
34. MLOps Gets You to Production
• End-to-end ownership by data science teams
using SWE best practices
• Continuously deliver of value to end users.
• Enables lineage, auditability and regulatory
compliance through consistency
46. A Small Example of Issues You Can Have…
• Inappropriate HW/SW stack
• Mismatched driver versions
• Crash looping deployment
• Data/model versioning [Nick Walsh]
• Non-standard images/OS version
• Pre-processing code doesn’t match
production pre-processing
• Production data doesn’t match
training/test data
• Output of the model doesn’t match
application expectations
• Hand-coded heuristics better than model
[Adam Laiacano]
• Model freshness (train on out-of-date
data/input shape changed)
• Test/production statistics/population
shape skew
• Overfitting on training/test data
• Bias introduction (or not tested)
• Over/under HW provisioning
• Latency issues
• Permissions/certs
• Failure to obey health checks
• Killed production model before roll out
of new/in wrong order
• Thundering herd for new model
• Logging to the wrong location
• Storage for model not allocated
properly/accessible by deployment
tooling
• Route to artifacts not available for
download
• API signature changes not
propagated/expected
• Cross-data center latency
• Expected benefit doesn’t materialize (e.g.
multiple components in the app change
simultaneously)
• Get wrong/no traffic because A/B config
didn’t roll out
• Get too much traffic too soon (expected
to canary/exponential roll out)
• Lack of visibility into real-time model
behavior (detecting data drift, live data
distribution vs train data, etc) [Nick
Walsh]
• Outliers not predicted [MikeBSilverman]
• Change was a good change, but didn’t
communicate with the rest of the team
(so you must roll back)
• No dates! (date to measure
impact/improvement against a pre-
agreed measure; date scheduled to
assess data changes) [Mary Branscombe]
• No CI/CD; manual changes untracked
[Jon Peck]
• LACK OF DOCUMENTATION!! (the
problem, the testing, the solution, lots
more) [Terry Christiani]
• Successful model causes pain elsewhere
in the organization (e.g. detecting faults
previously missed) [Mark Round]
Or It Just Doesn’t Work!
At All!
47. Does My Model Actually Work?
SRE/ML EngineersData Scientist
Laptop The Cloud
Source Control
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment
Nice. Nice.
ü
50. MLOps is a Platform and a Philosophy
Even if:
o Every data scientist trained...
o And you had all the tools necessary...
o And they all worked together...
o And your SREs understood ML modeling...
o And and and and ...
You’d still need a permenant, repeatble
record of what you did
55. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
I’d Like a loan,
please.
Source Control
56. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
No.
Source Control
57. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Ok, but why?
Source Control
58. Source Control
What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Uh oh.
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
Lawyer
LawyerLawyer
59. It’s Not Just About Explainability!
• Yes, models are complicated
• But, that’s not enough:
o What data did you train on?
o How did you transform/exclude outliers?
o What are the data statistics?
o Did anything change between code and production?
o What model did you actually serve (to this person)?
• MLOps can help!
60. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Source Control
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment
61. 32c04681d7573
What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment
Source Control
Immutable
Metadata Store
b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759
9ce88802f0759
62. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment
Source Control
Immutable
Metadata Store
b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759
32c04681d7573
Why didn’t I get
a loan?
9ce88802f0759
63. What Did My Customers See?
SRE/ML Engineers
The Cloud
Front End
Model Server
Customer
Automated
Validation &
Profiling
Package
For Rollout
Explain Model
& Look for
Bias
Clean/
Minimize
Code
Sane
Deployment
Source Control
Immutable
Metadata Store
b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759
32c04681d7573
32c04681d7573
9ce88802f0759
64. What Does All This Stuff Solve For?
1. Does My Model Actually Work?
2. What Did My Customers See?
3. Is My Model Still Good?
65. What Does All This Stuff Solve For?
1. Does My Model Actually Work?
2. What Did My Customers See?
3. Is My Model Still Good?
69. Is My Model Still Good?
SRE/ML Engineers
The Cloud
Front End
Model Server
f7c5f9fe7b762
It’s a
duck!
BLUE
There is a
blue or
orange
DUCK inside
this barn.
What color
is the duck?
71. Is My Model Still Good?
SRE/ML Engineers
The Cloud
Front End
Model Server
f7c5f9fe7b762
It’s a
duck!
BLUE
5 Blue Ducks
995 Yellow Ducks
Accuracy = 99%
False Positive = 1%
???????????????????
75. Is My Model Still Good?
SRE/ML Engineers
The Cloud
Front End
Model Server
f7c5f9fe7b762
It’s a
duck!
BLUE
995 Yellow Ducks
5 Blue Ducks
WRONG 2/3rd of the Time!
Accuracy = 99%
False Positive = 1%
???????????????????
78. Is My Model Still Good?
SRE/ML Engineers
The Cloud
Front End
Model Server
f7c5f9fe7b762
It’s a
duck!
BLUE
995 Yellow Ducks
5 Blue Ducks
Model Server
d4093cc84b267
80. Is My Model Still Good?
SRE/ML Engineers
The Cloud
Front End
Model Server
995 Yellow Ducks
5 Blue Ducks
d4093cc84b267
81. Is My Model Still Good?
SRE/ML Engineers
The Cloud
Front End
Model Server 500 Yellow Ducks
500 Blue Ducks
d4093cc84b267
82. Is My Model Still Good?
• Models != Code – they can go stale... QUICKLY.
• IMPORTANT:
o Watch your model & data for drift from training
o Regularly (if not continuously) retrain, even before
performance begins to fail
o Multiple versions rollbacks are not uncommon!
• Without an e2e MLOps pipeline, many of the
above are O(really really hard)!
83. What Does All This Stuff Solve For?
1. Does My Model Actually Work?
2. What Did My Customers See?
3. Is My Model Still Good?
85. MLOps Gives* You…
• Software best practices for building machine
learning solutions
• Repeatable workflow for training a model and
rolling it out to production
• An immutable record of what’s actually running
• Lineage of model creation including data sources
• Acceleration from code to customer benefits
* Requires some human and software work
86. What’s Next for MLOps
• Simplify monitoring and retraining
• Extend MLOps for data incl prep and profiling
• Enterprise features
o Test cases
o Auditing
o Security
o Resource management (bin packing / resource optimization)
o Network isolation
• Metadata and API standards
Or, better yet, you tell us!
87. It’s a whole new world
• Data science will touch
EVERY industry.
• We can’t ask people to
become a PhD in statistics
though.
• How do WE help everyone
take advantage of this
transformation?