Productionising Machine Learning Models

PRODUCTIONISING MACHINE
LEARNING MODELS
Tash Bickley – FTS Data & AI
linkedin.com/in/tashbickley/

Tash Bickley
Principal Consultant, FTS Data & AI
https://www.linkedin.com/in/tashbickley/
About Me
Background:
- Database administration
- Data engineering (ETL)
- Business intelligence
- Statistics and machine learning
- Data analytics architecture advisory

Productionising Machine Learning Models
• Planning for production
• Selecting the optimal architecture for your solution
• Development and Deployment
• Maintaining the quality of the solution and prediction outcomes in
production

Productionising Machine Learning Models
• Around half of all businesses are developing machine learning solutions
• More than 70% of industry leaders believe AI is important1
• Gartner predicts only 25% of machine learning projects are successful
• Capgemini study found 15% of models made it to full-scale production
• Many models in production quickly become liabilities – degrade within
days or months
1. Dresner Report 2019 https://www.forbes.com/sites/louiscolumbus/2019/09/08/state-of-ai-and-machine-learning-in-2019/

Microsoft Chatbot Tay.ai
• Tay posted 96,000 tweets
• < 24 hours to become offensive
• ‘Repeat after me’
• No filter – jokes, irony, malice
• No coherent personality – hybrid
• Who was the audience?
• Tweeted i love feminisim and I f*#!ng hate feminists
and about Bruce Jenner (learned not repeated):
caitlyn jenner is a hero & is a stunning, beautiful woman and
caitlyn jenner isn't a real woman yet she won woman of the year?

How can we avoid a poorly performing AI solution?
• Focus on desired outcomes
• Build a collaborative team with skills for end-to-end requirements
• Data inputs – volume, relevance and quality
• Modelling – feature engineering/training/testing/evaluating/model selection
• Testing (software type not ML training)
• Model designed and trained to suit platform for inference
• Monitoring platform, data inputs and model targets/outcomes
• DevOps
• Decision Engine – what actions do we want to take?
• Feedback from outcomes to improve model
• End-to-end platform implementation that refreshes the model

• Where are we in the hype cycle? – possibly near the peak?
How can we avoid a poorly performing AI solution?

Outcomes Focus
• What business problem will this solve?
• What outcomes does the business want?
• How will we measure success?
• Is the solution dependable and cost-effective?
• What actions should the model trigger?
• Define non-functional requirements e.g.
infrastructure requirements in production,
up-time, scalability
Ford Motor Company
Amazon.com

Outcomes Focus
➢Define a set of metrics for testing machine
learning model
➢Include metrics in machine learning model
monitoring process
➢Actions triggered by model predictions need
to be coded, tested, deployed, maintained and
monitored also – most likely by a team with
different technical skillset
➢How can we manage the risks?
Ford Motor Company
Amazon.com

Outcomes Focus – Ensuring Success
• Business sponsorship
• Realistic budget, timeframe
• Business actively involved in defining requirements and testing
solution
• Digitization journey – most success where more data digitized
• Manage change – how will solutions impact the way people do
their jobs and serve customers? How will outcomes be trusted?
• Infrastructure to suit the solution
• The right technical and business teams
• On-going monitoring and refresh
• Sufficient volume of quality Data!
• Does the solution make sense long-term?
Ford Motor Company
Amazon.com

Data Science Architecture
• Analytics pipeline – data flows from source, transformation, through model,
outputs prediction, which sends results to consumer or triggers action(s)
• Machine learning prediction models for different purposes:
• regression and classification models
e.g. score, scale, rank, clusters, anomalies
• Purpose influences:
• choice of model/algorithm
• data and features selected
• delivery mechanism(s) for outcomes from model
• monitoring and refresh requirements
➢ Decisions about how solution is architected

Architectures are Solution Specific
• Type of deployment
• Batch?
• Real-time?
• Time-series?
• Delivery mechanism
• Email? (batch)
• Report? (batch)
• Notification/Alert? (real-time,
time-series)
• Web Service? (real-time or
batched results)
• Embedded in App? (real-time)
• Online or offline?
• Action to take?
• ML model refresh frequency?

Data Science Architecture – Hidden Technical Debt
ML Modelling is the small black box in the middle

Data Science Architecture – Possible approaches
Four potential ML system architecture approaches:
How to Deploy Machine Learning Models: A Guide – March 2019 Christopher Samiullah
https://christophergs.github.io/machine%20learning/2019/03/17/how-to-deploy-machine-learning-models/

Data Ingestion Layer
• What data do we need?
• What data is available?
• Where does the data reside? – on-premise, external (web, sensors, IoT)
• Is the solution batch or near real-time?
• Data quality
• Data pre-processing
• Data cleansing, missing values, integration, schema drift, filtering noise
• Privacy and regulatory requirements
• Security layer

Data Ingestion Layer
• Ideally make data available for all data scientists and modelling – business can
also use data for reporting and business intelligence.
• In reality:
• Different data for machine learning modelling than reporting
• Trying to integrate everything is a large task - focus only on data needed initially
if data is not already managed centrally
• Don’t wait for perfect data strategy
• Have an enterprise-wide approach to data ingestion and integration
➢Over time data ingestion and transformation will take less time, more data
already cleansed and integrated leading to better quality data inputs

Data Storage Layer
• What data from model will be stored?
• Input data – training set data (for version control)
• Model configuration at run-time (training and prediction)
• Model outcomes/targets
• Log files – status, audit, errors, exceptions
• Actions taken
• Where will ML modelling data be stored?
E.g. training sets, configurations
• In-memory, near real-time, batch? – different storage pipelines

Feature Engineering Layer
• Usually generated during model training
• Share high-quality features
• Many have value across the organisation – not just for machine learning
• In some cases can be generated in advance as part of ETL layer
• data cleansing, missing values, indicator variables, interaction features,
transformations e.g. extracting hour from datetime
• Functions can be defined to generate feature on retrieval e.g. via web service
• Store features for sharing and provide access paths
• Monitor data ETL and feature engineering processes
• Batch vs real-time requirements? – different feature generation pipeline and
storage timing

Feature Store – Uber
• Features are defined by data scientists and loaded by ETL
• Features and custom algorithms (models) are shared
between teams
• Features automatically calculated, stored and updated
• Common functions defined e.g. normalizing, datetime
• Easy to consume features
• Used for both training and prediction
• Accelerates machine learning projects & outcomes
• Different processes for real-time and batch
• Additional metadata added to feature
– owner, description, SLA
• 10,000 features in Uber feature store

Modelling Layer
• Model training and testing – exploratory, experimentation
• Evaluation and model selection
• Is the best performing model always the best choice for production?
- compress model for deployment – trade-off performance
- trade-off speed of scoring with accuracy (latency threshold is a business metric)
• Identify bias – can be inherent in underlying input data or collection mechanisms
• Explainability – regulatory requirements, reduces risk, improves testing
• In ML model who becomes accountable for outcomes?
• Different hardware and infrastructure needed for model training and inference

Testing the model
• Model should be tested using traditional software approaches
– system testing, integration testing, user acceptance testing
• Evaluate against business metrics and outcomes initially scoped
• Test extreme values/possible outliers
• Perturb the inputs
• Test for bias, test explainability
• Verify data quality
• Validate decision engine actions and reports based on data inputs
and predictions

DevOps and Deployment
• ML model version control – different requirements
• Reproducibility of experiments
• Configuration, parameters, hyperparameters, algorithms
• Data versioning - include data input for training model
• DataVersion Control (DVC)
• Where will model be deployed?
• Cloud
• On-premise
• Web service
• Embedded in app or physical equipment
• Within data ingestion pipeline – streaming, near real-time
• How will model be packaged for deployment e.g. PMML, ONNX, PFA, Pickle, Flask, etc
• Solution deployment – containerize e.g. Docker, Kubernetes/Kubeflow
• Continuous Delivery / Continuous Integration / Continuous Deployment (CI/CD)

Infrastructure and Security Layers
• Cloud, on-premise or hybrid?
• Batch or real-time – auto-scaling requirements and limitations
• Online or offline
• Requirements for data storage, ingestion, model training, inference, serving
outcomes/actions, scalability, model performance
• What hardware will the solution be deployed to? Model needs to function
effectively on production hardware – e.g. are there available GPU resources in
production or an image classification solution? Can model be compressed?
• How will data be input to model in production?
Example: We definitely want an offline model that is secure and
performs with millisecond speed and precision for a self-driving car

Prediction Layer
• Goal: Deliver right data and insights to consumers at right time
• How quickly does model scoring need to occur?
• What are latency requirements for actions and outcomes?
• Decision engine – what action(s) will be taken
based on predictions
• How will predictions be actioned?
• Web service call
• Report
• Batch process or workflow triggered
• Alert or notification
• Action on a piece of equipment (functional AI)

Feedback Layer
• Capture feedback from model outputs and actions through monitoring
• Include in data input to retrain and improve model
• Optimally include a human in the feedback process – correct tags, identify
anomalies, tuning corrections based on experience
• Feedback can come from consumers responses to actions:
• Click-through rate from email received for advertising campaign
• Uplift in sales
• Customer service improvement measures
• Equipment downtime reductions
• Use model to predict errors and correct e.g. Uber Eats ETA time
• Compare predictions with actual behavior e.g. recommended items

Monitoring Layer
• Model performance degrades over time (sometimes in < 24 hours)
• Trends and tastes change over time, competitors change
• Compare model performance with pre-defined business metrics
• Compare with baselines for specific data slices to identify bias
• Evaluate model outcomes in production (AUC-ROC, PR, distribution skews in input data and
features, mean reciprocal rank, etc)
• Monitor infrastructure, ETL, decision engine, model and other components
• Capture and monitor logs - Is 24/7 support required?What are the SLAs?
• Model predictions can lead to changes in user behaviour and model performance
e.g. credit card fraud solutions, change in pricing,
• Optimal hyperparameters may change over time – automate processes to test for improved choices,
and even model refresh process

Model Degradation Examples
• 30-day hospital readmissions prediction:
• Changes impacting model outside the control of the business or IT
• Fields in electronic health record were changed to make documentation easier – made some fields blank
• Lab tests were switched to a different lab who used different codes
• An additional type of insurance was accepted by the hospital changing the distribution of people who went
to the ER
• A new server was provisioned for some source data to improve performance – timestamps
mismatched causing data integration issues and the model failed
• An automated camera sensor for detecting defects in a production line failed due to – dirt on lense
and could not scan a product that wasn’t perfectly centred
• Credit card fraud alerts have been hijacked by fake SMS requesting responses
• Other examples: https://www.oreilly.com/radar/lessons-learned-turning-machine-learning-models-
into-real-products-and-services/

Model Refresh Layer
• How often should model be refreshed? – on-going tuning and redeployment
• Depends on:
• The application of the model
• Changes in data inputs or new data available e.g. latest call centre data, current retail sales and prices
• Likely speed of model degradation and changing trends in market
• Importance of up-to-date data for predictions and use case
• How long it takes to retrain and evaluate new model
• Changed outcomes and behaviour can produce an inferior new model
• Fraud solutions reduce fraudulent transactions in production – less anomalies in new production data
=> need a way of saving fraudulent transactions and accuracy indicators to use in the training dataset
• Can steps be taken to accelerate model training and deployment?
• What deployment strategy will be implemented for updates?
• Shadow mode
• Canary deployment
• Update operational database from predictions

Solution Specific Architectures
Price matching promises -
online
Bulk customer
marketing email
Voice-assistant
contact search Customer Risk of Loan Default report
Patient health monitoring – prognosis,
preventive treatments
Video stream intruder detection
Smart metre monitoring; IoT
device sensors
Credit card
fraud alert
Back to our Machine Learning solutions – what is the ideal architecture for each example?

And one last word fromTay’s sibling Zo
Correcting machine learning model degradation can also have adverse effects…

Up-coming Meetups and Slideshare
Introduction to Kubeflow for MLOps:
• https://www.meetup.com/MLOps-Melbourne/events/hskjjryznbfb/
Machine Learning ASAP:The shortest paths to production
• https://www.meetup.com/Enterprise-Data-Science-
Architecture/events/264185824/
ML Governance slides (Aug 7)
• https://www.slideshare.net/TerenceSiganakis/enterprise-machine-
learning-governance

THANKYOU!
tash.bickley@ftsg.com.au
linkedin.com/in/tashbickley/

Productionising Machine Learning Models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Productionising Machine Learning Models

Similar to Productionising Machine Learning Models (20)

Recently uploaded

Recently uploaded (20)

Productionising Machine Learning Models