Recent Gartner and Capgemini studies predict only around 25% of data science projects are successful and only around 15% make it to full-scale production. Of these, many degrade in performance and produce disappointing results within months of implementation. How can focusing on the desired business outcomes and business use cases throughout a data science project help overcome the odds?
2. Tash Bickley
Principal Consultant, FTS Data & AI
https://www.linkedin.com/in/tashbickley/
About Me
Background:
- Database administration
- Data engineering (ETL)
- Business intelligence
- Statistics and machine learning
- Data analytics architecture advisory
3. Productionising Machine Learning Models
• Planning for production
• Selecting the optimal architecture for your solution
• Development and Deployment
• Maintaining the quality of the solution and prediction outcomes in
production
4. Productionising Machine Learning Models
• Around half of all businesses are developing machine learning solutions
• More than 70% of industry leaders believe AI is important1
• Gartner predicts only 25% of machine learning projects are successful
• Capgemini study found 15% of models made it to full-scale production
• Many models in production quickly become liabilities – degrade within
days or months
1. Dresner Report 2019 https://www.forbes.com/sites/louiscolumbus/2019/09/08/state-of-ai-and-machine-learning-in-2019/
5. Microsoft Chatbot Tay.ai
• Tay posted 96,000 tweets
• < 24 hours to become offensive
• ‘Repeat after me’
• No filter – jokes, irony, malice
• No coherent personality – hybrid
• Who was the audience?
• Tweeted i love feminisim and I f*#!ng hate feminists
and about Bruce Jenner (learned not repeated):
caitlyn jenner is a hero & is a stunning, beautiful woman and
caitlyn jenner isn't a real woman yet she won woman of the year?
6. How can we avoid a poorly performing AI solution?
• Focus on desired outcomes
• Build a collaborative team with skills for end-to-end requirements
• Data inputs – volume, relevance and quality
• Modelling – feature engineering/training/testing/evaluating/model selection
• Testing (software type not ML training)
• Model designed and trained to suit platform for inference
• Monitoring platform, data inputs and model targets/outcomes
• DevOps
• Decision Engine – what actions do we want to take?
• Feedback from outcomes to improve model
• End-to-end platform implementation that refreshes the model
7. • Where are we in the hype cycle? – possibly near the peak?
How can we avoid a poorly performing AI solution?
8. Outcomes Focus
• What business problem will this solve?
• What outcomes does the business want?
• How will we measure success?
• Is the solution dependable and cost-effective?
• What actions should the model trigger?
• Define non-functional requirements e.g.
infrastructure requirements in production,
up-time, scalability
Ford Motor Company
Amazon.com
9. Outcomes Focus
➢Define a set of metrics for testing machine
learning model
➢Include metrics in machine learning model
monitoring process
➢Actions triggered by model predictions need
to be coded, tested, deployed, maintained and
monitored also – most likely by a team with
different technical skillset
➢How can we manage the risks?
Ford Motor Company
Amazon.com
10. Outcomes Focus – Ensuring Success
• Business sponsorship
• Realistic budget, timeframe
• Business actively involved in defining requirements and testing
solution
• Digitization journey – most success where more data digitized
• Manage change – how will solutions impact the way people do
their jobs and serve customers? How will outcomes be trusted?
• Infrastructure to suit the solution
• The right technical and business teams
• On-going monitoring and refresh
• Sufficient volume of quality Data!
• Does the solution make sense long-term?
Ford Motor Company
Amazon.com
11. Data Science Architecture
• Analytics pipeline – data flows from source, transformation, through model,
outputs prediction, which sends results to consumer or triggers action(s)
• Machine learning prediction models for different purposes:
• regression and classification models
e.g. score, scale, rank, clusters, anomalies
• Purpose influences:
• choice of model/algorithm
• data and features selected
• delivery mechanism(s) for outcomes from model
• monitoring and refresh requirements
➢ Decisions about how solution is architected
12. Architectures are Solution Specific
• Type of deployment
• Batch?
• Real-time?
• Time-series?
• Delivery mechanism
• Email? (batch)
• Report? (batch)
• Notification/Alert? (real-time,
time-series)
• Web Service? (real-time or
batched results)
• Embedded in App? (real-time)
• Online or offline?
• Action to take?
• ML model refresh frequency?
14. Data Science Architecture – Possible approaches
Four potential ML system architecture approaches:
How to Deploy Machine Learning Models: A Guide – March 2019 Christopher Samiullah
https://christophergs.github.io/machine%20learning/2019/03/17/how-to-deploy-machine-learning-models/
15. Data Ingestion Layer
• What data do we need?
• What data is available?
• Where does the data reside? – on-premise, external (web, sensors, IoT)
• Is the solution batch or near real-time?
• Data quality
• Data pre-processing
• Data cleansing, missing values, integration, schema drift, filtering noise
• Privacy and regulatory requirements
• Security layer
16. Data Ingestion Layer
• Ideally make data available for all data scientists and modelling – business can
also use data for reporting and business intelligence.
• In reality:
• Different data for machine learning modelling than reporting
• Trying to integrate everything is a large task - focus only on data needed initially
if data is not already managed centrally
• Don’t wait for perfect data strategy
• Have an enterprise-wide approach to data ingestion and integration
➢Over time data ingestion and transformation will take less time, more data
already cleansed and integrated leading to better quality data inputs
17. Data Storage Layer
• What data from model will be stored?
• Input data – training set data (for version control)
• Model configuration at run-time (training and prediction)
• Model outcomes/targets
• Log files – status, audit, errors, exceptions
• Actions taken
• Where will ML modelling data be stored?
E.g. training sets, configurations
• In-memory, near real-time, batch? – different storage pipelines
18. Feature Engineering Layer
• Usually generated during model training
• Share high-quality features
• Many have value across the organisation – not just for machine learning
• In some cases can be generated in advance as part of ETL layer
• data cleansing, missing values, indicator variables, interaction features,
transformations e.g. extracting hour from datetime
• Functions can be defined to generate feature on retrieval e.g. via web service
• Store features for sharing and provide access paths
• Monitor data ETL and feature engineering processes
• Batch vs real-time requirements? – different feature generation pipeline and
storage timing
19. Feature Store – Uber
• Features are defined by data scientists and loaded by ETL
• Features and custom algorithms (models) are shared
between teams
• Features automatically calculated, stored and updated
• Common functions defined e.g. normalizing, datetime
• Easy to consume features
• Used for both training and prediction
• Accelerates machine learning projects & outcomes
• Different processes for real-time and batch
• Additional metadata added to feature
– owner, description, SLA
• 10,000 features in Uber feature store
20. Modelling Layer
• Model training and testing – exploratory, experimentation
• Evaluation and model selection
• Is the best performing model always the best choice for production?
- compress model for deployment – trade-off performance
- trade-off speed of scoring with accuracy (latency threshold is a business metric)
• Identify bias – can be inherent in underlying input data or collection mechanisms
• Explainability – regulatory requirements, reduces risk, improves testing
• In ML model who becomes accountable for outcomes?
• Different hardware and infrastructure needed for model training and inference
21. Testing the model
• Model should be tested using traditional software approaches
– system testing, integration testing, user acceptance testing
• Evaluate against business metrics and outcomes initially scoped
• Test extreme values/possible outliers
• Perturb the inputs
• Test for bias, test explainability
• Verify data quality
• Validate decision engine actions and reports based on data inputs
and predictions
22. DevOps and Deployment
• ML model version control – different requirements
• Reproducibility of experiments
• Configuration, parameters, hyperparameters, algorithms
• Data versioning - include data input for training model
• DataVersion Control (DVC)
• Where will model be deployed?
• Cloud
• On-premise
• Web service
• Embedded in app or physical equipment
• Within data ingestion pipeline – streaming, near real-time
• How will model be packaged for deployment e.g. PMML, ONNX, PFA, Pickle, Flask, etc
• Solution deployment – containerize e.g. Docker, Kubernetes/Kubeflow
• Continuous Delivery / Continuous Integration / Continuous Deployment (CI/CD)
23. Infrastructure and Security Layers
• Cloud, on-premise or hybrid?
• Batch or real-time – auto-scaling requirements and limitations
• Online or offline
• Requirements for data storage, ingestion, model training, inference, serving
outcomes/actions, scalability, model performance
• What hardware will the solution be deployed to? Model needs to function
effectively on production hardware – e.g. are there available GPU resources in
production or an image classification solution? Can model be compressed?
• How will data be input to model in production?
Example: We definitely want an offline model that is secure and
performs with millisecond speed and precision for a self-driving car
24. Prediction Layer
• Goal: Deliver right data and insights to consumers at right time
• How quickly does model scoring need to occur?
• What are latency requirements for actions and outcomes?
• Decision engine – what action(s) will be taken
based on predictions
• How will predictions be actioned?
• Web service call
• Report
• Batch process or workflow triggered
• Alert or notification
• Action on a piece of equipment (functional AI)
25. Feedback Layer
• Capture feedback from model outputs and actions through monitoring
• Include in data input to retrain and improve model
• Optimally include a human in the feedback process – correct tags, identify
anomalies, tuning corrections based on experience
• Feedback can come from consumers responses to actions:
• Click-through rate from email received for advertising campaign
• Uplift in sales
• Customer service improvement measures
• Equipment downtime reductions
• Use model to predict errors and correct e.g. Uber Eats ETA time
• Compare predictions with actual behavior e.g. recommended items
26. Monitoring Layer
• Model performance degrades over time (sometimes in < 24 hours)
• Trends and tastes change over time, competitors change
• Compare model performance with pre-defined business metrics
• Compare with baselines for specific data slices to identify bias
• Evaluate model outcomes in production (AUC-ROC, PR, distribution skews in input data and
features, mean reciprocal rank, etc)
• Monitor infrastructure, ETL, decision engine, model and other components
• Capture and monitor logs - Is 24/7 support required?What are the SLAs?
• Model predictions can lead to changes in user behaviour and model performance
e.g. credit card fraud solutions, change in pricing,
• Optimal hyperparameters may change over time – automate processes to test for improved choices,
and even model refresh process
27. Model Degradation Examples
• 30-day hospital readmissions prediction:
• Changes impacting model outside the control of the business or IT
• Fields in electronic health record were changed to make documentation easier – made some fields blank
• Lab tests were switched to a different lab who used different codes
• An additional type of insurance was accepted by the hospital changing the distribution of people who went
to the ER
• A new server was provisioned for some source data to improve performance – timestamps
mismatched causing data integration issues and the model failed
• An automated camera sensor for detecting defects in a production line failed due to – dirt on lense
and could not scan a product that wasn’t perfectly centred
• Credit card fraud alerts have been hijacked by fake SMS requesting responses
• Other examples: https://www.oreilly.com/radar/lessons-learned-turning-machine-learning-models-
into-real-products-and-services/
28. Model Refresh Layer
• How often should model be refreshed? – on-going tuning and redeployment
• Depends on:
• The application of the model
• Changes in data inputs or new data available e.g. latest call centre data, current retail sales and prices
• Likely speed of model degradation and changing trends in market
• Importance of up-to-date data for predictions and use case
• How long it takes to retrain and evaluate new model
• Changed outcomes and behaviour can produce an inferior new model
• Fraud solutions reduce fraudulent transactions in production – less anomalies in new production data
=> need a way of saving fraudulent transactions and accuracy indicators to use in the training dataset
• Can steps be taken to accelerate model training and deployment?
• What deployment strategy will be implemented for updates?
• Shadow mode
• Canary deployment
• Update operational database from predictions
29. Solution Specific Architectures
Price matching promises -
online
Bulk customer
marketing email
Voice-assistant
contact search Customer Risk of Loan Default report
Patient health monitoring – prognosis,
preventive treatments
Video stream intruder detection
Smart metre monitoring; IoT
device sensors
Credit card
fraud alert
Back to our Machine Learning solutions – what is the ideal architecture for each example?
30. And one last word fromTay’s sibling Zo
Correcting machine learning model degradation can also have adverse effects…
31. Up-coming Meetups and Slideshare
Introduction to Kubeflow for MLOps:
• https://www.meetup.com/MLOps-Melbourne/events/hskjjryznbfb/
Machine Learning ASAP:The shortest paths to production
• https://www.meetup.com/Enterprise-Data-Science-
Architecture/events/264185824/
ML Governance slides (Aug 7)
• https://www.slideshare.net/TerenceSiganakis/enterprise-machine-
learning-governance