SlideShare a Scribd company logo
1 of 38
1StoryStream.ai
From POC to Production in
Minimal Time –
Avoiding Pain in ML Projects
Dr Janet Bastiman
@yssybyl
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
poc-ml/
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
2StoryStream.ai
Project timings
Dr Janet Bastiman @yssybyl
3StoryStream.ai
The world’s leading automotive content platform
StoryStream is a dedicated automotive content platform, trusted by some of the
world’s leading car brands. Specifically created to help automotive brands
provide a more relevant, engaging customer experience, fuelled with authentic
content and designed for efficiently scaling content operations across global
teams.
● Grow customer engagement and conversions by up to 25%
● Reduce content creation and management costs by up to 60%
● Provide a more authentic customer experience
● Understand your customer in a deeper way
About StoryStream
The Core StoryStream Benefits
4StoryStream.ai
5StoryStream.ai
Dr Janet Bastiman @yssybyl
6StoryStream.ai
“[Client] needs this to go live at the end of
the month, I promised them we could
deliver...”
Every salesperson ever
Dr Janet Bastiman @yssybyl
7StoryStream.ai
Project timings
Dr Janet Bastiman @yssybyl
● 35 models = 1050 days (one person linear)
● ~ 5 years for one person working Mon-Fri - who is allowed
holidays :)
● 250 days with parallelisation of tasks and data upfront
● 150 days on worksheet, balanced by an increase in ongoing
license
8StoryStream.ai
Can you guess what happened next?
Dr Janet Bastiman @yssybyl
9StoryStream.ai
What would it take to get it done in that time?
Dr Janet Bastiman @yssybyl
The Core (2003)
Paramount Pictures
10StoryStream.ai
“They don’t have any data to give us”
Dr Janet Bastiman @yssybyl
11StoryStream.ai
If you are dealing with any critical
inferencing do not take shortcuts, do it
properly and do it rigorously and stand up
to the company and say no - make sure
it’s clear that the timelines will be longer
to get it right.
Dr Janet Bastiman @yssybyl
12StoryStream.ai
Without Data ML is just a Random Result
Dr Janet Bastiman @yssybyl
● Legal public sources
● https://github.com/awesomedata/awesome-public-datasets
● https://www.kaggle.com/datasets
● Take your own pictures/videos
● access/permission?
● Slow and inconsistent
● Scrape the client site with permission
13StoryStream.ai
How much data?
Dr Janet Bastiman @yssybyl
• Vision: 1000 images per output class but depends on
complexity of the problem
• Time series: at least double the time period over which you
are predicting, but be cautious of data becoming irrelevant
• Text: very variable depending on the problem
• This also changes if you already have pre-trained networks
that you’re updating
14StoryStream.ai
What do you do with the Data?
Dr Janet Bastiman @yssybyl
● Selection bias
● Random Sampling
● Over coverage
● Undercoverage
● Measurement (Response) error
● Processing errors
● Participation bias
15StoryStream.ai
What do you do with the Data?
Dr Janet Bastiman @yssybyl
Photos
Scrape
S3 bucket ● Unique filename
● source
● Set uuid (if multiple images of
same car)
● Date taken
● S3 bucket per vehicle variant
16StoryStream.ai
What do you do with the Data?
Dr Janet Bastiman @yssybyl
Photos
Scrape
Car
Detector
S3
Bucket
Manual
verification
● Extra field for label
● S3 bucket name became
mostly irrelevant
17StoryStream.ai
Crowdsource labelling
Dr Janet Bastiman @yssybyl
https://xkcd.com/1897/
19StoryStream.ai
Data Pipeline
Dr Janet Bastiman @yssybyl
Data In
Object
detector
Images
saved
Auxiliary
info saved
Temp public
access
Extract for
Turk
Import of
results
Dashboard
Expert
clean
Data
Ready
21StoryStream.ai
Transfer Learning
Dr Janet Bastiman @yssybyl
● Use transfer learning - fix most of the weights of
a good network and adapt the last few layers
● Fast and easy retraining and works with smaller
data sets in a variety of fields
● (image) https://arxiv.org/abs/1903.02196
● (series) https://arxiv.org/abs/1907.01332
● (audio) https://arxiv.org/abs/1909.07526
Deep Learning for Vision Systems, Mohamed Elgendy
22StoryStream.ai
Unbalanced Data
Dr Janet Bastiman @yssybyl
23StoryStream.ai https://www.designhacks.co/products/cognitive-bias-codex-poster
25StoryStream.ai
Stand on the shoulders
of giants…
Dr Janet Bastiman @yssybyl
● For some problems CNNs are robust to
noisy labels and up to 20 time noise to
real labels can still give business level
accuracy
https://arxiv.org/pdf/1705.10694.pdf
● Find the right architecture
http://www.asimovinstitute.org/neural-network-zoo/
26StoryStream.ai
Go old school
Dr Janet Bastiman @yssybyl
Reduce the dimensionality of the problem and use Bayesian approach, KNN or SVM
https://xkcd.com/2059/
27StoryStream.ai
Choose wisely
Dr Janet Bastiman @yssybyl
28StoryStream.ai
Simplify the problem
Dr Janet Bastiman @yssybyl
Removal of camera artefacts in eye images to
make detection easier - Jeffrey De Fauw
http://blog.kaggle.com/2015/08/10/detecting-diabetic-
retinopathy-in-eye-images/
Image Image
Specific
Vehicle
Specific
Vehicle
Car?
Make?
Removal of Doppler effect on moving source using
fractional octave band shifting, F Mobley
https://asa.scitation.org/doi/pdf/10.1121/2.0000578?class=pdf
Δ𝑛=−r[𝑙𝑜𝑔2(1−𝑀cos𝜃sin𝜑)]
29StoryStream.ai
Get every last drop from what you have
Dr Janet Bastiman @yssybyl
Statistical anatomical modelling for efficient and
personalised spine biomechanical models - I Castro
Mateos PhD thesis
Have a toolkit of augmentation
approaches but choose what’s relevant to
your needs...
30StoryStream.ai
Augmentation - detail
Dr Janet Bastiman @yssybyl
● Flip L/R U/D
● Rotations
● Reduce or enlarge bounding box coordinates by N%
● Add occlusions
https://www.umbc.edu/rssipl/people/aplaza/Papers/Journals/2019
.GRSL.Occlusion.pdf
● Change hue saturation and value of colours in the image
https://arxiv.org/pdf/1902.06543.pdf
● Copypairing - https://arxiv.org/abs/1909.00390#
34StoryStream.ai
Infrastructure
Dr Janet Bastiman @yssybyl
Data In Data Store
Taxonomy
Classifier
Definition
Test Set
DockerHub
Setup
Codeship
Project
GitHub
Setup
Notification
Slack
Email
Template
AWS
Image
Scripts
Dashboard
35StoryStream.ai
Cloud Formation
Dr Janet Bastiman @yssybyl
36StoryStream.ai
Automation
Dr Janet Bastiman @yssybyl
Delete
local data
Build
container
Get model
and key
Run test
harness
Validate
container
Run
container
Report
results
DashboardCommit
Build new
Container
37StoryStream.ai
Stack Automation
Dr Janet Bastiman @yssybyl
Add new
container
Start stack
Run stack
test harness
Better?
Compare
results
Create docs
YesUpdate CFLive
No
Human
investigation
38StoryStream.ai
Automatic Documentation
Dr Janet Bastiman @yssybyl
LaTeX
templates
Pweave
.tex files
and images
Save with
model files
Convert to
PDF
Run LaTeX
If live, save
in live docs
Email to
team
40StoryStream.ai
Did we make it?
Dr Janet Bastiman @yssybyl
● Some really difficult images
● Only expected images were
given
● Where it was wrong it was
(mostly) sensibly wrong
● Client happy
● Cool automated system
41StoryStream.ai
The Playbook
Dr Janet Bastiman @yssybyl
ai-playbook.com
42StoryStream.ai
Dr Janet Bastiman @yssybyl
Thank You
https://xkcd.com/2191/
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
poc-ml/

More Related Content

More from C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

More from C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

From POC to Production in Minimal Time - Avoiding Pain in ML Projects

  • 1. 1StoryStream.ai From POC to Production in Minimal Time – Avoiding Pain in ML Projects Dr Janet Bastiman @yssybyl
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ poc-ml/
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 5. 3StoryStream.ai The world’s leading automotive content platform StoryStream is a dedicated automotive content platform, trusted by some of the world’s leading car brands. Specifically created to help automotive brands provide a more relevant, engaging customer experience, fuelled with authentic content and designed for efficiently scaling content operations across global teams. ● Grow customer engagement and conversions by up to 25% ● Reduce content creation and management costs by up to 60% ● Provide a more authentic customer experience ● Understand your customer in a deeper way About StoryStream The Core StoryStream Benefits
  • 8. 6StoryStream.ai “[Client] needs this to go live at the end of the month, I promised them we could deliver...” Every salesperson ever Dr Janet Bastiman @yssybyl
  • 9. 7StoryStream.ai Project timings Dr Janet Bastiman @yssybyl ● 35 models = 1050 days (one person linear) ● ~ 5 years for one person working Mon-Fri - who is allowed holidays :) ● 250 days with parallelisation of tasks and data upfront ● 150 days on worksheet, balanced by an increase in ongoing license
  • 10. 8StoryStream.ai Can you guess what happened next? Dr Janet Bastiman @yssybyl
  • 11. 9StoryStream.ai What would it take to get it done in that time? Dr Janet Bastiman @yssybyl The Core (2003) Paramount Pictures
  • 12. 10StoryStream.ai “They don’t have any data to give us” Dr Janet Bastiman @yssybyl
  • 13. 11StoryStream.ai If you are dealing with any critical inferencing do not take shortcuts, do it properly and do it rigorously and stand up to the company and say no - make sure it’s clear that the timelines will be longer to get it right. Dr Janet Bastiman @yssybyl
  • 14. 12StoryStream.ai Without Data ML is just a Random Result Dr Janet Bastiman @yssybyl ● Legal public sources ● https://github.com/awesomedata/awesome-public-datasets ● https://www.kaggle.com/datasets ● Take your own pictures/videos ● access/permission? ● Slow and inconsistent ● Scrape the client site with permission
  • 15. 13StoryStream.ai How much data? Dr Janet Bastiman @yssybyl • Vision: 1000 images per output class but depends on complexity of the problem • Time series: at least double the time period over which you are predicting, but be cautious of data becoming irrelevant • Text: very variable depending on the problem • This also changes if you already have pre-trained networks that you’re updating
  • 16. 14StoryStream.ai What do you do with the Data? Dr Janet Bastiman @yssybyl ● Selection bias ● Random Sampling ● Over coverage ● Undercoverage ● Measurement (Response) error ● Processing errors ● Participation bias
  • 17. 15StoryStream.ai What do you do with the Data? Dr Janet Bastiman @yssybyl Photos Scrape S3 bucket ● Unique filename ● source ● Set uuid (if multiple images of same car) ● Date taken ● S3 bucket per vehicle variant
  • 18. 16StoryStream.ai What do you do with the Data? Dr Janet Bastiman @yssybyl Photos Scrape Car Detector S3 Bucket Manual verification ● Extra field for label ● S3 bucket name became mostly irrelevant
  • 19. 17StoryStream.ai Crowdsource labelling Dr Janet Bastiman @yssybyl https://xkcd.com/1897/
  • 20. 19StoryStream.ai Data Pipeline Dr Janet Bastiman @yssybyl Data In Object detector Images saved Auxiliary info saved Temp public access Extract for Turk Import of results Dashboard Expert clean Data Ready
  • 21. 21StoryStream.ai Transfer Learning Dr Janet Bastiman @yssybyl ● Use transfer learning - fix most of the weights of a good network and adapt the last few layers ● Fast and easy retraining and works with smaller data sets in a variety of fields ● (image) https://arxiv.org/abs/1903.02196 ● (series) https://arxiv.org/abs/1907.01332 ● (audio) https://arxiv.org/abs/1909.07526 Deep Learning for Vision Systems, Mohamed Elgendy
  • 24. 25StoryStream.ai Stand on the shoulders of giants… Dr Janet Bastiman @yssybyl ● For some problems CNNs are robust to noisy labels and up to 20 time noise to real labels can still give business level accuracy https://arxiv.org/pdf/1705.10694.pdf ● Find the right architecture http://www.asimovinstitute.org/neural-network-zoo/
  • 25. 26StoryStream.ai Go old school Dr Janet Bastiman @yssybyl Reduce the dimensionality of the problem and use Bayesian approach, KNN or SVM https://xkcd.com/2059/
  • 27. 28StoryStream.ai Simplify the problem Dr Janet Bastiman @yssybyl Removal of camera artefacts in eye images to make detection easier - Jeffrey De Fauw http://blog.kaggle.com/2015/08/10/detecting-diabetic- retinopathy-in-eye-images/ Image Image Specific Vehicle Specific Vehicle Car? Make? Removal of Doppler effect on moving source using fractional octave band shifting, F Mobley https://asa.scitation.org/doi/pdf/10.1121/2.0000578?class=pdf Δ𝑛=−r[𝑙𝑜𝑔2(1−𝑀cos𝜃sin𝜑)]
  • 28. 29StoryStream.ai Get every last drop from what you have Dr Janet Bastiman @yssybyl Statistical anatomical modelling for efficient and personalised spine biomechanical models - I Castro Mateos PhD thesis Have a toolkit of augmentation approaches but choose what’s relevant to your needs...
  • 29. 30StoryStream.ai Augmentation - detail Dr Janet Bastiman @yssybyl ● Flip L/R U/D ● Rotations ● Reduce or enlarge bounding box coordinates by N% ● Add occlusions https://www.umbc.edu/rssipl/people/aplaza/Papers/Journals/2019 .GRSL.Occlusion.pdf ● Change hue saturation and value of colours in the image https://arxiv.org/pdf/1902.06543.pdf ● Copypairing - https://arxiv.org/abs/1909.00390#
  • 30. 34StoryStream.ai Infrastructure Dr Janet Bastiman @yssybyl Data In Data Store Taxonomy Classifier Definition Test Set DockerHub Setup Codeship Project GitHub Setup Notification Slack Email Template AWS Image Scripts Dashboard
  • 32. 36StoryStream.ai Automation Dr Janet Bastiman @yssybyl Delete local data Build container Get model and key Run test harness Validate container Run container Report results DashboardCommit Build new Container
  • 33. 37StoryStream.ai Stack Automation Dr Janet Bastiman @yssybyl Add new container Start stack Run stack test harness Better? Compare results Create docs YesUpdate CFLive No Human investigation
  • 34. 38StoryStream.ai Automatic Documentation Dr Janet Bastiman @yssybyl LaTeX templates Pweave .tex files and images Save with model files Convert to PDF Run LaTeX If live, save in live docs Email to team
  • 35. 40StoryStream.ai Did we make it? Dr Janet Bastiman @yssybyl ● Some really difficult images ● Only expected images were given ● Where it was wrong it was (mostly) sensibly wrong ● Client happy ● Cool automated system
  • 36. 41StoryStream.ai The Playbook Dr Janet Bastiman @yssybyl ai-playbook.com
  • 37. 42StoryStream.ai Dr Janet Bastiman @yssybyl Thank You https://xkcd.com/2191/
  • 38. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ poc-ml/