How to evaluate & manage machine learning model #daft

•

6 likes•2,364 views

1. The document discusses how to evaluate and manage machine learning models before deploying them to production. It emphasizes the importance of offline and online evaluation to identify any gaps. 2. A key step described is conducting a "sanity check" on a new model by comparing its predictions to the current model on a sample of real data. This helps identify if the new model improves precision and recall or worsens them. 3. After deploying a new model, ongoing monitoring is recommended to check that the new and old models still make consistent predictions on the same data, and to analyze any differences or errors. This continuous evaluation helps ensure the quality of models in production.

Data & Analytics

How to Evaluate & Manage
Machine Learning Model? #DAFT
Shunya UETA @hurutoriya
2019-04-12

$ whoami
● Shunya UETA :: @hurutoriya
● Mercari, inc. Machine Learning Engineer
● Machine Learning Casual Talks Co-Organizer
● https://shunyaueta.com/

Machine Learning Workﬂow (CRISP-DM)
● Most Important Step
○ Business Understanding
○ Evaluation
● Missing things in for Production
Ref: Kenneth Jensen

CRISP-DM for Production
Ref: Jan Teichmann

Assummption
● ML Service runs All listing items
● Binary Classiffication
● Precision is important than recall
● We can simulate online result in offline by
Faster Customer Support Check System
Ref: Rendezvous Architecture for Data Science in Production

Before Deploy to Production New Model
2019/04/11 all listing items
Cullent Model
1. prob
2. true or false
Cloud
Pub/Sub

Sad story in Machine Learning System in Production
● Gap Between Offline & Online evaluation
→ OK! we can’t know online result, let’s deploy!
● Data Imbalance problem
High Speed Continuas Improvment
1. Easy A/B System
2. Online Offline Sanity Check

新しいモデルをオンライン投入する前にやっていること
New Model
Compute
Engine
2019/04/11 all listing items
Cullent Model
1. prob
2. true or false
Cloud
Pub/Sub

Sanity Check Before Deploy to Production
Threshold: 0.95
ID is_delete Model α Model β
1613431 True 0.98 0.999
5263832 True 0.97 0.43
7213438 False 0.95 0.45
3213492 True 0.70 0.98
9201420 True 0.01 0.97

Sanity Check Before Deploy to Production
ID is_delete Model α Model β
1613431 True 0.98 0.999
5263832 True 0.97 0.43
7213438 False 0.95 0.45
3213492 True 0.70 0.98
9201420 True 0.01 0.97
Success! Cost Sensitive
Threshold: 0.95

ID is_delete Model α Model β
1613431 True 0.98 0.999
5263832 True 0.97 0.43
7213438 False 0.95 0.45
3213492 True 0.70 0.98
9201420 True 0.01 0.97
Success! Cost Sensitive
Threshold: 0.95
Fail! worsen recall
Sanity Check Before Deploy to Production

Sanity Check Before Deploy to Production
Confidence or Probability:
High ↑
Confidence or Probability:
Low ↓
Deleted Items
(term of violation)
Improve👍 !! Bad Model ☠
Undelete Items
(Normal items)
Bad Model ☠ Improve 👍!!

Traditional Serve side Design
Ref: Rendezvous Architecture for Data Science in Production
MODEL
{“name”: “Dog” ,
“prob”: “92.5”
}

With Load Balancer
Ref: Rendezvous Architecture for Data Science in Production
{“name”: “Dog” ,
“prob”: “92.5”
}
Model 3
Load
Balancer
Model 1
Model 2

Sanity Check After Deploy to Production
● New Model, Old Model
○ Same prob score number of Overlap items
○ Top100, bottom100 👀grep
○ Error Analysis(False Positive sample)
○ Use False Positive to Hard Negative Sampling

Similar to How to evaluate & manage machine learning model #daft

Capture Time & Save MoneyKlopstra

Customer Satisfaction and Quality Induction_PPM .pptxMunirahAyub1

CRM Survey Avtar Patel

Test Bank for Managing Operations Across the Supply Chain 2nd Edition by Swinkriven012

Crm for manufacturing industriesSalesBabuCRM

Service Excellence FrankfurtSalesforce Deutschland

Bypassing Validation Rules Through Automation, Aaron CrearCzechDreamin

Be a fashion industry game changer!Mike Wittenstein

How to Turn Raw Data into Product Revenue by Retrofit PMProduct School

Hellomeets - 15th NovemberAbhijeet Gaur

Six Sigma MeasureFahad Hussain

JFS 2021 - The Process Automation MapBernd Ruecker

How to Turn Machine Learning Into Products by Capital One PMProduct School

4. Product Launch Benjamin Hülsen

How to train your product ownerDavid Murgatroyd

Making Your Digital Twin Come to Life.pdfAvinashBatham

Gross Profit Bidding for Ecommerce | SMX Virtual 2021Christopher Gutknecht

Common Google Shopping Disapproval Problems & Tips fro SMBs by Heather CooanSearch Marketing Expo - SMX

Crm in manufacturing industry get closer to your customersSalesBabuCRM

Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri Ptacek

Similar to How to evaluate & manage machine learning model #daft (20)

Capture Time & Save Money

Customer Satisfaction and Quality Induction_PPM .pptx

CRM Survey

Test Bank for Managing Operations Across the Supply Chain 2nd Edition by Swink

Crm for manufacturing industries

Service Excellence Frankfurt

Bypassing Validation Rules Through Automation, Aaron Crear

Be a fashion industry game changer!

How to Turn Raw Data into Product Revenue by Retrofit PM

Hellomeets - 15th November

Six Sigma Measure

JFS 2021 - The Process Automation Map

How to Turn Machine Learning Into Products by Capital One PM

4. Product Launch

How to train your product owner

Making Your Digital Twin Come to Life.pdf

Gross Profit Bidding for Ecommerce | SMX Virtual 2021

Common Google Shopping Disapproval Problems & Tips fro SMBs by Heather Cooan

Crm in manufacturing industry get closer to your customers

Jiri_Ptacek_Blackbelt_Case_study_Certified

Recently uploaded

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

Semantic Shed - Squashing and Squeezing.pptxMike Bennett

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一fhwihughh

Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

How we prevented account sharing with MFAAndrei Kaleshka

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics

ASML's Taxonomy Adventure by Daniel Cantervoginip

NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali

Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03

Recently uploaded (20)

DBA Basics: Getting Started with Performance Tuning.pdf

Semantic Shed - Squashing and Squeezing.pptx

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

办理学位证纽约大学毕业证(NYU毕业证书）原版一比一

Identifying Appropriate Test Statistics Involving Population Mean

GA4 Without Cookies [Measure Camp AMS]

How we prevented account sharing with MFA

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理

Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT

ASML's Taxonomy Adventure by Daniel Canter

NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

RABBIT: A CLI tool for identifying bots based on their GitHub events.

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...

Top 5 Best Data Analytics Courses In Queens

How to evaluate & manage machine learning model #daft

1. How to Evaluate & Manage Machine Learning Model? #DAFT Shunya UETA @hurutoriya 2019-04-12

2. $ whoami ● Shunya UETA :: @hurutoriya ● Mercari, inc. Machine Learning Engineer ● Machine Learning Casual Talks Co-Organizer ● https://shunyaueta.com/

3. Machine Learning Workﬂow (CRISP-DM) ● Most Important Step ○ Business Understanding ○ Evaluation ● Missing things in for Production Ref: Kenneth Jensen

4. CRISP-DM for Production Ref: Jan Teichmann

5. Content Modelation 1. item listing 2, if prob score greater than threshold value, items are hied and alert to Customer Support violation items → Delete normal items → Unhide 3. Customer Support Check E.g. Contents Moderation target: Fake Brand, Game Account

6. Assummption ● ML Service runs All listing items ● Binary Classiffication ● Precision is important than recall ● We can simulate online result in offline by Faster Customer Support Check System Ref: Rendezvous Architecture for Data Science in Production

7. Before Deploy to Production New Model 2019/04/11 all listing items Cullent Model 1. prob 2. true or false Cloud Pub/Sub

8. Sad story in Machine Learning System in Production ● Gap Between Offline & Online evaluation → OK! we can’t know online result, let’s deploy! ● Data Imbalance problem High Speed Continuas Improvment 1. Easy A/B System 2. Online Offline Sanity Check

9. 新しいモデルをオンライン投入する前にやっていること New Model Compute Engine 2019/04/11 all listing items Cullent Model 1. prob 2. true or false Cloud Pub/Sub

10. Sanity Check Before Deploy to Production Threshold: 0.95 ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97

11. Sanity Check Before Deploy to Production ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Threshold: 0.95

12. ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Threshold: 0.95 Fail! worsen recall Sanity Check Before Deploy to Production

13. ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Success! Cost Sensitive Threshold: 0.95 Fail! worsen recall Sanity Check Before Deploy to Production

14. ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Fail! worsen recall Success! Cost Sensitive Success! Improve Recall Threshold: 0.95 Sanity Check Before Deploy to Production

15. ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Success! Cost Sensitive Success! Improve Precision Success! Improve Recall Threshold: 0.95 Fail! worsen recall Sanity Check Before Deploy to Production

16. Sanity Check Before Deploy to Production Confidence or Probability: High ↑ Confidence or Probability: Low ↓ Deleted Items (term of violation) Improve👍 !! Bad Model ☠ Undelete Items (Normal items) Bad Model ☠ Improve 👍!!

17. Traditional Serve side Design Ref: Rendezvous Architecture for Data Science in Production MODEL {“name”: “Dog” , “prob”: “92.5” }

18. With Load Balancer Ref: Rendezvous Architecture for Data Science in Production {“name”: “Dog” , “prob”: “92.5” } Model 3 Load Balancer Model 1 Model 2

19. Sanity Check After Deploy to Production ● New Model, Old Model ○ Same prob score number of Overlap items ○ Top100, bottom100 👀grep ○ Error Analysis(False Positive sample) ○ Use False Positive to Hard Negative Sampling

20. Strong Recommend articles

How to evaluate & manage machine learning model #daft

Recommended

Recommended

More Related Content

Similar to How to evaluate & manage machine learning model #daft

Similar to How to evaluate & manage machine learning model #daft (20)

More from Shunya Ueta

More from Shunya Ueta (14)

Recently uploaded

Recently uploaded (20)

How to evaluate & manage machine learning model #daft