SlideShare a Scribd company logo
1 of 20
Download to read offline
How to Evaluate & Manage
Machine Learning Model? #DAFT
Shunya UETA @hurutoriya
2019-04-12
$ whoami
● Shunya UETA :: @hurutoriya
● Mercari, inc. Machine Learning Engineer
● Machine Learning Casual Talks Co-Organizer
● https://shunyaueta.com/
Machine Learning Workflow (CRISP-DM)
● Most Important Step
○ Business Understanding
○ Evaluation
● Missing things in for Production
Ref: Kenneth Jensen
CRISP-DM for Production
Ref: Jan Teichmann
Content Modelation
1. item listing 2, if prob score greater than
threshold value, items are hied
and alert to Customer Support
violation items
→ Delete
normal items
→ Unhide
3. Customer Support Check
E.g. Contents Moderation target: Fake Brand, Game Account
Assummption
● ML Service runs All listing items
● Binary Classiffication
● Precision is important than recall
● We can simulate online result in offline by
Faster Customer Support Check System
Ref: Rendezvous Architecture for Data Science in Production
Before Deploy to Production New Model
2019/04/11 all listing items
Cullent Model
1. prob
2. true or false
Cloud
Pub/Sub
Sad story in Machine Learning System in Production
● Gap Between Offline & Online evaluation
→ OK! we can’t know online result, let’s deploy!
● Data Imbalance problem
High Speed Continuas Improvment
1. Easy A/B System
2. Online Offline Sanity Check
新しいモデルをオンライン投入する前にやっていること
New Model
Compute
Engine
2019/04/11 all listing items
Cullent Model
1. prob
2. true or false
Cloud
Pub/Sub
Sanity Check Before Deploy to Production
Threshold: 0.95
ID is_delete Model α Model β
1613431 True 0.98 0.999
5263832 True 0.97 0.43
7213438 False 0.95 0.45
3213492 True 0.70 0.98
9201420 True 0.01 0.97
Sanity Check Before Deploy to Production
ID is_delete Model α Model β
1613431 True 0.98 0.999
5263832 True 0.97 0.43
7213438 False 0.95 0.45
3213492 True 0.70 0.98
9201420 True 0.01 0.97
Success! Cost Sensitive
Threshold: 0.95
ID is_delete Model α Model β
1613431 True 0.98 0.999
5263832 True 0.97 0.43
7213438 False 0.95 0.45
3213492 True 0.70 0.98
9201420 True 0.01 0.97
Success! Cost Sensitive
Threshold: 0.95
Fail! worsen recall
Sanity Check Before Deploy to Production
ID is_delete Model α Model β
1613431 True 0.98 0.999
5263832 True 0.97 0.43
7213438 False 0.95 0.45
3213492 True 0.70 0.98
9201420 True 0.01 0.97
Success! Cost Sensitive
Success! Cost Sensitive
Threshold: 0.95
Fail! worsen recall
Sanity Check Before Deploy to Production
ID is_delete Model α Model β
1613431 True 0.98 0.999
5263832 True 0.97 0.43
7213438 False 0.95 0.45
3213492 True 0.70 0.98
9201420 True 0.01 0.97
Success! Cost Sensitive
Fail! worsen recall
Success! Cost Sensitive
Success! Improve Recall
Threshold: 0.95
Sanity Check Before Deploy to Production
ID is_delete Model α Model β
1613431 True 0.98 0.999
5263832 True 0.97 0.43
7213438 False 0.95 0.45
3213492 True 0.70 0.98
9201420 True 0.01 0.97
Success! Cost Sensitive
Success! Cost Sensitive
Success! Improve Precision
Success! Improve Recall
Threshold: 0.95
Fail! worsen recall
Sanity Check Before Deploy to Production
Sanity Check Before Deploy to Production
Confidence or Probability:
High ↑
Confidence or Probability:
Low ↓
Deleted Items
(term of violation)
Improve👍 !! Bad Model ☠
Undelete Items
(Normal items)
Bad Model ☠ Improve 👍!!
Traditional Serve side Design
Ref: Rendezvous Architecture for Data Science in Production
MODEL
{“name”: “Dog” ,
“prob”: “92.5”
}
With Load Balancer
Ref: Rendezvous Architecture for Data Science in Production
{“name”: “Dog” ,
“prob”: “92.5”
}
Model 3
Load
Balancer
Model 1
Model 2
Sanity Check After Deploy to Production
● New Model, Old Model
○ Same prob score number of Overlap items
○ Top100, bottom100 👀grep
○ Error Analysis(False Positive sample)
○ Use False Positive to Hard Negative Sampling
Strong Recommend articles

More Related Content

Similar to How to evaluate & manage machine learning model #daft

Capture Time & Save Money
Capture Time & Save MoneyCapture Time & Save Money
Capture Time & Save MoneyKlopstra
 
Customer Satisfaction and Quality Induction_PPM .pptx
Customer Satisfaction and Quality Induction_PPM .pptxCustomer Satisfaction and Quality Induction_PPM .pptx
Customer Satisfaction and Quality Induction_PPM .pptxMunirahAyub1
 
Test Bank for Managing Operations Across the Supply Chain 2nd Edition by Swink
Test Bank for Managing Operations Across the Supply Chain 2nd Edition by SwinkTest Bank for Managing Operations Across the Supply Chain 2nd Edition by Swink
Test Bank for Managing Operations Across the Supply Chain 2nd Edition by Swinkriven012
 
Crm for manufacturing industries
Crm for manufacturing industriesCrm for manufacturing industries
Crm for manufacturing industriesSalesBabuCRM
 
Bypassing Validation Rules Through Automation, Aaron Crear
Bypassing Validation Rules Through Automation, Aaron CrearBypassing Validation Rules Through Automation, Aaron Crear
Bypassing Validation Rules Through Automation, Aaron CrearCzechDreamin
 
Be a fashion industry game changer!
Be a fashion industry game changer!Be a fashion industry game changer!
Be a fashion industry game changer!Mike Wittenstein
 
How to Turn Raw Data into Product Revenue by Retrofit PM
How to Turn Raw Data into Product Revenue by Retrofit PMHow to Turn Raw Data into Product Revenue by Retrofit PM
How to Turn Raw Data into Product Revenue by Retrofit PMProduct School
 
Hellomeets - 15th November
Hellomeets - 15th NovemberHellomeets - 15th November
Hellomeets - 15th NovemberAbhijeet Gaur
 
JFS 2021 - The Process Automation Map
JFS 2021 - The Process Automation MapJFS 2021 - The Process Automation Map
JFS 2021 - The Process Automation MapBernd Ruecker
 
How to Turn Machine Learning Into Products by Capital One PM
How to Turn Machine Learning Into Products by Capital One PMHow to Turn Machine Learning Into Products by Capital One PM
How to Turn Machine Learning Into Products by Capital One PMProduct School
 
How to train your product owner
How to train your product ownerHow to train your product owner
How to train your product ownerDavid Murgatroyd
 
Making Your Digital Twin Come to Life.pdf
Making Your Digital Twin Come to Life.pdfMaking Your Digital Twin Come to Life.pdf
Making Your Digital Twin Come to Life.pdfAvinashBatham
 
Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Gross Profit Bidding for Ecommerce | SMX Virtual 2021Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Gross Profit Bidding for Ecommerce | SMX Virtual 2021Christopher Gutknecht
 
Common Google Shopping Disapproval Problems & Tips fro SMBs by Heather Cooan
Common Google Shopping Disapproval Problems & Tips fro SMBs by Heather CooanCommon Google Shopping Disapproval Problems & Tips fro SMBs by Heather Cooan
Common Google Shopping Disapproval Problems & Tips fro SMBs by Heather CooanSearch Marketing Expo - SMX
 
Crm in manufacturing industry get closer to your customers
Crm in manufacturing industry  get closer to your customersCrm in manufacturing industry  get closer to your customers
Crm in manufacturing industry get closer to your customersSalesBabuCRM
 
Jiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri Ptacek
 

Similar to How to evaluate & manage machine learning model #daft (20)

Capture Time & Save Money
Capture Time & Save MoneyCapture Time & Save Money
Capture Time & Save Money
 
Customer Satisfaction and Quality Induction_PPM .pptx
Customer Satisfaction and Quality Induction_PPM .pptxCustomer Satisfaction and Quality Induction_PPM .pptx
Customer Satisfaction and Quality Induction_PPM .pptx
 
CRM Survey
CRM Survey CRM Survey
CRM Survey
 
Test Bank for Managing Operations Across the Supply Chain 2nd Edition by Swink
Test Bank for Managing Operations Across the Supply Chain 2nd Edition by SwinkTest Bank for Managing Operations Across the Supply Chain 2nd Edition by Swink
Test Bank for Managing Operations Across the Supply Chain 2nd Edition by Swink
 
Crm for manufacturing industries
Crm for manufacturing industriesCrm for manufacturing industries
Crm for manufacturing industries
 
Service Excellence Frankfurt
Service Excellence FrankfurtService Excellence Frankfurt
Service Excellence Frankfurt
 
Bypassing Validation Rules Through Automation, Aaron Crear
Bypassing Validation Rules Through Automation, Aaron CrearBypassing Validation Rules Through Automation, Aaron Crear
Bypassing Validation Rules Through Automation, Aaron Crear
 
Be a fashion industry game changer!
Be a fashion industry game changer!Be a fashion industry game changer!
Be a fashion industry game changer!
 
How to Turn Raw Data into Product Revenue by Retrofit PM
How to Turn Raw Data into Product Revenue by Retrofit PMHow to Turn Raw Data into Product Revenue by Retrofit PM
How to Turn Raw Data into Product Revenue by Retrofit PM
 
Hellomeets - 15th November
Hellomeets - 15th NovemberHellomeets - 15th November
Hellomeets - 15th November
 
Six Sigma Measure
Six Sigma MeasureSix Sigma Measure
Six Sigma Measure
 
JFS 2021 - The Process Automation Map
JFS 2021 - The Process Automation MapJFS 2021 - The Process Automation Map
JFS 2021 - The Process Automation Map
 
How to Turn Machine Learning Into Products by Capital One PM
How to Turn Machine Learning Into Products by Capital One PMHow to Turn Machine Learning Into Products by Capital One PM
How to Turn Machine Learning Into Products by Capital One PM
 
4. Product Launch
4. Product Launch  4. Product Launch
4. Product Launch
 
How to train your product owner
How to train your product ownerHow to train your product owner
How to train your product owner
 
Making Your Digital Twin Come to Life.pdf
Making Your Digital Twin Come to Life.pdfMaking Your Digital Twin Come to Life.pdf
Making Your Digital Twin Come to Life.pdf
 
Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Gross Profit Bidding for Ecommerce | SMX Virtual 2021Gross Profit Bidding for Ecommerce | SMX Virtual 2021
Gross Profit Bidding for Ecommerce | SMX Virtual 2021
 
Common Google Shopping Disapproval Problems & Tips fro SMBs by Heather Cooan
Common Google Shopping Disapproval Problems & Tips fro SMBs by Heather CooanCommon Google Shopping Disapproval Problems & Tips fro SMBs by Heather Cooan
Common Google Shopping Disapproval Problems & Tips fro SMBs by Heather Cooan
 
Crm in manufacturing industry get closer to your customers
Crm in manufacturing industry  get closer to your customersCrm in manufacturing industry  get closer to your customers
Crm in manufacturing industry get closer to your customers
 
Jiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_CertifiedJiri_Ptacek_Blackbelt_Case_study_Certified
Jiri_Ptacek_Blackbelt_Case_study_Certified
 

More from Shunya Ueta

Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...Shunya Ueta
 
Introduction to argo
Introduction to argoIntroduction to argo
Introduction to argoShunya Ueta
 
Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)Shunya Ueta
 
Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18Shunya Ueta
 
How to break the machine learning system barrier ?
How to break the machine learning system barrier ?How to break the machine learning system barrier ?
How to break the machine learning system barrier ?Shunya Ueta
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformShunya Ueta
 
Applied machine learning at facebook a datacenter infrastructure perspective...
Applied machine learning at facebook  a datacenter infrastructure perspective...Applied machine learning at facebook  a datacenter infrastructure perspective...
Applied machine learning at facebook a datacenter infrastructure perspective...Shunya Ueta
 
C-IMAGE: city cognitive mapping through geo-tagged photos 解説
C-IMAGE: city cognitive mapping through geo-tagged photos 解説C-IMAGE: city cognitive mapping through geo-tagged photos 解説
C-IMAGE: city cognitive mapping through geo-tagged photos 解説Shunya Ueta
 
Self-turning Spectral Clustering (NIPS2004)
Self-turning Spectral Clustering (NIPS2004)Self-turning Spectral Clustering (NIPS2004)
Self-turning Spectral Clustering (NIPS2004)Shunya Ueta
 
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)Shunya Ueta
 
Detecting Research Topics via the Correlation between Graphs and Texts
 Detecting Research Topics via the Correlation between Graphs and Texts Detecting Research Topics via the Correlation between Graphs and Texts
Detecting Research Topics via the Correlation between Graphs and TextsShunya Ueta
 
Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)Shunya Ueta
 
"Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio..."Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio...Shunya Ueta
 
コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法Shunya Ueta
 

More from Shunya Ueta (14)

Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...Introducing "Challenges and research opportunities in eCommerce search and re...
Introducing "Challenges and research opportunities in eCommerce search and re...
 
Introduction to argo
Introduction to argoIntroduction to argo
Introduction to argo
 
Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)Introduction to TFX (TFDV+TFT+TFMA)
Introduction to TFX (TFDV+TFT+TFMA)
 
Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18Kubeflowで何ができて何ができないのか #DEvFest18
Kubeflowで何ができて何ができないのか #DEvFest18
 
How to break the machine learning system barrier ?
How to break the machine learning system barrier ?How to break the machine learning system barrier ?
How to break the machine learning system barrier ?
 
TFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platformTFX: A tensor flow-based production-scale machine learning platform
TFX: A tensor flow-based production-scale machine learning platform
 
Applied machine learning at facebook a datacenter infrastructure perspective...
Applied machine learning at facebook  a datacenter infrastructure perspective...Applied machine learning at facebook  a datacenter infrastructure perspective...
Applied machine learning at facebook a datacenter infrastructure perspective...
 
C-IMAGE: city cognitive mapping through geo-tagged photos 解説
C-IMAGE: city cognitive mapping through geo-tagged photos 解説C-IMAGE: city cognitive mapping through geo-tagged photos 解説
C-IMAGE: city cognitive mapping through geo-tagged photos 解説
 
Self-turning Spectral Clustering (NIPS2004)
Self-turning Spectral Clustering (NIPS2004)Self-turning Spectral Clustering (NIPS2004)
Self-turning Spectral Clustering (NIPS2004)
 
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions (ICML2003)
 
Detecting Research Topics via the Correlation between Graphs and Texts
 Detecting Research Topics via the Correlation between Graphs and Texts Detecting Research Topics via the Correlation between Graphs and Texts
Detecting Research Topics via the Correlation between Graphs and Texts
 
Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)Fast normalized cut with linear constraint (CVPR2009)
Fast normalized cut with linear constraint (CVPR2009)
 
"Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio..."Spectral graph reduction for efficient image and streaming video segmentatio...
"Spectral graph reduction for efficient image and streaming video segmentatio...
 
コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法コミュニティサイトを爆速で作成し、お手軽に運用する方法
コミュニティサイトを爆速で作成し、お手軽に運用する方法
 

Recently uploaded

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 

Recently uploaded (20)

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 

How to evaluate & manage machine learning model #daft

  • 1. How to Evaluate & Manage Machine Learning Model? #DAFT Shunya UETA @hurutoriya 2019-04-12
  • 2. $ whoami ● Shunya UETA :: @hurutoriya ● Mercari, inc. Machine Learning Engineer ● Machine Learning Casual Talks Co-Organizer ● https://shunyaueta.com/
  • 3. Machine Learning Workflow (CRISP-DM) ● Most Important Step ○ Business Understanding ○ Evaluation ● Missing things in for Production Ref: Kenneth Jensen
  • 5. Content Modelation 1. item listing 2, if prob score greater than threshold value, items are hied and alert to Customer Support violation items → Delete normal items → Unhide 3. Customer Support Check E.g. Contents Moderation target: Fake Brand, Game Account
  • 6. Assummption ● ML Service runs All listing items ● Binary Classiffication ● Precision is important than recall ● We can simulate online result in offline by Faster Customer Support Check System Ref: Rendezvous Architecture for Data Science in Production
  • 7. Before Deploy to Production New Model 2019/04/11 all listing items Cullent Model 1. prob 2. true or false Cloud Pub/Sub
  • 8. Sad story in Machine Learning System in Production ● Gap Between Offline & Online evaluation → OK! we can’t know online result, let’s deploy! ● Data Imbalance problem High Speed Continuas Improvment 1. Easy A/B System 2. Online Offline Sanity Check
  • 10. Sanity Check Before Deploy to Production Threshold: 0.95 ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97
  • 11. Sanity Check Before Deploy to Production ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Threshold: 0.95
  • 12. ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Threshold: 0.95 Fail! worsen recall Sanity Check Before Deploy to Production
  • 13. ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Success! Cost Sensitive Threshold: 0.95 Fail! worsen recall Sanity Check Before Deploy to Production
  • 14. ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Fail! worsen recall Success! Cost Sensitive Success! Improve Recall Threshold: 0.95 Sanity Check Before Deploy to Production
  • 15. ID is_delete Model α Model β 1613431 True 0.98 0.999 5263832 True 0.97 0.43 7213438 False 0.95 0.45 3213492 True 0.70 0.98 9201420 True 0.01 0.97 Success! Cost Sensitive Success! Cost Sensitive Success! Improve Precision Success! Improve Recall Threshold: 0.95 Fail! worsen recall Sanity Check Before Deploy to Production
  • 16. Sanity Check Before Deploy to Production Confidence or Probability: High ↑ Confidence or Probability: Low ↓ Deleted Items (term of violation) Improve👍 !! Bad Model ☠ Undelete Items (Normal items) Bad Model ☠ Improve 👍!!
  • 17. Traditional Serve side Design Ref: Rendezvous Architecture for Data Science in Production MODEL {“name”: “Dog” , “prob”: “92.5” }
  • 18. With Load Balancer Ref: Rendezvous Architecture for Data Science in Production {“name”: “Dog” , “prob”: “92.5” } Model 3 Load Balancer Model 1 Model 2
  • 19. Sanity Check After Deploy to Production ● New Model, Old Model ○ Same prob score number of Overlap items ○ Top100, bottom100 👀grep ○ Error Analysis(False Positive sample) ○ Use False Positive to Hard Negative Sampling