SlideShare a Scribd company logo
1 of 35
Download to read offline
Aki Ariga | Field Data Scientist
2018.05.17
2 © Cloudera, Inc. All rights reserved.
● Field Data Scientist at Cloudera
● Previously research engineer at Toshiba, Rails developer at Cookpad
● Co-author of “ ”
● Founder of kawasaki.rb & MLCT
● Twitter: @chezou
● GitHub: https://github.com/chezou/
:
3 © Cloudera, Inc. All rights reserved.
Hidden technical debt in Machine learning systems [2]
Project
procedure
Culture
+
+
© Cloudera, Inc. All rights reserved.
Building a Data-driven product ≠ Research
5 © Cloudera, Inc. All rights reserved.
A journey for Data-driven product
1.
2.
3. A/B
4. A/B
5.
6.
7.
http://tjo.hatenablog.com/entry/2016/01/18/080000 ( )
Culture
BI
Statistics
ML
6 © Cloudera, Inc. All rights reserved.
1.
2.
3.
4.
5.
6.
7.
8.
Procedure in a Machine Learning project
Step.4 7
7 © Cloudera, Inc. All rights reserved.
•
•
•
• / Web
•
Typical project member recommendation for ML project
© Cloudera, Inc. All rights reserved.
What’s the difference between academia and industry for ML?
9 © Cloudera, Inc. All rights reserved.
Production by Nick Youngson CC BY-SA 3.0 Alpha Stock Images
10 © Cloudera, Inc. All rights reserved.
Sample data science/machine learning workflow
From data to exploration to action
Data Engineering Data Science (Exploratory) Production (Operational)
Data
Wrangling
Data
Exploration
Model Training
& Testing
Production
Data Pipelines Batch Scoring
Online Scoring
Serving
Data GovernanceCuration
Data Engineering
Acquisition
Reports,
Dashboards
Data Models Predictions Business value
1.
12 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
13 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
14 © Cloudera, Inc. All rights reserved.
1. Train by batch, predict on the fly, serve via REST API
2. Train by batch, predict by batch, serve through the shared DB
3. Train, predict, serve by streaming
4. Train by batch, predict on mobile app
1.
15 © Cloudera, Inc. All rights reserved.
Web Application
DB
Trained Model
Execute training
Extract feature
Prediction
result
Activity log/
Contents data
Feature
Training result
Feature
Batch SystemAPI Server
REST
API
User ID/
Item ID
ML System
Pattern 1: Train by batch, predict on the fly, serve via REST API
1.
16 © Cloudera, Inc. All rights reserved.
Extract feature &
Train/update model
Extract feature & Predict
Trained Model
Activity log
Export model as
PMML
Model building layer
Predicting &
serving layer
Updated model
CDSW
Prediction results
HDFSRequest to predict
Load model
Example architecture: PMML + OpenScoring
1.
17 © Cloudera, Inc. All rights reserved.
Extract feature &
Train/update model
Extract feature & Predict
Trained Model
Activity log
Save model on
object storage
Model building layer
Predicting &
serving layer
Updated model
Prediction results
HDFSRequest to predict
Load model
Object
storage
Pack the runtime
env with Docker
CDSW
Example architecture: Docker based API Server
1.
18 © Cloudera, Inc. All rights reserved.
Web Application
DB
Trained Model
Batch System
Execute training
Extract feature
Prediction
result
Activity log/
Contents data
Feature
Training result
Feature
Serve prediction
Training BatchPrediction Batch
Pattern 2: Train by batch, predict by batch, serve through the shared DB
1.
19 © Cloudera, Inc. All rights reserved.
Kudu/HBase
Extract feature &
Train/update model
Extract feature & Predict
Activity log
Prediction results
Model building &
predicting layerServing layer
Updated model
Activity log Load trained
model
Prediction results
HDFS
CDSW
Historical
data
Historical
data
Example architecture: Serving by HBase/Kudu
Trained Model
1.
20 © Cloudera, Inc. All rights reserved.
Web Application
Trained Model
Stream-based ML System
(e.g. Spark Streaming)
Train & Predict
Extract feature
Prediction
results
Recent
log data
Feature Model updates
Model
- Querying for prediction
- Showing or sending alerts
- This component may work
with message queue like
Kafka
Messagequeue
(e.g.Kafka)
Log data
Prediction
results
Pattern 3: Train, predict, serve by streaming
1.
21 © Cloudera, Inc. All rights reserved.
Mobile Application
DB
Trained Model
Batch System
Execute training
Extract feature
Extract feature
Request for
prediction Activity logs/
Contents data
Prediction
result
Activity log/
Contents data
Feature
Training resultFeature
DB
Trained Model
Convert
model
Pattern 4: Train by batch, predict on a mobile app
1.
22 © Cloudera, Inc. All rights reserved.
Extract feature &
Train/update model
Extract feature & Predict
Trained Model
Activity log
Convert model to
TFLite/CoreML
Model building layer
Predicting &
serving layer
Updated model
Prediction results
HDFSRequest to predict
Load model
Storage in a
smart phone
CDSW
Example architecture: Serving on a mobile app
1.
23 © Cloudera, Inc. All rights reserved.
Pattern 4’: Federated learning
https://research.googleblog.com/2017/04/federated-learning-
collaborative.html
1.
24 © Cloudera, Inc. All rights reserved.
4 patterns Comparison
1.
Pattern 1 (REST API) Pattern 2 (Shared DB) Pattern 3 (Streaming) Pattern 4 (Mobile app)
Training by batch by batch NRT (by streaming) by batch
Prediction NRT (on the fly) by batch NRT (by streaming) NRT (on the fly)
Prediction result
delivery
NRT (via REST API) NRT
(through the shared DB)
NRT
(by streaming via MQ )
NRT (via in-process API
on mobile)
Latency for prediction
from getting new data
So so So so ~ Long Very low Low
Required time to predict Short Long Short Short
Tight/loose coupling
with app
Loose Loose Loose Tight
Dependency of
languages
Independent Independent Independent Depends on frameworks
System management
difficulty
So so Easy Very Hard So so
NRT: Near real time
25 © Cloudera, Inc. All rights reserved.
CI, CD and Blue Green deployment
https://www.slideshare.net/hiroakikudo77/ss-84593653/14
1.
26 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
27 © Cloudera, Inc. All rights reserved.
• /Feedback loop
•
•
2.
28 © Cloudera, Inc. All rights reserved.
•
• ) MeCab
•
• )
•
•
•
/Feedback loop
https://twitter.com/hagino3000/status/986257856730034177
2.
29 © Cloudera, Inc. All rights reserved.
•
• “safe to serve” & “desired prediction quality” [4]
• (offline) (online)
• “Silent failures” [3]
• ) Join
• )
•
•
•
• serving
2.
30 © Cloudera, Inc. All rights reserved.
• •
• [1]
• ) DVC, Bitemporal Modeling
• [4]
• )
•
• [2,4]
• [4]
2.
31 © Cloudera, Inc. All rights reserved.
1.
2.
3.
Production
MLOps
32 © Cloudera, Inc. All rights reserved.
•
• [7]
• Google, Facebook [4, 9]
• /
• /
•
•
Researcher, Dev, Ops:
https://www.slideshare.net/syou6162/ss-88255142
3.
33 © Cloudera, Inc. All rights reserved.
• IoT
[8]
•
•
(GDPR)
3.
34 © Cloudera, Inc. All rights reserved.
• Data-driven product
•
•
•
• ML systems Production
•
•
•
•
35 © Cloudera, Inc. All rights reserved.
• [1] “My model has higher BLEU, can I ship it? The Joel Test for machine learning systems”, L. Park,
2017, ACML-AIMLP Workshop
• [2] “Hidden Technical Debt in Machine Learning Systems”, D. Sculley et al., NIPS’ 15
• [3] “Rules of Machine Learning: Best Practices for ML Engineering”, M. Zinkevich
• [4] “TFX: A TensorFlow-Based Production-Scale Machine Learning Platform”, A. Naresh et al., KDD
2017
• [5] “What’s your ML test score? A rubric for ML production systems”, E. Breck et al., Reliable Machine
Learning in the Wild - NIPS 2016 Workshop (2016)
• [6] , 2017, ML Ops Study #1
• [7] , , 2018, HACKER TACKLE 2018
• [8] “DevOps for models: How to manage millions of models in production—and at the edge”, T. Tung
et al., Strata Data Singapore, 2017
• [9] “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”, K. Hazelwood
et al., IEEE HPCA, 2018
THANK YOU

More Related Content

Similar to 仕事ではじめる機械学習

How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into productionDataWorks Summit
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
Deployment Automation for Hybrid Cloud and Multi-Platform Environments
Deployment Automation for Hybrid Cloud and Multi-Platform EnvironmentsDeployment Automation for Hybrid Cloud and Multi-Platform Environments
Deployment Automation for Hybrid Cloud and Multi-Platform EnvironmentsIBM UrbanCode Products
 
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015Christophe Lucas
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowDaniel Zivkovic
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapNeo4j
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCodemotion
 
Enabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using SteeltoeEnabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using SteeltoeVMware Tanzu
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the MonolithVMware Tanzu
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Cloudera, Inc.
 
Custom Runtimes for the Cloud
Custom Runtimes for the CloudCustom Runtimes for the Cloud
Custom Runtimes for the CloudCloudBees
 
CSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps sessionCSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps sessionTom Laszewski
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
DevOps on Oracle Cloud
DevOps on Oracle CloudDevOps on Oracle Cloud
DevOps on Oracle CloudMee Nam Lee
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIswesley chun
 
A Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid Applications
A Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid ApplicationsA Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid Applications
A Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid Applicationsajithranabahu
 

Similar to 仕事ではじめる機械学習 (20)

How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into production
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
Deployment Automation for Hybrid Cloud and Multi-Platform Environments
Deployment Automation for Hybrid Cloud and Multi-Platform EnvironmentsDeployment Automation for Hybrid Cloud and Multi-Platform Environments
Deployment Automation for Hybrid Cloud and Multi-Platform Environments
 
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
Perth DevOps Meetup - Introducing the IBM Innovation Lab - 12112015
 
SamSegalResume
SamSegalResumeSamSegalResume
SamSegalResume
 
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and DataflowHow to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
How to build unified Batch & Streaming Pipelines with Apache Beam and Dataflow
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Cloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platformCloud Roundtable | Pivoltal: Agile platform
Cloud Roundtable | Pivoltal: Agile platform
 
Enabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using SteeltoeEnabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using Steeltoe
 
Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18Machine Learning Models: From Research to Production 6.13.18
Machine Learning Models: From Research to Production 6.13.18
 
Custom Runtimes for the Cloud
Custom Runtimes for the CloudCustom Runtimes for the Cloud
Custom Runtimes for the Cloud
 
CSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps sessionCSC AWS re:Invent Enterprise DevOps session
CSC AWS re:Invent Enterprise DevOps session
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
DevOps on Oracle Cloud
DevOps on Oracle CloudDevOps on Oracle Cloud
DevOps on Oracle Cloud
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIs
 
Sam segal resume
Sam segal resumeSam segal resume
Sam segal resume
 
A Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid Applications
A Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid ApplicationsA Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid Applications
A Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid Applications
 

More from Aki Ariga

Challenges for machine learning systems toward continuous improvement
Challenges for machine learning systems toward continuous improvementChallenges for machine learning systems toward continuous improvement
Challenges for machine learning systems toward continuous improvementAki Ariga
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataAki Ariga
 
主人が外資系IT企業に転職して4ヶ月が過ぎました
主人が外資系IT企業に転職して4ヶ月が過ぎました主人が外資系IT企業に転職して4ヶ月が過ぎました
主人が外資系IT企業に転職して4ヶ月が過ぎましたAki Ariga
 
R&D at Foodtech company - #CookpadTechConf 2016
R&D at Foodtech company - #CookpadTechConf 2016R&D at Foodtech company - #CookpadTechConf 2016
R&D at Foodtech company - #CookpadTechConf 2016Aki Ariga
 
Why I started Machine Learning Casual Talks? #MLCT
Why I started Machine Learning Casual Talks? #MLCTWhy I started Machine Learning Casual Talks? #MLCT
Why I started Machine Learning Casual Talks? #MLCTAki Ariga
 
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題Aki Ariga
 
Rubyistがgemの前にPypiデビューするのは間違っているだろうか
Rubyistがgemの前にPypiデビューするのは間違っているだろうかRubyistがgemの前にPypiデビューするのは間違っているだろうか
Rubyistがgemの前にPypiデビューするのは間違っているだろうかAki Ariga
 
Machine Learning Casual Talks Intro #MLCT
Machine Learning Casual Talks Intro #MLCTMachine Learning Casual Talks Intro #MLCT
Machine Learning Casual Talks Intro #MLCTAki Ariga
 
Make Julia more popular in Japan!!1 #JuliaTokyo
Make Julia more popular in Japan!!1 #JuliaTokyoMake Julia more popular in Japan!!1 #JuliaTokyo
Make Julia more popular in Japan!!1 #JuliaTokyoAki Ariga
 
Refrection of kawasaki.rb
Refrection of kawasaki.rbRefrection of kawasaki.rb
Refrection of kawasaki.rbAki Ariga
 
Introduction and benchmarking of MeCab.jl #JapanR
Introduction and benchmarking of MeCab.jl  #JapanRIntroduction and benchmarking of MeCab.jl  #JapanR
Introduction and benchmarking of MeCab.jl #JapanRAki Ariga
 
Recommendation for iruby #tqrk08
Recommendation for iruby #tqrk08Recommendation for iruby #tqrk08
Recommendation for iruby #tqrk08Aki Ariga
 
The book that changed me
The book that changed meThe book that changed me
The book that changed meAki Ariga
 
Introduction of Mecab.jl #JuliaTokyo
Introduction of Mecab.jl #JuliaTokyoIntroduction of Mecab.jl #JuliaTokyo
Introduction of Mecab.jl #JuliaTokyoAki Ariga
 
Introduction to Kanagawa Ruby Kaigi01 #kana01
Introduction to Kanagawa Ruby Kaigi01 #kana01Introduction to Kanagawa Ruby Kaigi01 #kana01
Introduction to Kanagawa Ruby Kaigi01 #kana01Aki Ariga
 
Julia 100 exercises #JuliaTokyo
Julia 100 exercises #JuliaTokyoJulia 100 exercises #JuliaTokyo
Julia 100 exercises #JuliaTokyoAki Ariga
 
Machine Learning Casual Talks opening talk
Machine Learning Casual Talks opening talkMachine Learning Casual Talks opening talk
Machine Learning Casual Talks opening talkAki Ariga
 
Gong anyware
Gong anywareGong anyware
Gong anywareAki Ariga
 
gsub with ActiveSupport::SafeBuffer
gsub with ActiveSupport::SafeBuffergsub with ActiveSupport::SafeBuffer
gsub with ActiveSupport::SafeBufferAki Ariga
 
はじめて翻訳記事を書いたら300ブクマ超えた話
はじめて翻訳記事を書いたら300ブクマ超えた話はじめて翻訳記事を書いたら300ブクマ超えた話
はじめて翻訳記事を書いたら300ブクマ超えた話Aki Ariga
 

More from Aki Ariga (20)

Challenges for machine learning systems toward continuous improvement
Challenges for machine learning systems toward continuous improvementChallenges for machine learning systems toward continuous improvement
Challenges for machine learning systems toward continuous improvement
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure Data
 
主人が外資系IT企業に転職して4ヶ月が過ぎました
主人が外資系IT企業に転職して4ヶ月が過ぎました主人が外資系IT企業に転職して4ヶ月が過ぎました
主人が外資系IT企業に転職して4ヶ月が過ぎました
 
R&D at Foodtech company - #CookpadTechConf 2016
R&D at Foodtech company - #CookpadTechConf 2016R&D at Foodtech company - #CookpadTechConf 2016
R&D at Foodtech company - #CookpadTechConf 2016
 
Why I started Machine Learning Casual Talks? #MLCT
Why I started Machine Learning Casual Talks? #MLCTWhy I started Machine Learning Casual Talks? #MLCT
Why I started Machine Learning Casual Talks? #MLCT
 
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
クックパッドサマーインターン2015 機械学習・自然言語処理 実習課題
 
Rubyistがgemの前にPypiデビューするのは間違っているだろうか
Rubyistがgemの前にPypiデビューするのは間違っているだろうかRubyistがgemの前にPypiデビューするのは間違っているだろうか
Rubyistがgemの前にPypiデビューするのは間違っているだろうか
 
Machine Learning Casual Talks Intro #MLCT
Machine Learning Casual Talks Intro #MLCTMachine Learning Casual Talks Intro #MLCT
Machine Learning Casual Talks Intro #MLCT
 
Make Julia more popular in Japan!!1 #JuliaTokyo
Make Julia more popular in Japan!!1 #JuliaTokyoMake Julia more popular in Japan!!1 #JuliaTokyo
Make Julia more popular in Japan!!1 #JuliaTokyo
 
Refrection of kawasaki.rb
Refrection of kawasaki.rbRefrection of kawasaki.rb
Refrection of kawasaki.rb
 
Introduction and benchmarking of MeCab.jl #JapanR
Introduction and benchmarking of MeCab.jl  #JapanRIntroduction and benchmarking of MeCab.jl  #JapanR
Introduction and benchmarking of MeCab.jl #JapanR
 
Recommendation for iruby #tqrk08
Recommendation for iruby #tqrk08Recommendation for iruby #tqrk08
Recommendation for iruby #tqrk08
 
The book that changed me
The book that changed meThe book that changed me
The book that changed me
 
Introduction of Mecab.jl #JuliaTokyo
Introduction of Mecab.jl #JuliaTokyoIntroduction of Mecab.jl #JuliaTokyo
Introduction of Mecab.jl #JuliaTokyo
 
Introduction to Kanagawa Ruby Kaigi01 #kana01
Introduction to Kanagawa Ruby Kaigi01 #kana01Introduction to Kanagawa Ruby Kaigi01 #kana01
Introduction to Kanagawa Ruby Kaigi01 #kana01
 
Julia 100 exercises #JuliaTokyo
Julia 100 exercises #JuliaTokyoJulia 100 exercises #JuliaTokyo
Julia 100 exercises #JuliaTokyo
 
Machine Learning Casual Talks opening talk
Machine Learning Casual Talks opening talkMachine Learning Casual Talks opening talk
Machine Learning Casual Talks opening talk
 
Gong anyware
Gong anywareGong anyware
Gong anyware
 
gsub with ActiveSupport::SafeBuffer
gsub with ActiveSupport::SafeBuffergsub with ActiveSupport::SafeBuffer
gsub with ActiveSupport::SafeBuffer
 
はじめて翻訳記事を書いたら300ブクマ超えた話
はじめて翻訳記事を書いたら300ブクマ超えた話はじめて翻訳記事を書いたら300ブクマ超えた話
はじめて翻訳記事を書いたら300ブクマ超えた話
 

Recently uploaded

ingles nivel 3 ucv 2024 - modulo 3 _ppt2
ingles nivel 3 ucv 2024 - modulo 3 _ppt2ingles nivel 3 ucv 2024 - modulo 3 _ppt2
ingles nivel 3 ucv 2024 - modulo 3 _ppt2nhuayllav
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxwendy cai
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfNaveenVerma126
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...soginsider
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxHome
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxLMW Machine Tool Division
 
Semiconductor Physics Background and Light Emitting Diode(LEDs)-.pptx
Semiconductor Physics Background and Light Emitting Diode(LEDs)-.pptxSemiconductor Physics Background and Light Emitting Diode(LEDs)-.pptx
Semiconductor Physics Background and Light Emitting Diode(LEDs)-.pptxbhoomijyani51
 
ASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderjuancarlos286641
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Projectreemakb03
 
Multicomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfMulticomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfGiovanaGhasary1
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesDIPIKA83
 
OS Services, System call, Virtual Machine
OS Services, System call, Virtual MachineOS Services, System call, Virtual Machine
OS Services, System call, Virtual MachineDivya S
 
Phase noise transfer functions.pptx
Phase noise transfer      functions.pptxPhase noise transfer      functions.pptx
Phase noise transfer functions.pptxSaiGouthamSunkara
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...amrabdallah9
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfodunowoeminence2019
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....santhyamuthu1
 
Lecture 1: Basics of trigonometry (surveying)
Lecture 1: Basics of trigonometry (surveying)Lecture 1: Basics of trigonometry (surveying)
Lecture 1: Basics of trigonometry (surveying)Bahzad5
 

Recently uploaded (20)

Lecture 2 .pdf
Lecture 2                           .pdfLecture 2                           .pdf
Lecture 2 .pdf
 
ingles nivel 3 ucv 2024 - modulo 3 _ppt2
ingles nivel 3 ucv 2024 - modulo 3 _ppt2ingles nivel 3 ucv 2024 - modulo 3 _ppt2
ingles nivel 3 ucv 2024 - modulo 3 _ppt2
 
Nodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptxNodal seismic construction requirements.pptx
Nodal seismic construction requirements.pptx
 
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdfSummer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
Summer training report on BUILDING CONSTRUCTION for DIPLOMA Students.pdf
 
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...Transforming Process Safety Management: Challenges, Benefits, and Transition ...
Transforming Process Safety Management: Challenges, Benefits, and Transition ...
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptx
 
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptxVertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
Vertical- Machining - Center - VMC -LMW-Machine-Tool-Division.pptx
 
Semiconductor Physics Background and Light Emitting Diode(LEDs)-.pptx
Semiconductor Physics Background and Light Emitting Diode(LEDs)-.pptxSemiconductor Physics Background and Light Emitting Diode(LEDs)-.pptx
Semiconductor Physics Background and Light Emitting Diode(LEDs)-.pptx
 
Lecture 2 .pptx
Lecture 2                            .pptxLecture 2                            .pptx
Lecture 2 .pptx
 
ASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entenderASME BPVC 2023 Section I para leer y entender
ASME BPVC 2023 Section I para leer y entender
 
Gender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 ProjectGender Bias in Engineer, Honors 203 Project
Gender Bias in Engineer, Honors 203 Project
 
Multicomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdfMulticomponent Spiral Wound Membrane Separation Model.pdf
Multicomponent Spiral Wound Membrane Separation Model.pdf
 
Graphics Primitives and CG Display Devices
Graphics Primitives and CG Display DevicesGraphics Primitives and CG Display Devices
Graphics Primitives and CG Display Devices
 
Présentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdfPrésentation IIRB 2024 Marine Cordonnier.pdf
Présentation IIRB 2024 Marine Cordonnier.pdf
 
OS Services, System call, Virtual Machine
OS Services, System call, Virtual MachineOS Services, System call, Virtual Machine
OS Services, System call, Virtual Machine
 
Phase noise transfer functions.pptx
Phase noise transfer      functions.pptxPhase noise transfer      functions.pptx
Phase noise transfer functions.pptx
 
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
Strategies of Urban Morphologyfor Improving Outdoor Thermal Comfort and Susta...
 
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdfRenewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
Renewable Energy & Entrepreneurship Workshop_21Feb2024.pdf
 
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
SATELITE COMMUNICATION UNIT 1 CEC352 REGULATION 2021 PPT BASICS OF SATELITE ....
 
Lecture 1: Basics of trigonometry (surveying)
Lecture 1: Basics of trigonometry (surveying)Lecture 1: Basics of trigonometry (surveying)
Lecture 1: Basics of trigonometry (surveying)
 

仕事ではじめる機械学習

  • 1. Aki Ariga | Field Data Scientist 2018.05.17
  • 2. 2 © Cloudera, Inc. All rights reserved. ● Field Data Scientist at Cloudera ● Previously research engineer at Toshiba, Rails developer at Cookpad ● Co-author of “ ” ● Founder of kawasaki.rb & MLCT ● Twitter: @chezou ● GitHub: https://github.com/chezou/ :
  • 3. 3 © Cloudera, Inc. All rights reserved. Hidden technical debt in Machine learning systems [2] Project procedure Culture + +
  • 4. © Cloudera, Inc. All rights reserved. Building a Data-driven product ≠ Research
  • 5. 5 © Cloudera, Inc. All rights reserved. A journey for Data-driven product 1. 2. 3. A/B 4. A/B 5. 6. 7. http://tjo.hatenablog.com/entry/2016/01/18/080000 ( ) Culture BI Statistics ML
  • 6. 6 © Cloudera, Inc. All rights reserved. 1. 2. 3. 4. 5. 6. 7. 8. Procedure in a Machine Learning project Step.4 7
  • 7. 7 © Cloudera, Inc. All rights reserved. • • • • / Web • Typical project member recommendation for ML project
  • 8. © Cloudera, Inc. All rights reserved. What’s the difference between academia and industry for ML?
  • 9. 9 © Cloudera, Inc. All rights reserved. Production by Nick Youngson CC BY-SA 3.0 Alpha Stock Images
  • 10. 10 © Cloudera, Inc. All rights reserved. Sample data science/machine learning workflow From data to exploration to action Data Engineering Data Science (Exploratory) Production (Operational) Data Wrangling Data Exploration Model Training & Testing Production Data Pipelines Batch Scoring Online Scoring Serving Data GovernanceCuration Data Engineering Acquisition Reports, Dashboards Data Models Predictions Business value 1.
  • 11. 12 © Cloudera, Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 12. 13 © Cloudera, Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 13. 14 © Cloudera, Inc. All rights reserved. 1. Train by batch, predict on the fly, serve via REST API 2. Train by batch, predict by batch, serve through the shared DB 3. Train, predict, serve by streaming 4. Train by batch, predict on mobile app 1.
  • 14. 15 © Cloudera, Inc. All rights reserved. Web Application DB Trained Model Execute training Extract feature Prediction result Activity log/ Contents data Feature Training result Feature Batch SystemAPI Server REST API User ID/ Item ID ML System Pattern 1: Train by batch, predict on the fly, serve via REST API 1.
  • 15. 16 © Cloudera, Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Export model as PMML Model building layer Predicting & serving layer Updated model CDSW Prediction results HDFSRequest to predict Load model Example architecture: PMML + OpenScoring 1.
  • 16. 17 © Cloudera, Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Save model on object storage Model building layer Predicting & serving layer Updated model Prediction results HDFSRequest to predict Load model Object storage Pack the runtime env with Docker CDSW Example architecture: Docker based API Server 1.
  • 17. 18 © Cloudera, Inc. All rights reserved. Web Application DB Trained Model Batch System Execute training Extract feature Prediction result Activity log/ Contents data Feature Training result Feature Serve prediction Training BatchPrediction Batch Pattern 2: Train by batch, predict by batch, serve through the shared DB 1.
  • 18. 19 © Cloudera, Inc. All rights reserved. Kudu/HBase Extract feature & Train/update model Extract feature & Predict Activity log Prediction results Model building & predicting layerServing layer Updated model Activity log Load trained model Prediction results HDFS CDSW Historical data Historical data Example architecture: Serving by HBase/Kudu Trained Model 1.
  • 19. 20 © Cloudera, Inc. All rights reserved. Web Application Trained Model Stream-based ML System (e.g. Spark Streaming) Train & Predict Extract feature Prediction results Recent log data Feature Model updates Model - Querying for prediction - Showing or sending alerts - This component may work with message queue like Kafka Messagequeue (e.g.Kafka) Log data Prediction results Pattern 3: Train, predict, serve by streaming 1.
  • 20. 21 © Cloudera, Inc. All rights reserved. Mobile Application DB Trained Model Batch System Execute training Extract feature Extract feature Request for prediction Activity logs/ Contents data Prediction result Activity log/ Contents data Feature Training resultFeature DB Trained Model Convert model Pattern 4: Train by batch, predict on a mobile app 1.
  • 21. 22 © Cloudera, Inc. All rights reserved. Extract feature & Train/update model Extract feature & Predict Trained Model Activity log Convert model to TFLite/CoreML Model building layer Predicting & serving layer Updated model Prediction results HDFSRequest to predict Load model Storage in a smart phone CDSW Example architecture: Serving on a mobile app 1.
  • 22. 23 © Cloudera, Inc. All rights reserved. Pattern 4’: Federated learning https://research.googleblog.com/2017/04/federated-learning- collaborative.html 1.
  • 23. 24 © Cloudera, Inc. All rights reserved. 4 patterns Comparison 1. Pattern 1 (REST API) Pattern 2 (Shared DB) Pattern 3 (Streaming) Pattern 4 (Mobile app) Training by batch by batch NRT (by streaming) by batch Prediction NRT (on the fly) by batch NRT (by streaming) NRT (on the fly) Prediction result delivery NRT (via REST API) NRT (through the shared DB) NRT (by streaming via MQ ) NRT (via in-process API on mobile) Latency for prediction from getting new data So so So so ~ Long Very low Low Required time to predict Short Long Short Short Tight/loose coupling with app Loose Loose Loose Tight Dependency of languages Independent Independent Independent Depends on frameworks System management difficulty So so Easy Very Hard So so NRT: Near real time
  • 24. 25 © Cloudera, Inc. All rights reserved. CI, CD and Blue Green deployment https://www.slideshare.net/hiroakikudo77/ss-84593653/14 1.
  • 25. 26 © Cloudera, Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 26. 27 © Cloudera, Inc. All rights reserved. • /Feedback loop • • 2.
  • 27. 28 © Cloudera, Inc. All rights reserved. • • ) MeCab • • ) • • • /Feedback loop https://twitter.com/hagino3000/status/986257856730034177 2.
  • 28. 29 © Cloudera, Inc. All rights reserved. • • “safe to serve” & “desired prediction quality” [4] • (offline) (online) • “Silent failures” [3] • ) Join • ) • • • • serving 2.
  • 29. 30 © Cloudera, Inc. All rights reserved. • • • [1] • ) DVC, Bitemporal Modeling • [4] • ) • • [2,4] • [4] 2.
  • 30. 31 © Cloudera, Inc. All rights reserved. 1. 2. 3. Production MLOps
  • 31. 32 © Cloudera, Inc. All rights reserved. • • [7] • Google, Facebook [4, 9] • / • / • • Researcher, Dev, Ops: https://www.slideshare.net/syou6162/ss-88255142 3.
  • 32. 33 © Cloudera, Inc. All rights reserved. • IoT [8] • • (GDPR) 3.
  • 33. 34 © Cloudera, Inc. All rights reserved. • Data-driven product • • • • ML systems Production • • • •
  • 34. 35 © Cloudera, Inc. All rights reserved. • [1] “My model has higher BLEU, can I ship it? The Joel Test for machine learning systems”, L. Park, 2017, ACML-AIMLP Workshop • [2] “Hidden Technical Debt in Machine Learning Systems”, D. Sculley et al., NIPS’ 15 • [3] “Rules of Machine Learning: Best Practices for ML Engineering”, M. Zinkevich • [4] “TFX: A TensorFlow-Based Production-Scale Machine Learning Platform”, A. Naresh et al., KDD 2017 • [5] “What’s your ML test score? A rubric for ML production systems”, E. Breck et al., Reliable Machine Learning in the Wild - NIPS 2016 Workshop (2016) • [6] , 2017, ML Ops Study #1 • [7] , , 2018, HACKER TACKLE 2018 • [8] “DevOps for models: How to manage millions of models in production—and at the edge”, T. Tung et al., Strata Data Singapore, 2017 • [9] “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective”, K. Hazelwood et al., IEEE HPCA, 2018