SlideShare a Scribd company logo
1 of 33
Download to read offline
Machine Learning in
Production with scikit-learn
Jeff Klukas - Data Engineer at Simple
1
2
3
• What’s the problem we’re solving?
• Why machine learning?
• Walkthrough of developing the model
• ✨ Live demo ✨
• Complications of moving this workflow to
production
• Other potential approaches
Overview
4
5
Categorizing chats
# SELECT subject, body, category FROM chats;
subject | body | category
--------------+---------------------------+----------------
Check deposit | Hi how are you? I was… | education
Lost Card | Can you send me a new… | urgent
my transfer | My transfer of $10 isn’t… | education
Mail deposits | I have a large check… | education
urgent, customer education, new product, incidents, other
6
7
8
✨
✨
✨
✨ ✨
✨
✨
✨
💖
💖
💖
Machine
Learning
✨
💖
✨
9
10
11
sklearn.pipeline
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import (
CountVectorizer, TfidfTransformer)
from xgboost import XGBClassifier
stopwords, lemmatizer = …
pipeline = Pipeline([
('preprocess', MessagePreprocessor(subject_weight=2)),
('text', TextProcessor(stopwords, lemmatizer)),
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', XGBClassifier(objective='multi:softmax')),
])
12
Training the model
import pandas as pd
data_frame = pd.read_sql(redshift_connection,
"SELECT category, subject, body FROM chats;")
X = data_frame[['subject', 'body']]
y = data_frame['category']
X_train, X_test, y_train, y_test = 
train_test_split(X, y, test_size=0.33, random_state=0)
pipeline.fit(X_train, y_train)
13
Overfitting
https://en.wikipedia.org/wiki/Overfitting
14
Testing the model
from sklearn.metrics import classification_report
y_predicted = pipeline.predict(X_test)
print(classification_report(y_test, y_predicted))
precision recall f1-score support
class 0 0.67 1.00 0.80 2
class 1 0.00 0.00 0.00 1
class 2 1.00 0.50 0.67 2
avg / total 0.67 0.60 0.59 5
15
Serving the model in Flask
from flask import route, jsonify, request
@route('/chat-classification-api/messages',
methods=['POST'])
def classify_messages():
"""Classify given chat messages"""
messages = request.get_json()
y = pipeline.predict(messages)
# join class labels back with identifiers
predictions = [{"chat_id": message["chat_id"],
"class_label": label}
for message, label in zip(messages, y)]
return jsonify(predictions)
16
Live Demo
17
How do we take this to production?
18
How do we take this to production?
Step 1
Separate training and
serving
19
20
Model Persistence
import pickle
import boto3
def write_to_s3(pipeline, key, bucket):
s3_client = boto3.client("s3")
kms_client = boto3.client("kms")
pkl = pickle.dumps(pipeline)
enc_pkl = my_encrypt_function(pkl, kms_client)
s3_client.put_object(Bucket=s3_bucket,
Key=key,
Body=enc_pkl,
ServerSideEncryption="AES256")
21
Model Persistence
import pickle
import boto3
from flask import current_app
def load_message_classifier(app):
conf = app.config["MESSAGE_CLASSIFIER"]
s3_client = boto3.client("s3")
kms_client = boto3.client("kms")
resp = s3_client.get_object(Bucket=conf[“bucket"],
Key=conf["path"])
untrusted_bytes = resp["Body"].read()
pkl = decrypt(untrusted_bytes, kms_client)
with app.app_context():
current_app._message_classifier = pickle.loads(pkl)
Step 2
Provide an environment for
batch training and evaluation
22
23
Optimizing Parameter Values
from sklearn.model_selection import GridSearchCV
params = {
'preprocess__subject_weight': (1, 2, 3, 4, 5),
'text__stopwords': ([], IGNORE, PUNCTUATION),
'vect__max_df': (0.5, 0.75, 1.0),
'vect__ngram_range': ((1, 1), (1, 2)),
'tfidf__use_idf': (True, False),
'tfidf__norm': ('l1', 'l2'),
}
search = GridSearchCV(pipeline, params)
search.fit(X_train, y_train)
Step 3
Monitor performance,
adapt to production load,
degrade gracefully
24
Other Approaches
25
26
• How big is your team?
• How large of a problem space do you need to
cover?
• What is your existing stack?
Considerations
27
Off-the-Shelf
28
Off-the-Shelf
29
Off-the-Shelf
30
• Train and test in a batch environment
• Output serialized model and classification report
• sklearn.pipeline is convenient for storing
code+params
• Serve on-demand predictions separately
• Treat this like any production service
Recap
Thank You
31
32
Questions ✨ 💖
Machine Learning in
Production with scikit-learn
Jeff Klukas - Data Engineer at Simple
33

More Related Content

What's hot

Introduction To Generative Adversarial Networks GANs
Introduction To Generative Adversarial Networks GANsIntroduction To Generative Adversarial Networks GANs
Introduction To Generative Adversarial Networks GANsHichem Felouat
 
Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Alessio Tonioni
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersBayu Aldi Yansyah
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Databricks
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
16. Java stacks and queues
16. Java stacks and queues16. Java stacks and queues
16. Java stacks and queuesIntro C# Book
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostJaroslaw Szymczak
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance SegmentationHichem Felouat
 
Property-based Testing and Generators (Lua)
Property-based Testing and Generators (Lua)Property-based Testing and Generators (Lua)
Property-based Testing and Generators (Lua)Sumant Tambe
 
Tensor flow (1)
Tensor flow (1)Tensor flow (1)
Tensor flow (1)景逸 王
 
Java programming lab manual
Java programming lab manualJava programming lab manual
Java programming lab manualsameer farooq
 
Functional Programming Patterns (BuildStuff '14)
Functional Programming Patterns (BuildStuff '14)Functional Programming Patterns (BuildStuff '14)
Functional Programming Patterns (BuildStuff '14)Scott Wlaschin
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Hichem Felouat
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfPolytechnique Montréal
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowEtsuji Nakai
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
NUS-ISS Learning Day 2019-Deploying AI apps using tensor flow lite in mobile ...
NUS-ISS Learning Day 2019-Deploying AI apps using tensor flow lite in mobile ...NUS-ISS Learning Day 2019-Deploying AI apps using tensor flow lite in mobile ...
NUS-ISS Learning Day 2019-Deploying AI apps using tensor flow lite in mobile ...NUS-ISS
 

What's hot (20)

Introduction To Generative Adversarial Networks GANs
Introduction To Generative Adversarial Networks GANsIntroduction To Generative Adversarial Networks GANs
Introduction To Generative Adversarial Networks GANs
 
Tensorflow - Intro (2017)
Tensorflow - Intro (2017)Tensorflow - Intro (2017)
Tensorflow - Intro (2017)
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning Practitioners
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
 
Task and Data Parallelism
Task and Data ParallelismTask and Data Parallelism
Task and Data Parallelism
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
16. Java stacks and queues
16. Java stacks and queues16. Java stacks and queues
16. Java stacks and queues
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
 
Deep Learning in theano
Deep Learning in theanoDeep Learning in theano
Deep Learning in theano
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance Segmentation
 
Property-based Testing and Generators (Lua)
Property-based Testing and Generators (Lua)Property-based Testing and Generators (Lua)
Property-based Testing and Generators (Lua)
 
Tensor flow (1)
Tensor flow (1)Tensor flow (1)
Tensor flow (1)
 
Java programming lab manual
Java programming lab manualJava programming lab manual
Java programming lab manual
 
Functional Programming Patterns (BuildStuff '14)
Functional Programming Patterns (BuildStuff '14)Functional Programming Patterns (BuildStuff '14)
Functional Programming Patterns (BuildStuff '14)
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
NUS-ISS Learning Day 2019-Deploying AI apps using tensor flow lite in mobile ...
NUS-ISS Learning Day 2019-Deploying AI apps using tensor flow lite in mobile ...NUS-ISS Learning Day 2019-Deploying AI apps using tensor flow lite in mobile ...
NUS-ISS Learning Day 2019-Deploying AI apps using tensor flow lite in mobile ...
 

Viewers also liked

Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataRobert Dempsey
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructurejoshwills
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonSimon Frid
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
 
Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineRobert Dempsey
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...PyData
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

Viewers also liked (16)

Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning Pipeline
 
Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...Python as part of a production machine learning stack by Michael Manapat PyDa...
Python as part of a production machine learning stack by Michael Manapat PyDa...
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar to Machine learning in production with scikit-learn

Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonMicrosoft
 
關於測試,我說的其實是......
關於測試,我說的其實是......關於測試,我說的其實是......
關於測試,我說的其實是......hugo lu
 
Breaking Dependencies Legacy Code - Cork Software Crafters - September 2019
Breaking Dependencies Legacy Code -  Cork Software Crafters - September 2019Breaking Dependencies Legacy Code -  Cork Software Crafters - September 2019
Breaking Dependencies Legacy Code - Cork Software Crafters - September 2019Paulo Clavijo
 
pytest로 파이썬 코드 테스트하기
pytest로 파이썬 코드 테스트하기pytest로 파이썬 코드 테스트하기
pytest로 파이썬 코드 테스트하기Yeongseon Choe
 
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018 Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018 Codemotion
 
Test-Driven Design Insights@DevoxxBE 2023.pptx
Test-Driven Design Insights@DevoxxBE 2023.pptxTest-Driven Design Insights@DevoxxBE 2023.pptx
Test-Driven Design Insights@DevoxxBE 2023.pptxVictor Rentea
 
Python Summit 2022: Never Write Scripts Again
Python Summit 2022: Never Write Scripts AgainPython Summit 2022: Never Write Scripts Again
Python Summit 2022: Never Write Scripts AgainPeter Bittner
 
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for EducationPythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for EducationECAM Brussels Engineering School
 
Expanding the idea of static analysis from code check to other development pr...
Expanding the idea of static analysis from code check to other development pr...Expanding the idea of static analysis from code check to other development pr...
Expanding the idea of static analysis from code check to other development pr...Andrey Karpov
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]AAKANKSHA JAIN
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseFeatureByte
 
Tips for Building your First XPages Java Application
Tips for Building your First XPages Java ApplicationTips for Building your First XPages Java Application
Tips for Building your First XPages Java ApplicationTeamstudio
 
Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019Edwin Jung
 
Kotlin / Android Update
Kotlin / Android UpdateKotlin / Android Update
Kotlin / Android UpdateGarth Gilmour
 
Kotlin: Why Do You Care?
Kotlin: Why Do You Care?Kotlin: Why Do You Care?
Kotlin: Why Do You Care?intelliyole
 
Static code analysis: what? how? why?
Static code analysis: what? how? why?Static code analysis: what? how? why?
Static code analysis: what? how? why?Andrey Karpov
 

Similar to Machine learning in production with scikit-learn (20)

Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python
 
關於測試,我說的其實是......
關於測試,我說的其實是......關於測試,我說的其實是......
關於測試,我說的其實是......
 
Breaking Dependencies Legacy Code - Cork Software Crafters - September 2019
Breaking Dependencies Legacy Code -  Cork Software Crafters - September 2019Breaking Dependencies Legacy Code -  Cork Software Crafters - September 2019
Breaking Dependencies Legacy Code - Cork Software Crafters - September 2019
 
Database connectivity in python
Database connectivity in pythonDatabase connectivity in python
Database connectivity in python
 
pytest로 파이썬 코드 테스트하기
pytest로 파이썬 코드 테스트하기pytest로 파이썬 코드 테스트하기
pytest로 파이썬 코드 테스트하기
 
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018 Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
 
Test-Driven Design Insights@DevoxxBE 2023.pptx
Test-Driven Design Insights@DevoxxBE 2023.pptxTest-Driven Design Insights@DevoxxBE 2023.pptx
Test-Driven Design Insights@DevoxxBE 2023.pptx
 
Python Summit 2022: Never Write Scripts Again
Python Summit 2022: Never Write Scripts AgainPython Summit 2022: Never Write Scripts Again
Python Summit 2022: Never Write Scripts Again
 
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for EducationPythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
Pythia Reloaded: An Intelligent Unit Testing-Based Code Grader for Education
 
Expanding the idea of static analysis from code check to other development pr...
Expanding the idea of static analysis from code check to other development pr...Expanding the idea of static analysis from code check to other development pr...
Expanding the idea of static analysis from code check to other development pr...
 
Data herding
Data herdingData herding
Data herding
 
Data herding
Data herdingData herding
Data herding
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data Warehouse
 
Tips for Building your First XPages Java Application
Tips for Building your First XPages Java ApplicationTips for Building your First XPages Java Application
Tips for Building your First XPages Java Application
 
Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019Mock Hell PyCon DE and PyData Berlin 2019
Mock Hell PyCon DE and PyData Berlin 2019
 
Kotlin / Android Update
Kotlin / Android UpdateKotlin / Android Update
Kotlin / Android Update
 
Kotlin: Why Do You Care?
Kotlin: Why Do You Care?Kotlin: Why Do You Care?
Kotlin: Why Do You Care?
 
Static code analysis: what? how? why?
Static code analysis: what? how? why?Static code analysis: what? how? why?
Static code analysis: what? how? why?
 
Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019Machine Learning in SQL Server 2019
Machine Learning in SQL Server 2019
 

Recently uploaded

How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 

Recently uploaded (20)

How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 

Machine learning in production with scikit-learn

  • 1. Machine Learning in Production with scikit-learn Jeff Klukas - Data Engineer at Simple 1
  • 2. 2
  • 3. 3 • What’s the problem we’re solving? • Why machine learning? • Walkthrough of developing the model • ✨ Live demo ✨ • Complications of moving this workflow to production • Other potential approaches Overview
  • 4. 4
  • 5. 5 Categorizing chats # SELECT subject, body, category FROM chats; subject | body | category --------------+---------------------------+---------------- Check deposit | Hi how are you? I was… | education Lost Card | Can you send me a new… | urgent my transfer | My transfer of $10 isn’t… | education Mail deposits | I have a large check… | education urgent, customer education, new product, incidents, other
  • 6. 6
  • 7. 7
  • 9. 9
  • 10. 10
  • 11. 11 sklearn.pipeline from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import ( CountVectorizer, TfidfTransformer) from xgboost import XGBClassifier stopwords, lemmatizer = … pipeline = Pipeline([ ('preprocess', MessagePreprocessor(subject_weight=2)), ('text', TextProcessor(stopwords, lemmatizer)), ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', XGBClassifier(objective='multi:softmax')), ])
  • 12. 12 Training the model import pandas as pd data_frame = pd.read_sql(redshift_connection, "SELECT category, subject, body FROM chats;") X = data_frame[['subject', 'body']] y = data_frame['category'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0) pipeline.fit(X_train, y_train)
  • 14. 14 Testing the model from sklearn.metrics import classification_report y_predicted = pipeline.predict(X_test) print(classification_report(y_test, y_predicted)) precision recall f1-score support class 0 0.67 1.00 0.80 2 class 1 0.00 0.00 0.00 1 class 2 1.00 0.50 0.67 2 avg / total 0.67 0.60 0.59 5
  • 15. 15 Serving the model in Flask from flask import route, jsonify, request @route('/chat-classification-api/messages', methods=['POST']) def classify_messages(): """Classify given chat messages""" messages = request.get_json() y = pipeline.predict(messages) # join class labels back with identifiers predictions = [{"chat_id": message["chat_id"], "class_label": label} for message, label in zip(messages, y)] return jsonify(predictions)
  • 17. 17 How do we take this to production?
  • 18. 18 How do we take this to production?
  • 19. Step 1 Separate training and serving 19
  • 20. 20 Model Persistence import pickle import boto3 def write_to_s3(pipeline, key, bucket): s3_client = boto3.client("s3") kms_client = boto3.client("kms") pkl = pickle.dumps(pipeline) enc_pkl = my_encrypt_function(pkl, kms_client) s3_client.put_object(Bucket=s3_bucket, Key=key, Body=enc_pkl, ServerSideEncryption="AES256")
  • 21. 21 Model Persistence import pickle import boto3 from flask import current_app def load_message_classifier(app): conf = app.config["MESSAGE_CLASSIFIER"] s3_client = boto3.client("s3") kms_client = boto3.client("kms") resp = s3_client.get_object(Bucket=conf[“bucket"], Key=conf["path"]) untrusted_bytes = resp["Body"].read() pkl = decrypt(untrusted_bytes, kms_client) with app.app_context(): current_app._message_classifier = pickle.loads(pkl)
  • 22. Step 2 Provide an environment for batch training and evaluation 22
  • 23. 23 Optimizing Parameter Values from sklearn.model_selection import GridSearchCV params = { 'preprocess__subject_weight': (1, 2, 3, 4, 5), 'text__stopwords': ([], IGNORE, PUNCTUATION), 'vect__max_df': (0.5, 0.75, 1.0), 'vect__ngram_range': ((1, 1), (1, 2)), 'tfidf__use_idf': (True, False), 'tfidf__norm': ('l1', 'l2'), } search = GridSearchCV(pipeline, params) search.fit(X_train, y_train)
  • 24. Step 3 Monitor performance, adapt to production load, degrade gracefully 24
  • 26. 26 • How big is your team? • How large of a problem space do you need to cover? • What is your existing stack? Considerations
  • 30. 30 • Train and test in a batch environment • Output serialized model and classification report • sklearn.pipeline is convenient for storing code+params • Serve on-demand predictions separately • Treat this like any production service Recap
  • 33. Machine Learning in Production with scikit-learn Jeff Klukas - Data Engineer at Simple 33