SlideShare a Scribd company logo
1 of 23
Download to read offline
Python	
  as	
  part	
  of	
  a	
  produc0on	
  
machine	
  learning	
  stack	
  
	
  
	
  
	
  
Michael	
  Manapat	
  
@mlmanapat	
  
Stripe	
  
	
  
Outline	
  
	
  
-­‐Why	
  we	
  need	
  ML	
  at	
  Stripe	
  
-­‐Simple	
  models	
  with	
  sklearn	
  
-­‐Pipelines	
  with	
  Luigi	
  
-­‐Scoring	
  as	
  a	
  service	
  
	
  
Stripe	
  is	
  a	
  technology	
  company	
  
focusing	
  on	
  making	
  payments	
  easy	
  
	
  
-­‐Short	
  applica>on	
  
	
  
Tokeniza0on	
  
	
  
	
   Customer	
  
browser	
  
Stripe	
  
Stripe.js	
  
Token	
  
Merchant	
  
server	
  
Stripe	
  
API	
  call	
  
Result	
  
API	
  Call	
  
	
  
import stripe

stripe.Charge.create(

amount=400,

currency="usd",

card="tok_103xnl2gR5VxTSB”

email=customer@example.com"

)"
Fraud	
  /	
  business	
  viola0ons	
  
	
  
-­‐Terms	
  of	
  service	
  viola>ons	
  (weapons)	
  
-­‐Merchant	
  fraud	
  (card	
  “cashers”)	
  	
  	
  
-­‐Transac>on	
  fraud	
  
	
  
-­‐No	
  machine	
  learning	
  a	
  year	
  ago	
  
Fraud	
  /	
  business	
  viola0ons	
  
	
  
-­‐Terms	
  of	
  service	
  viola>ons	
  
	
  
E-­‐cigareMes,	
  drugs,	
  weapons,	
  etc.	
  
	
  
How	
  do	
  we	
  find	
  these	
  automa>cally?	
  
Merchant	
  sign	
  up	
  flow	
  
	
  
	
  
	
  
	
  
Applica>on	
  
submission	
  
Website	
  
scraped	
  
Text	
  scored	
  
Applica>on	
  
reviewed	
  
Merchant	
  sign	
  up	
  flow	
  
	
  
	
  
	
  
	
  
Applica>on	
  
submission	
  
Website	
  
scraped	
  
Text	
  scored	
  
Applica>on	
  
reviewed	
  
Machine	
  
learning	
  
pipeline	
  and	
  
service	
  
Building	
  a	
  classifier:	
  e-­‐cigareIes	
  
	
  
data = pandas.from_pickle(‘ecigs’)

data.head()



text violator

0 " please verify your age i am 21 years or older ... True

1 coming soon toggle me drag me with your mouse ... False

2 drink moscow mules cart 0 log in or create an ... False

3 vapors electronic cigarette buy now insuper st... True

4 t-shirts shorts hawaii about us silver coll... False



[5 rows x 2 columns]	
  
Features	
  for	
  text	
  classifica0on	
  
	
  
cv = CountVectorizer



features = 

cv.fit_transform(data['text'])



Sparse	
  matrix	
  of	
  word	
  counts	
  from	
  
input	
  text	
  (omiSng	
  feature	
  selec>on)	
  
Features	
  for	
  text	
  classifica0on	
  


X_train, X_test, y_train, y_test = 

train_test_split(

features, data['violator'], 

test_size=0.2)



-­‐Avoid	
  leakage	
  
-Other	
  cross-­‐valida>on	
  methods	
  
Training	
  


model = LogisticRegression()

model.fit(X_train, y_train)



Serializer	
  reads	
  from	
  


model.intercept_

model.coef_

	
  
Valida0on	
  


probs = model.predict_proba(X_test)



fpr, tpr, thresholds =

roc_curve(y_test, probs[:, 1])



matplotlib.pyplot(fpr, tpr)	
  
ROC:	
  Receiver	
  opera0ng	
  characteris0c	
  




	
  
Pipeline	
  
	
  
-­‐Fetch	
  website	
  snapshots	
  from	
  S3	
  
-­‐Fetch	
  classifica>ons	
  from	
  SQL/Impala	
  
-­‐Sani>ze	
  text	
  (strip	
  HTML)	
  
-­‐Run	
  feature	
  genera>on	
  and	
  selec>on	
  
-­‐Train	
  and	
  serialize	
  model	
  
-­‐Export	
  valida>on	
  sta>s>cs	
  
Luigi	
  
	
  
class GetSnapshots(luigi.Task):

def run(self):

" "...



class GenFeatures(luigi.Task):

def requires(self):

return GetSnapshots()"
Luigi	
  runs	
  tasks	
  on	
  Hadoop	
  cluster	
  
"
Scoring	
  as	
  a	
  service	
  
	
  
"Applica>on	
  
submission	
  
Website	
  
scraped	
  
Text	
  scored	
  
Applica>on	
  
reviewed	
  
ThriO	
  
RPC	
  
Scoring	
  
Service	
  
Scoring	
  as	
  a	
  service	
  
	
  
struct ScoringRequest {

1: string text

2: optional string model_name

}



struct ScoringResponse {

1: double score" " "// Experiments?

2: double request_duration

}"
Why	
  a	
  service?	
  
	
  
-­‐Same	
  code	
  base	
  for	
  training/scoring	
  
	
  
-­‐Reduced	
  duplica>on/easier	
  deploys	
  
	
  
-­‐Experimenta>on	
  
	
  
-­‐Log	
  requests	
  
	
  and	
  responses	
  
	
  (Parquet/Impala)	
  
	
  
-­‐Centralized	
  
	
  monitoring	
  
	
  (Graphite)	
  
Summary	
  
	
  
-­‐Simple	
  models	
  with	
  sklearn	
  
-­‐Pipelines	
  with	
  Luigi	
  
-­‐Scoring	
  as	
  a	
  service	
  
	
  
Thanks!	
  
@mlmanapat	
  
	
  

More Related Content

What's hot

How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseMo Patel
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkDatabricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itBruno Faria
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Amy W. Tang
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesTaro L. Saito
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 

What's hot (20)

Apache airflow
Apache airflowApache airflow
Apache airflow
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Zipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering FrameworkZipline—Airbnb’s Declarative Feature Engineering Framework
Zipline—Airbnb’s Declarative Feature Engineering Framework
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Airflow Intro-1.pdf
Airflow Intro-1.pdfAirflow Intro-1.pdf
Airflow Intro-1.pdf
 
Introducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using it
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementIntro to Airflow: Goodbye Cron, Welcome scheduled workflow management
Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 

Viewers also liked

Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learnJeff Klukas
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructurejoshwills
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataRobert Dempsey
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsTuri, Inc.
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningStepan Pushkarev
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operationsStepan Pushkarev
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonSimon Frid
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanHakka Labs
 
Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineRobert Dempsey
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CaptureJeff Klukas
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In ProductionSamir Bessalah
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017Carol Smith
 

Viewers also liked (16)

Machine learning in production with scikit-learn
Machine learning in production with scikit-learnMachine learning in production with scikit-learn
Machine learning in production with scikit-learn
 
Production machine learning_infrastructure
Production machine learning_infrastructureProduction machine learning_infrastructure
Production machine learning_infrastructure
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 
Multi runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learningMulti runtime serving pipelines for machine learning
Multi runtime serving pipelines for machine learning
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Square's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong YanSquare's Machine Learning Infrastructure and Applications - Rong Yan
Square's Machine Learning Infrastructure and Applications - Rong Yan
 
Building A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning PipelineBuilding A Production-Level Machine Learning Pipeline
Building A Production-Level Machine Learning Pipeline
 
PostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data CapturePostgreSQL + Kafka: The Delight of Change Data Capture
PostgreSQL + Kafka: The Delight of Change Data Capture
 
Machine Learning In Production
Machine Learning In ProductionMachine Learning In Production
Machine Learning In Production
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
AI and Machine Learning Demystified by Carol Smith at Midwest UX 2017
 

Similar to Python as part of a production machine learning stack by Michael Manapat PyData SV 2014

미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가JaeCheolKim10
 
MLflow at Company Scale
MLflow at Company ScaleMLflow at Company Scale
MLflow at Company ScaleDatabricks
 
Vertical Recommendation Using Collaborative Filtering
Vertical Recommendation Using Collaborative FilteringVertical Recommendation Using Collaborative Filtering
Vertical Recommendation Using Collaborative Filteringgorass
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2zhang hua
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testingClarkTony
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testingClarkTony
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Machine Learning with Microsoft Azure
Machine Learning with Microsoft AzureMachine Learning with Microsoft Azure
Machine Learning with Microsoft AzureDmitry Petukhov
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Amazon Web Services
 
Profiling Mondrian MDX Requests in a Production Environment
Profiling Mondrian MDX Requests in a Production EnvironmentProfiling Mondrian MDX Requests in a Production Environment
Profiling Mondrian MDX Requests in a Production EnvironmentRaimonds Simanovskis
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...HostedbyConfluent
 
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...NETWAYS
 
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby MeetupBecoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby MeetupAndy Maleh
 
Building Machine Learning Pipelines
Building Machine Learning PipelinesBuilding Machine Learning Pipelines
Building Machine Learning PipelinesInMobi Technology
 
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...IRJET Journal
 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud preventionVolodymyr Syzonenko
 
Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
Meet TransmogrifAI, Open Source AutoML That Powers Einstein PredictionsMeet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
Meet TransmogrifAI, Open Source AutoML That Powers Einstein PredictionsMatthew Tovbin
 

Similar to Python as part of a production machine learning stack by Michael Manapat PyData SV 2014 (20)

미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
 
MLflow at Company Scale
MLflow at Company ScaleMLflow at Company Scale
MLflow at Company Scale
 
Vertical Recommendation Using Collaborative Filtering
Vertical Recommendation Using Collaborative FilteringVertical Recommendation Using Collaborative Filtering
Vertical Recommendation Using Collaborative Filtering
 
Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2Swift distributed tracing method and tools v2
Swift distributed tracing method and tools v2
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testing
 
Iswim for testing
Iswim for testingIswim for testing
Iswim for testing
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Machine Learning with Microsoft Azure
Machine Learning with Microsoft AzureMachine Learning with Microsoft Azure
Machine Learning with Microsoft Azure
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Introduction to AWS Step Functions:
Introduction to AWS Step Functions:
 
Profiling Mondrian MDX Requests in a Production Environment
Profiling Mondrian MDX Requests in a Production EnvironmentProfiling Mondrian MDX Requests in a Production Environment
Profiling Mondrian MDX Requests in a Production Environment
 
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...A Practical Deep Dive into Observability of Streaming Applications with Kosta...
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
 
Payments On Rails
Payments On RailsPayments On Rails
Payments On Rails
 
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
stackconf 2023 | How to reduce expenses on monitoring with VictoriaMetrics by...
 
MaheshCV_Yepme
MaheshCV_YepmeMaheshCV_Yepme
MaheshCV_Yepme
 
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby MeetupBecoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
Becoming a SOC2 Ruby Shop - Montreal.rb November, 5, 2022 Ruby Meetup
 
Building Machine Learning Pipelines
Building Machine Learning PipelinesBuilding Machine Learning Pipelines
Building Machine Learning Pipelines
 
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
IRJET- Credit Card Fraud Detection : A Comparison using Random Forest, SVM an...
 
Machine learning techniques in fraud prevention
Machine learning techniques in fraud preventionMachine learning techniques in fraud prevention
Machine learning techniques in fraud prevention
 
Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
Meet TransmogrifAI, Open Source AutoML That Powers Einstein PredictionsMeet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
Meet TransmogrifAI, Open Source AutoML That Powers Einstein Predictions
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Python as part of a production machine learning stack by Michael Manapat PyData SV 2014

  • 1. Python  as  part  of  a  produc0on   machine  learning  stack         Michael  Manapat   @mlmanapat   Stripe    
  • 2. Outline     -­‐Why  we  need  ML  at  Stripe   -­‐Simple  models  with  sklearn   -­‐Pipelines  with  Luigi   -­‐Scoring  as  a  service    
  • 3. Stripe  is  a  technology  company   focusing  on  making  payments  easy     -­‐Short  applica>on    
  • 4. Tokeniza0on       Customer   browser   Stripe   Stripe.js   Token   Merchant   server   Stripe   API  call   Result  
  • 5. API  Call     import stripe
 stripe.Charge.create(
 amount=400,
 currency="usd",
 card="tok_103xnl2gR5VxTSB”
 email=customer@example.com"
 )"
  • 6. Fraud  /  business  viola0ons     -­‐Terms  of  service  viola>ons  (weapons)   -­‐Merchant  fraud  (card  “cashers”)       -­‐Transac>on  fraud     -­‐No  machine  learning  a  year  ago  
  • 7. Fraud  /  business  viola0ons     -­‐Terms  of  service  viola>ons     E-­‐cigareMes,  drugs,  weapons,  etc.     How  do  we  find  these  automa>cally?  
  • 8. Merchant  sign  up  flow           Applica>on   submission   Website   scraped   Text  scored   Applica>on   reviewed  
  • 9. Merchant  sign  up  flow           Applica>on   submission   Website   scraped   Text  scored   Applica>on   reviewed   Machine   learning   pipeline  and   service  
  • 10. Building  a  classifier:  e-­‐cigareIes     data = pandas.from_pickle(‘ecigs’)
 data.head()
 
 text violator
 0 " please verify your age i am 21 years or older ... True
 1 coming soon toggle me drag me with your mouse ... False
 2 drink moscow mules cart 0 log in or create an ... False
 3 vapors electronic cigarette buy now insuper st... True
 4 t-shirts shorts hawaii about us silver coll... False
 
 [5 rows x 2 columns]  
  • 11. Features  for  text  classifica0on     cv = CountVectorizer
 
 features = 
 cv.fit_transform(data['text'])
 
 Sparse  matrix  of  word  counts  from   input  text  (omiSng  feature  selec>on)  
  • 12. Features  for  text  classifica0on   
 X_train, X_test, y_train, y_test = 
 train_test_split(
 features, data['violator'], 
 test_size=0.2)
 
 -­‐Avoid  leakage   -Other  cross-­‐valida>on  methods  
  • 13. Training   
 model = LogisticRegression()
 model.fit(X_train, y_train)
 
 Serializer  reads  from   
 model.intercept_
 model.coef_
  
  • 14. Valida0on   
 probs = model.predict_proba(X_test)
 
 fpr, tpr, thresholds =
 roc_curve(y_test, probs[:, 1])
 
 matplotlib.pyplot(fpr, tpr)  
  • 15. ROC:  Receiver  opera0ng  characteris0c   
 
  
  • 16. Pipeline     -­‐Fetch  website  snapshots  from  S3   -­‐Fetch  classifica>ons  from  SQL/Impala   -­‐Sani>ze  text  (strip  HTML)   -­‐Run  feature  genera>on  and  selec>on   -­‐Train  and  serialize  model   -­‐Export  valida>on  sta>s>cs  
  • 17. Luigi     class GetSnapshots(luigi.Task):
 def run(self):
 " "...
 
 class GenFeatures(luigi.Task):
 def requires(self):
 return GetSnapshots()"
  • 18. Luigi  runs  tasks  on  Hadoop  cluster   "
  • 19. Scoring  as  a  service     "Applica>on   submission   Website   scraped   Text  scored   Applica>on   reviewed   ThriO   RPC   Scoring   Service  
  • 20. Scoring  as  a  service     struct ScoringRequest {
 1: string text
 2: optional string model_name
 }
 
 struct ScoringResponse {
 1: double score" " "// Experiments?
 2: double request_duration
 }"
  • 21. Why  a  service?     -­‐Same  code  base  for  training/scoring     -­‐Reduced  duplica>on/easier  deploys     -­‐Experimenta>on    
  • 22. -­‐Log  requests    and  responses    (Parquet/Impala)     -­‐Centralized    monitoring    (Graphite)  
  • 23. Summary     -­‐Simple  models  with  sklearn   -­‐Pipelines  with  Luigi   -­‐Scoring  as  a  service     Thanks!   @mlmanapat