SlideShare a Scribd company logo
1 of 41
Download to read offline
Richard Garris (Principal Solutions Architect)
Apache Spark™ MLlib 2.x:
How to Productionize your Machine
Learning Models
Empower anyone to innovate faster with big data.
Founded by the creators of Apache Spark.
Contributes 75% of the open source code,
10x more than any other company.
VISION
WHO WE ARE
A fully-managed data processing platform
for the enterprise, powered by Apache Spark.
PRODUCT
CLUSTER TUNING &
MANAGEMENT
INTERACTIVE
WORKSPACE
PRODUCTION
PIPELINE
AUTOMATION
OPTIMIZED DATA
ACCESS
DATABRICKSENTERPRISE SECURITY
YOUR	TEAMS
Data Science
Data Engineering
Manyothers…
BIAnalysts
YOUR	DATA
Cloud Storage
Data Warehouses
Data Lake
VIRTUAL ANALYTICSPLATFORM
About Me
• Richard L Garris
• rlgarris@databricks.com
• Twitter @rlgarris
• Principal Data Solutions Architect @ Databricks
• 12+ years designing Enterprise Data Solutions for everyone from
startups to Global 2000
• Prior Work ExperiencePwC, Google and Skytree – the Machine
Learning Company
• Ohio State Buckeye and Masters from CMU
Outline
• Spark Mllib2.X
• Model Serialization
• Model Scoring SystemRequirements
• Model Scoring Architectures
• Databricks Model Scoring
About Apache Spark™ MLlib
• Started with Spark 0.8 in the
AMPLab in 2014
• Migration to Spark
DataFrames started with
Spark 1.3 with feature parity
within 2.X
• Contributions by 75+ orgs,
~250 individuals
• Distributed algorithms that
scale linearly with the data
MLlib’s Goals
• General purpose machine learning library optimizedfor big data
• Linearly scalable = 2x more machines , runtime theoretically cut in half
• Fault tolerant = resilient to the failure of nodes
• Covers the most common algorithms with distributed implementations
• Built around the concept of a Data Science Pipeline (scikit-learn)
• Written entirelyusing Apache Spark™
• Integrateswell withthe Agile Modeling Process
A Model is a MathematicalFunction
• A model is a function: 𝑓 𝑥
• Linear regression 𝑦	 = 	𝑏0	 + 	𝑏1 𝑥1	 + 	𝑏2 𝑥2
ML Pipelines
Train	model
Evaluate
Load	data
Extract	features
A	very	simple	pipeline
ML Pipelines
Train	model	1
Evaluate
Datasource 1
Datasource 2
Datasource	3
Extract	featuresExtract	features
Feature	transform	1
Feature	transform	2
Feature	transform	3
Train	model	2
Ensemble
A	real	pipeline!
ProductionizingModels Today
Data Science Data Engineering
Develop Prototype
Model using Python/R Re-implement model for
production (Java)
Problems with ProductionizingModels
Develop Prototype
Model using Python/R
Re-implement model for
production (Java)
- Extra work
- Different code paths
- Data science does not translate to production
- Slow to update models
Data Science Data Engineering
MLlib 2.X Model Serialization
Data Science Data Engineering
Develop Prototype
Model using Python/R
Persist model or Pipeline:
model.save(“s3n://...”)
Load Pipeline (Scala/Java)
Model.load(“s3n://…”)
Deploy in production
Scala
val lrModel = lrPipeline.fit(dataset)
// Save the Model
lrModel.write.save("/models/lr")
•
MLlib 2.X Model Serialization Snippet
Python
lrModel = lrPipeline.fit(dataset)
# Save the Model
lrModel.write.save("/models/lr")
•
Model Serialization Output
Code
// List Contents of the Model Dir
dbutils.fs.ls("/models/lr")
•
Output
Remember	this	is	a	pipeline	
model	and	these	are	the	stages!
TransformerStage (StringIndexer)
Code
// Cat the contents of the Metadata dir
dbutils.fs.head(”/models/lr/stages/00_strI
dx_bb9728f85745/metadata/part-00000")
// Display the Parquet File in the Data dir
display(spark.read.parquet(”/models/lr/sta
ges/00_strIdx_bb9728f85745/data/"))
Output
{
"class":"org.apache.spark.ml.feature.StringIndexerModel",
"timestamp":1488120411719,
"sparkVersion":"2.1.0",
"uid":"strIdx_bb9728f85745",
"paramMap":{
"outputCol":"workclassIdx",
"inputCol":"workclass",
"handleInvalid":"error"
}
}
Metadata	and	params
Data	(Hashmap)
Estimator Stage (LogisticRegression)
Code
// Cat the contents of the Metadata dir
dbutils.fs.head(”/models/lr/stages/18_logr
eg_325fa760f925/metadata/part-00000")
// Display the Parquet File in the Data dir
display(spark.read.parquet("/models/lr/sta
ges/18_logreg_325fa760f925/data/"))
Output
Model	params
Intercept	+	Coefficients
{"class":"org.apache.spark.ml.classification.LogisticRegressionModel",
"timestamp":1488120446324,
"sparkVersion":"2.1.0",
"uid":"logreg_325fa760f925",
"paramMap":{
"predictionCol":"prediction",
"standardization":true,
"probabilityCol":"probability",
"maxIter":100,
"elasticNetParam":0.0,
"family":"auto",
"regParam":0.0,
"threshold":0.5,
"fitIntercept":true,
"labelCol":"label” }}
Output
Decision	Tree	Splits
Estimator Stage (DecisionTree)
Code
// Display the Parquet File in the Data dir
display(spark.read.parquet(”/models/dt/stages/18_dtc_3d614bcb3ff825/data/"))
// Re-save as JSON
spark.read.parquet("/models/dt/stages/18_dtc_3d614bcb3ff825/data/").json((”/models/json/dt").
Visualize Stage (DecisionTree)
Visualization	of	the	Tree	
In	Databricks
What are the Requirementsfor
a Robust Model Deployment
System?
Model ScoringEnvironment Examples
• In Web Applications / EcommercePortals
• Mainframe / Batch ProcessingSystems
• Real-TimeProcessingSystems/ Middleware
• Via API / Microservice
• Embeddedin Devices (Mobile Phones, Medical Devices,Autos)
Hidden Technical Debt in ML Systems
“Hidden Technical Debt in Machine
Learning Systems “, Google NIPS 2015
“Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015
Agile Modeling Process
Set	Business	Goals
Understand	Your	
Data
Create	Hypothesis
Devise	Experiment
Prepare	Data
Train-Tune-Test	
Model
Deploy	Model
Measure/Evaluate	
Results
Agile Modeling Process
Set	Business	Goals
Understand	Your	
Data
Create	Hypothesis
Devise	Experiment
Prepare	Data
Train-Tune-Test	
Model
Deploy	Model
Measure/Evaluate	
Results
Focus	of	this	
talk
Set	Business	Goals
Understand	Your	
Data
Create	Hypothesis
Devise	Experiment
Prepare	Data
Train-Tune-Test	
Model
Deploy	Model
Measure/Evaluate	
Results
Deployment Should be Agile
• Deploymentneeds to
support A/B testing and
experiments
• Deploymentshould
support measuring and
evaluating model
performance
• Deploymentshould be
fast and adaptive to
business needs
Model A/B Testing, Monitoring, Updates
• A/B testing – comparing two versions to see what performs better
• Monitoring is the process of observing the model’s performance, logging it’s
behavior and alerting when the model degrades
• Logging should log exactly the data feed into the model at the time of scoring
• Model update process
• Benchmark (or Shadow Models)
• Phase-In (20% traffic)
• Avoid Big Bang
Consider the Scoring Environment
Customer SLAs
•Response time
•Throughput
(predictions per
second)
•Uptime / Reliability
Tech Stack
–C / C++
–Legacy (mainframe)
–Java
Batch Real-Time
Scoringin Batch vs Real-Time
• Synchronous
• Could be Seconds:
– Customer is waiting
(human real-time)
• Subsecond:
– High Frequency Trading
– Fraud Detection on the Swipe
• Asynchronous
• Internal Use
• Triggers can be event based on time
based
• Used for Email Campaigns,
Notifications
Open Loop – human being involved
Closed Loop – no human involved
• Model Scoring – almost always
closed loop, some open loop e.g.
alert agents or customer service
• Model Training – usually open loop
with a data scientist in the loop to
update the model
Online Learning and Open / Closed Loop
• Online is closed loop, entirely
machine driven but modeling is
risky
• need to have proper model
monitoring and safeguards to
prevent abuse / sensitivity to noise
• MLlib supports online through
streaming models (k-means, logistic
regression support online)
• Alternative – use a more complex
model to better fit new data rather
than using online learning
Open / ClosedLoop Online Learning
Model Scoring– Bot Detection
Not All Models Return Boolean – e.g. a Yes / No
Example: Login Bot Detector
Different behavior depending on probability that use is a bot
0.0-0.4 ☞ Allow login
0.4-0.6 ☞ Send Challenge Question
0.6 to 0.75 ☞ Send SMS Code
0.75 to 0.9 ☞ Refer to Agent
0.9 - 1.0 ☞ Block
Model Scoring– Recommendations
Output is a ranking of the top n items
API – send user ID + number of items
Returnsorted set of items to recommend
Optional –
pass contextsensitive informationto tailor results
Model Scoring Architectures
Architecture Option A
PrecomputePredictions using Spark and Serve fromDatabase
Train	ALS	Model
Send	Email	Offers	
to	Customers
Save	Offers	to	
NoSQL
Ranked	Offers
Display	Ranked	
Offers	in	Web	/	
Mobile
Recurring	
Batch
Architecture Option B
Spark Streamand Score using an API with CachedPredictions
Web	Activity	Logs
Kill	User’s	Login	
SessionCompute	Features Run	Prediction
Streaming
Cache	Predictions API	Check
Architecture Option C
Train with Spark and Score Outside of Spark
Train	Model	in	
Spark
Save	Model	to	S3	
/	HDFS
New	Data
Copy	
Model	to	
Production
Predictions
Load	coefficients	and	
intercept	from	file
Databricks Model Scoring
Databricks Model Scoring
• Based on ArchitectureOption C
• Goal: DeployMLlib model outside of Apache Spark and
Databricks.
• Easy to Embed in Existing Environments
• Low Latency and Complexity
• Low Overhead
• Train Model in Databricks
– Call Fit on Pipeline
– Save Model as JSON
• Deploy model in external system
– Add dependency on “dbml-local”
package (without Spark)
– Load model from JSON at startup
– Make predictions in real time
Databricks Model Scoring
Code
// Fit and Export the Model in Databricks
val lrModel = lrPipeline.fit(dataset)
ModelExporter.export(lrModel, " /models/db ")
// In Your Application (Scala)
import com.databricks.ml.local.ModelImport
val lrModel= ModelImport.import("s3a:/...")
val jsonInput = ...
val jsonOutput= lrModel.transform(jsonInput)
Databricks Model ScoringPrivate Beta
• Private BetaAvailable for Databricks Customers
• Available on Databricks using Apache Spark 2.1
• Only logistic regressionavailable now
• Additional Estimatorsand Transformersin Progress
Demo Model Scoring
https://community.cloud.databricks.com/?o=1526931011080774
#notebook/1904316851197504
Thank You.
Questions?
Happy Sparking
richard@databricks.com

More Related Content

What's hot

Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Kaxil Naik
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
クラウドネイティブトランスフォーメーションのススメ
クラウドネイティブトランスフォーメーションのススメクラウドネイティブトランスフォーメーションのススメ
クラウドネイティブトランスフォーメーションのススメHiromasa Oka
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Edureka!
 
Microsoft Azureのビッグデータ基盤とAIテクノロジーを活用しよう
Microsoft Azureのビッグデータ基盤とAIテクノロジーを活用しようMicrosoft Azureのビッグデータ基盤とAIテクノロジーを活用しよう
Microsoft Azureのビッグデータ基盤とAIテクノロジーを活用しようHideo Takagi
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Yaroslav Tkachenko
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Spark Summit
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Databricks
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesDatabricks
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Yoshiyasu SAEKI
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodDatabricks
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
SQL大量発行処理をいかにして高速化するか
SQL大量発行処理をいかにして高速化するかSQL大量発行処理をいかにして高速化するか
SQL大量発行処理をいかにして高速化するかShogo Wakayama
 

What's hot (20)

Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
クラウドネイティブトランスフォーメーションのススメ
クラウドネイティブトランスフォーメーションのススメクラウドネイティブトランスフォーメーションのススメ
クラウドネイティブトランスフォーメーションのススメ
 
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
Spark Hadoop Tutorial | Spark Hadoop Example on NBA | Apache Spark Training |...
 
Microsoft Azureのビッグデータ基盤とAIテクノロジーを活用しよう
Microsoft Azureのビッグデータ基盤とAIテクノロジーを活用しようMicrosoft Azureのビッグデータ基盤とAIテクノロジーを活用しよう
Microsoft Azureのビッグデータ基盤とAIテクノロジーを活用しよう
 
Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?Streaming SQL for Data Engineers: The Next Big Thing?
Streaming SQL for Data Engineers: The Next Big Thing?
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
SQL大量発行処理をいかにして高速化するか
SQL大量発行処理をいかにして高速化するかSQL大量発行処理をいかにして高速化するか
SQL大量発行処理をいかにして高速化するか
 

Viewers also liked

Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines
Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines
Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines Databricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...PAPIs.io
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrDatabricks
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...sparktc
 
Building Custom ML PipelineStages for Feature Selection with Marc Kaminski
Building Custom ML PipelineStages for Feature Selection with Marc KaminskiBuilding Custom ML PipelineStages for Feature Selection with Marc Kaminski
Building Custom ML PipelineStages for Feature Selection with Marc KaminskiSpark Summit
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 

Viewers also liked (8)

Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines
Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines
Deep Learning on Apache Spark: TensorFrames & Deep Learning Pipelines
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...[Tutorial] building machine learning models for predictive maintenance applic...
[Tutorial] building machine learning models for predictive maintenance applic...
 
Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
 
Building Custom ML PipelineStages for Feature Selection with Marc Kaminski
Building Custom ML PipelineStages for Feature Selection with Marc KaminskiBuilding Custom ML PipelineStages for Feature Selection with Marc Kaminski
Building Custom ML PipelineStages for Feature Selection with Marc Kaminski
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 

Similar to How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2.x with Richard Garris

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureDatabricks
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_VeriticalsPeyman Mohajerian
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...SQUADEX
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)Arnab Biswas
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...KafkaZone
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowDatabricks
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to productionGeorg Heiler
 

Similar to How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2.x with Richard Garris (20)

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
 
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
Tooling for Machine Learning: AWS Products, Open Source Tools, and DevOps Pra...
 
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...Real time data processing and model inferncing platform with Kafka streams (N...
Real time data processing and model inferncing platform with Kafka streams (N...
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
Machine learning model to production
Machine learning model to productionMachine learning model to production
Machine learning model to production
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 

Recently uploaded (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2.x with Richard Garris

  • 1. Richard Garris (Principal Solutions Architect) Apache Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
  • 2. Empower anyone to innovate faster with big data. Founded by the creators of Apache Spark. Contributes 75% of the open source code, 10x more than any other company. VISION WHO WE ARE A fully-managed data processing platform for the enterprise, powered by Apache Spark. PRODUCT
  • 3. CLUSTER TUNING & MANAGEMENT INTERACTIVE WORKSPACE PRODUCTION PIPELINE AUTOMATION OPTIMIZED DATA ACCESS DATABRICKSENTERPRISE SECURITY YOUR TEAMS Data Science Data Engineering Manyothers… BIAnalysts YOUR DATA Cloud Storage Data Warehouses Data Lake VIRTUAL ANALYTICSPLATFORM
  • 4. About Me • Richard L Garris • rlgarris@databricks.com • Twitter @rlgarris • Principal Data Solutions Architect @ Databricks • 12+ years designing Enterprise Data Solutions for everyone from startups to Global 2000 • Prior Work ExperiencePwC, Google and Skytree – the Machine Learning Company • Ohio State Buckeye and Masters from CMU
  • 5. Outline • Spark Mllib2.X • Model Serialization • Model Scoring SystemRequirements • Model Scoring Architectures • Databricks Model Scoring
  • 6. About Apache Spark™ MLlib • Started with Spark 0.8 in the AMPLab in 2014 • Migration to Spark DataFrames started with Spark 1.3 with feature parity within 2.X • Contributions by 75+ orgs, ~250 individuals • Distributed algorithms that scale linearly with the data
  • 7. MLlib’s Goals • General purpose machine learning library optimizedfor big data • Linearly scalable = 2x more machines , runtime theoretically cut in half • Fault tolerant = resilient to the failure of nodes • Covers the most common algorithms with distributed implementations • Built around the concept of a Data Science Pipeline (scikit-learn) • Written entirelyusing Apache Spark™ • Integrateswell withthe Agile Modeling Process
  • 8. A Model is a MathematicalFunction • A model is a function: 𝑓 𝑥 • Linear regression 𝑦 = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2
  • 10. ML Pipelines Train model 1 Evaluate Datasource 1 Datasource 2 Datasource 3 Extract featuresExtract features Feature transform 1 Feature transform 2 Feature transform 3 Train model 2 Ensemble A real pipeline!
  • 11. ProductionizingModels Today Data Science Data Engineering Develop Prototype Model using Python/R Re-implement model for production (Java)
  • 12. Problems with ProductionizingModels Develop Prototype Model using Python/R Re-implement model for production (Java) - Extra work - Different code paths - Data science does not translate to production - Slow to update models Data Science Data Engineering
  • 13. MLlib 2.X Model Serialization Data Science Data Engineering Develop Prototype Model using Python/R Persist model or Pipeline: model.save(“s3n://...”) Load Pipeline (Scala/Java) Model.load(“s3n://…”) Deploy in production
  • 14. Scala val lrModel = lrPipeline.fit(dataset) // Save the Model lrModel.write.save("/models/lr") • MLlib 2.X Model Serialization Snippet Python lrModel = lrPipeline.fit(dataset) # Save the Model lrModel.write.save("/models/lr") •
  • 15. Model Serialization Output Code // List Contents of the Model Dir dbutils.fs.ls("/models/lr") • Output Remember this is a pipeline model and these are the stages!
  • 16. TransformerStage (StringIndexer) Code // Cat the contents of the Metadata dir dbutils.fs.head(”/models/lr/stages/00_strI dx_bb9728f85745/metadata/part-00000") // Display the Parquet File in the Data dir display(spark.read.parquet(”/models/lr/sta ges/00_strIdx_bb9728f85745/data/")) Output { "class":"org.apache.spark.ml.feature.StringIndexerModel", "timestamp":1488120411719, "sparkVersion":"2.1.0", "uid":"strIdx_bb9728f85745", "paramMap":{ "outputCol":"workclassIdx", "inputCol":"workclass", "handleInvalid":"error" } } Metadata and params Data (Hashmap)
  • 17. Estimator Stage (LogisticRegression) Code // Cat the contents of the Metadata dir dbutils.fs.head(”/models/lr/stages/18_logr eg_325fa760f925/metadata/part-00000") // Display the Parquet File in the Data dir display(spark.read.parquet("/models/lr/sta ges/18_logreg_325fa760f925/data/")) Output Model params Intercept + Coefficients {"class":"org.apache.spark.ml.classification.LogisticRegressionModel", "timestamp":1488120446324, "sparkVersion":"2.1.0", "uid":"logreg_325fa760f925", "paramMap":{ "predictionCol":"prediction", "standardization":true, "probabilityCol":"probability", "maxIter":100, "elasticNetParam":0.0, "family":"auto", "regParam":0.0, "threshold":0.5, "fitIntercept":true, "labelCol":"label” }}
  • 18. Output Decision Tree Splits Estimator Stage (DecisionTree) Code // Display the Parquet File in the Data dir display(spark.read.parquet(”/models/dt/stages/18_dtc_3d614bcb3ff825/data/")) // Re-save as JSON spark.read.parquet("/models/dt/stages/18_dtc_3d614bcb3ff825/data/").json((”/models/json/dt").
  • 20. What are the Requirementsfor a Robust Model Deployment System?
  • 21. Model ScoringEnvironment Examples • In Web Applications / EcommercePortals • Mainframe / Batch ProcessingSystems • Real-TimeProcessingSystems/ Middleware • Via API / Microservice • Embeddedin Devices (Mobile Phones, Medical Devices,Autos)
  • 22. Hidden Technical Debt in ML Systems “Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015 “Hidden Technical Debt in Machine Learning Systems “, Google NIPS 2015
  • 25. Set Business Goals Understand Your Data Create Hypothesis Devise Experiment Prepare Data Train-Tune-Test Model Deploy Model Measure/Evaluate Results Deployment Should be Agile • Deploymentneeds to support A/B testing and experiments • Deploymentshould support measuring and evaluating model performance • Deploymentshould be fast and adaptive to business needs
  • 26. Model A/B Testing, Monitoring, Updates • A/B testing – comparing two versions to see what performs better • Monitoring is the process of observing the model’s performance, logging it’s behavior and alerting when the model degrades • Logging should log exactly the data feed into the model at the time of scoring • Model update process • Benchmark (or Shadow Models) • Phase-In (20% traffic) • Avoid Big Bang
  • 27. Consider the Scoring Environment Customer SLAs •Response time •Throughput (predictions per second) •Uptime / Reliability Tech Stack –C / C++ –Legacy (mainframe) –Java
  • 28. Batch Real-Time Scoringin Batch vs Real-Time • Synchronous • Could be Seconds: – Customer is waiting (human real-time) • Subsecond: – High Frequency Trading – Fraud Detection on the Swipe • Asynchronous • Internal Use • Triggers can be event based on time based • Used for Email Campaigns, Notifications
  • 29. Open Loop – human being involved Closed Loop – no human involved • Model Scoring – almost always closed loop, some open loop e.g. alert agents or customer service • Model Training – usually open loop with a data scientist in the loop to update the model Online Learning and Open / Closed Loop • Online is closed loop, entirely machine driven but modeling is risky • need to have proper model monitoring and safeguards to prevent abuse / sensitivity to noise • MLlib supports online through streaming models (k-means, logistic regression support online) • Alternative – use a more complex model to better fit new data rather than using online learning Open / ClosedLoop Online Learning
  • 30. Model Scoring– Bot Detection Not All Models Return Boolean – e.g. a Yes / No Example: Login Bot Detector Different behavior depending on probability that use is a bot 0.0-0.4 ☞ Allow login 0.4-0.6 ☞ Send Challenge Question 0.6 to 0.75 ☞ Send SMS Code 0.75 to 0.9 ☞ Refer to Agent 0.9 - 1.0 ☞ Block
  • 31. Model Scoring– Recommendations Output is a ranking of the top n items API – send user ID + number of items Returnsorted set of items to recommend Optional – pass contextsensitive informationto tailor results
  • 33. Architecture Option A PrecomputePredictions using Spark and Serve fromDatabase Train ALS Model Send Email Offers to Customers Save Offers to NoSQL Ranked Offers Display Ranked Offers in Web / Mobile Recurring Batch
  • 34. Architecture Option B Spark Streamand Score using an API with CachedPredictions Web Activity Logs Kill User’s Login SessionCompute Features Run Prediction Streaming Cache Predictions API Check
  • 35. Architecture Option C Train with Spark and Score Outside of Spark Train Model in Spark Save Model to S3 / HDFS New Data Copy Model to Production Predictions Load coefficients and intercept from file
  • 37. Databricks Model Scoring • Based on ArchitectureOption C • Goal: DeployMLlib model outside of Apache Spark and Databricks. • Easy to Embed in Existing Environments • Low Latency and Complexity • Low Overhead
  • 38. • Train Model in Databricks – Call Fit on Pipeline – Save Model as JSON • Deploy model in external system – Add dependency on “dbml-local” package (without Spark) – Load model from JSON at startup – Make predictions in real time Databricks Model Scoring Code // Fit and Export the Model in Databricks val lrModel = lrPipeline.fit(dataset) ModelExporter.export(lrModel, " /models/db ") // In Your Application (Scala) import com.databricks.ml.local.ModelImport val lrModel= ModelImport.import("s3a:/...") val jsonInput = ... val jsonOutput= lrModel.transform(jsonInput)
  • 39. Databricks Model ScoringPrivate Beta • Private BetaAvailable for Databricks Customers • Available on Databricks using Apache Spark 2.1 • Only logistic regressionavailable now • Additional Estimatorsand Transformersin Progress