SlideShare a Scribd company logo
1 of 34
Apache Spark(™)
Model Deployment
Bay Area Spark Meetup – June 30, 2016
Richard Garris – Big Data Solution Architect Focused on Advanced Analytics
About Me
Richard L Garris
• rlgarris@databricks.com
• @rlgarris [Twitter]
Big Data Solutions Architect @ Databricks
12+ years designing Enterprise Data Solutions for everyone from
startups to Global 2000
Prior Work Experience PwC, Google, Skytree
Ohio State Buckeye and CMU Alumni
2
About Apache Spark MLlib
Started at Berkeley AMPLab
(Apache Spark 0.8)
Now (Apache Spark 2.0)
• Contributions from 75+ orgs, ~250 individuals
• Development driven by Databricks: roadmap + 50% of
PRs
• Growing coverage of distributed algorithms
Spark
SparkSQL Streaming MLlib GraphFrames
3
MLlib Goals
General Machine Learning library for big data
• Scalable & robust
• Coverage of common algorithms
• Leverages Apache Spark
Tools for practical workflows
Integration with existing data science tools
4
Apache Spark MLlib
• spark.mllib
• Pre Mllib < Spark 1.4
• Spark Mllib was a lower
level library that used
Spark RDDs
• Uses LabeledPoint,
Vectors and Tuples
• Maintenance Mode only
after Spark 2.X
// Load and parse the data
val data = sc.textFile("data/mllib/ridge-data/lpsa.data")
val parsedData = data.map { line =>
val parts = line.split(',')
LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split('
').map(_.toDouble)))
}.cache()
// Building the model
val numIterations = 100
val stepSize = 0.00000001
val model = LinearRegressionWithSGD.train(parsedData, numIterations, st
epSize)
// Evaluate model on training examples and compute training error
val valuesAndPreds = parsedData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
Apache Spark – ML Pipelines
• spark.ml
• Spark > 1.4
• Spark.ML pipelines –
able to create more
complex models
• Integrated with
DataFrames
// Let's initialize our linear regression learner
val lr = new LinearRegression()
// Now we set the parameters for the method
lr.setPredictionCol("Predicted_PE")
.setLabelCol("PE").setMaxIter(100).setRegParam(0.1)
// We will use the new spark.ml pipeline API. If you
have worked with scikit-learn this will be very
familiar.
val lrPipeline = new Pipeline()
lrPipeline.setStages(Array(vectorizer, lr))
// Let's first train on the entire dataset to see what
we get
val lrModel = lrPipeline.fit(trainingSet)
The Agile Modeling Process
Set Business
Goals
Understand
Your Data
Create
Hypothesis
Devise
Experiment
Prepare Data
Train-Tune-Test
Model
Deploy Model
Measure /
Evaluate Results
The Agile Modeling Process
Set Business
Goals
Understand
Your Data
Create
Hypothesis
Devise
Experiment
Prepare Data
Train-Tune-Test
Model
Deploy Model
Measure /
Evaluate Results
Focus of this
talk
What is a Model?
•
But What Really is a Model?
A model is a complex pipeline of components
• Data Sources
• Joins
• Featurization Logic
• Algorithm(s)
• Transformers
• Estimators
• Tuning Parameters
ML Pipelines
11
Train model
Evaluate
Load data
Extract features
A very simple pipeline
ML Pipelines
12
Train model 1
Evaluate
Datasource 1
Datasource 2
Datasource 3
Extract features
Extract features
Feature transform 1
Feature transform 2
Feature transform 3
Train model 2
Ensemble
A real pipeline
Why ML persistence?
13
Data
Science
Software
Engineering
Prototype (Python/R)
Create model
Re-implement model for
production (Java)
Deploy model
Why ML persistence?
14
Data
Science
Software
Engineering
Prototype (Python/R)
Create Pipeline
• Extract raw features
• Transform features
• Select key features
• Fit multiple models
• Combine results to
make prediction
• Extra implementation work
• Different code paths
• Synchronization overhead
Re-implement Pipeline for
production (Java)
Deploy Pipeline
With ML persistence...
15
Data
Science
Software
Engineering
Prototype (Python/R)
Create Pipeline
Persist model or Pipeline:
model.save(“s3n://...”)
Load Pipeline (Scala/Java)
Model.load(“s3n://…”)
Deploy in production
Demo
Model Serialization in Apache Spark 2.0 using Parquet
What are the Requirements
for a Robust Model
Deployment System?
Customer SLAs
• Response time
• Throughput (predictions per second)
• Uptime / Reliability
Tech Stack
• C / C++
• Legacy (mainframe)
• Java
• Docker
Your Model Scoring Environment
Offline
• Internal Use (batch)
• Emails, Notifications
(batch)
• Offline – schedule based or
event trigger based
Model Scoring Offline vs Online
Online
• Customer Waiting on the
Response (human real-time)
• Super low-latency with fixed
response window
(transactional fraud, ad
bidding)
Not All Models Return a Yes / No
Model Scoring Considerations
Example: Login Bot Detector
Different behavior depending on
probability score
0.0-0.4 ☞ Allow login
0.4-0.6 ☞ Challenge Question
0.6 to 0.75 ☞ Send SMS
0.75 to 0.9 ☞ Refer to Agent
0.9 - 1.0 ☞ Block
Example: Item Recommendations
Output is a ranking of the top n items
API – send user ID + number of items
Return sorted set of items to recommend
Optional – pass context sensitive information
to tailor results
Model Updates and Versioning
• Model Update Frequency
(nightly, weekly, monthly, quarterly)
• Model Version Tracking
• Model Release Process
• Dev ‣ Test ‣ Staging ‣ Production
• Model update process
• Benchmark (or Shadow Models)
• Phase-In (20% traffic)
• Big Bang
• Models can have both reward and risk to the business
– Well designed models prevent fraud, reduce churn, increase sales
– Poorly designed models increase fraud, could impact the company’s brand,
cause compliance violations or other risks
• Models should be governed by the company's policies and procedures,
laws and regulations and the organization's management goals
Model Governance
Considerations
• Models have to be transparent, explainable, traceable and interpretable for
auditors / regulators
• Models may need reason codes for rejections (e.g. if I decline someone credit why?)
• Models should have an approval and release process
• Models also cannot violate any discrimination laws or use features that could be
traced to religion, gender, ethnicity,
Model A/B Testing
Set Business
Goals
Understand
Your Data
Create
Hypothesis
Devise
Experiment
Prepare Data
Train-Tune-Test
Model
Deploy Model
Measure /
Evaluate
Results
• A/B testing – comparing two
versions to see what performs
better
• Historical data works for
evaluating models in testing, but
production experiments required
to validate model hypothesis
• Model update process
• Benchmark (or Shadow Models)
• Phase-In (20% traffic)
• Big Bang
A/B Framework should support these steps
• Monitoring is the process of
observing the model’s
performance, logging it’s
behavior and alerting when the
model degrades
• Logging should log exactly the
data feed into the model at the
time of scoring
• Model alerting is critical to
detect unusual or unexpected
behaviors
Model Monitoring
Open Loop vs Closed Loop
• Open Loop – human being involved
• Closed Loop – no human involved
Model Scoring – almost always closed loop, some models alert
agents or customer service
Model Training – usually open loop with a data scientist in the
loop to update the model
Online Learning
• closed loop, entirely machine driven modeling is
risky
• need to have proper model monitoring and
safeguards to prevent abuse / sensitivity to noise
• Mllib supports online through streaming models (k-
means, logistic regression support online)
• Alternative – use a more complex model to better fit
new data rather than using online learning
Model Deployment
Architectures
Architecture #1
Offline Recommendations
Train ALS Model Send Offers to Customers
Save Offers to NoSQL
Ranked Offers
Display Ranked Offers in
Web / Mobile
Nightly Batch
Architecture #2
Precomputed Features with Streaming
Web Logs
Kill User’s Login SessionPre-compute Features Features
Spark Streaming
Architecture #3
Local Apache Spark(™)
Train Model in Spark Save Model to S3 / HDFS
New Data
Copy
Model to
Production
Predictions
Run Spark Local
Demo
• Example of Offline Recommendations using ALS and
Redis as a NoSQL Cache
Try Databricks Community Edition
2016 Apache Spark Survey
33
Spark Summit EU
Brussels
October 25-27
The CFP closes at 11:59pm on July 1st
For more information and to submit:
https://spark-summit.org/eu-2016/
34

More Related Content

What's hot

Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Databricks
 
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...Spark Summit
 
VictoriaMetrics 2023 Roadmap
VictoriaMetrics 2023 RoadmapVictoriaMetrics 2023 Roadmap
VictoriaMetrics 2023 RoadmapVictoriaMetrics
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...HostedbyConfluent
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Roberto Hashioka
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api TrainingSpark Summit
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
 
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...IDERA Software
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debeziumKasun Don
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
How to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with PantherHow to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with PantherPanther Labs
 
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...confluent
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for PrometheusMitsuhiro Tanda
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 

What's hot (20)

Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
 
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
Clickstream Analysis with Spark—Understanding Visitors in Realtime by Josef A...
 
VictoriaMetrics 2023 Roadmap
VictoriaMetrics 2023 RoadmapVictoriaMetrics 2023 Roadmap
VictoriaMetrics 2023 Roadmap
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...Idera live 2021:  Keynote Presentation The Future of Data is The Data Cloud b...
Idera live 2021: Keynote Presentation The Future of Data is The Data Cloud b...
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
How to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with PantherHow to Implement Snowflake Security Best Practices with Panther
How to Implement Snowflake Security Best Practices with Panther
 
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Grafana optimization for Prometheus
Grafana optimization for PrometheusGrafana optimization for Prometheus
Grafana optimization for Prometheus
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 

Viewers also liked

Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrDatabricks
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkRDataWorks Summit
 
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...Spark Summit
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...sparktc
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterDataWorks Summit
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...gethue
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanDatabricks
 
Dynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark ApplicationDynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark ApplicationDataWorks Summit
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageSandeep Patil
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
 

Viewers also liked (20)

Use r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkrUse r tutorial part1, introduction to sparkr
Use r tutorial part1, introduction to sparkr
 
Scalable Data Science with SparkR
Scalable Data Science with SparkRScalable Data Science with SparkR
Scalable Data Science with SparkR
 
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
 
Spark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu KasinathanSpark Compute as a Service at Paypal with Prabhu Kasinathan
Spark Compute as a Service at Paypal with Prabhu Kasinathan
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Dynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark ApplicationDynamically Allocate Cluster Resources to your Spark Application
Dynamically Allocate Cluster Resources to your Spark Application
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
Hadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better StorageHadoop and Spark Analytics over Better Storage
Hadoop and Spark Analytics over Better Storage
 
SocSciBot(01 Mar2010) - Korean Manual
SocSciBot(01 Mar2010) - Korean ManualSocSciBot(01 Mar2010) - Korean Manual
SocSciBot(01 Mar2010) - Korean Manual
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
Proxy Servers
Proxy ServersProxy Servers
Proxy Servers
 
Proxy Server
Proxy ServerProxy Server
Proxy Server
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 

Similar to Apache Spark Model Deployment

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro sessionAvinash Patil
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-useltonrodriguez11
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsPerficient, Inc.
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)Arnab Biswas
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsDatabricks
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 

Similar to Apache Spark Model Deployment (20)

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle Analytics
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Machine learning
Machine learningMachine learning
Machine learning
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Apache Spark Model Deployment

  • 1. Apache Spark(™) Model Deployment Bay Area Spark Meetup – June 30, 2016 Richard Garris – Big Data Solution Architect Focused on Advanced Analytics
  • 2. About Me Richard L Garris • rlgarris@databricks.com • @rlgarris [Twitter] Big Data Solutions Architect @ Databricks 12+ years designing Enterprise Data Solutions for everyone from startups to Global 2000 Prior Work Experience PwC, Google, Skytree Ohio State Buckeye and CMU Alumni 2
  • 3. About Apache Spark MLlib Started at Berkeley AMPLab (Apache Spark 0.8) Now (Apache Spark 2.0) • Contributions from 75+ orgs, ~250 individuals • Development driven by Databricks: roadmap + 50% of PRs • Growing coverage of distributed algorithms Spark SparkSQL Streaming MLlib GraphFrames 3
  • 4. MLlib Goals General Machine Learning library for big data • Scalable & robust • Coverage of common algorithms • Leverages Apache Spark Tools for practical workflows Integration with existing data science tools 4
  • 5. Apache Spark MLlib • spark.mllib • Pre Mllib < Spark 1.4 • Spark Mllib was a lower level library that used Spark RDDs • Uses LabeledPoint, Vectors and Tuples • Maintenance Mode only after Spark 2.X // Load and parse the data val data = sc.textFile("data/mllib/ridge-data/lpsa.data") val parsedData = data.map { line => val parts = line.split(',') LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble))) }.cache() // Building the model val numIterations = 100 val stepSize = 0.00000001 val model = LinearRegressionWithSGD.train(parsedData, numIterations, st epSize) // Evaluate model on training examples and compute training error val valuesAndPreds = parsedData.map { point => val prediction = model.predict(point.features) (point.label, prediction) }
  • 6. Apache Spark – ML Pipelines • spark.ml • Spark > 1.4 • Spark.ML pipelines – able to create more complex models • Integrated with DataFrames // Let's initialize our linear regression learner val lr = new LinearRegression() // Now we set the parameters for the method lr.setPredictionCol("Predicted_PE") .setLabelCol("PE").setMaxIter(100).setRegParam(0.1) // We will use the new spark.ml pipeline API. If you have worked with scikit-learn this will be very familiar. val lrPipeline = new Pipeline() lrPipeline.setStages(Array(vectorizer, lr)) // Let's first train on the entire dataset to see what we get val lrModel = lrPipeline.fit(trainingSet)
  • 7. The Agile Modeling Process Set Business Goals Understand Your Data Create Hypothesis Devise Experiment Prepare Data Train-Tune-Test Model Deploy Model Measure / Evaluate Results
  • 8. The Agile Modeling Process Set Business Goals Understand Your Data Create Hypothesis Devise Experiment Prepare Data Train-Tune-Test Model Deploy Model Measure / Evaluate Results Focus of this talk
  • 9. What is a Model? •
  • 10. But What Really is a Model? A model is a complex pipeline of components • Data Sources • Joins • Featurization Logic • Algorithm(s) • Transformers • Estimators • Tuning Parameters
  • 11. ML Pipelines 11 Train model Evaluate Load data Extract features A very simple pipeline
  • 12. ML Pipelines 12 Train model 1 Evaluate Datasource 1 Datasource 2 Datasource 3 Extract features Extract features Feature transform 1 Feature transform 2 Feature transform 3 Train model 2 Ensemble A real pipeline
  • 13. Why ML persistence? 13 Data Science Software Engineering Prototype (Python/R) Create model Re-implement model for production (Java) Deploy model
  • 14. Why ML persistence? 14 Data Science Software Engineering Prototype (Python/R) Create Pipeline • Extract raw features • Transform features • Select key features • Fit multiple models • Combine results to make prediction • Extra implementation work • Different code paths • Synchronization overhead Re-implement Pipeline for production (Java) Deploy Pipeline
  • 15. With ML persistence... 15 Data Science Software Engineering Prototype (Python/R) Create Pipeline Persist model or Pipeline: model.save(“s3n://...”) Load Pipeline (Scala/Java) Model.load(“s3n://…”) Deploy in production
  • 16. Demo Model Serialization in Apache Spark 2.0 using Parquet
  • 17. What are the Requirements for a Robust Model Deployment System?
  • 18. Customer SLAs • Response time • Throughput (predictions per second) • Uptime / Reliability Tech Stack • C / C++ • Legacy (mainframe) • Java • Docker Your Model Scoring Environment
  • 19. Offline • Internal Use (batch) • Emails, Notifications (batch) • Offline – schedule based or event trigger based Model Scoring Offline vs Online Online • Customer Waiting on the Response (human real-time) • Super low-latency with fixed response window (transactional fraud, ad bidding)
  • 20. Not All Models Return a Yes / No Model Scoring Considerations Example: Login Bot Detector Different behavior depending on probability score 0.0-0.4 ☞ Allow login 0.4-0.6 ☞ Challenge Question 0.6 to 0.75 ☞ Send SMS 0.75 to 0.9 ☞ Refer to Agent 0.9 - 1.0 ☞ Block Example: Item Recommendations Output is a ranking of the top n items API – send user ID + number of items Return sorted set of items to recommend Optional – pass context sensitive information to tailor results
  • 21. Model Updates and Versioning • Model Update Frequency (nightly, weekly, monthly, quarterly) • Model Version Tracking • Model Release Process • Dev ‣ Test ‣ Staging ‣ Production • Model update process • Benchmark (or Shadow Models) • Phase-In (20% traffic) • Big Bang
  • 22. • Models can have both reward and risk to the business – Well designed models prevent fraud, reduce churn, increase sales – Poorly designed models increase fraud, could impact the company’s brand, cause compliance violations or other risks • Models should be governed by the company's policies and procedures, laws and regulations and the organization's management goals Model Governance Considerations • Models have to be transparent, explainable, traceable and interpretable for auditors / regulators • Models may need reason codes for rejections (e.g. if I decline someone credit why?) • Models should have an approval and release process • Models also cannot violate any discrimination laws or use features that could be traced to religion, gender, ethnicity,
  • 23. Model A/B Testing Set Business Goals Understand Your Data Create Hypothesis Devise Experiment Prepare Data Train-Tune-Test Model Deploy Model Measure / Evaluate Results • A/B testing – comparing two versions to see what performs better • Historical data works for evaluating models in testing, but production experiments required to validate model hypothesis • Model update process • Benchmark (or Shadow Models) • Phase-In (20% traffic) • Big Bang A/B Framework should support these steps
  • 24. • Monitoring is the process of observing the model’s performance, logging it’s behavior and alerting when the model degrades • Logging should log exactly the data feed into the model at the time of scoring • Model alerting is critical to detect unusual or unexpected behaviors Model Monitoring
  • 25. Open Loop vs Closed Loop • Open Loop – human being involved • Closed Loop – no human involved Model Scoring – almost always closed loop, some models alert agents or customer service Model Training – usually open loop with a data scientist in the loop to update the model
  • 26. Online Learning • closed loop, entirely machine driven modeling is risky • need to have proper model monitoring and safeguards to prevent abuse / sensitivity to noise • Mllib supports online through streaming models (k- means, logistic regression support online) • Alternative – use a more complex model to better fit new data rather than using online learning
  • 28. Architecture #1 Offline Recommendations Train ALS Model Send Offers to Customers Save Offers to NoSQL Ranked Offers Display Ranked Offers in Web / Mobile Nightly Batch
  • 29. Architecture #2 Precomputed Features with Streaming Web Logs Kill User’s Login SessionPre-compute Features Features Spark Streaming
  • 30. Architecture #3 Local Apache Spark(™) Train Model in Spark Save Model to S3 / HDFS New Data Copy Model to Production Predictions Run Spark Local
  • 31. Demo • Example of Offline Recommendations using ALS and Redis as a NoSQL Cache
  • 33. 2016 Apache Spark Survey 33
  • 34. Spark Summit EU Brussels October 25-27 The CFP closes at 11:59pm on July 1st For more information and to submit: https://spark-summit.org/eu-2016/ 34