SlideShare a Scribd company logo
1 of 83
© 2017 MapR Technologies 1
Why Stream?
and
Machine Learning Logistics
© 2017 MapR Technologies 2
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Committer, PMC member, board member, ASF
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
© 2017 MapR Technologies 3
Traditional Solution – Use a Profile Database
POS
1..n
Fraud
detector
Last card
use
© 2017 MapR Technologies 4
What Happens as You Scale Up?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
© 2017 MapR Technologies 5
Shared Database Can Be A Problem
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
Shared database
causes problems
Big problem is
disagreement about
schema and indexing
© 2017 MapR Technologies 6
Alternative: Use a Stream to Isolate Services
POS
1..n
Fraud
detector
Last card
use
Updater
card activity
© 2017 MapR Technologies 7
Add New Services via the Stream
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity
© 2017 MapR Technologies 8
Changing Implementation Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
© 2017 MapR Technologies 9
Changing Implementation Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector
© 2017 MapR Technologies 10
With MapR, Geo-Distributed Data Appears Local
stream
Data
source
Consumer
© 2017 MapR Technologies 11
With MapR, Geo-Distributed Data Appears Local
stream
stream
Data
source
Consumer
© 2017 MapR Technologies 12
With MapR, Geo-distributed Data Appears Local
stream
stream
Data
source
ConsumerGlobal Data Center
Regional Data Center
© 2017 MapR Technologies 13
Use Case: Telecommunications
Callers
Towers
cdr data
© 2017 MapR Technologies 14
Streaming in Telecom
• Data collection & handling happens at different levels
– tower, local data center, central data center)
• Batch: Can take 30 minutes per level
• Streaming: Latency drops to seconds or sub-seconds per level
• Ability to respond as events occur
• MapR Streams enables stream replication with offsets across data
centers
© 2017 MapR Technologies 15
Unique to MapR: Manage Topics at Stream Level
• Many more topics on MapR cluster
• Topics are grouped together in Stream (different from Kafka)
• Policies set at the Stream level such as time-to-live, ACEs (controlled
access at this level is different than Kafka)
• Geo-distributed stream replication (different from Kafka)
Stream
Topic 1
Topic 3
Topic 2
Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission
© 2017 MapR Technologies 16
Use Case: Each pump has many sensors
pump
data
Dashboard
C2
topic = p1
p2
p3
p4
p5
p1
p1
p5
© 2017 MapR Technologies 17
Use topics as an organizing principle
© 2017 MapR Technologies 18
Example
Files
Table
Streams
Directories
Cluster
Volume mount point
© 2017 MapR Technologies 19
Cluster
Volume mount point
© 2017 MapR Technologies 20
Streams should be integrated tightly into
normal persistence
© 2017 MapR Technologies 21
Stream vs Database
• Can be better for flexibility and multi-tenancy
• Streams can be 50 – 100x faster than db (no mutation)
• Faster means less arguments about performance optimization
• Operations are simpler so works better to share data
• Don’t have to commit to one type of db: push updates through
stream and let each group use the db they want
© 2017 MapR Technologies 22
Collect Data
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 23
And Transport to Global Analytics
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 24
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 25
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 26
With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
© 2017 MapR Technologies 27
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
© 2017 MapR Technologies 28
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
data-center . machine . sensor
© 2017 MapR Technologies 29
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
data-center . *. sensor
© 2017 MapR Technologies 30
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
data-center . machine. *
© 2017 MapR Technologies 31
Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
* . *. sensor
© 2017 MapR Technologies 32
Act locally, learn globally
© 2017 MapR Technologies 33
Machine Learning Logistics
© 2017 MapR Technologies 34
Traditional View
© 2017 MapR Technologies 35
Traditional View: This isn’t the whole story
© 2017 MapR Technologies 36
90% of the effort in successful machine
learning isn’t in the training or model dev…
It’s the logistics
© 2017 MapR Technologies 37
Why?
• Just getting the training data is hard
– Which data? How to make it accessible? Multiple sources!
– New kinds of observations force restarts
– Requires a ton of domain knowledge
• The myth of the unitary model
– You can’t train just one
– You will have dozens of models, likely hundreds or more
– Handoff to new versions is tricky
– You have to get run-time to be sure about which is better

© 2017 MapR Technologies 38
What Machine Learning Tool is Best?
• Most successful groups keep several “favorite” machine
learning tools at hand
– No single tool is best in every situation
• The most important tool is a platform that supports logistics well
– Don’t have to do everything at the application level
– Lots of what matters can be handled at the platform level
• A good design for the logistics can make a big difference
© 2017 MapR Technologies 39
Some Gotchas
• Ops-oriented people will not “get it” regarding modeling
subtleties
• Data scientists will not “get it” regarding operational realities
• Therefore, modelers have to deliver self-contained models
• And, ops has to provide pre-wired structure
© 2017 MapR Technologies 40
Rendezvous Architecture
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
© 2017 MapR Technologies 41
Rendezvous to the Rescue: Better ML Logistics
• Stream-1st architecture is a powerful approach with surprisingly
widespread advantages
– Innovative technologies emerging to for streaming data
• Microservices approach provides flexibility
– Streaming supports microservices (if done right)
• Containers remove surprises
– Predictable environment for running models
© 2017 MapR Technologies 42
Rendezvous: Mainly for Decisioning Engines
• Decisioning models
– Looking for a “right answer”
– Simpler than reinforcement learning
• Examples include:
– Fraud detection
– Predictive analytics / market prediction
– Churn prediction (as in telecommunications)
– Yield optimization
– Deep learning in form of speech or image recognition, in some cases
© 2017 MapR Technologies 43
What We Ultimately Want
request
response
Model
© 2017 MapR Technologies 44
But This Isn’t The Answer
Model 1
request
response
Load
balancer
Model 2
Model 3
© 2017 MapR Technologies 45
First Try with Streams
Input
Model 1
Model 2
Model 3
request
response
?
© 2017 MapR Technologies 46
First Rendezvous
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
© 2017 MapR Technologies 47
Some Key Points
• Note that all models see identical inputs
• All models run in production setting
• All models send scores to same stream
• The rendezvous server decides which scores to ignore
• Roll forward, roll back, correlated comparison are all now trivial
© 2017 MapR Technologies 48
Reality Check, Injecting External State
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Database
The world
© 2017 MapR Technologies 49
Recording Raw Data (as it really was)
Input
Scores
Decoy
Model 2
Model 3
Archive
© 2017 MapR Technologies 50
Quality & Reproducibility of Input Data is Important!
• Recording raw-ish data is really a big deal
– Data as seen by a model is worth gold
– Data reconstructed later often has time-machine leaks
– Databases were made for updates, streams are safer
• Raw data is useful for non-ML cases as well (think flexibility)
• Decoy model records training data as seen by models under
development & evaluation
© 2017 MapR Technologies 51
Canary for Comparison
Real
model
∆
Result
Canary
Decoy
Archive
Input
© 2017 MapR Technologies 52
What Does the Canary Do?
• The canary is a real model, but is very rarely updated
• The canary results are almost never used for decisioning
• The virtue of the canary is stability
• Comparing to the canary results gives insight into new models
© 2017 MapR Technologies 53
Isolated Development With Stream Replication
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Internal 1
Internal 2
Internal 3
The world
Model 4
Raw
New
external
data
Input
Internal 4
Production
Development
© 2017 MapR Technologies 54
A Quick Review
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 55
The Proxy Talks to the Outside World
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 56
The Input Stream Feeds All Models Identically
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 57
The Scores Stream Contains All Results
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 58
The Rendezvous Picks A Result
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 59
Results Return Via A Stream and Return Address
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 60
Models in production live in the real
world:
Conditions may (will) change
© 2017 MapR Technologies 61
Rendezvous Schedules
• The key idea of rendezvous schedules is to define the trade-off
of latency versus model priority
– At short delays, we want the best
– At moderate delays we will compromise a bit
– Near the deadline, we will take any answer at all
• Normally the same rendezvous schedules apply to all
transactions
– Overriding default schedule has bona fide uses
© 2017 MapR Technologies 62
Rendezvous Overrides
• Incoming transaction can carry an overriding schedule
– This is great for QA, to see output from a specific model
– Overriding the default schedule is also good for systemic A/B tests
• Overrides should be unusual
© 2017 MapR Technologies 63
Scaling Up
• More kinds of model
– multiple rendezvous frameworks for different tasks
• More throughput
– Fast default models
– Partition input stream to allow parallel model evaluation
– Input batching
• Extreme volumes require extreme measures
– Cannibalize fancy models to run more fast/simple models
– Speed before beauty
© 2017 MapR Technologies 64
Faster Throughput Through Failure
• Suppose we have one model that can handle 10,000 t/s @ 2ms
– But this isn’t the most accurate model. Not bad, but not best
• And our champion model can handle 1000 t/s @ 10ms
• Then imagine a burst of 2000 t/s for several minutes
• Champion can only evaluate half of all requests
– Should skip to keep up
– Fast model will cover for champion
© 2017 MapR Technologies 65
Input Scores
Model 1
Model 2
Model 3
© 2017 MapR Technologies 66
Input Scores
Model 1
Model 2
Model 3
© 2017 MapR Technologies 67
Input Scores
Model 1
Model 2
Model 3
© 2017 MapR Technologies 68
Always have a default or
fallback model
Models that fall behind should
discard requests to catch up
© 2017 MapR Technologies 69
Limitations of Rendezvous
• 100% speculative execution can be expensive
– Can be mitigated by partial speculation
– Or it may just be too expensive
• Minimum Viable Products should be minimal
– You may not require zero downtime … be realistic
• Context may be too large
• Latency limits may be too stringent
© 2017 MapR Technologies 70
Ad Targeting Example
Detailed
scoring
Proxy Pre-select
1
2
Sharded Ad Scoring
3
User
Profile
Ads
User profile and context used
for rough-cut selection of ads
Roughly 1000 ads are scored in
detail for p(click)
© 2017 MapR Technologies 71
Why Not Full Rendezvous?
• 1000’s of ads / second x 1000 candidates = 1M scores /
second
– AKA “a lot”
• Scoring a single model is expensive
• Sharding and replication provides a form of failure tolerance
• Full speculative execution across several options is prohibitive
• Latency guarantees can be very short (10 ms)
© 2017 MapR Technologies 72
Rendezvous-lite Options
• We have some options
• We can allow selective speculation on marked requests
– If only 1% of ads run speculative execution, we can pack 10x more
shards per node and use 10x fewer nodes
– Selective speculation doesn’t give redundancy
• We can release results if >80% of shards reply
• Temporary speculation during hand-offs is useful
© 2017 MapR Technologies 73
Let’s Review
© 2017 MapR Technologies 74
A Quick Review
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 75
The Proxy Talks to the Outside World
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 76
The Input Stream Feeds All Models Identically
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 77
The Scores Stream Contains All Results
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 78
The Rendezvous Picks A Result
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 79
Results Return Via A Stream and Return Address
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy
© 2017 MapR Technologies 80
Not Such Bad Ideas
• Keep models running “in the wings”
– Don’t wait until conditions change to start building the next model
– Keep new short-history models ready to roll, some graybeards as well
• Hot hand-off
– With rendezvous: just stop ignoring the new best model
• Deploy a canary server
– Keep an old model active as a reference
– If it was 90% correct, difference with any better model should be small
– Score distribution should be roughly constant
© 2017 MapR Technologies 81
New book: how to manage machine learning models
Download free pdf or read free online via @MapR:
https://mapr.com/ebook/machine-learning-logistics/
“Rendezvous Architecture” by Ted Dunning & Ellen Friedman, in
Encyclopedia of Big Data Technologies. Sherif Sakr and Albert
Zomaya, editors. Springer International Publishing, in press 2018.
and
© 2017 MapR Technologies 82
Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Committer, PMC member, board member, ASF
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning
© 2017 MapR Technologies 83
Q&A
@mapr
tdunning@mapr.com
ENGAGE WITH US
@ Ted_Dunning

More Related Content

What's hot

Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Carol McDonald
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the MoviesDataWorks Summit
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 

What's hot (20)

Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
Hadoop and R Go to the Movies
Hadoop and R Go to the MoviesHadoop and R Go to the Movies
Hadoop and R Go to the Movies
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
Perspective on HPC-enabled AI
Perspective on HPC-enabled AIPerspective on HPC-enabled AI
Perspective on HPC-enabled AI
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 

Similar to Streaming Architecture including Rendezvous for Machine Learning

Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleIan Downard
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupAlan Iovine
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricMatt Stubbs
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsMatt Stubbs
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 

Similar to Streaming Architecture including Rendezvous for Machine Learning (20)

Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Spark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating ExampleSpark and MapR Streams: A Motivating Example
Spark and MapR Streams: A Motivating Example
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
 
Big Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data FabricBig Data LDN 2017: Real World Impact of a Global Data Fabric
Big Data LDN 2017: Real World Impact of a Global Data Fabric
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationTed Dunning
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 

More from Ted Dunning (14)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
Using Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for RecommendationUsing Mahout and a Search Engine for Recommendation
Using Mahout and a Search Engine for Recommendation
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 

Recently uploaded

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Streaming Architecture including Rendezvous for Machine Learning

  • 1. © 2017 MapR Technologies 1 Why Stream? and Machine Learning Logistics
  • 2. © 2017 MapR Technologies 2 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Committer, PMC member, board member, ASF O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning
  • 3. © 2017 MapR Technologies 3 Traditional Solution – Use a Profile Database POS 1..n Fraud detector Last card use
  • 4. © 2017 MapR Technologies 4 What Happens as You Scale Up? POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector
  • 5. © 2017 MapR Technologies 5 Shared Database Can Be A Problem POS 1..n Fraud detector Last card use POS 1..n Fraud detector POS 1..n Fraud detector Shared database causes problems Big problem is disagreement about schema and indexing
  • 6. © 2017 MapR Technologies 6 Alternative: Use a Stream to Isolate Services POS 1..n Fraud detector Last card use Updater card activity
  • 7. © 2017 MapR Technologies 7 Add New Services via the Stream POS 1..n Fraud detector Last card use Updater Card location history Other card activity
  • 8. © 2017 MapR Technologies 8 Changing Implementation Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
  • 9. © 2017 MapR Technologies 9 Changing Implementation Through Isolation POS 1..n Last card use Updater POS 1..n Last card use Updater card activity Fraud detector Fraud detector
  • 10. © 2017 MapR Technologies 10 With MapR, Geo-Distributed Data Appears Local stream Data source Consumer
  • 11. © 2017 MapR Technologies 11 With MapR, Geo-Distributed Data Appears Local stream stream Data source Consumer
  • 12. © 2017 MapR Technologies 12 With MapR, Geo-distributed Data Appears Local stream stream Data source ConsumerGlobal Data Center Regional Data Center
  • 13. © 2017 MapR Technologies 13 Use Case: Telecommunications Callers Towers cdr data
  • 14. © 2017 MapR Technologies 14 Streaming in Telecom • Data collection & handling happens at different levels – tower, local data center, central data center) • Batch: Can take 30 minutes per level • Streaming: Latency drops to seconds or sub-seconds per level • Ability to respond as events occur • MapR Streams enables stream replication with offsets across data centers
  • 15. © 2017 MapR Technologies 15 Unique to MapR: Manage Topics at Stream Level • Many more topics on MapR cluster • Topics are grouped together in Stream (different from Kafka) • Policies set at the Stream level such as time-to-live, ACEs (controlled access at this level is different than Kafka) • Geo-distributed stream replication (different from Kafka) Stream Topic 1 Topic 3 Topic 2 Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission
  • 16. © 2017 MapR Technologies 16 Use Case: Each pump has many sensors pump data Dashboard C2 topic = p1 p2 p3 p4 p5 p1 p1 p5
  • 17. © 2017 MapR Technologies 17 Use topics as an organizing principle
  • 18. © 2017 MapR Technologies 18 Example Files Table Streams Directories Cluster Volume mount point
  • 19. © 2017 MapR Technologies 19 Cluster Volume mount point
  • 20. © 2017 MapR Technologies 20 Streams should be integrated tightly into normal persistence
  • 21. © 2017 MapR Technologies 21 Stream vs Database • Can be better for flexibility and multi-tenancy • Streams can be 50 – 100x faster than db (no mutation) • Faster means less arguments about performance optimization • Operations are simpler so works better to share data • Don’t have to commit to one type of db: push updates through stream and let each group use the db they want
  • 22. © 2017 MapR Technologies 22 Collect Data log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center
  • 23. © 2017 MapR Technologies 23 And Transport to Global Analytics log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 24. © 2017 MapR Technologies 24 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 25. © 2017 MapR Technologies 25 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 26. © 2017 MapR Technologies 26 With Many Sources log consolidator web server web server Web- server Log Web- server Log log_events log-stash log-stash data center GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center log consolidator web server Web- server Log web server Web- server Log log_events log-stash log-stash data center
  • 27. © 2017 MapR Technologies 27 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection
  • 28. © 2017 MapR Technologies 28 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection Topic: data-center . machine . sensor
  • 29. © 2017 MapR Technologies 29 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection Topic: data-center . *. sensor
  • 30. © 2017 MapR Technologies 30 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection Topic: data-center . machine. *
  • 31. © 2017 MapR Technologies 31 Analytics Doesn’t Care About Location GHQ log_events events Elaborate events (log-stash) Aggregate Signal detection Topic: * . *. sensor
  • 32. © 2017 MapR Technologies 32 Act locally, learn globally
  • 33. © 2017 MapR Technologies 33 Machine Learning Logistics
  • 34. © 2017 MapR Technologies 34 Traditional View
  • 35. © 2017 MapR Technologies 35 Traditional View: This isn’t the whole story
  • 36. © 2017 MapR Technologies 36 90% of the effort in successful machine learning isn’t in the training or model dev… It’s the logistics
  • 37. © 2017 MapR Technologies 37 Why? • Just getting the training data is hard – Which data? How to make it accessible? Multiple sources! – New kinds of observations force restarts – Requires a ton of domain knowledge • The myth of the unitary model – You can’t train just one – You will have dozens of models, likely hundreds or more – Handoff to new versions is tricky – You have to get run-time to be sure about which is better 
  • 38. © 2017 MapR Technologies 38 What Machine Learning Tool is Best? • Most successful groups keep several “favorite” machine learning tools at hand – No single tool is best in every situation • The most important tool is a platform that supports logistics well – Don’t have to do everything at the application level – Lots of what matters can be handled at the platform level • A good design for the logistics can make a big difference
  • 39. © 2017 MapR Technologies 39 Some Gotchas • Ops-oriented people will not “get it” regarding modeling subtleties • Data scientists will not “get it” regarding operational realities • Therefore, modelers have to deliver self-contained models • And, ops has to provide pre-wired structure
  • 40. © 2017 MapR Technologies 40 Rendezvous Architecture Input Scores RendezvousModel 1 Model 2 Model 3 request response Results
  • 41. © 2017 MapR Technologies 41 Rendezvous to the Rescue: Better ML Logistics • Stream-1st architecture is a powerful approach with surprisingly widespread advantages – Innovative technologies emerging to for streaming data • Microservices approach provides flexibility – Streaming supports microservices (if done right) • Containers remove surprises – Predictable environment for running models
  • 42. © 2017 MapR Technologies 42 Rendezvous: Mainly for Decisioning Engines • Decisioning models – Looking for a “right answer” – Simpler than reinforcement learning • Examples include: – Fraud detection – Predictive analytics / market prediction – Churn prediction (as in telecommunications) – Yield optimization – Deep learning in form of speech or image recognition, in some cases
  • 43. © 2017 MapR Technologies 43 What We Ultimately Want request response Model
  • 44. © 2017 MapR Technologies 44 But This Isn’t The Answer Model 1 request response Load balancer Model 2 Model 3
  • 45. © 2017 MapR Technologies 45 First Try with Streams Input Model 1 Model 2 Model 3 request response ?
  • 46. © 2017 MapR Technologies 46 First Rendezvous Input Scores RendezvousModel 1 Model 2 Model 3 request response Results
  • 47. © 2017 MapR Technologies 47 Some Key Points • Note that all models see identical inputs • All models run in production setting • All models send scores to same stream • The rendezvous server decides which scores to ignore • Roll forward, roll back, correlated comparison are all now trivial
  • 48. © 2017 MapR Technologies 48 Reality Check, Injecting External State Model 1 Model 2 Model 3 request Raw Add external data Input Database The world
  • 49. © 2017 MapR Technologies 49 Recording Raw Data (as it really was) Input Scores Decoy Model 2 Model 3 Archive
  • 50. © 2017 MapR Technologies 50 Quality & Reproducibility of Input Data is Important! • Recording raw-ish data is really a big deal – Data as seen by a model is worth gold – Data reconstructed later often has time-machine leaks – Databases were made for updates, streams are safer • Raw data is useful for non-ML cases as well (think flexibility) • Decoy model records training data as seen by models under development & evaluation
  • 51. © 2017 MapR Technologies 51 Canary for Comparison Real model ∆ Result Canary Decoy Archive Input
  • 52. © 2017 MapR Technologies 52 What Does the Canary Do? • The canary is a real model, but is very rarely updated • The canary results are almost never used for decisioning • The virtue of the canary is stability • Comparing to the canary results gives insight into new models
  • 53. © 2017 MapR Technologies 53 Isolated Development With Stream Replication Model 1 Model 2 Model 3 request Raw Add external data Input Internal 1 Internal 2 Internal 3 The world Model 4 Raw New external data Input Internal 4 Production Development
  • 54. © 2017 MapR Technologies 54 A Quick Review Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 55. © 2017 MapR Technologies 55 The Proxy Talks to the Outside World Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 56. © 2017 MapR Technologies 56 The Input Stream Feeds All Models Identically Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 57. © 2017 MapR Technologies 57 The Scores Stream Contains All Results Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 58. © 2017 MapR Technologies 58 The Rendezvous Picks A Result Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 59. © 2017 MapR Technologies 59 Results Return Via A Stream and Return Address Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 60. © 2017 MapR Technologies 60 Models in production live in the real world: Conditions may (will) change
  • 61. © 2017 MapR Technologies 61 Rendezvous Schedules • The key idea of rendezvous schedules is to define the trade-off of latency versus model priority – At short delays, we want the best – At moderate delays we will compromise a bit – Near the deadline, we will take any answer at all • Normally the same rendezvous schedules apply to all transactions – Overriding default schedule has bona fide uses
  • 62. © 2017 MapR Technologies 62 Rendezvous Overrides • Incoming transaction can carry an overriding schedule – This is great for QA, to see output from a specific model – Overriding the default schedule is also good for systemic A/B tests • Overrides should be unusual
  • 63. © 2017 MapR Technologies 63 Scaling Up • More kinds of model – multiple rendezvous frameworks for different tasks • More throughput – Fast default models – Partition input stream to allow parallel model evaluation – Input batching • Extreme volumes require extreme measures – Cannibalize fancy models to run more fast/simple models – Speed before beauty
  • 64. © 2017 MapR Technologies 64 Faster Throughput Through Failure • Suppose we have one model that can handle 10,000 t/s @ 2ms – But this isn’t the most accurate model. Not bad, but not best • And our champion model can handle 1000 t/s @ 10ms • Then imagine a burst of 2000 t/s for several minutes • Champion can only evaluate half of all requests – Should skip to keep up – Fast model will cover for champion
  • 65. © 2017 MapR Technologies 65 Input Scores Model 1 Model 2 Model 3
  • 66. © 2017 MapR Technologies 66 Input Scores Model 1 Model 2 Model 3
  • 67. © 2017 MapR Technologies 67 Input Scores Model 1 Model 2 Model 3
  • 68. © 2017 MapR Technologies 68 Always have a default or fallback model Models that fall behind should discard requests to catch up
  • 69. © 2017 MapR Technologies 69 Limitations of Rendezvous • 100% speculative execution can be expensive – Can be mitigated by partial speculation – Or it may just be too expensive • Minimum Viable Products should be minimal – You may not require zero downtime … be realistic • Context may be too large • Latency limits may be too stringent
  • 70. © 2017 MapR Technologies 70 Ad Targeting Example Detailed scoring Proxy Pre-select 1 2 Sharded Ad Scoring 3 User Profile Ads User profile and context used for rough-cut selection of ads Roughly 1000 ads are scored in detail for p(click)
  • 71. © 2017 MapR Technologies 71 Why Not Full Rendezvous? • 1000’s of ads / second x 1000 candidates = 1M scores / second – AKA “a lot” • Scoring a single model is expensive • Sharding and replication provides a form of failure tolerance • Full speculative execution across several options is prohibitive • Latency guarantees can be very short (10 ms)
  • 72. © 2017 MapR Technologies 72 Rendezvous-lite Options • We have some options • We can allow selective speculation on marked requests – If only 1% of ads run speculative execution, we can pack 10x more shards per node and use 10x fewer nodes – Selective speculation doesn’t give redundancy • We can release results if >80% of shards reply • Temporary speculation during hand-offs is useful
  • 73. © 2017 MapR Technologies 73 Let’s Review
  • 74. © 2017 MapR Technologies 74 A Quick Review Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 75. © 2017 MapR Technologies 75 The Proxy Talks to the Outside World Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 76. © 2017 MapR Technologies 76 The Input Stream Feeds All Models Identically Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 77. © 2017 MapR Technologies 77 The Scores Stream Contains All Results Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 78. © 2017 MapR Technologies 78 The Rendezvous Picks A Result Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 79. © 2017 MapR Technologies 79 Results Return Via A Stream and Return Address Input Scores RendezvousModel 1 Model 2 Model 3 request response Results Proxy
  • 80. © 2017 MapR Technologies 80 Not Such Bad Ideas • Keep models running “in the wings” – Don’t wait until conditions change to start building the next model – Keep new short-history models ready to roll, some graybeards as well • Hot hand-off – With rendezvous: just stop ignoring the new best model • Deploy a canary server – Keep an old model active as a reference – If it was 90% correct, difference with any better model should be small – Score distribution should be roughly constant
  • 81. © 2017 MapR Technologies 81 New book: how to manage machine learning models Download free pdf or read free online via @MapR: https://mapr.com/ebook/machine-learning-logistics/ “Rendezvous Architecture” by Ted Dunning & Ellen Friedman, in Encyclopedia of Big Data Technologies. Sherif Sakr and Albert Zomaya, editors. Springer International Publishing, in press 2018. and
  • 82. © 2017 MapR Technologies 82 Contact Information Ted Dunning, PhD Chief Application Architect, MapR Technologies Committer, PMC member, board member, ASF O’Reilly author Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning
  • 83. © 2017 MapR Technologies 83 Q&A @mapr tdunning@mapr.com ENGAGE WITH US @ Ted_Dunning