Streaming Architecture including Rendezvous for Machine Learning

© 2017 MapR Technologies 1
Why Stream?
and
Machine Learning Logistics

Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Committer, PMC member, board member, ASF
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning

Traditional Solution – Use a Profile Database
POS
1..n
Fraud
detector
Last card
use

What Happens as You Scale Up?
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector

Shared Database Can Be A Problem
POS
1..n
Fraud
detector
Last card
use
POS
1..n
Fraud
detector
POS
1..n
Fraud
detector
Shared database
causes problems
Big problem is
disagreement about
schema and indexing

Alternative: Use a Stream to Isolate Services
POS
1..n
Fraud
detector
Last card
use
Updater
card activity

Add New Services via the Stream
POS
1..n
Fraud
detector
Last card
use
Updater
Card
location
history
Other
card activity

Changing Implementation Through Isolation
POS
1..n
Last card
use
Updater
POS
1..n
Last card
use
Updater
card activity
Fraud
detector
Fraud
detector

With MapR, Geo-Distributed Data Appears Local
stream
Data
source
Consumer

With MapR, Geo-Distributed Data Appears Local
stream
stream
Data
source
Consumer

With MapR, Geo-distributed Data Appears Local
stream
stream
Data
source
ConsumerGlobal Data Center
Regional Data Center

Use Case: Telecommunications
Callers
Towers
cdr data

Streaming in Telecom
• Data collection & handling happens at different levels
– tower, local data center, central data center)
• Batch: Can take 30 minutes per level
• Streaming: Latency drops to seconds or sub-seconds per level
• Ability to respond as events occur
• MapR Streams enables stream replication with offsets across data
centers

Unique to MapR: Manage Topics at Stream Level
• Many more topics on MapR cluster
• Topics are grouped together in Stream (different from Kafka)
• Policies set at the Stream level such as time-to-live, ACEs (controlled
access at this level is different than Kafka)
• Geo-distributed stream replication (different from Kafka)
Stream
Topic 1
Topic 3
Topic 2
Image © 2016 Ted Dunning & Ellen Friedman from Chap 5 of O’Reilly book Streaming Architecture used with permission

Use Case: Each pump has many sensors
pump
data
Dashboard
C2
topic = p1
p2
p3
p4
p5
p1
p1
p5

Use topics as an organizing principle

Example
Files
Table
Streams
Directories
Cluster
Volume mount point

Cluster
Volume mount point

Streams should be integrated tightly into
normal persistence

Stream vs Database
• Can be better for flexibility and multi-tenancy
• Streams can be 50 – 100x faster than db (no mutation)
• Faster means less arguments about performance optimization
• Operations are simpler so works better to share data
• Don’t have to commit to one type of db: push updates through
stream and let each group use the db they want

Collect Data
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center

And Transport to Global Analytics
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection

With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection

With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center

With Many Sources
log consolidator
web server
web server
Web-
server
Log
Web-
server
Log
log_events
log-stash
log-stash
data center GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center
log consolidator
web server
Web-
server
Log
web server
Web-
server
Log
log_events
log-stash
log-stash
data center

Analytics Doesn’t Care About Location
GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection

GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
data-center . machine . sensor

GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
data-center . *. sensor

GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
data-center . machine. *

GHQ
log_events
events
Elaborate
events
(log-stash)
Aggregate
Signal
detection
Topic:
* . *. sensor

Act locally, learn globally

Machine Learning Logistics

Traditional View

Traditional View: This isn’t the whole story

90% of the effort in successful machine
learning isn’t in the training or model dev…
It’s the logistics

Why?
• Just getting the training data is hard
– Which data? How to make it accessible? Multiple sources!
– New kinds of observations force restarts
– Requires a ton of domain knowledge
• The myth of the unitary model
– You can’t train just one
– You will have dozens of models, likely hundreds or more
– Handoff to new versions is tricky
– You have to get run-time to be sure about which is better


What Machine Learning Tool is Best?
• Most successful groups keep several “favorite” machine
learning tools at hand
– No single tool is best in every situation
• The most important tool is a platform that supports logistics well
– Don’t have to do everything at the application level
– Lots of what matters can be handled at the platform level
• A good design for the logistics can make a big difference

Some Gotchas
• Ops-oriented people will not “get it” regarding modeling
subtleties
• Data scientists will not “get it” regarding operational realities
• Therefore, modelers have to deliver self-contained models
• And, ops has to provide pre-wired structure

Rendezvous Architecture
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results

Rendezvous to the Rescue: Better ML Logistics
• Stream-1st architecture is a powerful approach with surprisingly
widespread advantages
– Innovative technologies emerging to for streaming data
• Microservices approach provides flexibility
– Streaming supports microservices (if done right)
• Containers remove surprises
– Predictable environment for running models

Rendezvous: Mainly for Decisioning Engines
• Decisioning models
– Looking for a “right answer”
– Simpler than reinforcement learning
• Examples include:
– Fraud detection
– Predictive analytics / market prediction
– Churn prediction (as in telecommunications)
– Yield optimization
– Deep learning in form of speech or image recognition, in some cases

What We Ultimately Want
request
response
Model

But This Isn’t The Answer
Model 1
request
response
Load
balancer
Model 2
Model 3

First Try with Streams
Input
Model 1
Model 2
Model 3
request
response
?

First Rendezvous
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results

Some Key Points
• Note that all models see identical inputs
• All models run in production setting
• All models send scores to same stream
• The rendezvous server decides which scores to ignore
• Roll forward, roll back, correlated comparison are all now trivial

Reality Check, Injecting External State
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Database
The world

Recording Raw Data (as it really was)
Input
Scores
Decoy
Model 2
Model 3
Archive

Quality & Reproducibility of Input Data is Important!
• Recording raw-ish data is really a big deal
– Data as seen by a model is worth gold
– Data reconstructed later often has time-machine leaks
– Databases were made for updates, streams are safer
• Raw data is useful for non-ML cases as well (think flexibility)
• Decoy model records training data as seen by models under
development & evaluation

Canary for Comparison
Real
model
∆
Result
Canary
Decoy
Archive
Input

What Does the Canary Do?
• The canary is a real model, but is very rarely updated
• The canary results are almost never used for decisioning
• The virtue of the canary is stability
• Comparing to the canary results gives insight into new models

Isolated Development With Stream Replication
Model 1
Model 2
Model 3
request
Raw
Add
external
data
Input
Internal 1
Internal 2
Internal 3
The world
Model 4
Raw
New
external
data
Input
Internal 4
Production
Development

A Quick Review
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

The Proxy Talks to the Outside World
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

The Input Stream Feeds All Models Identically
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

The Scores Stream Contains All Results
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

The Rendezvous Picks A Result
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

Results Return Via A Stream and Return Address
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

Models in production live in the real
world:
Conditions may (will) change

Rendezvous Schedules
• The key idea of rendezvous schedules is to define the trade-off
of latency versus model priority
– At short delays, we want the best
– At moderate delays we will compromise a bit
– Near the deadline, we will take any answer at all
• Normally the same rendezvous schedules apply to all
transactions
– Overriding default schedule has bona fide uses

Rendezvous Overrides
• Incoming transaction can carry an overriding schedule
– This is great for QA, to see output from a specific model
– Overriding the default schedule is also good for systemic A/B tests
• Overrides should be unusual

Scaling Up
• More kinds of model
– multiple rendezvous frameworks for different tasks
• More throughput
– Fast default models
– Partition input stream to allow parallel model evaluation
– Input batching
• Extreme volumes require extreme measures
– Cannibalize fancy models to run more fast/simple models
– Speed before beauty

Faster Throughput Through Failure
• Suppose we have one model that can handle 10,000 t/s @ 2ms
– But this isn’t the most accurate model. Not bad, but not best
• And our champion model can handle 1000 t/s @ 10ms
• Then imagine a burst of 2000 t/s for several minutes
• Champion can only evaluate half of all requests
– Should skip to keep up
– Fast model will cover for champion

Input Scores
Model 1
Model 2
Model 3

Always have a default or
fallback model
Models that fall behind should
discard requests to catch up

Limitations of Rendezvous
• 100% speculative execution can be expensive
– Can be mitigated by partial speculation
– Or it may just be too expensive
• Minimum Viable Products should be minimal
– You may not require zero downtime … be realistic
• Context may be too large
• Latency limits may be too stringent

Ad Targeting Example
Detailed
scoring
Proxy Pre-select
1
2
Sharded Ad Scoring
3
User
Proﬁle
Ads
User profile and context used
for rough-cut selection of ads
Roughly 1000 ads are scored in
detail for p(click)

Why Not Full Rendezvous?
• 1000’s of ads / second x 1000 candidates = 1M scores /
second
– AKA “a lot”
• Scoring a single model is expensive
• Sharding and replication provides a form of failure tolerance
• Full speculative execution across several options is prohibitive
• Latency guarantees can be very short (10 ms)

Rendezvous-lite Options
• We have some options
• We can allow selective speculation on marked requests
– If only 1% of ads run speculative execution, we can pack 10x more
shards per node and use 10x fewer nodes
– Selective speculation doesn’t give redundancy
• We can release results if >80% of shards reply
• Temporary speculation during hand-offs is useful

Let’s Review

A Quick Review
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

The Proxy Talks to the Outside World
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

The Input Stream Feeds All Models Identically
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

The Scores Stream Contains All Results
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

The Rendezvous Picks A Result
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

Results Return Via A Stream and Return Address
Input Scores
RendezvousModel 1
Model 2
Model 3
request
response
Results
Proxy

Not Such Bad Ideas
• Keep models running “in the wings”
– Don’t wait until conditions change to start building the next model
– Keep new short-history models ready to roll, some graybeards as well
• Hot hand-off
– With rendezvous: just stop ignoring the new best model
• Deploy a canary server
– Keep an old model active as a reference
– If it was 90% correct, difference with any better model should be small
– Score distribution should be roughly constant

New book: how to manage machine learning models
Download free pdf or read free online via @MapR:
https://mapr.com/ebook/machine-learning-logistics/
“Rendezvous Architecture” by Ted Dunning & Ellen Friedman, in
Encyclopedia of Big Data Technologies. Sherif Sakr and Albert
Zomaya, editors. Springer International Publishing, in press 2018.
and

Contact Information
Ted Dunning, PhD
Chief Application Architect, MapR Technologies
Committer, PMC member, board member, ASF
O’Reilly author
Email tdunning@mapr.com tdunning@apache.org
Twitter @Ted_Dunning

Q&A
@mapr
tdunning@mapr.com
ENGAGE WITH US
@ Ted_Dunning

Streaming Architecture including Rendezvous for Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Streaming Architecture including Rendezvous for Machine Learning

Similar to Streaming Architecture including Rendezvous for Machine Learning (20)

More from Ted Dunning

More from Ted Dunning (14)

Recently uploaded

Recently uploaded (20)

Streaming Architecture including Rendezvous for Machine Learning