6. A system creates as many models as we want, when
we want them, that dynamically adapts in real-time
to changing conditions
○ Automatically creates, validates, ships, and
monitors models, with a capacity that scales
to 10s of thousands of models
The Opportunity
What we really need:
7. Online models evolve &
adapt over time, in
reaction to a changing
environment with each
and every event
Given a complete
data set, a batch
model is created in
entirety all at once
Introducing Online Learning
Batch Online Learning
Creation Evolution
8. large-scale
data storage
large-scale
data schelping
painful data
aggregation
lots of manual
everything
Harder to build models,
but easier to evaluate
limited data storage,
mostly for monitoring
event-level
data streams
light data
aggregation
lots of automatic
everything
Easier to build, but harder
to evaluate (& support)
Batch Models (Offline) vs. Online Learning
Online LearningBatch Models (Offline)
9. ● Outperformed both L2 and Elastic Net
● Leverages small (‘micro’) batches
● Validates and monitors models in real time
● Alerts team when models are not behaving
Some Techno Mumbo Jumbo
Stochastic gradient descent with L1 regularization
11. eXpresso Serving Cluster
10B events/day
260 nodes across
4 data centers
eXtream Modeling Cluster
160B models/day
85 nodes across
4 data centers
JGroups
Distributed
Messaging
Serving Layer
12. Online LearningBatch Models (Offline)
Batch
Predefined ratio
Predefined feature selection
One time Validation
Streaming
Downsampling
Automated feature selection
Ongoing data cleaning
Ongoing validation
The Online Learning Challenge
13. ● All necessary data already exists in eXtream
● The cluster’s processing resources can be better utilized
● eXtream addresses most performance / scalability requirements
● Scoring mechanism already exists
eXtream as a Framework for Online Learning
Why it works...
18. ● Sliding window of recent events
● 60/40 not-converted/converted ratio
● Various accuracy metrics (lift, precision, recall, confusion matrix)
● Decide if the model is ready for making predictions
Model Validation
19. ● Two phases (Scoring, Re-code)
● Scale vs Accuracy tradeoff
Predictions Mechanism
21. Why do we need it?
● Store the models in one common place
● Persistency
● Built-in replication
● Aerospike has built in limitation for object size - 1MB
○ Developed sharding mechanism for storing models on Aerospike
Scalability / Performance
Why do we need it?
Large object issue on Aerospike
22. The solution is Aerospike fast built-in replication
Cross Data Center Learning
● Low Volume Models
● Traffic Redirection
23. Monitoring- Why do we need it?
thousands of models
automatically created by users
some models won’t converge
28. ● The ideal candidate for digital media expands and even subtly shifts in real time
● Real-time modeling tracks and reacts to these changes as they happen, with 2x CPA
improvement over a batch model
The Times, They Are A-Changin’
Market: Downgrading a country’s credit ratings
● Holiday shopping is very different from the rest of the year, particularly Cyber Monday
● AM changes in Eastern US are applied to the Pacific coast before the madness begins
Audiences: Cyber Monday frenzies
● … after the campaign starts, effecting the ideal audience
● No need to panic; modeled audience automagically adjust
Product: A product offering is revised
29. Scores of self-maintaining models that constantly adapt to our
ever changing conditions
Happiness Renewed...