BDX 2016 - Kevin lyons & yakir buskilla @ eXelate

Online Learning
The Future of Audience Segmentation is Here
Kevin Lyons + Yakir Buskilla

Models that build profitable marketing audiences at scale...
Finding more of your best customers:
High-income business professional

The Modeling Process, simplified

2012 2015
30 - 40 models
levering billions of events
Creating 100 million + scores
over 1000 models
‘leveraging’ trillions of events
Creating 150 billion+ scores / day
The Challenge

In other words, we simply need ….

A system creates as many models as we want, when
we want them, that dynamically adapts in real-time
to changing conditions
○ Automatically creates, validates, ships, and
monitors models, with a capacity that scales
to 10s of thousands of models
The Opportunity
What we really need:

Online models evolve &
adapt over time, in
reaction to a changing
environment with each
and every event
Given a complete
data set, a batch
model is created in
entirety all at once
Introducing Online Learning
Batch Online Learning
Creation Evolution

large-scale
data storage
large-scale
data schelping
painful data
aggregation
lots of manual
everything
Harder to build models,
but easier to evaluate
limited data storage,
mostly for monitoring
event-level
data streams
light data
aggregation
lots of automatic
everything
Easier to build, but harder
to evaluate (& support)
Batch Models (Offline) vs. Online Learning
Online LearningBatch Models (Offline)

● Outperformed both L2 and Elastic Net
● Leverages small (‘micro’) batches
● Validates and monitors models in real time
● Alerts team when models are not behaving
Some Techno Mumbo Jumbo
Stochastic gradient descent with L1 regularization

eXelate.com @eXelate
Technical Solutions
How do we do it?

eXpresso Serving Cluster
10B events/day
260 nodes across
4 data centers
eXtream Modeling Cluster
160B models/day
85 nodes across
4 data centers
JGroups
Distributed
Messaging
Serving Layer

Online LearningBatch Models (Offline)
Batch
Predefined ratio
Predefined feature selection
One time Validation
Streaming
Downsampling
Automated feature selection
Ongoing data cleaning
Ongoing validation
The Online Learning Challenge

● All necessary data already exists in eXtream
● The cluster’s processing resources can be better utilized
● eXtream addresses most performance / scalability requirements
● Scoring mechanism already exists
eXtream as a Framework for Online Learning
Why it works...

● Labeling Mechanism - customer defined target
audience
Events Classification

● Downsampling mechanism
● Burst tolerance
● Duplicate entries
Dataset Preparation

● Blacklist
● Whitelist
● Automatic Tuning
Features Selection

● Sliding window of recent events
● 60/40 not-converted/converted ratio
● Various accuracy metrics (lift, precision, recall, confusion matrix)
● Decide if the model is ready for making predictions
Model Validation

● Two phases (Scoring, Re-code)
● Scale vs Accuracy tradeoff
Predictions Mechanism

Scalability / Performance
Thousands of
Concurrent Models: High Throughput:
billions of training events per daytraining, validation, scoring

Why do we need it?
● Store the models in one common place
● Persistency
● Built-in replication
● Aerospike has built in limitation for object size - 1MB
○ Developed sharding mechanism for storing models on Aerospike
Scalability / Performance
Why do we need it?
Large object issue on Aerospike

The solution is Aerospike fast built-in replication
Cross Data Center Learning
● Low Volume Models
● Traffic Redirection

Monitoring- Why do we need it?
thousands of models
automatically created by users
some models won’t converge

eXelate.com @eXelate
Case study
Working in action

● The ideal candidate for digital media expands and even subtly shifts in real time
● Real-time modeling tracks and reacts to these changes as they happen, with 2x CPA
improvement over a batch model
The Times, They Are A-Changin’
Market: Downgrading a country’s credit ratings
● Holiday shopping is very different from the rest of the year, particularly Cyber Monday
● AM changes in Eastern US are applied to the Pacific coast before the madness begins
Audiences: Cyber Monday frenzies
● … after the campaign starts, effecting the ideal audience
● No need to panic; modeled audience automagically adjust
Product: A product offering is revised

Scores of self-maintaining models that constantly adapt to our
ever changing conditions
Happiness Renewed...

BDX 2016 - Kevin lyons & yakir buskilla @ eXelate

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to BDX 2016 - Kevin lyons & yakir buskilla @ eXelate

Similar to BDX 2016 - Kevin lyons & yakir buskilla @ eXelate (20)

Recently uploaded

Recently uploaded (20)

BDX 2016 - Kevin lyons & yakir buskilla @ eXelate