Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Copyright © 2016 Criteo
ML for Display Advertising @ Scale
Damien Lefortier
MLconf NYC
2016-04-15

Outline
• Introduction to the AdTech / Criteo
• Deep dive into our ML algorithms
• Offline and online evaluation
• Future areas of research
2

Outline
3

AdTech / Criteo
4
Advertiser Publisher

Our Engine is trying to answer 3 questions
COMMON
OBJECTIVE:
Maximize the
client’s value
1. How much should we bid for a given ad space?
My company
yes no no
My company
yes …
2. What products should we recommend / show?
My company
BUY!
My company
BUY! BUY!
BUY! BUY!
My company
BUY! BUY!
BUY! BUY!
My company
BUY! BUY! BUY!
BUY!
3. What is the best look & feel of the banner?

6
Physical infrastructure
7 in-house data centers on 3 continents
~ 15000 servers; largest Hadoop cluster in Europe
More than 35 PB of data storage
Traffic
800k HTTP requests / sec (peak activity)
29000 impressions / sec (peak activity)
< 10 ms to process a bidding request
< 100 ms to render the ad (if we win)

Outline
7

8
Bidding
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Look & Feel
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data

9
Bidding
•Should we bid?
•At which price?
Recommendation
we display?
Look & Feel
Prediction

Bidding strategy (1)
• As we sell performance: Criteo’s and our clients’ interests are aligned.
• The cost of a display is lower and independent from the bid (2nd price or floor),
so we should bid the max value the client is willing to pay.
• We use adjustments for 1st price auctions.
10

Bidding strategy (2)
• This value depends on the predicted performance and the client’s objective.
• Some examples:
• Click optimized campaign: bid = maxCPC  pClick
• CR optimized campaign: bid = maxCPO  pCR
• …
11

We train our prediction models on our historical displays
Historical displays
Variables
 Level of engagement of the user
 Quality of inventory
 User fatigue
 For travel: time to check-in and number
of nights
: clicked displays : converted displays (size = order value)
Our ability to predict relies
greatly on the relevance of
the variables we consider
Machine Learning
Algorithms

13
Bidding
•Should we bid?
•At which price?
Recommendation
we display?
Look & Feel
Prediction

Recommend products for a user
• What we want: reco(user) = products
• 1B users x 3B products!
• But we need to scale and keep it fresh
• What we can do:
Pre-select products offline
Refine scoring online to get final candidates

Bob saw orange shoes
Some candidate products
Historical
Similar
Complementary
Most viewed

Products delivering the best performance are displayed
Variables
 Products seen by the user
 Time since product event
 Level of similarity
 Product features
Historical displays
: clicked products : converted products (size = order value)
Products are selected based
on their CTR, CR or OV
Machine Learning
Algorithms

17
Bidding
•Should we bid?
•At which price?
Recommendation
we display?
Look & Feel
Prediction

Historical displays (color = look & feel)
We train our prediction models on our historical displays
Variables
Some of which we control:
 How user interacts with banner
 Organization of information
 Colorset
Some of which we don’t:
 Zone format
 Publisher
: clicked displays : converted displays (size = order value)
Look and feel will be selected
based on its CTR, CR or OV
My company
BUY! BUY! BUY!
BUY!
Machine Learning
Algorithms

19
Bidding
•Should we bid?
•At which price?
Recommendation
we display?
Look & Feel
Prediction

Many models to learn
• We have different ML models for bidding / recommendation / … and depending
on the campaign objective. We use logistic regression in many places.
• Each model is trained independently & refreshed as often as possible.
• Three main sources of features: user, ad, page (mostly categorical).
20

Learn on huge volumes of data
10 000 displays

10 000 displays
leads to
50 clicks

10 000 displays
leads to
50 clicks
leads to
1 sale

Quadratic features
• Outer product between 2 features (similar to a polynomial kernel of degree 2).
• Example between site and advertiser:
24
Publisher network
Publisher
Site
Url
Advertiser network
Ad
Campaign
Advertiser

Hashing trick
• Standard representation of categorical features: “one-hot” encoding
• Dimensionality equal to the number of different values…
• Hashing to reduce dimensionality (made popular by John Langford in VW)
• Dimensionality now independent of number of values
• Using:
25

In-house Machine Learning library -- IRMA
• We have our own large-scale distributed machine learning library on top of
Hadoop used for all our models.
• From a ML perspective we rely, in most cases, on an L-BFGS solver initialized
with SGD (see, eg, A. Agarwal et al. A Reliable Effective Terascale Linear
Learning System).
26

Distribution of L-BFGS & SGD
• L-BFGS, being a batch algorithm, is easy to distribute.
• SGD is a bit tricker: we do parameter averaging for that and we also use
Hogwild! to multi-thread on each machine.
• We use Hadoop AllReduce:
27

A word on more advanced techniques
• Irma is not only about vanilla logistic regression with L2 regularization… 
• It contains more advanced techniques such as, e.g., transfer learning,
factorization machines, learning to rank, cost-sensitive learning, …
• We for example use cost-sensitive learning for bidding.
28

Outline
29

Offline & online evaluation
Usual two-step process:
• Offline testing is fast, cheap, and efficient for wide exploration.
• Online testing is expensive but has the ultimate word.
30

Offline metrics (bidding case)
• We use classical metrics: LLH, RMSE, … (which focus on the prediction and
ignore the bidding system where we use these models).
• Utility from Offline Evaluation of Response Prediction in Online Advertising
Auctions by O. Chapelle (WWW’15).
31

Online metrics (bidding case)
• RevExTac = Revenue Excluding Traffic Acquisition Costs
• Cost, Revenue, …
32

Some statistics on evaluation
• 100K+ offline tests per year
• 1K+ A/B tests per year
• Many people 
33
• We developed a platform and processes that enable very fast testing and improvement

Outline
34

Some examples of future areas of Research
• Counterfactual evaluation (offline A/B tests)
• Embeddings for recommandation
• Policy learning
35

Counterfactual evaluation
• Estimate the business metric directly (clicks, sales, …).
• Using the production model + randomization.
• Good results on clicks already.
36

Embeddings for recommandation
• Can embeddings (for example a la word2vec) help us compute similarities
between, e.g., different products or users?
37

Policy learning – example on Look & Feel optimization
• Classical supervised machine learning approach: learn a pClick model and
sort by predicted values for each possible value (e.g, each color).
• This is a hard problem and may be overkill!
• Really, we only want to know which color is the best according to some
business metric (eg, sales).
38

Academic research @ Criteo
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• New 1TB dataset released last year.
• Some recent publications:
Offline Evaluation of Response Prediction in Online Advertising Auctions. O. Chapelle, WWW’15.
Sources of Variability in Large-scale Machine Learning Systems. D. Lefortier, A. Truchet, and M. de
Rijke, NIPS 2015, workshop on ML systems, 2015.
Cost-sensitive Learning for Bidding in Online Advertising Auctions. F. Vasile and D. Lefortier, NIPS
workshop on ML for e-Commerce, 2015.
39

Questions
d.lefortier@criteo.com

Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Viewers also liked

Viewers also liked (20)

Similar to Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16

Similar to Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16 (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning team, Criteo at MLconf NYC - 4/15/16