SlideShare a Scribd company logo
1 of 29
RESPONSE PREDICTION FOR
DISPLAY ADVERTISING
WSDM ’14

Olivier Chapelle
OUTLINE
1.
2.
3.
4.

Criteo & Display advertising
Modeling
Large scale learning
Explore / exploit

2
DISPLAY ADVERTISING

Display Ad

3
DISPLAY ADVERTISING
• Rapidly growing multi-billion dollar business (30% of internet
advertising revenue in 2013).
• Marketplace between:
– Publishers: sell display opportunities
– Advertisers: pay for showing their ad

• Real Time Bidding:
– Auction amongst advertisers is held at the moment when a user generates a
display opportunity by visiting a publisher‟s web page.

4
BRANDING VS PERFORMANCE
PRICING TYPE
• CPM (Cost Per Mille): advertiser pays per thousand impressions
• CPC (Cost Per Click): advertiser pays only when the user clicks
• CPA (Cost Per Action): advertiser pays only when the user performs a
predefined action such as a purchase.

CAMPAIGN TYPE
• Branding

CPM

• Performance based advertising (retargeting)

CPC, CPA

CONVERSIONS
eCPM = CPC * predicted clickthrough rate
eCPM = CPA * predicted conversion rate
5
CRITEO
Bill on
CPC

Advertisers

Pay on
CPM

Publishers

Criteo’s success depends highly on how precisely we can predict
Click Through Rates (CTR) and Conversion Rates (CR)

6
RECOMMENDATION

Collaborative filtering
Propose related items to a user based on
historical interactions from other users

Over 50% of Criteo driven sales come
from recommended products the user had
never viewed on advertiser websites

7
CRITEO NUMBERS

750
10 PB

EMPLOYEES

42

COUNTRIES

18
BILLION

600
$ MILLION

2013 REVENUE

BANNERS
PER DAY

RT BIDS
PER DAY

7500 CORES CLUSTER

2.1

BILLION

250K
AD CALLS
PER SECOND

8
8
OUTLINE
1.
2.
3.
4.

Criteo & Display advertising
Modeling
Large scale learning
Explore / exploit

9
FEATURES
• Three sources of features: user, ad, page
• In this talk: categorical features on ad and page.
Advertiser network

Publisher network

Advertiser

Publisher

Campaign

Site

Publisher hierarchy

Url

Ad

Advertiser hierarchy

10
HASHING TRICK
• Standard representation of categorical features: “one-hot” encoding
For instance, site feature

0

0

1

cnn.com

0

0

0

0
news.yahoo.com

• Dimensionality equal to the number of different values
– can be very large

• Hashing to reduce dimensionality (made popular by John Langford in VW)

• Dimensionality now independent of number of values

11
HASHING VS FEATURE SELECTION
• “Small” problem with 35M different values.
• Methods that require a dictionary have a larger model.

12
QUADRATIC FEATURES
• Outer product between two features.
• Example: between site and advertiser,
Feature is 1

site=finance.yahoo.com & advertiser=bank of america
Advertiser network

Publisher network
Publisher

Advertiser

Site

Campaign

Url

Ad

 Similar to a polynomial kernel of degree 2
 Large number of values
hashing trick
13
ADVANTAGES OF HASHING
• Practical
– Straightforward implement; no need to maintain dictionaries

• Statistical
– Regularization (infrequent values are washed away by frequent ones)

• Most powerful when combined with quadratic features
Quote of John Langford about hashing
At first it‟s scary, then you love it

14
LEARNING
• Regularized logistic regression
– Vowpal Wabbit open source package

• Regularization with hierarchical features

Well estimated

backoff smoothing

Small if rare value

• Negative data subsampled for computational reason

15
EVALUATION
• Comparison with (Agarwal et al. ‟10)
– Probabilistic model for the same display advertising prediction problem
– Leverages the hierarchical structures on the ad and publisher sides
– Sparse prior for smoothing

• Model trained on three weeks of data, tested on the 3 following days

auROC

auPRC

Log likelihood

+ 3.1%

+ 10.0%

+ 7.1%

D. Agarwal et al., Estimating Rates of Rare Events with Multiple Hierarchies through Scalable Log-linear Models, KDD, 2010

16
BAYESIAN LOGISTIC REGRESSION
• Regularized logistic regression = MAP solution
(Gaussian prior, logistic likelihood)

• Posterior is not Gaussian
• Diagonal Laplace approximation:

with:

and:

17
MODEL UPDATE
• Needed because ads / campaigns keep changing.
• The posterior distribution of a previously trained model can be used as
the prior for training a new model with a new batch of data
Day 1 Day 2 Day 3

Day 4

Day 5

M0

M1

M2

• Influence of the update frequency (auPRC):
1 day

6 hours

2 hours

+3.7%

+5.1%

+5.8%

18
OUTLINE
1.
2.
3.
4.

Criteo & Display advertising
Modeling
Large scale learning
Explore / exploit

19
PARALLEL LEARNING
• Large training set
– 2B training samples; 16M parameters
– 400GB (compressed)

• Proposed method: less than one hour with 500 machines

• Optimize:
• SGD is fast on a single machine, but difficult to parallelize.
• Batch (quasi-Newton) methods are straightforward to parallelize
– L-BFGS with distributed gradient computation.

m
20
ALLREDUCE
• Aggregate and broadcast across nodes
9
13

37

37

15

1
7

7

37 37

8
5

3

5

3

37

37

4

4

• Very few modification to existing code: just insert several AllReduce op.
• Compatible with Hadoop / MapReduce
– Build a spanning tree on the gateway
– Single MapReduce job
– Leverage speculative execution to alleviate the slow node issue
21
ONLINE INITIALIZATION
• Hybrid approach:
– One pass of online learning on each node
– Average the weights from each node to get a warm start for batch
optimization

• Best of both (online / batch) worlds.

Splice site prediction (Sonnenburg et al. ‘10)

S. Sonnenburg and V. Franc, COFFIN: A Computational Framework for Linear SVMs, ICML 2010

Display advertising

22
OUTLINE
1.
2.
3.
4.

Criteo & Display advertising
Modeling
Large scale learning
Explore / exploit

23
THOMPSON SAMPLING
• Heuristic to address the Explore / Exploit problem, dating back to
Thompson (1933)
• Simple to implement
• Good performance in practice (Graepel et al. „10, Chapelle and Li „11)
• Rarely used, maybe because of lack of theoretical guarantee.

Draw model parameter
according to P( D)

t

Select best action
according to t

T. Graepel et al., Web-scale Bayesian click-through rate prediction for sponsored search advertising in
Microsoft’s Bing search engine, ICML 2010
O. Chapelle and L. Li, An Empirical Evaluation of Thompson Sampling, NIPS 2011

Observe reward
and update model

24
E/E SIMULATIONS
MAB with K arms. Best arm has mean reward = 0.5, others have 0.5 − ε.

25
EVALUATION
• Semi-simulated environment: real input features, but labels generated.
• Set of eligible ads varies from 1 to 5,910. Total ads = 66,373
• Comparison of E/E algorithms:
– 4 days of data
– Cold start

• Algorithms:
X candidates

– UCB: mean + std. dev.
– -greedy
– Thompson sampling

X selected

Random w
(ground truth)

Generated Y

Learned w

model update

26
RESULTS
• CTR regret (in percentage):
Thompson

UCB

e-greedy

Exploit-only

Random

3.72

4.14

4.98

5.00

31.95

• Regret over time:

27
OPEN QUESTIONS
• Hashing
– Theoretical performance guarantees

• Low rank matrix factorization
– Better predict on unseen pairs (publisher, advertiser)

• Sample selection bias
– System is trained only on selected ads, but all ads are scored.
– Possible solution: inverse propensity scoring
– But we still need to bias the training data toward good ads.

• Explore / exploit
– Evaluation framework
– Regret analysis of Thompson‟s sampling
– E/E with a budget; with multiple slots; with a delayed feedback

28
CONCLUSION
• Simple yet efficient techniques for click prediction

• Main difficulty in applied machine learning: avoid the bias (because of
academic papers) toward complex systems
– It‟s easy to get lured into building a complex system
– It‟s difficult to keep it simple

See paper for more details
Simple and scalable response prediction for display advertising
O. Chapelle, E. Manavoglu, R. Rosales, 2014

29

More Related Content

What's hot

Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Josef A. Habdank
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...Institute of Contemporary Sciences
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelFaisal Siddiqi
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt
 
MATLAB Based Projects for ECE Research Guidance
MATLAB Based Projects for ECE Research GuidanceMATLAB Based Projects for ECE Research Guidance
MATLAB Based Projects for ECE Research GuidanceMatlab Simulation
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopJulien SIMON
 
Porting R Models into Scala Spark
Porting R Models into Scala SparkPorting R Models into Scala Spark
Porting R Models into Scala Sparkcarl_pulley
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkMichal Bachman
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkDatabricks
 
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Fred Madrid
 
AWS Machine Learning & Google Cloud Machine Learning
AWS Machine Learning & Google Cloud Machine LearningAWS Machine Learning & Google Cloud Machine Learning
AWS Machine Learning & Google Cloud Machine LearningSC5.io
 
MATLAB Small Projects Research Assistance
MATLAB Small Projects Research AssistanceMATLAB Small Projects Research Assistance
MATLAB Small Projects Research AssistanceMatlab Simulation
 
An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)Julien SIMON
 
Source Code for MATLAB Projects Research Topics
Source Code for MATLAB Projects Research TopicsSource Code for MATLAB Projects Research Topics
Source Code for MATLAB Projects Research TopicsMatlab Simulation
 
Project Based on MATLAB For Final Year Research Guidance
Project Based on MATLAB For Final Year Research GuidanceProject Based on MATLAB For Final Year Research Guidance
Project Based on MATLAB For Final Year Research GuidanceMatlab Simulation
 

What's hot (20)

Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
Airfare prediction using Machine Learning with Apache Spark on 1 billion obse...
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
 
Netflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time TravelNetflix Recommendations Feature Engineering with Time Travel
Netflix Recommendations Feature Engineering with Time Travel
 
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge Funds
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 
MATLAB Based Projects for ECE Research Guidance
MATLAB Based Projects for ECE Research GuidanceMATLAB Based Projects for ECE Research Guidance
MATLAB Based Projects for ECE Research Guidance
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
 
Porting R Models into Scala Spark
Porting R Models into Scala SparkPorting R Models into Scala Spark
Porting R Models into Scala Spark
 
Advanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware FrameworkAdvanced Neo4j Use Cases with the GraphAware Framework
Advanced Neo4j Use Cases with the GraphAware Framework
 
Zipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering FrameworkZipline - A Declarative Feature Engineering Framework
Zipline - A Declarative Feature Engineering Framework
 
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
Scalable Time Series Forecasting and Monitoring using Apache Spark and Elasti...
 
AWS Machine Learning & Google Cloud Machine Learning
AWS Machine Learning & Google Cloud Machine LearningAWS Machine Learning & Google Cloud Machine Learning
AWS Machine Learning & Google Cloud Machine Learning
 
MATLAB Small Projects Research Assistance
MATLAB Small Projects Research AssistanceMATLAB Small Projects Research Assistance
MATLAB Small Projects Research Assistance
 
An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)
 
Source Code for MATLAB Projects Research Topics
Source Code for MATLAB Projects Research TopicsSource Code for MATLAB Projects Research Topics
Source Code for MATLAB Projects Research Topics
 
Project Based on MATLAB For Final Year Research Guidance
Project Based on MATLAB For Final Year Research GuidanceProject Based on MATLAB For Final Year Research Guidance
Project Based on MATLAB For Final Year Research Guidance
 

Similar to Response prediction for display advertising - WSDM 2014

Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Databricks
 
RTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorialRTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorialShuai Yuan
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganSpark Summit
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Wayyingfeng
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiHakka Labs
 
Recharge your PPC Profitability
Recharge your PPC ProfitabilityRecharge your PPC Profitability
Recharge your PPC ProfitabilityPurna Virji
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilitiesAllan D. Butler
 
Int'l Conference on Predictive APIs: RTB Optimizer presentation
Int'l Conference on Predictive APIs: RTB Optimizer presentationInt'l Conference on Predictive APIs: RTB Optimizer presentation
Int'l Conference on Predictive APIs: RTB Optimizer presentationDatacratic
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsKun Liu
 
Nicolas Kruchten @ Datacratic
Nicolas Kruchten @ DatacraticNicolas Kruchten @ Datacratic
Nicolas Kruchten @ DatacraticPAPIs.io
 
20141209 meetup hassan
20141209 meetup hassan20141209 meetup hassan
20141209 meetup hassanNanda Kishore
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarSigOpt
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate Ido Shilon
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...Webanalisten .nl
 
How to buy traffic from Facebook, Instagram and Facebook Audience Network
How to buy traffic from Facebook, Instagram and Facebook Audience NetworkHow to buy traffic from Facebook, Instagram and Facebook Audience Network
How to buy traffic from Facebook, Instagram and Facebook Audience NetworkTravelpayouts
 
Flash Series: Using Ad Testing Data To Set Up Foolproof Tests
Flash Series: Using Ad Testing Data To Set Up Foolproof TestsFlash Series: Using Ad Testing Data To Set Up Foolproof Tests
Flash Series: Using Ad Testing Data To Set Up Foolproof TestsHanapin Marketing
 
Optimization Direct: Introduction and recent case studies
Optimization Direct: Introduction and recent case studiesOptimization Direct: Introduction and recent case studies
Optimization Direct: Introduction and recent case studiesAlkis Vazacopoulos
 

Similar to Response prediction for display advertising - WSDM 2014 (20)

Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
 
RTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorialRTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorial
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
Deepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn WayDeepak-Computational Advertising-The LinkedIn Way
Deepak-Computational Advertising-The LinkedIn Way
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick Gorski
 
Recharge your PPC Profitability
Recharge your PPC ProfitabilityRecharge your PPC Profitability
Recharge your PPC Profitability
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
Int'l Conference on Predictive APIs: RTB Optimizer presentation
Int'l Conference on Predictive APIs: RTB Optimizer presentationInt'l Conference on Predictive APIs: RTB Optimizer presentation
Int'l Conference on Predictive APIs: RTB Optimizer presentation
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
 
Nicolas Kruchten @ Datacratic
Nicolas Kruchten @ DatacraticNicolas Kruchten @ Datacratic
Nicolas Kruchten @ Datacratic
 
20141209 meetup hassan
20141209 meetup hassan20141209 meetup hassan
20141209 meetup hassan
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
RTB Bid Landscape in Adform
RTB Bid Landscape in AdformRTB Bid Landscape in Adform
RTB Bid Landscape in Adform
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
Gui Liberali, Morphing Websites - Digital Data Tips Tuesday #4 - Growth throu...
 
How to buy traffic from Facebook, Instagram and Facebook Audience Network
How to buy traffic from Facebook, Instagram and Facebook Audience NetworkHow to buy traffic from Facebook, Instagram and Facebook Audience Network
How to buy traffic from Facebook, Instagram and Facebook Audience Network
 
Flash Series: Using Ad Testing Data To Set Up Foolproof Tests
Flash Series: Using Ad Testing Data To Set Up Foolproof TestsFlash Series: Using Ad Testing Data To Set Up Foolproof Tests
Flash Series: Using Ad Testing Data To Set Up Foolproof Tests
 
Optimization Direct: Introduction and recent case studies
Optimization Direct: Introduction and recent case studiesOptimization Direct: Introduction and recent case studies
Optimization Direct: Introduction and recent case studies
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Response prediction for display advertising - WSDM 2014

  • 1. RESPONSE PREDICTION FOR DISPLAY ADVERTISING WSDM ’14 Olivier Chapelle
  • 2. OUTLINE 1. 2. 3. 4. Criteo & Display advertising Modeling Large scale learning Explore / exploit 2
  • 4. DISPLAY ADVERTISING • Rapidly growing multi-billion dollar business (30% of internet advertising revenue in 2013). • Marketplace between: – Publishers: sell display opportunities – Advertisers: pay for showing their ad • Real Time Bidding: – Auction amongst advertisers is held at the moment when a user generates a display opportunity by visiting a publisher‟s web page. 4
  • 5. BRANDING VS PERFORMANCE PRICING TYPE • CPM (Cost Per Mille): advertiser pays per thousand impressions • CPC (Cost Per Click): advertiser pays only when the user clicks • CPA (Cost Per Action): advertiser pays only when the user performs a predefined action such as a purchase. CAMPAIGN TYPE • Branding CPM • Performance based advertising (retargeting) CPC, CPA CONVERSIONS eCPM = CPC * predicted clickthrough rate eCPM = CPA * predicted conversion rate 5
  • 6. CRITEO Bill on CPC Advertisers Pay on CPM Publishers Criteo’s success depends highly on how precisely we can predict Click Through Rates (CTR) and Conversion Rates (CR) 6
  • 7. RECOMMENDATION Collaborative filtering Propose related items to a user based on historical interactions from other users Over 50% of Criteo driven sales come from recommended products the user had never viewed on advertiser websites 7
  • 8. CRITEO NUMBERS 750 10 PB EMPLOYEES 42 COUNTRIES 18 BILLION 600 $ MILLION 2013 REVENUE BANNERS PER DAY RT BIDS PER DAY 7500 CORES CLUSTER 2.1 BILLION 250K AD CALLS PER SECOND 8 8
  • 9. OUTLINE 1. 2. 3. 4. Criteo & Display advertising Modeling Large scale learning Explore / exploit 9
  • 10. FEATURES • Three sources of features: user, ad, page • In this talk: categorical features on ad and page. Advertiser network Publisher network Advertiser Publisher Campaign Site Publisher hierarchy Url Ad Advertiser hierarchy 10
  • 11. HASHING TRICK • Standard representation of categorical features: “one-hot” encoding For instance, site feature 0 0 1 cnn.com 0 0 0 0 news.yahoo.com • Dimensionality equal to the number of different values – can be very large • Hashing to reduce dimensionality (made popular by John Langford in VW) • Dimensionality now independent of number of values 11
  • 12. HASHING VS FEATURE SELECTION • “Small” problem with 35M different values. • Methods that require a dictionary have a larger model. 12
  • 13. QUADRATIC FEATURES • Outer product between two features. • Example: between site and advertiser, Feature is 1 site=finance.yahoo.com & advertiser=bank of america Advertiser network Publisher network Publisher Advertiser Site Campaign Url Ad  Similar to a polynomial kernel of degree 2  Large number of values hashing trick 13
  • 14. ADVANTAGES OF HASHING • Practical – Straightforward implement; no need to maintain dictionaries • Statistical – Regularization (infrequent values are washed away by frequent ones) • Most powerful when combined with quadratic features Quote of John Langford about hashing At first it‟s scary, then you love it 14
  • 15. LEARNING • Regularized logistic regression – Vowpal Wabbit open source package • Regularization with hierarchical features Well estimated backoff smoothing Small if rare value • Negative data subsampled for computational reason 15
  • 16. EVALUATION • Comparison with (Agarwal et al. ‟10) – Probabilistic model for the same display advertising prediction problem – Leverages the hierarchical structures on the ad and publisher sides – Sparse prior for smoothing • Model trained on three weeks of data, tested on the 3 following days auROC auPRC Log likelihood + 3.1% + 10.0% + 7.1% D. Agarwal et al., Estimating Rates of Rare Events with Multiple Hierarchies through Scalable Log-linear Models, KDD, 2010 16
  • 17. BAYESIAN LOGISTIC REGRESSION • Regularized logistic regression = MAP solution (Gaussian prior, logistic likelihood) • Posterior is not Gaussian • Diagonal Laplace approximation: with: and: 17
  • 18. MODEL UPDATE • Needed because ads / campaigns keep changing. • The posterior distribution of a previously trained model can be used as the prior for training a new model with a new batch of data Day 1 Day 2 Day 3 Day 4 Day 5 M0 M1 M2 • Influence of the update frequency (auPRC): 1 day 6 hours 2 hours +3.7% +5.1% +5.8% 18
  • 19. OUTLINE 1. 2. 3. 4. Criteo & Display advertising Modeling Large scale learning Explore / exploit 19
  • 20. PARALLEL LEARNING • Large training set – 2B training samples; 16M parameters – 400GB (compressed) • Proposed method: less than one hour with 500 machines • Optimize: • SGD is fast on a single machine, but difficult to parallelize. • Batch (quasi-Newton) methods are straightforward to parallelize – L-BFGS with distributed gradient computation. m 20
  • 21. ALLREDUCE • Aggregate and broadcast across nodes 9 13 37 37 15 1 7 7 37 37 8 5 3 5 3 37 37 4 4 • Very few modification to existing code: just insert several AllReduce op. • Compatible with Hadoop / MapReduce – Build a spanning tree on the gateway – Single MapReduce job – Leverage speculative execution to alleviate the slow node issue 21
  • 22. ONLINE INITIALIZATION • Hybrid approach: – One pass of online learning on each node – Average the weights from each node to get a warm start for batch optimization • Best of both (online / batch) worlds. Splice site prediction (Sonnenburg et al. ‘10) S. Sonnenburg and V. Franc, COFFIN: A Computational Framework for Linear SVMs, ICML 2010 Display advertising 22
  • 23. OUTLINE 1. 2. 3. 4. Criteo & Display advertising Modeling Large scale learning Explore / exploit 23
  • 24. THOMPSON SAMPLING • Heuristic to address the Explore / Exploit problem, dating back to Thompson (1933) • Simple to implement • Good performance in practice (Graepel et al. „10, Chapelle and Li „11) • Rarely used, maybe because of lack of theoretical guarantee. Draw model parameter according to P( D) t Select best action according to t T. Graepel et al., Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine, ICML 2010 O. Chapelle and L. Li, An Empirical Evaluation of Thompson Sampling, NIPS 2011 Observe reward and update model 24
  • 25. E/E SIMULATIONS MAB with K arms. Best arm has mean reward = 0.5, others have 0.5 − ε. 25
  • 26. EVALUATION • Semi-simulated environment: real input features, but labels generated. • Set of eligible ads varies from 1 to 5,910. Total ads = 66,373 • Comparison of E/E algorithms: – 4 days of data – Cold start • Algorithms: X candidates – UCB: mean + std. dev. – -greedy – Thompson sampling X selected Random w (ground truth) Generated Y Learned w model update 26
  • 27. RESULTS • CTR regret (in percentage): Thompson UCB e-greedy Exploit-only Random 3.72 4.14 4.98 5.00 31.95 • Regret over time: 27
  • 28. OPEN QUESTIONS • Hashing – Theoretical performance guarantees • Low rank matrix factorization – Better predict on unseen pairs (publisher, advertiser) • Sample selection bias – System is trained only on selected ads, but all ads are scored. – Possible solution: inverse propensity scoring – But we still need to bias the training data toward good ads. • Explore / exploit – Evaluation framework – Regret analysis of Thompson‟s sampling – E/E with a budget; with multiple slots; with a delayed feedback 28
  • 29. CONCLUSION • Simple yet efficient techniques for click prediction • Main difficulty in applied machine learning: avoid the bias (because of academic papers) toward complex systems – It‟s easy to get lured into building a complex system – It‟s difficult to keep it simple See paper for more details Simple and scalable response prediction for display advertising O. Chapelle, E. Manavoglu, R. Rosales, 2014 29