More Related Content
Similar to Making advertising personal, 4th NL Recommenders Meetup (20)
Making advertising personal, 4th NL Recommenders Meetup
- 1. Copyright © 2014 Criteo
Making Advertising Personal
Large Scale Real-Time Product Recommendation at Criteo
Olivier Koch & Romain Lerallut
May 19th, 2015
- 2. Copyright © 2014 Criteo
Performance Advertising
We buy
• Inventory ! (ad spaces)
• Billions of times a day
• All over the Internet
• For 95% of the population
Where is the need for tech ?
We sell
• Clicks !
• (that convert)
• (that convert a lot)
We take the risk
You pay only for what you get
- 3. Copyright © 2013. Confidential
International infrastructure
Key figures
Traffic
550 k HTTP requests / sec (peak activity)
23000 impressions /sec (peak activity)
180 k requests / sec on RTB (average)
Less than 10 ms to process an RTB request
3
Figures of May2013
Physical infrastructure
6 Data centers on 3 continents operated and conceived in-
house
~ 12000 servers, largest Hadoop cluster in Europe
Availability / Uptime >99.95%
More than 20 PB of storage Big Data
- 4. Copyright © 2012. Confidential 4
Arbitrage
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Graphical
optimization
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Global Engine Architecture
- 5. Copyright © 2014 Criteo
Building ads with
personalized content in realtime
Recommendation for Advertising
2.5 billion products
~ 8ms response time
- 6. Copyright © 2014 Criteo
Data Sources
• Catalog data
• Feed provided by the merchants
• User behavior data
• Large scale intent data
• All visits to merchant websites
• Page views, basket, sales events
• Ad display data
• Displayed and clicked ads
6
- 7. Copyright © 2012. Confidential
7
Recommendation execution flow
Candidates
Generation
• Get candidates from all
sources using user historical
products and bestofs
Candidates
Aggregation
• Remove duplicates
• Aggregate features
Preselection
• Call degraded prediction to
decrease number of
candidates
Scoring
• Call full prediction model to
score each candidate
Winner Selector
• Select N products on score,
randomization
Glup Logging
• Log recommended products,
prediction variables
- 8. Copyright © 2014 Criteo
Issue #1: Retrieving user-specific products
• The need: storing [user → interesting products] vectors
• Difficult to store and retrieve at scale (900+ mln users)
• Hard to keep up to date
• Using seen products as a proxy
• Store the [user → viewed products] vectors
• Easier to maintain
• Store [viewed product → interesting products] vectors
• Based on aggregated user behavior data
• Computable offline
• Final ranking by a ML model
8
- 9. Copyright © 2014 Criteo
Issue #2: Scoring products
• Need to fuse data from several sources
• Product-specific
• User-specific
• User-product interactions
• Display-specific
• 1st solution: regression model
• Predict P(product click then sale)
• Easy to evaluate
• 2nd solution: ranking model
• More appropriate for our needs
• Still maximizing post-click sales
• Can be evaluated only on multi-product banners
9
- 10. Copyright © 2014 Criteo
Issue #3: Picking the right products
• Several questions:
• User-product fatigue
• Independent product choice assumption
• Explore / exploit
• Solution 1: Randomization in the banners
• Keep independent products assumption
• Separate optimization process to shuffle displayed items
• Solution 2: A better scoring model
• Score a full banner, not independent products
• Store all product display counts
10
- 11. Copyright © 2014 Criteo
Issue #4: 8ms response time ! Tweaking for performance
• CTR / Sales prediction takes ~40 µs per candidate using in-house library
• 2-step prediction:
• A fast pass to remove most of the candidates
• A slow pass to score accurately the remaining candidates
And the technical fine print:
• All real-time code in C#
• Async I/O for better efficiency
• HAProxy to scale the front-end
• Memcached to store all required data in memory
11
- 12. Copyright © 2014 Criteo
Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
• Vertical-optimized engine
• Classifieds (catalog-free recommendation ?)
• Travel
• Instant-update of similarities
• (because batch computation is soooo last year)
12
- 13. Copyright © 2014 Criteo
Fancy a try ?
13
On your own:
With us !
http://www.criteo.com/careers/
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• 4GB display and click data, Kaggle challenge in 2014
• NEW : 1TB dataset released a few weeks ago
• Hosted on Microsoft Azure, just waiting for you