SlideShare a Scribd company logo
1 of 15
Copyright © 2014 Criteo
Making Advertising Personal
Large Scale Real-Time Product Recommendation at Criteo
Olivier Koch & Romain Lerallut
May 19th, 2015
Copyright © 2014 Criteo
Performance Advertising
We buy
• Inventory ! (ad spaces)
• Billions of times a day
• All over the Internet
• For 95% of the population
Where is the need for tech ?
We sell
• Clicks !
• (that convert)
• (that convert a lot)
We take the risk
You pay only for what you get
Copyright © 2013. Confidential
International infrastructure
Key figures
Traffic
550 k HTTP requests / sec (peak activity)
23000 impressions /sec (peak activity)
180 k requests / sec on RTB (average)
Less than 10 ms to process an RTB request
3
Figures of May2013
Physical infrastructure
6 Data centers on 3 continents operated and conceived in-
house
~ 12000 servers, largest Hadoop cluster in Europe
Availability / Uptime >99.95%
More than 20 PB of storage Big Data
Copyright © 2012. Confidential 4
Arbitrage
•Should we bid?
•At which price?
Recommendation
•Which products should
we display?
Graphical
optimization
•Big image vs small image
•Background color, ...
Prediction
•Generic prediction engine
•Specific models trained on TBs of data
Global Engine Architecture
Copyright © 2014 Criteo
Building ads with
personalized content in realtime
Recommendation for Advertising
2.5 billion products
~ 8ms response time
Copyright © 2014 Criteo
Data Sources
• Catalog data
• Feed provided by the merchants
• User behavior data
• Large scale intent data
• All visits to merchant websites
• Page views, basket, sales events
• Ad display data
• Displayed and clicked ads
6
Copyright © 2012. Confidential
7
Recommendation execution flow
Candidates
Generation
• Get candidates from all
sources using user historical
products and bestofs
Candidates
Aggregation
• Remove duplicates
• Aggregate features
Preselection
• Call degraded prediction to
decrease number of
candidates
Scoring
• Call full prediction model to
score each candidate
Winner Selector
• Select N products on score,
randomization
Glup Logging
• Log recommended products,
prediction variables
Copyright © 2014 Criteo
Issue #1: Retrieving user-specific products
• The need: storing [user → interesting products] vectors
• Difficult to store and retrieve at scale (900+ mln users)
• Hard to keep up to date
• Using seen products as a proxy
• Store the [user → viewed products] vectors
• Easier to maintain
• Store [viewed product → interesting products] vectors
• Based on aggregated user behavior data
• Computable offline
• Final ranking by a ML model
8
Copyright © 2014 Criteo
Issue #2: Scoring products
• Need to fuse data from several sources
• Product-specific
• User-specific
• User-product interactions
• Display-specific
• 1st solution: regression model
• Predict P(product click then sale)
• Easy to evaluate
• 2nd solution: ranking model
• More appropriate for our needs
• Still maximizing post-click sales
• Can be evaluated only on multi-product banners
9
Copyright © 2014 Criteo
Issue #3: Picking the right products
• Several questions:
• User-product fatigue
• Independent product choice assumption
• Explore / exploit
• Solution 1: Randomization in the banners
• Keep independent products assumption
• Separate optimization process to shuffle displayed items
• Solution 2: A better scoring model
• Score a full banner, not independent products
• Store all product display counts
10
Copyright © 2014 Criteo
Issue #4: 8ms response time ! Tweaking for performance
• CTR / Sales prediction takes ~40 µs per candidate using in-house library
• 2-step prediction:
• A fast pass to remove most of the candidates
• A slow pass to score accurately the remaining candidates
And the technical fine print:
• All real-time code in C#
• Async I/O for better efficiency
• HAProxy to scale the front-end
• Memcached to store all required data in memory
11
Copyright © 2014 Criteo
Upcoming challenges
• Long(er)-term user profiles
• More and better product information (images, semantic, NLP)
• Vertical-optimized engine
• Classifieds (catalog-free recommendation ?)
• Travel
• Instant-update of similarities
• (because batch computation is soooo last year)
12
Copyright © 2014 Criteo
Fancy a try ?
13
On your own:
With us !
http://www.criteo.com/careers/
• Our 1st public dataset is online: http://bit.ly/1vgw2XC
• 4GB display and click data, Kaggle challenge in 2014
• NEW : 1TB dataset released a few weeks ago
• Hosted on Microsoft Azure, just waiting for you
Copyright © 2014 Criteo
Questions?
Copyright © 2014 Criteo
Thank you !
o.koch@criteo.com
r.lerallut@criteo.com

More Related Content

What's hot

criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015
Carolyn Bednarz
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0
Scott Turecek
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
Dataconomy Media
 
3 Minute Introduction
3 Minute Introduction3 Minute Introduction
3 Minute Introduction
Julian Tol
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo
 
CrowdCast Pitch
CrowdCast PitchCrowdCast Pitch
CrowdCast Pitch
Shahrukh Ghazali
 

What's hot (16)

criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015criteo-performance-advertising-playbook-2015
criteo-performance-advertising-playbook-2015
 
Introduction Criteo - 2.0
Introduction Criteo - 2.0Introduction Criteo - 2.0
Introduction Criteo - 2.0
 
Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo Simon Dollé_Large-scale Real-time recommendation at Criteo
Simon Dollé_Large-scale Real-time recommendation at Criteo
 
Criteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) MeetupCriteo Infrastructure (Platform) Meetup
Criteo Infrastructure (Platform) Meetup
 
Criteo Couchbase live 2015
Criteo Couchbase live 2015Criteo Couchbase live 2015
Criteo Couchbase live 2015
 
Your Future With Content Manager OnDemand
Your Future With Content Manager OnDemandYour Future With Content Manager OnDemand
Your Future With Content Manager OnDemand
 
Aws community day keynote
Aws community day keynoteAws community day keynote
Aws community day keynote
 
Ad Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchangeAd Server Solutions - ad server ad exchange
Ad Server Solutions - ad server ad exchange
 
3 Minute Introduction
3 Minute Introduction3 Minute Introduction
3 Minute Introduction
 
Optimize your Cloud Purchase Strategy
Optimize your Cloud Purchase Strategy Optimize your Cloud Purchase Strategy
Optimize your Cloud Purchase Strategy
 
Google’s strategy to gain an edge on the Cloud
Google’s strategy to gain an edge on the CloudGoogle’s strategy to gain an edge on the Cloud
Google’s strategy to gain an edge on the Cloud
 
Successful IoT projects - a few lessons
Successful IoT projects - a few lessonsSuccessful IoT projects - a few lessons
Successful IoT projects - a few lessons
 
Online Ad Serving
Online Ad ServingOnline Ad Serving
Online Ad Serving
 
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of ClickersCriteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
Criteo's Ad Week 2012 presentation - Big Data and the Value of Clickers
 
Axonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffersAxonite Campaign Automation Infrastructure for HasOffers
Axonite Campaign Automation Infrastructure for HasOffers
 
CrowdCast Pitch
CrowdCast PitchCrowdCast Pitch
CrowdCast Pitch
 

Viewers also liked

Etude Criteo - la valeur réelle des internautes qui cliquent - 26 Juin 2012
Etude Criteo - la valeur réelle des internautes qui cliquent  - 26 Juin 2012Etude Criteo - la valeur réelle des internautes qui cliquent  - 26 Juin 2012
Etude Criteo - la valeur réelle des internautes qui cliquent - 26 Juin 2012
Romain Fonnier
 

Viewers also liked (18)

C# development workflow @ criteo
C# development workflow @ criteoC# development workflow @ criteo
C# development workflow @ criteo
 
RecsysFR: Criteo presentation
RecsysFR: Criteo presentationRecsysFR: Criteo presentation
RecsysFR: Criteo presentation
 
Criteo. Reach people, not devices!
Criteo. Reach people, not devices!Criteo. Reach people, not devices!
Criteo. Reach people, not devices!
 
Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo? Why reinvent the wheel at Criteo?
Why reinvent the wheel at Criteo?
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014
 
Infographic: How Professional Services Automation can transform your Company!
Infographic: How Professional Services Automation can transform your Company!Infographic: How Professional Services Automation can transform your Company!
Infographic: How Professional Services Automation can transform your Company!
 
Saintjo Two AV4
Saintjo Two AV4Saintjo Two AV4
Saintjo Two AV4
 
Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2
Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2
Bizarre... vous avez dit bizarre - Paris Monitoring meetup #2
 
Etude Criteo - la valeur réelle des internautes qui cliquent - 26 Juin 2012
Etude Criteo - la valeur réelle des internautes qui cliquent  - 26 Juin 2012Etude Criteo - la valeur réelle des internautes qui cliquent  - 26 Juin 2012
Etude Criteo - la valeur réelle des internautes qui cliquent - 26 Juin 2012
 
Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...
Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...
Morning with MongoDB Paris 2012 - Cas d'usages courant en entreprise. Présent...
 
AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...
AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...
AWS Summit Paris - Track 1 - Session 2 - Designez vos architectures pour plus...
 
Hadoop summit-ams-2014-04-03
Hadoop summit-ams-2014-04-03Hadoop summit-ams-2014-04-03
Hadoop summit-ams-2014-04-03
 
TechFeedというテクノロジーキュレーションサービスを作った話
TechFeedというテクノロジーキュレーションサービスを作った話TechFeedというテクノロジーキュレーションサービスを作った話
TechFeedというテクノロジーキュレーションサービスを作った話
 
Using Neural Networks to predict user ratings
Using Neural Networks to predict user ratingsUsing Neural Networks to predict user ratings
Using Neural Networks to predict user ratings
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-InformationMeta-Prod2Vec: Simple Product Embeddings with Side-Information
Meta-Prod2Vec: Simple Product Embeddings with Side-Information
 
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
CONTENT2VEC: a Joint Architecture to use Product Image and Text for the task ...
 

Similar to Making advertising personal, 4th NL Recommenders Meetup

Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 

Similar to Making advertising personal, 4th NL Recommenders Meetup (20)

Recommendation at scale
Recommendation at scaleRecommendation at scale
Recommendation at scale
 
Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache Geode
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
 
Webinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each DayWebinar: 2 Billion Data Points Each Day
Webinar: 2 Billion Data Points Each Day
 
The New Model
The New ModelThe New Model
The New Model
 
10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the Globe10 Lessons Learned from Meeting with 150 Banks Across the Globe
10 Lessons Learned from Meeting with 150 Banks Across the Globe
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best Practices
 
Hybris @ Neev
Hybris @ NeevHybris @ Neev
Hybris @ Neev
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
The Business Justification for APM
The Business Justification for APMThe Business Justification for APM
The Business Justification for APM
 
Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019
Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019
Conclusion Connect state of IoT 2019 Review io t solutions world congress 2019
 
Dashlane Mission Teams
Dashlane Mission TeamsDashlane Mission Teams
Dashlane Mission Teams
 
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launchLean Startup: Reduce 40% go-to-market time & cost on your next product launch
Lean Startup: Reduce 40% go-to-market time & cost on your next product launch
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Lean startup 2019
Lean startup 2019Lean startup 2019
Lean startup 2019
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Containers and VMs and Clouds: Oh My. by Mike Coleman
Containers and VMs and Clouds: Oh My. by Mike ColemanContainers and VMs and Clouds: Oh My. by Mike Coleman
Containers and VMs and Clouds: Oh My. by Mike Coleman
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Making advertising personal, 4th NL Recommenders Meetup

  • 1. Copyright © 2014 Criteo Making Advertising Personal Large Scale Real-Time Product Recommendation at Criteo Olivier Koch & Romain Lerallut May 19th, 2015
  • 2. Copyright © 2014 Criteo Performance Advertising We buy • Inventory ! (ad spaces) • Billions of times a day • All over the Internet • For 95% of the population Where is the need for tech ? We sell • Clicks ! • (that convert) • (that convert a lot) We take the risk You pay only for what you get
  • 3. Copyright © 2013. Confidential International infrastructure Key figures Traffic 550 k HTTP requests / sec (peak activity) 23000 impressions /sec (peak activity) 180 k requests / sec on RTB (average) Less than 10 ms to process an RTB request 3 Figures of May2013 Physical infrastructure 6 Data centers on 3 continents operated and conceived in- house ~ 12000 servers, largest Hadoop cluster in Europe Availability / Uptime >99.95% More than 20 PB of storage Big Data
  • 4. Copyright © 2012. Confidential 4 Arbitrage •Should we bid? •At which price? Recommendation •Which products should we display? Graphical optimization •Big image vs small image •Background color, ... Prediction •Generic prediction engine •Specific models trained on TBs of data Global Engine Architecture
  • 5. Copyright © 2014 Criteo Building ads with personalized content in realtime Recommendation for Advertising 2.5 billion products ~ 8ms response time
  • 6. Copyright © 2014 Criteo Data Sources • Catalog data • Feed provided by the merchants • User behavior data • Large scale intent data • All visits to merchant websites • Page views, basket, sales events • Ad display data • Displayed and clicked ads 6
  • 7. Copyright © 2012. Confidential 7 Recommendation execution flow Candidates Generation • Get candidates from all sources using user historical products and bestofs Candidates Aggregation • Remove duplicates • Aggregate features Preselection • Call degraded prediction to decrease number of candidates Scoring • Call full prediction model to score each candidate Winner Selector • Select N products on score, randomization Glup Logging • Log recommended products, prediction variables
  • 8. Copyright © 2014 Criteo Issue #1: Retrieving user-specific products • The need: storing [user → interesting products] vectors • Difficult to store and retrieve at scale (900+ mln users) • Hard to keep up to date • Using seen products as a proxy • Store the [user → viewed products] vectors • Easier to maintain • Store [viewed product → interesting products] vectors • Based on aggregated user behavior data • Computable offline • Final ranking by a ML model 8
  • 9. Copyright © 2014 Criteo Issue #2: Scoring products • Need to fuse data from several sources • Product-specific • User-specific • User-product interactions • Display-specific • 1st solution: regression model • Predict P(product click then sale) • Easy to evaluate • 2nd solution: ranking model • More appropriate for our needs • Still maximizing post-click sales • Can be evaluated only on multi-product banners 9
  • 10. Copyright © 2014 Criteo Issue #3: Picking the right products • Several questions: • User-product fatigue • Independent product choice assumption • Explore / exploit • Solution 1: Randomization in the banners • Keep independent products assumption • Separate optimization process to shuffle displayed items • Solution 2: A better scoring model • Score a full banner, not independent products • Store all product display counts 10
  • 11. Copyright © 2014 Criteo Issue #4: 8ms response time ! Tweaking for performance • CTR / Sales prediction takes ~40 µs per candidate using in-house library • 2-step prediction: • A fast pass to remove most of the candidates • A slow pass to score accurately the remaining candidates And the technical fine print: • All real-time code in C# • Async I/O for better efficiency • HAProxy to scale the front-end • Memcached to store all required data in memory 11
  • 12. Copyright © 2014 Criteo Upcoming challenges • Long(er)-term user profiles • More and better product information (images, semantic, NLP) • Vertical-optimized engine • Classifieds (catalog-free recommendation ?) • Travel • Instant-update of similarities • (because batch computation is soooo last year) 12
  • 13. Copyright © 2014 Criteo Fancy a try ? 13 On your own: With us ! http://www.criteo.com/careers/ • Our 1st public dataset is online: http://bit.ly/1vgw2XC • 4GB display and click data, Kaggle challenge in 2014 • NEW : 1TB dataset released a few weeks ago • Hosted on Microsoft Azure, just waiting for you
  • 14. Copyright © 2014 Criteo Questions?
  • 15. Copyright © 2014 Criteo Thank you ! o.koch@criteo.com r.lerallut@criteo.com