SlideShare a Scribd company logo
1 of 25
REAL TIME MACHINE LEARNING DECISIONS AS A SERVICE 
RTB Optimizer: 
Behind the scenes with 
a Predictive API 
Nicolas Kruchten 
PAPIs.io – November 18, 2014
About Datacratic 
• Software company specializing in 
high performance systems and 
machine learning 
• 30 employees, founded in 2009, 
based in Montréal, Québec, Canada with an office in New York 
• 3 Predictive APIs in market today 
• Building a Machine Learning Database to help others 
build Predictive APIs and Apps
Real-Time Bidding for online advertising 
Real-Time 
Exchange 
Bidder 
Bidder 
Bidder 
Bidder 
Web 
Browser 
GET ad 
bid requests
Real-Time Bidding for online advertising 
Real-Time 
Exchange 
Bidder 
Bidder 
Bidder 
Bidder 
Web 
Browser 
ad 
bids 
auction
Real-Time Bidding for online advertising 
Real-Time 
Exchange 
Bidder 
Bidder 
Bidder 
Bidder 
Web 
Browser 
ad 
bids 
auction 
This happens millions of times per second 
Bidders must respond within 100 milliseconds
Real-Time Bidding for online advertising 
Real-Time 
Exchange 
Bidder 
Bidder 
Bidder 
Bidder 
Web 
Browser 
ad 
bids 
auction 
RTB Optimizer enables bidders to achieve campaign goals
Campaign goals 
• Advertising campaigns are typically outcome-oriented 
– Clicks 
– Video views 
– Conversions: app installs, purchases, sign-ups 
• e.g. Ad network has sold someone 1,000 outcomes for $1,000 
• e.g. Advertiser has $1,000 to get as many outcomes as 
possible 
• Essentially maximize profit or minimize cost-per-outcome
Datacratic’s RTB Optimizer 
• Client bidder relays bid-requests to API, API tells it how to bid 
• Handles 100,000 queries per second, for 100s of campaign 
• API says which campaign should bid and how much 
• API also needs outcomes in real-time and campaign goals
RTB Optimizer 
Bids API 
Outcomes 
API
A Predictive API that learns 
• Datacratic has no proprietary data set 
• API can learn from scratch from the bid-request stream 
what works for each campaign: 
– Contextual features: website, time of day, banner size and placement 
– User features: geo-location, browser, language, # of impressions shown 
– Customer-provided data: about the user, about the website 
• Provides insights into what features are driving performance 
• Can re-use learnings from previous campaigns
Second price auctions 
• First Price Auctions 
– You bid $1, I bid $2: I win, and I pay $2 
• RTB uses Second Price Auctions 
– You bid $1, I bid $2: I win, and I pay $1 
• Optimal bid = E[ value ] 
– Say it’s worth $2 to me 
– I will never bid more than $2 
– If I bid $1.50 and you bid $1.75: I’ve lost an opportunity for $0.25 surplus! 
– I should always bid $2
Don’t buy lottery tickets! 
E[ value ] = payout * P( getting the payout )
What’s it to you? 
• If client gets paid $10,000 for 1,000 then payout = $10 
E[ value | bid-request ] = $10 * P( conversion | bid-request ) 
• What was an economics problem is now a prediction 
problem 
• We need to calibrate to predict true probabilities
RTB Optimizer 
Bids API 
E[ value ] 
Outcomes 
API 
P( outcome )
Collecting the data 
• To compute P( X | Y ) we need examples of Y’s with an X label 
• RTB Optimizer uses mix of strategies to meet campaign goals 
• Probe strategy bids randomly to collect data 
• Optimized strategy bids with E[ value] 
• Automatic training/retraining when API see enough examples
RTB Optimizer 
Bids API 
Probe 
E[ value ] 
Outcomes 
Training 
API 
P( outcome )
Bias control 
• Never stop the probe strategy 
• Always need control group for evaluation, retraining 
• Risk of filter bubbles: future models trained on previous output 
• Bid requests are randomly routed to probe, less often over time 
• Models automatically back-tested before deployment
How to learn in real-time 
• Classify using bagged generalized linear models 
• Generate non-linear features with statistics tables 
• Periodically retrain classifier 
• Continuously update stats tables
Statistics Table by example 
Table Bucket Impressions Outcomes 
Outcomes/I 
mpressions 
95% Confidence 
Lower Bound on 
Outcomes/Impressi 
ons 
Browser 
Chrome 5M 3k 0.060% 0.058% 
Firefox 3M 1k 0.033% 0.031% 
Website 
abc.com 4M 2k 0.050% 0.048% 
xyz.com 1k 10 1.000% 0.481%
RTB Optimizer 
Bids API 
Probe 
Real-Time 
Stats 
Tables 
E[ value ] 
Outcomes 
Training 
API 
GLZ Classifier 
Batch
Implementation details (are everything) 
• 100k requests per second, 10 millisecond latency, running 
24/7, 
1 trillion predictions to date 
• Distributed system, written in C++ 11 
• AWS: data in S3, training runs on Amazon EC2 spot market 
• http://opensource.datacratic.com/ 
– RTBkit 
– JML 
– StarCluster
Does it work? 
Classification success? ROC or calibration curves…
Does it work? 
Classification success? ROC and calibration curves… 
Optimization success? 80% reductions in cost-per-outcome…
Does it work? 
Classification success? ROC or calibration curves… 
Optimization success? 80% reductions in cost-per-outcome… 
Customer success! 25% monthly growth
REAL TIME MACHINE LEARNING DECISIONS AS A SERVICE 
Thanks! 
nicolas@datacratic.com

More Related Content

Similar to Real Time Machine Learning Decisions as a Service: Behind the Scenes with a Predictive RTB Optimizer API

Nicolas Kruchten @ Datacratic
Nicolas Kruchten @ DatacraticNicolas Kruchten @ Datacratic
Nicolas Kruchten @ DatacraticPAPIs.io
 
Computational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local AdsComputational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local Adssoupsranjan
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiHakka Labs
 
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Databricks
 
RTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorialRTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorialShuai Yuan
 
Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014Olivier Chapelle
 
Mobile Ad Monetization for Games | Christian Calderon
Mobile Ad Monetization for Games | Christian CalderonMobile Ad Monetization for Games | Christian Calderon
Mobile Ad Monetization for Games | Christian CalderonJessica Tams
 
Szetela practcal ad words ai rocks
Szetela practcal ad words ai rocksSzetela practcal ad words ai rocks
Szetela practcal ad words ai rocksDavid Szetela
 
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingWeinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingJun Wang
 
Preffered account structure and performance tracking in Apple Search Ads
Preffered account structure and performance tracking in Apple Search AdsPreffered account structure and performance tracking in Apple Search Ads
Preffered account structure and performance tracking in Apple Search AdsAnna Yurchuk
 
AI and Machine Language in PPC
AI and Machine Language in PPCAI and Machine Language in PPC
AI and Machine Language in PPCDavid Szetela
 
Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統Vpon
 
Artificial Intelligence and Machine Learning in PPC - David Szetela
Artificial Intelligence and Machine Learning in PPC - David SzetelaArtificial Intelligence and Machine Learning in PPC - David Szetela
Artificial Intelligence and Machine Learning in PPC - David SzetelaState of Search Conference
 
Data Science at Flurry
Data Science at FlurryData Science at Flurry
Data Science at Flurrysoupsranjan
 
Pacing In The RTB Space
Pacing In The RTB SpacePacing In The RTB Space
Pacing In The RTB Spacejxieeducation
 
アドテク×Scala @Dynalyst
アドテク×Scala @Dynalystアドテク×Scala @Dynalyst
アドテク×Scala @DynalystSangwon Han
 
Reporting tips & tricks
Reporting tips & tricks  Reporting tips & tricks
Reporting tips & tricks marcwan
 
Franchise presentation
Franchise presentationFranchise presentation
Franchise presentationBrent Crysell
 
SiteScout August Buyer Strategy Webinar
SiteScout August Buyer Strategy WebinarSiteScout August Buyer Strategy Webinar
SiteScout August Buyer Strategy Webinarsitescout
 

Similar to Real Time Machine Learning Decisions as a Service: Behind the Scenes with a Predictive RTB Optimizer API (20)

Nicolas Kruchten @ Datacratic
Nicolas Kruchten @ DatacraticNicolas Kruchten @ Datacratic
Nicolas Kruchten @ Datacratic
 
Computational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local AdsComputational Advertising in Yelp Local Ads
Computational Advertising in Yelp Local Ads
 
Fast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick GorskiFast Data Driving Personalization - Nick Gorski
Fast Data Driving Personalization - Nick Gorski
 
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
Yelp Ad Targeting at Scale with Apache Spark with Inaz Alaei-Novin and Joe Ma...
 
RTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorialRTBMA ECIR 2016 tutorial
RTBMA ECIR 2016 tutorial
 
Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014Response prediction for display advertising - WSDM 2014
Response prediction for display advertising - WSDM 2014
 
Mobile Ad Monetization for Games | Christian Calderon
Mobile Ad Monetization for Games | Christian CalderonMobile Ad Monetization for Games | Christian Calderon
Mobile Ad Monetization for Games | Christian Calderon
 
Szetela practcal ad words ai rocks
Szetela practcal ad words ai rocksSzetela practcal ad words ai rocks
Szetela practcal ad words ai rocks
 
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display AdvertisingWeinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
Weinan Zhang's KDD15 Talk: Statistical Arbitrage Mining for Display Advertising
 
RTB Bid Landscape in Adform
RTB Bid Landscape in AdformRTB Bid Landscape in Adform
RTB Bid Landscape in Adform
 
Preffered account structure and performance tracking in Apple Search Ads
Preffered account structure and performance tracking in Apple Search AdsPreffered account structure and performance tracking in Apple Search Ads
Preffered account structure and performance tracking in Apple Search Ads
 
AI and Machine Language in PPC
AI and Machine Language in PPCAI and Machine Language in PPC
AI and Machine Language in PPC
 
Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統Vpon - 廣告效果導向為基礎的行動廣告系統
Vpon - 廣告效果導向為基礎的行動廣告系統
 
Artificial Intelligence and Machine Learning in PPC - David Szetela
Artificial Intelligence and Machine Learning in PPC - David SzetelaArtificial Intelligence and Machine Learning in PPC - David Szetela
Artificial Intelligence and Machine Learning in PPC - David Szetela
 
Data Science at Flurry
Data Science at FlurryData Science at Flurry
Data Science at Flurry
 
Pacing In The RTB Space
Pacing In The RTB SpacePacing In The RTB Space
Pacing In The RTB Space
 
アドテク×Scala @Dynalyst
アドテク×Scala @Dynalystアドテク×Scala @Dynalyst
アドテク×Scala @Dynalyst
 
Reporting tips & tricks
Reporting tips & tricks  Reporting tips & tricks
Reporting tips & tricks
 
Franchise presentation
Franchise presentationFranchise presentation
Franchise presentation
 
SiteScout August Buyer Strategy Webinar
SiteScout August Buyer Strategy WebinarSiteScout August Buyer Strategy Webinar
SiteScout August Buyer Strategy Webinar
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Real Time Machine Learning Decisions as a Service: Behind the Scenes with a Predictive RTB Optimizer API

  • 1. REAL TIME MACHINE LEARNING DECISIONS AS A SERVICE RTB Optimizer: Behind the scenes with a Predictive API Nicolas Kruchten PAPIs.io – November 18, 2014
  • 2. About Datacratic • Software company specializing in high performance systems and machine learning • 30 employees, founded in 2009, based in Montréal, Québec, Canada with an office in New York • 3 Predictive APIs in market today • Building a Machine Learning Database to help others build Predictive APIs and Apps
  • 3. Real-Time Bidding for online advertising Real-Time Exchange Bidder Bidder Bidder Bidder Web Browser GET ad bid requests
  • 4. Real-Time Bidding for online advertising Real-Time Exchange Bidder Bidder Bidder Bidder Web Browser ad bids auction
  • 5. Real-Time Bidding for online advertising Real-Time Exchange Bidder Bidder Bidder Bidder Web Browser ad bids auction This happens millions of times per second Bidders must respond within 100 milliseconds
  • 6. Real-Time Bidding for online advertising Real-Time Exchange Bidder Bidder Bidder Bidder Web Browser ad bids auction RTB Optimizer enables bidders to achieve campaign goals
  • 7. Campaign goals • Advertising campaigns are typically outcome-oriented – Clicks – Video views – Conversions: app installs, purchases, sign-ups • e.g. Ad network has sold someone 1,000 outcomes for $1,000 • e.g. Advertiser has $1,000 to get as many outcomes as possible • Essentially maximize profit or minimize cost-per-outcome
  • 8. Datacratic’s RTB Optimizer • Client bidder relays bid-requests to API, API tells it how to bid • Handles 100,000 queries per second, for 100s of campaign • API says which campaign should bid and how much • API also needs outcomes in real-time and campaign goals
  • 9. RTB Optimizer Bids API Outcomes API
  • 10. A Predictive API that learns • Datacratic has no proprietary data set • API can learn from scratch from the bid-request stream what works for each campaign: – Contextual features: website, time of day, banner size and placement – User features: geo-location, browser, language, # of impressions shown – Customer-provided data: about the user, about the website • Provides insights into what features are driving performance • Can re-use learnings from previous campaigns
  • 11. Second price auctions • First Price Auctions – You bid $1, I bid $2: I win, and I pay $2 • RTB uses Second Price Auctions – You bid $1, I bid $2: I win, and I pay $1 • Optimal bid = E[ value ] – Say it’s worth $2 to me – I will never bid more than $2 – If I bid $1.50 and you bid $1.75: I’ve lost an opportunity for $0.25 surplus! – I should always bid $2
  • 12. Don’t buy lottery tickets! E[ value ] = payout * P( getting the payout )
  • 13. What’s it to you? • If client gets paid $10,000 for 1,000 then payout = $10 E[ value | bid-request ] = $10 * P( conversion | bid-request ) • What was an economics problem is now a prediction problem • We need to calibrate to predict true probabilities
  • 14. RTB Optimizer Bids API E[ value ] Outcomes API P( outcome )
  • 15. Collecting the data • To compute P( X | Y ) we need examples of Y’s with an X label • RTB Optimizer uses mix of strategies to meet campaign goals • Probe strategy bids randomly to collect data • Optimized strategy bids with E[ value] • Automatic training/retraining when API see enough examples
  • 16. RTB Optimizer Bids API Probe E[ value ] Outcomes Training API P( outcome )
  • 17. Bias control • Never stop the probe strategy • Always need control group for evaluation, retraining • Risk of filter bubbles: future models trained on previous output • Bid requests are randomly routed to probe, less often over time • Models automatically back-tested before deployment
  • 18. How to learn in real-time • Classify using bagged generalized linear models • Generate non-linear features with statistics tables • Periodically retrain classifier • Continuously update stats tables
  • 19. Statistics Table by example Table Bucket Impressions Outcomes Outcomes/I mpressions 95% Confidence Lower Bound on Outcomes/Impressi ons Browser Chrome 5M 3k 0.060% 0.058% Firefox 3M 1k 0.033% 0.031% Website abc.com 4M 2k 0.050% 0.048% xyz.com 1k 10 1.000% 0.481%
  • 20. RTB Optimizer Bids API Probe Real-Time Stats Tables E[ value ] Outcomes Training API GLZ Classifier Batch
  • 21. Implementation details (are everything) • 100k requests per second, 10 millisecond latency, running 24/7, 1 trillion predictions to date • Distributed system, written in C++ 11 • AWS: data in S3, training runs on Amazon EC2 spot market • http://opensource.datacratic.com/ – RTBkit – JML – StarCluster
  • 22. Does it work? Classification success? ROC or calibration curves…
  • 23. Does it work? Classification success? ROC and calibration curves… Optimization success? 80% reductions in cost-per-outcome…
  • 24. Does it work? Classification success? ROC or calibration curves… Optimization success? 80% reductions in cost-per-outcome… Customer success! 25% monthly growth
  • 25. REAL TIME MACHINE LEARNING DECISIONS AS A SERVICE Thanks! nicolas@datacratic.com