SlideShare a Scribd company logo
1 of 12
Download to read offline
Machine Learning at Orbitz
Robert Lancaster and Jonathan Seidman
                          Strata 2011
                    February 02 | 2011
Launched: 2001, Chicago, IL




                              page 2
Why Start the Machine Learning Team at Orbitz?


•  Team was created in 2009 with the goal to apply machine
   learning techniques to improve the customer experience.
•  For example:
   –  Hotel sort optimization: How can we improve the ranking of
      hotel search results in order to show consumers hotels that
      more closely match their preferences?
   –  Cache optimization: can we intelligently cache hotel rates in
      order to optimize the performance of hotel searches?
   –  Personalization/segmentation: can we show targeted search
      results to specific consumer segments?




                                                                      page 3
Data Challenges


•  The team immediately faced challenges getting access to data:
   –  Performing required analysis requires access to large
      amounts of data on user interaction with the site.
   –  This data is available in web analytics logs, but required
      fields were not available in our data warehouse because of
      size considerations.
   –  Even worse, we had no archive of the data beyond several
      days.
   –  Size constraints aside, there’s considerable time and effort
      to get new data added to the data warehouse.




                                                                     page 4
New Data Infrastructure to Address These Challenges

•  Hadoop provides a solution to these challenges by:
   –  Providing long-term storage of entire raw dataset without
      placing constraints on how that data is processed.
   –  Allowing us to immediately take advantage of new web
      analytics data added to the site.
   –  Providing a platform for efficient analysis of data, as well as
      preparation of data for input to external processes for further
      analysis.
•  Hive was added to the infrastructure to provide structure over
   the prepared data, facilitating ad-hoc queries and selection of
   specific data sets for analysis.
•  Data stored in Hive not only supports machine learning efforts,
   but also provides metrics to analysts not available through
   other sources.

                                                                        page 5
New Data Infrastructure – Cont’d

•  Hadoop and Hive are now being used by the machine learning
   team to:
  –  Extract data from logs for hotel sort and cache optimization
     analyses.
  –  Distribute complex cross-validation and performance
     evaluation operations.
  –  Extracting data for clustering.
•  Hadoop and Hive have also gained rapid adoption in the
   organization beyond the machine learning team: evaluating
   page download performance, searching production logs,
   keyword analysis, etc.




                                                                    page 6
Use Case – Hotel Cache Optimization

Overview:
  Search methodology:
     •  Subset of total properties in a location (1 page at a time).
     •  Get “just enough” information to present to consumers.
  Caching:
     •  Reduces impact to suppliers (maintain “look-to-book” ratio).
     •  Reduces latency.
     •  Increases “coverage.”
Optimization Goal:
  Improve the customer experience (reduce latency, increase
    coverage) when searching for hotel rates while controlling impact
    on suppliers (maintain look-to-book).




                                                                        page 7
Hotel Cache Optimization – Early Attempts


Early approaches were well intended, but were not driven by analysis of
  the available data. For example:

Theory: High amount of thrashing leads to eviction of more useful cache entries.
Attempted Solution: Increase cache size.
Result: No increase in measured coverage.
Problem: No actual analysis on required cache size.


Theory: Locally managed inventory represents “free” information and can be
  requested without limit to improve coverage.
Attempted Solution: Don’t cache locally managed inventory. Increase the amount
   of local inventory requested with each user search.
Result: No increase in measured coverage.
Problem: Locally managed inventory doesn’t represent a large percentage of total
  inventory and is already highly preferenced.


                                                                                   page 8
Hotel Cache Optimization – Data Driven Approaches


Data Driven Approaches:


  Traffic Partitioning: Identify the subset of traffic that is most
    efficient and optimize that subset through prefetching and
    increased bursting.


  TTL Optimization: Use historic logs of availability and rate
   change information to predict volatility of hotel rates and
   optimize cache TTL.




                                                                      page 9
Hotel Cache Optimization– Traffic Distribution
100.00%
                        72% of queries are                                                        Queries
                        singletons and make up
90.00%                                                                                            Searches
                        nearly a third of total
                        search volume.
80.00%                                                                                            Reverse Running Total
                                                                                                  (Searches)
 71.67%
                                                                                                  Reverse Running Total
70.00%                                                                                            (Queries)


60.00%
                                                                                   A small number of
                                                                                   queries (3%) make
50.00%                                                                             up more than a third
                                                                                   of search volume.
40.00%
                                                           34.30%
 31.87%

30.00%


20.00%


10.00%
                                                          2.78%

 0.00%
          1     2   3       4     5     6     7   8   9      10     11   12   13     14     15    16      17   18     19   20




                                                                                                                           page 10
Optimize Hotel Cache – Traffic Partitioning


       Evaluate possible mechanisms for determining most
        frequent queries.
       Favor mechanisms that gives high search/query ratio for
         the greatest percentage of search volume.
       Test for stability of mechanism across multiple time periods.

Par$on	
  Strategy	
   Descrip$on	
                                                 Pct	
  Queries	
  Pct	
  Searches	
  Searches/Query	
  

Baseline	
           All	
  traffic	
                                                     100.00%	
         100.00%	
                   2.19	
  

Top	
  50	
          Top	
  50	
  searched	
  markets	
                                   14.88%	
         26.76%	
                   3.94	
  
                     Top	
  50	
  searched	
  markets,	
  	
  weekend	
  stay	
  
HeurisCc	
           within	
  1	
  month.	
                                               0.87%	
           8.52%	
                  21.4	
  

EnumeraCon	
         Queries	
  repeated	
  5	
  or	
  more	
  Cmes.	
                     3.45%	
         28.80%	
                 18.29	
  

PredicCon	
          TBD	
                                                                    TBD	
            TBD	
                  TBD	
  


                                                                                                                                                 page 11
Conclusions and Lessons Learned


•  Start with a manageable problem (ease of measuring success,
   availability of data, etc.)
•  Avoid thinking of machine learning team as an R&D
   organization.
•  Instead, foster machine learning approaches throughout the
   organization:
   –  Embed resources on actual feature teams.
   –  Machine learning study groups, etc.




                                                                 page 12

More Related Content

Similar to Real World Machine Learning at Orbitz, Strata 2011

Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011Jonathan Seidman
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar
 
Emetrics - Oct 19 2011 - New York - X channel optimisation
Emetrics - Oct 19 2011 - New York - X channel optimisationEmetrics - Oct 19 2011 - New York - X channel optimisation
Emetrics - Oct 19 2011 - New York - X channel optimisationCraig Sullivan
 
Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011Sematext Group, Inc.
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Daniel Austin
 
Generation of Search Based Test Data on Acceptability Testing Principle
Generation of Search Based Test Data on Acceptability Testing PrincipleGeneration of Search Based Test Data on Acceptability Testing Principle
Generation of Search Based Test Data on Acceptability Testing Principleiosrjce
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesNish Parikh
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningitstuff
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopKevin Crawley
 
Ed Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisVolha Banadyseva
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Lionel Briand
 
Lavacon 2012 How Documentation Teams Can Use Web Analytics to Expand their Co...
Lavacon 2012 How Documentation Teams Can Use Web Analytics to Expand their Co...Lavacon 2012 How Documentation Teams Can Use Web Analytics to Expand their Co...
Lavacon 2012 How Documentation Teams Can Use Web Analytics to Expand their Co...bzebian
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013Daniel Austin
 
Adding voice of customer to your analytics toolkit
Adding voice of customer to your analytics toolkitAdding voice of customer to your analytics toolkit
Adding voice of customer to your analytics toolkitiperceptions
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Raghu Kashyap
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientificRevenue
 

Similar to Real World Machine Learning at Orbitz, Strata 2011 (20)

Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011Extending the EDW with Hadoop - Chicago Data Summit 2011
Extending the EDW with Hadoop - Chicago Data Summit 2011
 
YM-RMWisdom15 final
YM-RMWisdom15 finalYM-RMWisdom15 final
YM-RMWisdom15 final
 
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...
 
Emetrics - Oct 19 2011 - New York - X channel optimisation
Emetrics - Oct 19 2011 - New York - X channel optimisationEmetrics - Oct 19 2011 - New York - X channel optimisation
Emetrics - Oct 19 2011 - New York - X channel optimisation
 
Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011Search Analytics at Enterprise Search Summit Fall 2011
Search Analytics at Enterprise Search Summit Fall 2011
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 
Generation of Search Based Test Data on Acceptability Testing Principle
Generation of Search Based Test Data on Acceptability Testing PrincipleGeneration of Search Based Test Data on Acceptability Testing Principle
Generation of Search Based Test Data on Acceptability Testing Principle
 
D017642026
D017642026D017642026
D017642026
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
DockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability WorkshopDockerCon SF 2019 - Observability Workshop
DockerCon SF 2019 - Observability Workshop
 
Ed Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual Analysis
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
Analytics in the Cloud
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
 
Lavacon 2012 How Documentation Teams Can Use Web Analytics to Expand their Co...
Lavacon 2012 How Documentation Teams Can Use Web Analytics to Expand their Co...Lavacon 2012 How Documentation Teams Can Use Web Analytics to Expand their Co...
Lavacon 2012 How Documentation Teams Can Use Web Analytics to Expand their Co...
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Adding voice of customer to your analytics toolkit
Adding voice of customer to your analytics toolkitAdding voice of customer to your analytics toolkit
Adding voice of customer to your analytics toolkit
 
Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011Web analyticsandbigdata techweek2011
Web analyticsandbigdata techweek2011
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Scientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talkScientific Revenue USF 2016 talk
Scientific Revenue USF 2016 talk
 

More from Jonathan Seidman

Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019Jonathan Seidman
 
Foundations strata sf-2019_final
Foundations strata sf-2019_finalFoundations strata sf-2019_final
Foundations strata sf-2019_finalJonathan Seidman
 
Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Jonathan Seidman
 
Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Jonathan Seidman
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Jonathan Seidman
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman
 
Extending the Data Warehouse with Hadoop - Hadoop world 2011
Extending the Data Warehouse with Hadoop - Hadoop world 2011Extending the Data Warehouse with Hadoop - Hadoop world 2011
Extending the Data Warehouse with Hadoop - Hadoop world 2011Jonathan Seidman
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Jonathan Seidman
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Jonathan Seidman
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Jonathan Seidman
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 

More from Jonathan Seidman (15)

Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019
 
Foundations strata sf-2019_final
Foundations strata sf-2019_finalFoundations strata sf-2019_final
Foundations strata sf-2019_final
 
Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018
 
Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
Integrating Hadoop Into the Enterprise – Hadoop Summit 2012
 
Extending the Data Warehouse with Hadoop - Hadoop world 2011
Extending the Data Warehouse with Hadoop - Hadoop world 2011Extending the Data Warehouse with Hadoop - Hadoop world 2011
Extending the Data Warehouse with Hadoop - Hadoop world 2011
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 

Real World Machine Learning at Orbitz, Strata 2011

  • 1. Machine Learning at Orbitz Robert Lancaster and Jonathan Seidman Strata 2011 February 02 | 2011
  • 3. Why Start the Machine Learning Team at Orbitz? •  Team was created in 2009 with the goal to apply machine learning techniques to improve the customer experience. •  For example: –  Hotel sort optimization: How can we improve the ranking of hotel search results in order to show consumers hotels that more closely match their preferences? –  Cache optimization: can we intelligently cache hotel rates in order to optimize the performance of hotel searches? –  Personalization/segmentation: can we show targeted search results to specific consumer segments? page 3
  • 4. Data Challenges •  The team immediately faced challenges getting access to data: –  Performing required analysis requires access to large amounts of data on user interaction with the site. –  This data is available in web analytics logs, but required fields were not available in our data warehouse because of size considerations. –  Even worse, we had no archive of the data beyond several days. –  Size constraints aside, there’s considerable time and effort to get new data added to the data warehouse. page 4
  • 5. New Data Infrastructure to Address These Challenges •  Hadoop provides a solution to these challenges by: –  Providing long-term storage of entire raw dataset without placing constraints on how that data is processed. –  Allowing us to immediately take advantage of new web analytics data added to the site. –  Providing a platform for efficient analysis of data, as well as preparation of data for input to external processes for further analysis. •  Hive was added to the infrastructure to provide structure over the prepared data, facilitating ad-hoc queries and selection of specific data sets for analysis. •  Data stored in Hive not only supports machine learning efforts, but also provides metrics to analysts not available through other sources. page 5
  • 6. New Data Infrastructure – Cont’d •  Hadoop and Hive are now being used by the machine learning team to: –  Extract data from logs for hotel sort and cache optimization analyses. –  Distribute complex cross-validation and performance evaluation operations. –  Extracting data for clustering. •  Hadoop and Hive have also gained rapid adoption in the organization beyond the machine learning team: evaluating page download performance, searching production logs, keyword analysis, etc. page 6
  • 7. Use Case – Hotel Cache Optimization Overview: Search methodology: •  Subset of total properties in a location (1 page at a time). •  Get “just enough” information to present to consumers. Caching: •  Reduces impact to suppliers (maintain “look-to-book” ratio). •  Reduces latency. •  Increases “coverage.” Optimization Goal: Improve the customer experience (reduce latency, increase coverage) when searching for hotel rates while controlling impact on suppliers (maintain look-to-book). page 7
  • 8. Hotel Cache Optimization – Early Attempts Early approaches were well intended, but were not driven by analysis of the available data. For example: Theory: High amount of thrashing leads to eviction of more useful cache entries. Attempted Solution: Increase cache size. Result: No increase in measured coverage. Problem: No actual analysis on required cache size. Theory: Locally managed inventory represents “free” information and can be requested without limit to improve coverage. Attempted Solution: Don’t cache locally managed inventory. Increase the amount of local inventory requested with each user search. Result: No increase in measured coverage. Problem: Locally managed inventory doesn’t represent a large percentage of total inventory and is already highly preferenced. page 8
  • 9. Hotel Cache Optimization – Data Driven Approaches Data Driven Approaches: Traffic Partitioning: Identify the subset of traffic that is most efficient and optimize that subset through prefetching and increased bursting. TTL Optimization: Use historic logs of availability and rate change information to predict volatility of hotel rates and optimize cache TTL. page 9
  • 10. Hotel Cache Optimization– Traffic Distribution 100.00% 72% of queries are Queries singletons and make up 90.00% Searches nearly a third of total search volume. 80.00% Reverse Running Total (Searches) 71.67% Reverse Running Total 70.00% (Queries) 60.00% A small number of queries (3%) make 50.00% up more than a third of search volume. 40.00% 34.30% 31.87% 30.00% 20.00% 10.00% 2.78% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 page 10
  • 11. Optimize Hotel Cache – Traffic Partitioning Evaluate possible mechanisms for determining most frequent queries. Favor mechanisms that gives high search/query ratio for the greatest percentage of search volume. Test for stability of mechanism across multiple time periods. Par$on  Strategy   Descrip$on   Pct  Queries  Pct  Searches  Searches/Query   Baseline   All  traffic   100.00%   100.00%   2.19   Top  50   Top  50  searched  markets   14.88%   26.76%   3.94   Top  50  searched  markets,    weekend  stay   HeurisCc   within  1  month.   0.87%   8.52%   21.4   EnumeraCon   Queries  repeated  5  or  more  Cmes.   3.45%   28.80%   18.29   PredicCon   TBD   TBD   TBD   TBD   page 11
  • 12. Conclusions and Lessons Learned •  Start with a manageable problem (ease of measuring success, availability of data, etc.) •  Avoid thinking of machine learning team as an R&D organization. •  Instead, foster machine learning approaches throughout the organization: –  Embed resources on actual feature teams. –  Machine learning study groups, etc. page 12