SlideShare a Scribd company logo
1 of 24
Confidential - do not distribute
Hotels.com’s journey to becoming
anAlgorithmic Business
Matthew Fryer
VP,ChiefDataScienceOfficer
mfryer@hotels.com
Confidential - do not distribute
Part of Expedia, Inc. family
>385,000 properties
89 countries
39 languages
>30m Hotels.com Rewards Members
Home of Captain Obvious
Billions of Recommendations, based on real-time Data per day
Hotels.com
Confidential - do not distribute
Confidential - do not distribute
Confidential - do not distribute
5
Data Science Engineering Front End Development
Confidential - do not distribute
“Artificial Intelligence Will Be
Travel’s Next Big Thing”
Barry Diller
Chairman & Senior Executive,
Expedia, Inc.
3M’s are disruptive
technology
Mobile
Messaging / NLP
Machine Learning
Confidential - do not distribute
Confidential - do not distribute 8
Core Elements of our Data Science Cloud Platform
Databricks Unified Platform
Maestro – Our Internally Developed
Platform on AWS
(EMR, Spark, R-Studio, Intellij,SBT, Jupyter,
Zeppelin, Unit / QA, Metastore,Apache Airflow,
Keras, Tensorflow)
Proof of Concept on Google
Cloud, Beam, Spark &
Tensorflow
Confidential - do not distribute
Databricks Unified Platform
Chart is in 1 hour blocks, y axis = number of 32 core instances
9
• Key asset to the success of data science at Hotels.com
• Key in driving up data scientist productivity / efficiency / flexibility
• Helps make our data science lifecycle operate much easier and
faster driving speed to market
• Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting
cost effective spot instance on AWS.
Confidential - do not distribute
The hidden secret of data science and AI
Typically data scientists are investing large amounts of
time in feature / data engineering areas which are ripe for
a technology solution
10
Confidential - do not distribute 11
ALPs – Algorithm Lifecycle Pipeline Service
The end to end ML Platform
Confidential - do not distribute
Site Data
TrainingScoring & serving
Hotels.com
Training
Real-time
scoring / bandit
Ingestion
Cache
Service
Data pipelines
Data set generation,
feature extraction
Reporting
Train & deploy
model
Update feedback loop
with CTR, GP etc.
Clickstream
Experiment
Experiment
Store &
serve scores
Assign
variant
Calculate scores
Data
pipelines
Frameworks
& Platforms
Lifecycle
/ Deploy
Develop and
maintain ML/ AI
pipelines
Methods to
research & exploit
ML & AI
innovation
Implement ML / AI
in production
Data
capture
Accessible data
Confidential - do not distribute
Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour
Images are an important factor while choosing a hotel
13
0% 10% 20% 30% 40% 50% 60% 70% 80%
Loyalty Program
Reviews
Hotel Brand
Star Rating
Destination Info
Images
Hotel Info
Factors other than price/location
Very Imporant/Important Important Very Important
Confidential - do not distribute
Computer Vision problems we try to tackle
14
Near Duplicate Detection
Scene Classification Image Ranking
Confidential - do not distribute 15
Tagged as Bathroom
Confidential - do not distribute 16
GPU’s quickly became key, took a large effort to optimize using
Keras + Tensorflow (Inception v3 + ResNet)
493
67
20
7
4
1
10
100
1000
12-CPU 1-GPU 1-GPU +
limited cache
16-GPU +
limited cache
16-GPU + full
cache
Days CIFAR2
Expedia Small
15
2.5
0
5
10
15
20
16-GPU + full cache Optimized
Days
Confidential - do not distribute
Near Duplicate Detection: Real world examples
17
Non-Duplicates – probability 100%
Non-Duplicates – probability 95.91%
Duplicates – probability 97.98%
Duplicates – probability 98.43%
Confidential - do not distribute
ROOM/BATHROOM
Using the model: Real world examples
18
EXTERIOR/HOTEL INTERIOR/SEATING_LO
BBY
ROOM/LIVING_ROOM
ROOM/GUESTROOM
FACILITIES/DINING
INTERIOR/SEATING_LOBBY
FACILITIES/POOL
Confidential - do not distribute
Accuracy & Confusion Matrix
19
• After many manual / long
winded iterations and
regularization processes
tuning hyperparameters
• We achieved good
accuracy and low
confusion matrix
Confidential - do not distribute
Optimizing the photo order for improved customer
experiences
20
Original Model
Reference: Radisson Blu Edwardian Berkshire Hotel, London
Confidential - do not distribute
Finding the right hotel in our marketplace is core to
our customers needs.
Confidential - do not distribute
Kensington
Bloomsbury
Heathrow
Canary
Wharf
Paddington
Westminster
London City
Airport
Chelsea
Battersea
Wimbledon
Wembley
City of
London
As an example different user segments like to stay in
different locations
Confidential - do not distribute 23
Utility
Utility
Utility
just browsing! BOOK!Intent
(click)
Confidential - do not distribute
Thank you
mfryer@hotels.com
https://uk.linkedin.com/in/matthewfryer
@mattfryer

More Related Content

More from Jen Aman

Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkJen Aman
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityJen Aman
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache SparkJen Aman
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkJen Aman
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonJen Aman
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousJen Aman
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLJen Aman
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on MesosJen Aman
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics Jen Aman
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development KitJen Aman
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkJen Aman
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To ProductionJen Aman
 
High-Performance Python On Spark
High-Performance Python On SparkHigh-Performance Python On Spark
High-Performance Python On SparkJen Aman
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduJen Aman
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersJen Aman
 

More from Jen Aman (20)

Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Spark on Mesos
Spark on MesosSpark on Mesos
Spark on Mesos
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
Spark at Bloomberg: Dynamically Composable Analytics
Spark at Bloomberg:  Dynamically Composable Analytics Spark at Bloomberg:  Dynamically Composable Analytics
Spark at Bloomberg: Dynamically Composable Analytics
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
EclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache SparkEclairJS = Node.Js + Apache Spark
EclairJS = Node.Js + Apache Spark
 
Spark: Interactive To Production
Spark: Interactive To ProductionSpark: Interactive To Production
Spark: Interactive To Production
 
High-Performance Python On Spark
High-Performance Python On SparkHigh-Performance Python On Spark
High-Performance Python On Spark
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 

Recently uploaded

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Recently uploaded (20)

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

Hotels.com's Journey to Becoming an Algorithmic Business...Exponential Growth in Data Science Whilst Migrating to Apache Spark+Cloud All at the Same Time with Matt Fryer

  • 1. Confidential - do not distribute Hotels.com’s journey to becoming anAlgorithmic Business Matthew Fryer VP,ChiefDataScienceOfficer mfryer@hotels.com
  • 2. Confidential - do not distribute Part of Expedia, Inc. family >385,000 properties 89 countries 39 languages >30m Hotels.com Rewards Members Home of Captain Obvious Billions of Recommendations, based on real-time Data per day Hotels.com
  • 3. Confidential - do not distribute
  • 4. Confidential - do not distribute
  • 5. Confidential - do not distribute 5 Data Science Engineering Front End Development
  • 6. Confidential - do not distribute “Artificial Intelligence Will Be Travel’s Next Big Thing” Barry Diller Chairman & Senior Executive, Expedia, Inc. 3M’s are disruptive technology Mobile Messaging / NLP Machine Learning
  • 7. Confidential - do not distribute
  • 8. Confidential - do not distribute 8 Core Elements of our Data Science Cloud Platform Databricks Unified Platform Maestro – Our Internally Developed Platform on AWS (EMR, Spark, R-Studio, Intellij,SBT, Jupyter, Zeppelin, Unit / QA, Metastore,Apache Airflow, Keras, Tensorflow) Proof of Concept on Google Cloud, Beam, Spark & Tensorflow
  • 9. Confidential - do not distribute Databricks Unified Platform Chart is in 1 hour blocks, y axis = number of 32 core instances 9 • Key asset to the success of data science at Hotels.com • Key in driving up data scientist productivity / efficiency / flexibility • Helps make our data science lifecycle operate much easier and faster driving speed to market • Reliable / secure + facilitates ‘Highly Elastic’ workflows exploiting cost effective spot instance on AWS.
  • 10. Confidential - do not distribute The hidden secret of data science and AI Typically data scientists are investing large amounts of time in feature / data engineering areas which are ripe for a technology solution 10
  • 11. Confidential - do not distribute 11 ALPs – Algorithm Lifecycle Pipeline Service The end to end ML Platform
  • 12. Confidential - do not distribute Site Data TrainingScoring & serving Hotels.com Training Real-time scoring / bandit Ingestion Cache Service Data pipelines Data set generation, feature extraction Reporting Train & deploy model Update feedback loop with CTR, GP etc. Clickstream Experiment Experiment Store & serve scores Assign variant Calculate scores Data pipelines Frameworks & Platforms Lifecycle / Deploy Develop and maintain ML/ AI pipelines Methods to research & exploit ML & AI innovation Implement ML / AI in production Data capture Accessible data
  • 13. Confidential - do not distribute Reference: The Influence of Visuals in Online Hotel Research and Booking Behaviour Images are an important factor while choosing a hotel 13 0% 10% 20% 30% 40% 50% 60% 70% 80% Loyalty Program Reviews Hotel Brand Star Rating Destination Info Images Hotel Info Factors other than price/location Very Imporant/Important Important Very Important
  • 14. Confidential - do not distribute Computer Vision problems we try to tackle 14 Near Duplicate Detection Scene Classification Image Ranking
  • 15. Confidential - do not distribute 15 Tagged as Bathroom
  • 16. Confidential - do not distribute 16 GPU’s quickly became key, took a large effort to optimize using Keras + Tensorflow (Inception v3 + ResNet) 493 67 20 7 4 1 10 100 1000 12-CPU 1-GPU 1-GPU + limited cache 16-GPU + limited cache 16-GPU + full cache Days CIFAR2 Expedia Small 15 2.5 0 5 10 15 20 16-GPU + full cache Optimized Days
  • 17. Confidential - do not distribute Near Duplicate Detection: Real world examples 17 Non-Duplicates – probability 100% Non-Duplicates – probability 95.91% Duplicates – probability 97.98% Duplicates – probability 98.43%
  • 18. Confidential - do not distribute ROOM/BATHROOM Using the model: Real world examples 18 EXTERIOR/HOTEL INTERIOR/SEATING_LO BBY ROOM/LIVING_ROOM ROOM/GUESTROOM FACILITIES/DINING INTERIOR/SEATING_LOBBY FACILITIES/POOL
  • 19. Confidential - do not distribute Accuracy & Confusion Matrix 19 • After many manual / long winded iterations and regularization processes tuning hyperparameters • We achieved good accuracy and low confusion matrix
  • 20. Confidential - do not distribute Optimizing the photo order for improved customer experiences 20 Original Model Reference: Radisson Blu Edwardian Berkshire Hotel, London
  • 21. Confidential - do not distribute Finding the right hotel in our marketplace is core to our customers needs.
  • 22. Confidential - do not distribute Kensington Bloomsbury Heathrow Canary Wharf Paddington Westminster London City Airport Chelsea Battersea Wimbledon Wembley City of London As an example different user segments like to stay in different locations
  • 23. Confidential - do not distribute 23 Utility Utility Utility just browsing! BOOK!Intent (click)
  • 24. Confidential - do not distribute Thank you mfryer@hotels.com https://uk.linkedin.com/in/matthewfryer @mattfryer

Editor's Notes

  1. Comments I checked the IR website for the latest data that we have made public (including Annual Report) Was planning to only briefly linger on this slide, will call out a few data points especially recommendation volume + loyalty member etc
  2. Comments (This slide has a build on it, you can see it by slideshow view) General thankyou to Spark Summit and Databricks for inviting me Share goal of presentation, eg highlight focus on transforming customer experiences with algorithms Hotels.com / our move to spark / cloud in the last year and share some of the interesting things we are doing Link to the slide : highlight it used to feel like there was data everywhere the size of the torch is growing every day
  3. Comments Create / Build of new Data Science Function, Move to Public Cloud (mainly AWS + some Azure / GCP) from On Prem, Move to Spark from SAS / Core Hadoop in all in the last year As per the title , comment we are entering a golden age of data science where we can now use data to find patterns, build algo to help customer experiences, Imagine the world when we are enter adulthood aka maturity Given the potential I think we are all toddlers with so much more to learn and figure out Better to be fast first (example of testing and freedom to innovate) and ideally often being correct is a bonus!
  4. Comments It has taken complete teamwork from across the business to deliver success and well aligned pipelines i) Built in creating a data science function in the last 2 years, it is team effort and data science / algorithms sit on the back on the workhorse of engineers ii) This allows algorithms to make choices and understand patterns to optimize for customer experiences rather than limited optimization. iii) Part of the secret has been matching data scientists with dedicated data, network, devops and software engineers on the platform iv) Create a community (big group hug) to share approach and work together for success Overall we have >20 amazing data scientist / >15 dedicated data science engineers + growing fast + 100’s of analysts and engineers
  5. Comments machine learning and artificial intelligence will combine to manage companies’ big data troves and there will be layers of innovation “tacked onto distribution systems.” Key has been support from the very top of the company Call out support from the very top has been vital to move forward at pace with wider organization alignment Dara’s comment from last earning call of the 3 M’s and organic Intelligence I think AI is a good deal down the road. I think right now, we are more dependent on OI, organic intelligence, here; of folks here at the company. I think as far as disruptive technology, I do like to talk about the 3 Ms, and it's not disruptive. It's just happening. One is mobile for us. And right now with most brands, over 1/3 of our transactions are mobile. Over half of our traffic is mobile. And the cool thing about mobile is it's always on and it gives you location context. The second M for us that's emerging especially in the APAC markets, are messaging. And what messaging does for us is it allows two-way communication at any time, but it also combines identity with that communication. And once you have identity, you can start communicating with someone on a one-to-one basis. Most of our systems right now are built to serve the average. This is a consumer where you come to Expedia, most of our systems are built to serve the average consumer. Now more and more we can optimize to the specific customer, and you combine that with a third M, which is machine learning, it is only possible to optimize to the individual based on very significant amounts of data, very significant amounts of interaction so that you can start treating every single customer in a different way. You can go back to the olden days when your travel agent knew exactly what you wanted. This is going to be disruptive, but it's going to be a slow disruption as we learn more
  6. Comments i)Hotels.com / Expedia in the first travel revolution, empowered consumers globally with the transparency of price, variety, choice and content ii)Machine Learning / Algorithms are creating the next travel revolution transforming consumer experiences and effectively powering the turnaround of the travel agent iii) Future of having the conversation with the travel agent with a modern twist. to having a messaging (20 years we re-invented travel and democratised travel and information (green screen around), now we can with data science and spark power personalised experiences and give customers access to the best experience of both. iv) Machine Learning is now at the strategic core to the growth and future of Hotels.com
  7. Databricks Optimised for Data Science Easy to use UI (Notebook) Advanced Job Scheduling Spot pool capability Great for algorithm development & feature engineering (aka ETL) Awesome support from Spark Engineers Maestro – AWS Platform Integrated platform for large model development and deployment Advanced cluster support including Maven / Artifactory Maestro Framework (Internal extension to Spark ML) Individual environment per Data Scientist Fast to R&D / Fully ephemeral Google Launched PoC on Google Cloud Evaluate Google approach to AI / Machine Learning including Tensorflow GPU NLP / Vision API’s ML Engine Datalab notebooks Apache Bean / Dataflow It is the code-base responsible for building Machine Learning models for HCOM. It is developed in-house using Scala 2.11 & Spark 2.0 (migrating to 2.1) It is a ML framework which: Standardizes and speeds up the way we build models. Provides all the necessary tools for training, testing & validating models Google PoC
  8. Facilitated use of ‘extreme elasticity’ incl spot instances Saving on cost whilst using huge compute power Speed to market dramatically increases Spot instance costs ~10-20% of On Demand cost Databricks is making things easy and doing image classification across the Expedia portfolio   Highlight value prop that you get w Databricks over open source Spark    Works out of the box. Elasticity, ease of use, notebooks, etc. 
  9. Across millions of hotel and user submitted images Critical use case on mobile
  10. Comment Highlight images are not algo optimized historically + Now have 100 of thousands of User Photos to categorise and sort Built in Spark and Tensorflow, Convolution Neural Net Approach with some surprisingly good accuracy, would recommend everyone to try there hand at deep learning.
  11. Target: Detect near-duplicate images on the PDP. Dataset: A synthetic dataset produced by applying transformations on hotel photos. Size ~ 6 million images. Network: A custom Siamese network on top of the Scene Detection classifier. Results: 99.97% accuracy on the synthetic dataset. Validated on real world images. Important to use your own data, we obtained 82% from off the shelf deep learning API’s (such as Google Vision API etc.)
  12. Linking in customer feedback loops with the neural nets to begin optimizing the most relevant sort of images for different customers.
  13. Comments Personalisation especially MicroSegmentation is crucial, taking max signals, spark has enabled us to cope with the scale of data Popularity balanced with Diversity / Quality and Niche Customer Needs All in the context of linking searches of users doing 4-9 different searches.
  14. Size increase of 20x the data, covers attribution of all customer clicks (typically 4-9 searches per user) 10x the data columns Facilitates personalization / microsegments