SlideShare a Scribd company logo
1 of 56
Download to read offline
Data science challenges
in flight search
Konstantin Halachev
Plamen Aleksandrov
29th July 2015
#SkyscannerSofia
Agenda
• Introduction
• Why is it hard to do meta-search for flights?
• A few applications of flights meta-search data
Image from mastersindatascience.org
Who are we?
Konstantin Halachev
• Data science for bioinformatics (PhD with
focus on epigenetic data)
• Joined the new Skyscanner office in Sofia
nine months ago
Plamen Aleksandrov
• Worked on flights search engine
• Principal software engineer and squad lead
in Skyscanner
What is Skyscanner?
Skyscanner is a leading travel search site offering:
• unbiased
• comprehensive
• free
search services
Skyscanner in numbers?
- 9 global offices
- Sofia is the latest.
Started with 7 people, now at 16 and growing fast
- 700+ employees
- 40+ million app downloads
- 40+ million unique monthly visitors
- 13+ million searches per day
#SkyscannerSofia
Why is it hard to do meta-search
for flights?
How do you plan your travel?
by destination and dates
by destination, choose dates
by dates, choose destination
Online Travel Search - Flights?
#SkyscannerSofia
Airline industry
4000+ airports served by commercial airlines
700+ airlines in the world; 25,000+ aircrafts
40 million scheduled commercial flights in 2014
100,000 flights per day - i.e. >1 per second
40% of flights within US and Canada
79% average airplane fill rate
3 billion passengers in 2014
Flights Frequency
Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx
Profitability
Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx
at $8.27 per
passenger
distribution
is where the
money is
Profitability is growing due to oil prices
#SkyscannerSofia
Dimensionality of flights
Routes
Source: https://www.itasoftware.com
One Ways and Round Trips
Multi-leg: Open Jaws, Circle Trips
Fares, Fare components, Pricing Units, Tickets
Itinerary Structure
A B
A B
A B
A B
A
C
A
B
C B
A B
A B
CC
takeAAflights/fares on a SFO-BOS route
Atotal of 25,401,415 validAAsolutions
Only this particular airline and route
Example Route
SFO ORD
DWF BOS
5 * 36 = 85 fc
19 * 32 = 109 fc
41 * 32 = 162 fc
9 * 32 = 87 fc
Even exact dates are complicated
time to travel changes price
weekend stay and seasonality
advance purchase
Dates give interesting features and patterns
day of week
stay duration
age of quote/price
seasons: Christmas, Easter, holidays
Dates
Prices
Airline use seat availability to adjust price
prices are volatile – 26 booking classes
Airlines do Variable Pricing for fare portfolios
your flight neighbour paid a different price
15,000,000 availability questions per sec
no lock-down between search and book
Prices for the same seat can still be different
who sells your ticket? – codeshare, agency, OTA
All tickets are booked at website or GDS
Distribution Providers
#SkyscannerSofia
Data and Scale at Skyscanner
40m unique monthly visitors
120m visits on web and mobile per month
13m searches per day
results are up-to-date user experiences on the web
Searches on Month view and Browse view
Exits by redirects
we don’t take ownership of the booking
we keep true to our users, providers and own values
Searches and Exits
​2bn quotes per day => 700bn quotes per year
quotes contain entire itinerary and price
data can be easily processed and/or extracted
prices are up to date, but we also keep historical data
200GB gzipped data per day => 80TB per year​​
95% airlines and OTAs world coverage
Data
How much data is that?
A small list of technologies used:
• Thrift/ RabbitMQ/ Ruby/ FluentD,
• Scala/ Spark/ Hive,
• AWS (S3, Glacier, EC2, Elastic
MapReduce, DynamoDB),
• Elasticsearch/ Kibana,
• Python/ Flask
Image from vicchi.org
2,000,000,000 quotes per day
#SkyscannerSofia
What can we do with these
data?
Search
Search
Search
Search
Search
#SkyscannerSofia
What can we do with these data?
1. Dynamics of flight prices
2. Travel Insights for airlines and airports
3. Inspiration – finding good deals
4. A small analysis
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
1. Too many routes ->
Let’s select a popular route (London - Madrid)
2. Let’s focus on direct connections only
3. Let’s focus on one-way only
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel
Dynamics of flight prices
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel - May
Dynamics of flight prices
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel - May
Dynamics of flight prices
Route
LON - MAD
Direct only
One way
Carrier –
Ryanair
Travelling
on Wednesday
Month of
travel - May
#SkyscannerSofia
What can we do with these data?
1. Dynamics of flight prices
2. Travel Insights for airlines and airports
3. Inspiration – finding good deals
4. A small analysis
Travel Insights – for airlines and airports
Travel Insights – for airlines and airports
Travel Insights – for airlines and airports
Travel Insights – for airlines
Another small list of technologies used :
• Python, .Net
• AWS (S3, Redshift, EC2), MS SQL
• Tableau
#SkyscannerSofia
What can we do with these data?
1. Dynamics of flight prices
2. Travel Insights for airlines and airports
3. Inspiration – finding good deals
• Where?
• When?
• Which deal is good?
4. A small analysis
Travel Inspiration- When and Where
Travel Inspiration - a hack day project
Travel Inspiration – is it a good deal?
Travel Inspiration - Skyscanner API
Technologies used:
Google maps, Python, Flask, AWS Redshift, Skyscanner API
You want to do better?
http://business.skyscanner.net/
You can get a trial API key by filling in the feedback form at
the end of the event:
http://goo.gl/forms/i4C2VcSGyW
#SkyscannerSofia
What can we do with this data?
1. Dynamics of flight prices
2. Travel Insights for airlines and airports
3. Inspiration – finding good deals
4. A small analysis or how did demand for trips to Greece
change in the heat of the crisis and what do the Danish
know about it?
Analysis - Greece
Analysis - Greece
Red represents week
on week decrease.
Green is increase.
Data for 2015
Analysis - Greece
Red represents week
on week decrease.
Green is increase.
Data for 2014
What we know we did not talk about?
• What is the best way to get the cheapest deals?
• Recommendations
• Personalization
• A/B testing
• Sorting of flight results
• Infrastructure
• Ahum, “Travel”…
Image credit: jangosteve.com
#SkyscannerSofia
Thank you!
Please give us feedback or apply for API keys
here: http://goo.gl/forms/i4C2VcSGyW
• Konstantin Halachev
konstantin.halachev@skyscanner.net
• Plamen Aleksandrov
plamen.aleksandrov@skyscanner.net
We are hiring!!!

More Related Content

Viewers also liked

Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataData Science Society
 
Computer vision and image processing for dental products
Computer vision and image processing for dental productsComputer vision and image processing for dental products
Computer vision and image processing for dental productsData Science Society
 
Tweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic PerspectiveTweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic PerspectiveData Science Society
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesData Science Society
 
Wavelet analysis of financial datasets
Wavelet analysis of financial datasetsWavelet analysis of financial datasets
Wavelet analysis of financial datasetsData Science Society
 

Viewers also liked (9)

The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Real-time analytics with HBase
Real-time analytics with HBaseReal-time analytics with HBase
Real-time analytics with HBase
 
Real-time information analysis: social networks and open data
Real-time information analysis: social networks and open dataReal-time information analysis: social networks and open data
Real-time information analysis: social networks and open data
 
DBPedia-past-present-future
DBPedia-past-present-futureDBPedia-past-present-future
DBPedia-past-present-future
 
Computer vision and image processing for dental products
Computer vision and image processing for dental productsComputer vision and image processing for dental products
Computer vision and image processing for dental products
 
Tweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic PerspectiveTweeting beyond Facts – The Need for a Linguistic Perspective
Tweeting beyond Facts – The Need for a Linguistic Perspective
 
Big Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companiesBig Data: Improving capacity utilization of transport companies
Big Data: Improving capacity utilization of transport companies
 
Crowdsourced hedge funds
Crowdsourced hedge funds Crowdsourced hedge funds
Crowdsourced hedge funds
 
Wavelet analysis of financial datasets
Wavelet analysis of financial datasetsWavelet analysis of financial datasets
Wavelet analysis of financial datasets
 

Similar to Data science challenges in flight search

Airline analytics for the 21st century
Airline analytics for the 21st centuryAirline analytics for the 21st century
Airline analytics for the 21st centuryFaical Allou
 
Webinar: Personalization in Airlines with Tom Bacon and Dr. Medepalli
Webinar: Personalization in Airlines with Tom Bacon and Dr. MedepalliWebinar: Personalization in Airlines with Tom Bacon and Dr. Medepalli
Webinar: Personalization in Airlines with Tom Bacon and Dr. MedepalliRateGain®
 
Jonathan Meiri - Adaptation to the Passengers in the Aviation World
Jonathan Meiri - Adaptation to the Passengers in the Aviation WorldJonathan Meiri - Adaptation to the Passengers in the Aviation World
Jonathan Meiri - Adaptation to the Passengers in the Aviation WorldOscar4B
 
Intern Project - Tech
Intern Project - TechIntern Project - Tech
Intern Project - TechNIKET GOSRANI
 
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...BigDataExpo
 
Miami SEO Meetup: 5 SEO Lessons from Skyscanner
Miami SEO Meetup: 5 SEO Lessons from SkyscannerMiami SEO Meetup: 5 SEO Lessons from Skyscanner
Miami SEO Meetup: 5 SEO Lessons from SkyscannerJames F. Gibbons
 
Alexander Trieb, CEO at travel audience GmbH - an Amadeus Company - Intellige...
Alexander Trieb, CEO at travel audience GmbH - an Amadeus Company - Intellige...Alexander Trieb, CEO at travel audience GmbH - an Amadeus Company - Intellige...
Alexander Trieb, CEO at travel audience GmbH - an Amadeus Company - Intellige...Travel Tech Conference Russia
 
Travel and hospitality industry - 2017 analytics landscape
Travel and hospitality industry - 2017 analytics landscapeTravel and hospitality industry - 2017 analytics landscape
Travel and hospitality industry - 2017 analytics landscapeMetriplica
 
Alternate meta channels
Alternate meta channelsAlternate meta channels
Alternate meta channelsDean Schmit
 
Travelport Smartpoint
Travelport Smartpoint Travelport Smartpoint
Travelport Smartpoint TravelportUS
 
Travel SDK presentation - Travelpayouts #1 Webinar
Travel SDK presentation - Travelpayouts #1 WebinarTravel SDK presentation - Travelpayouts #1 Webinar
Travel SDK presentation - Travelpayouts #1 WebinarTravelpayouts
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Conductor — Storytelling with Data
Conductor — Storytelling with DataConductor — Storytelling with Data
Conductor — Storytelling with DataSemrush
 
Corporate Travel Data Workshop - Key Concepts and Applications
Corporate Travel Data Workshop - Key Concepts and ApplicationsCorporate Travel Data Workshop - Key Concepts and Applications
Corporate Travel Data Workshop - Key Concepts and ApplicationsScott Gillespie
 
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...Databricks
 
Elevate 2017- The Sandbox: R&D Total Price Calendar Query Engine Prototype
Elevate 2017- The Sandbox: R&D Total Price Calendar Query Engine PrototypeElevate 2017- The Sandbox: R&D Total Price Calendar Query Engine Prototype
Elevate 2017- The Sandbox: R&D Total Price Calendar Query Engine PrototypeATPCO
 
THack @ WIT - Skyscanner presentation
THack @ WIT - Skyscanner presentationTHack @ WIT - Skyscanner presentation
THack @ WIT - Skyscanner presentationKevin May
 

Similar to Data science challenges in flight search (20)

Airline analytics for the 21st century
Airline analytics for the 21st centuryAirline analytics for the 21st century
Airline analytics for the 21st century
 
Webinar: Personalization in Airlines with Tom Bacon and Dr. Medepalli
Webinar: Personalization in Airlines with Tom Bacon and Dr. MedepalliWebinar: Personalization in Airlines with Tom Bacon and Dr. Medepalli
Webinar: Personalization in Airlines with Tom Bacon and Dr. Medepalli
 
Jonathan Meiri - Adaptation to the Passengers in the Aviation World
Jonathan Meiri - Adaptation to the Passengers in the Aviation WorldJonathan Meiri - Adaptation to the Passengers in the Aviation World
Jonathan Meiri - Adaptation to the Passengers in the Aviation World
 
Intern tech project
Intern tech projectIntern tech project
Intern tech project
 
Intern Tech Project
Intern Tech ProjectIntern Tech Project
Intern Tech Project
 
Intern Project - Tech
Intern Project - TechIntern Project - Tech
Intern Project - Tech
 
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
Schiphol - Optimale doorstroom van passagiers op Schiphol dankzij slimme data...
 
Miami SEO Meetup: 5 SEO Lessons from Skyscanner
Miami SEO Meetup: 5 SEO Lessons from SkyscannerMiami SEO Meetup: 5 SEO Lessons from Skyscanner
Miami SEO Meetup: 5 SEO Lessons from Skyscanner
 
Alexander Trieb, CEO at travel audience GmbH - an Amadeus Company - Intellige...
Alexander Trieb, CEO at travel audience GmbH - an Amadeus Company - Intellige...Alexander Trieb, CEO at travel audience GmbH - an Amadeus Company - Intellige...
Alexander Trieb, CEO at travel audience GmbH - an Amadeus Company - Intellige...
 
Travel and hospitality industry - 2017 analytics landscape
Travel and hospitality industry - 2017 analytics landscapeTravel and hospitality industry - 2017 analytics landscape
Travel and hospitality industry - 2017 analytics landscape
 
Alternate meta channels
Alternate meta channelsAlternate meta channels
Alternate meta channels
 
Travelport Smartpoint
Travelport Smartpoint Travelport Smartpoint
Travelport Smartpoint
 
Travel SDK presentation - Travelpayouts #1 Webinar
Travel SDK presentation - Travelpayouts #1 WebinarTravel SDK presentation - Travelpayouts #1 Webinar
Travel SDK presentation - Travelpayouts #1 Webinar
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Conductor — Storytelling with Data
Conductor — Storytelling with DataConductor — Storytelling with Data
Conductor — Storytelling with Data
 
Corporate Travel Data Workshop - Key Concepts and Applications
Corporate Travel Data Workshop - Key Concepts and ApplicationsCorporate Travel Data Workshop - Key Concepts and Applications
Corporate Travel Data Workshop - Key Concepts and Applications
 
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
Predicting Banking Customer Needs with an Agile Approach to Analytics in the ...
 
Electronic Flight Bag.pdf
Electronic Flight Bag.pdfElectronic Flight Bag.pdf
Electronic Flight Bag.pdf
 
Elevate 2017- The Sandbox: R&D Total Price Calendar Query Engine Prototype
Elevate 2017- The Sandbox: R&D Total Price Calendar Query Engine PrototypeElevate 2017- The Sandbox: R&D Total Price Calendar Query Engine Prototype
Elevate 2017- The Sandbox: R&D Total Price Calendar Query Engine Prototype
 
THack @ WIT - Skyscanner presentation
THack @ WIT - Skyscanner presentationTHack @ WIT - Skyscanner presentation
THack @ WIT - Skyscanner presentation
 

More from Data Science Society

[Data Meetup] Data Science in Finance - Factor Models in Finance
[Data Meetup] Data Science in Finance - Factor Models in Finance[Data Meetup] Data Science in Finance - Factor Models in Finance
[Data Meetup] Data Science in Finance - Factor Models in FinanceData Science Society
 
[Data Meetup] Data Science in Finance - Building a Quant ML pipeline
[Data Meetup] Data Science in Finance -  Building a Quant ML pipeline[Data Meetup] Data Science in Finance -  Building a Quant ML pipeline
[Data Meetup] Data Science in Finance - Building a Quant ML pipelineData Science Society
 
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MITData Science Society
 
ML in Proptech - Concept to Production
ML in Proptech  -  Concept to ProductionML in Proptech  -  Concept to Production
ML in Proptech - Concept to ProductionData Science Society
 
Lessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use CasesLessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use CasesData Science Society
 
AI methods for localization in noisy environment
AI methods for localization in noisy environment AI methods for localization in noisy environment
AI methods for localization in noisy environment Data Science Society
 
Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution Data Science Society
 
Data Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large CorporationsData Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large CorporationsData Science Society
 
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamAir Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamData Science Society
 
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon CaseData Science Society
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Data Science Society
 
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionDNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionData Science Society
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...Data Science Society
 
Data science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData Science Society
 
Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Data Science Society
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Data Science Society
 
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovIntelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovData Science Society
 
Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Data Science Society
 

More from Data Science Society (20)

[Data Meetup] Data Science in Finance - Factor Models in Finance
[Data Meetup] Data Science in Finance - Factor Models in Finance[Data Meetup] Data Science in Finance - Factor Models in Finance
[Data Meetup] Data Science in Finance - Factor Models in Finance
 
[Data Meetup] Data Science in Finance - Building a Quant ML pipeline
[Data Meetup] Data Science in Finance -  Building a Quant ML pipeline[Data Meetup] Data Science in Finance -  Building a Quant ML pipeline
[Data Meetup] Data Science in Finance - Building a Quant ML pipeline
 
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
[Data Meetup] Data Science in Journalism - Tanbih, QCRI and MIT
 
Computer Vision in Real Estate
Computer Vision in Real EstateComputer Vision in Real Estate
Computer Vision in Real Estate
 
ML in Proptech - Concept to Production
ML in Proptech  -  Concept to ProductionML in Proptech  -  Concept to Production
ML in Proptech - Concept to Production
 
Lessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use CasesLessons Learned: Linked Open Data implemented in 2 Use Cases
Lessons Learned: Linked Open Data implemented in 2 Use Cases
 
AI methods for localization in noisy environment
AI methods for localization in noisy environment AI methods for localization in noisy environment
AI methods for localization in noisy environment
 
Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution Object Identification and Detection Hackathon Solution
Object Identification and Detection Hackathon Solution
 
Data Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large CorporationsData Science for Open Innovation in SMEs and Large Corporations
Data Science for Open Innovation in SMEs and Large Corporations
 
Air Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi teamAir Pollution in Sofia - Solution through Data Science by Kiwi team
Air Pollution in Sofia - Solution through Data Science by Kiwi team
 
Machine Learning in Astrophysics
Machine Learning in AstrophysicsMachine Learning in Astrophysics
Machine Learning in Astrophysics
 
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
#AcademiaDatathon Finlists' Solution of Crypto Datathon Case
 
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
Coreference Extraction from Identric’s Documents - Solution of Datathon 2018
 
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 SolutionDNA Analytics - What does really goes into Sausages - Datathon2018 Solution
DNA Analytics - What does really goes into Sausages - Datathon2018 Solution
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...
 
Data science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.HaralampievData science tools - A.Marchev and K.Haralampiev
Data science tools - A.Marchev and K.Haralampiev
 
Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel Problems of Application of Machine Learning in the CRM - panel
Problems of Application of Machine Learning in the CRM - panel
 
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
Disruptive as Usual: New Technologies and Data Value Professor Severino Mereg...
 
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav NakovIntelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
Intelligent Question Answering Using the Wisdom of the Crowd, Preslav Nakov
 
Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg Master class Hristo Hadjitchonev - Aubg
Master class Hristo Hadjitchonev - Aubg
 

Recently uploaded

Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 

Recently uploaded (20)

Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 

Data science challenges in flight search

  • 1. Data science challenges in flight search Konstantin Halachev Plamen Aleksandrov 29th July 2015
  • 2. #SkyscannerSofia Agenda • Introduction • Why is it hard to do meta-search for flights? • A few applications of flights meta-search data Image from mastersindatascience.org
  • 3. Who are we? Konstantin Halachev • Data science for bioinformatics (PhD with focus on epigenetic data) • Joined the new Skyscanner office in Sofia nine months ago Plamen Aleksandrov • Worked on flights search engine • Principal software engineer and squad lead in Skyscanner
  • 4. What is Skyscanner? Skyscanner is a leading travel search site offering: • unbiased • comprehensive • free search services
  • 5. Skyscanner in numbers? - 9 global offices - Sofia is the latest. Started with 7 people, now at 16 and growing fast - 700+ employees - 40+ million app downloads - 40+ million unique monthly visitors - 13+ million searches per day
  • 6. #SkyscannerSofia Why is it hard to do meta-search for flights?
  • 7. How do you plan your travel? by destination and dates by destination, choose dates by dates, choose destination Online Travel Search - Flights?
  • 9. 4000+ airports served by commercial airlines 700+ airlines in the world; 25,000+ aircrafts 40 million scheduled commercial flights in 2014 100,000 flights per day - i.e. >1 per second 40% of flights within US and Canada 79% average airplane fill rate 3 billion passengers in 2014 Flights Frequency Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx
  • 10. Profitability Source: http://www.iata.org/publications/economics/Pages/Air-Passenger-Monthly-Analysis.aspx at $8.27 per passenger distribution is where the money is Profitability is growing due to oil prices
  • 13. One Ways and Round Trips Multi-leg: Open Jaws, Circle Trips Fares, Fare components, Pricing Units, Tickets Itinerary Structure A B A B A B A B A C A B C B A B A B CC
  • 14. takeAAflights/fares on a SFO-BOS route Atotal of 25,401,415 validAAsolutions Only this particular airline and route Example Route SFO ORD DWF BOS 5 * 36 = 85 fc 19 * 32 = 109 fc 41 * 32 = 162 fc 9 * 32 = 87 fc
  • 15. Even exact dates are complicated time to travel changes price weekend stay and seasonality advance purchase Dates give interesting features and patterns day of week stay duration age of quote/price seasons: Christmas, Easter, holidays Dates
  • 16. Prices Airline use seat availability to adjust price prices are volatile – 26 booking classes Airlines do Variable Pricing for fare portfolios your flight neighbour paid a different price 15,000,000 availability questions per sec no lock-down between search and book Prices for the same seat can still be different who sells your ticket? – codeshare, agency, OTA
  • 17. All tickets are booked at website or GDS Distribution Providers
  • 19. 40m unique monthly visitors 120m visits on web and mobile per month 13m searches per day results are up-to-date user experiences on the web Searches on Month view and Browse view Exits by redirects we don’t take ownership of the booking we keep true to our users, providers and own values Searches and Exits
  • 20. ​2bn quotes per day => 700bn quotes per year quotes contain entire itinerary and price data can be easily processed and/or extracted prices are up to date, but we also keep historical data 200GB gzipped data per day => 80TB per year​​ 95% airlines and OTAs world coverage Data
  • 21. How much data is that? A small list of technologies used: • Thrift/ RabbitMQ/ Ruby/ FluentD, • Scala/ Spark/ Hive, • AWS (S3, Glacier, EC2, Elastic MapReduce, DynamoDB), • Elasticsearch/ Kibana, • Python/ Flask Image from vicchi.org 2,000,000,000 quotes per day
  • 22. #SkyscannerSofia What can we do with these data?
  • 28. #SkyscannerSofia What can we do with these data? 1. Dynamics of flight prices 2. Travel Insights for airlines and airports 3. Inspiration – finding good deals 4. A small analysis
  • 29. Dynamics of flight prices Route LON - MAD Direct only One way 1. Too many routes -> Let’s select a popular route (London - Madrid) 2. Let’s focus on direct connections only 3. Let’s focus on one-way only
  • 30. Dynamics of flight prices Route LON - MAD Direct only One way
  • 31. Dynamics of flight prices Route LON - MAD Direct only One way Carrier
  • 32. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair
  • 33. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on
  • 34. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on
  • 35. Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Dynamics of flight prices
  • 36. Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel Dynamics of flight prices
  • 37. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel
  • 38. Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel - May Dynamics of flight prices
  • 39. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel - May
  • 40. Dynamics of flight prices Route LON - MAD Direct only One way Carrier – Ryanair Travelling on Wednesday Month of travel - May
  • 41. #SkyscannerSofia What can we do with these data? 1. Dynamics of flight prices 2. Travel Insights for airlines and airports 3. Inspiration – finding good deals 4. A small analysis
  • 42. Travel Insights – for airlines and airports
  • 43. Travel Insights – for airlines and airports
  • 44. Travel Insights – for airlines and airports
  • 45. Travel Insights – for airlines Another small list of technologies used : • Python, .Net • AWS (S3, Redshift, EC2), MS SQL • Tableau
  • 46. #SkyscannerSofia What can we do with these data? 1. Dynamics of flight prices 2. Travel Insights for airlines and airports 3. Inspiration – finding good deals • Where? • When? • Which deal is good? 4. A small analysis
  • 48. Travel Inspiration - a hack day project
  • 49. Travel Inspiration – is it a good deal?
  • 50. Travel Inspiration - Skyscanner API Technologies used: Google maps, Python, Flask, AWS Redshift, Skyscanner API You want to do better? http://business.skyscanner.net/ You can get a trial API key by filling in the feedback form at the end of the event: http://goo.gl/forms/i4C2VcSGyW
  • 51. #SkyscannerSofia What can we do with this data? 1. Dynamics of flight prices 2. Travel Insights for airlines and airports 3. Inspiration – finding good deals 4. A small analysis or how did demand for trips to Greece change in the heat of the crisis and what do the Danish know about it?
  • 53. Analysis - Greece Red represents week on week decrease. Green is increase. Data for 2015
  • 54. Analysis - Greece Red represents week on week decrease. Green is increase. Data for 2014
  • 55. What we know we did not talk about? • What is the best way to get the cheapest deals? • Recommendations • Personalization • A/B testing • Sorting of flight results • Infrastructure • Ahum, “Travel”… Image credit: jangosteve.com
  • 56. #SkyscannerSofia Thank you! Please give us feedback or apply for API keys here: http://goo.gl/forms/i4C2VcSGyW • Konstantin Halachev konstantin.halachev@skyscanner.net • Plamen Aleksandrov plamen.aleksandrov@skyscanner.net We are hiring!!!