Joint work wih Amadeus presenting a recommender system for your next destination using knowledge graphs and deep learning network, presented at the LocWeb 2019 Workshop colocated with TheWebConf 2019 (San Francisco, USA)
1. Location Embeddings for Next Trip
Recommendation
Amine Dadoun, Raphael Troncy,
Riccardo Petitti, Olivier Ratier
LocWeb19,
13 May 2019
2. LocWeb 2019 … Why?
LocationUser
Web, Social Media Recommendation, Travel
2
3. Travel … A great source of inspiration
John Doe
“I do not
know where
to go”
“Try this”
3
4. Use Case Description
Given a traveler, his demographics, his historical bookings and the
contextual data related to these bookings, we recommend him a
ranked list of destinations he would like to go to.
Traveler's Demographic Data
43 years old, Malaysian, Male, Nature, Museums
Time
Contextual Data
14/09/2016, Wednesday, 2 Days, Alone, etc.
21/12/2016, Friday, 14 Days, 4 persons in party, etc.
07/06/2017, Saturday, 10 Days, 2 persons in party, etc.
15/01/2017, Sunday, 5 Days, Alone, etc.
09/09/2018, Sunday, 4 Days, Alone, etc.
?
+
4
5. Scientific Problems
Given historical purchases made by a user (or user-item past interactions), plus the
context where the interaction was made, how can we accurately predict what will
be the next item the user is going to interact with?
Research Questions
1. What item to recommend to the user?
2. Can we integrate external data to improve the accuracy of a predictive model?
3. How can we evaluate the recommendation made to this user?
5
6. DKFM (our approach):
It combines Factorization
Machines in order to
represent contextual
information and the WDL
Recommender System in
order to have the user-
item interactions and the
content information. The
combination of these two
models are represented in
a DNN
6
State of the Art
Recommender
System
Collaborative
Filtering [1, 2, 3]
Implicit MF
Bayesian
Personalized MF
Neural
Collaborative
Filtering
Content-based
Filtering [4]
Item KNN
Hybrid Method [5]
Wide & Deep
Learning
Context-aware
Recommender
System [6, 7]
Factorization
Machines
Neural
Factorization
Machines
Knowledge-aware
Recommender
System [8]
Deep Knowledge
Factorization
Machines
Collaborative Fileting:
They are Matrix Factorization
methods based only on the user-
item interaction. They vary either on
the loss used in the training or in the
interaction function that computes
the recommendation probability.
Content-based Filtering:
Item KNN is a neighborhood based
collaborative filtering method, it
computes the k nearest neighbors
for each item.
Hybrid Method:
WDL is a DNN Model that computes
the probability to have a user-item
pair based on both user-item
interaction and the content of the
item
Context-aware Recommender System:
These two methods are based on
factorization machines algorithm
which take into account the context of
the recommendation in addition to
the user-item interaction
Our ModelSota & baselines
Recommender
Systems
7. 7
Data integration to enrich the representation of destination
User
Items
𝑢𝑢1
𝑖𝑖1
𝑖𝑖2
𝑖𝑖3
...
User-Item Interactions
Age,
Nationality,
Gender,
Etc.
User’s Demographics
Date,
Session behavior,
Etc.
Interaction Information
Item description:
• Text
• Knowledge Graph
• Etc.
Content Information
8. 8
Contribution: Deep Knowledge Factorization Machines (DKFM)
Deep Neural Network:
• Collaborative information
• Content information
• Contextual information
User
Items
𝑢𝑢1
𝑖𝑖1
𝑖𝑖2
𝑖𝑖3
...
User-Item Interactions
Item Description:
• Text
• Knowledge Graph
• Etc.
Content Information
Age,
Nationality,
Gender,
Etc.
User’s Demographics
Date,
Session Behavior,
Etc.
Interaction Information
9. 9
Back to our problem … Next Trip Destination
Traveler's Demographic Data
43 years old, Malaysian, Male, Nature, Museums
14/09/2016
Wednesday
2 Days
Alone
21/12/2016
Friday
14 Days
4 persons in party
07/06/2017
Saturday
10 Days
2 persons in party
09/09/2018
Sunday
4 Days
Alone
?
Historical Bookings with contextual information Next Trip Recommendation
10. 10
Traveller's Profiles Data
• Real Traveler’s Data • Number of Profiles: ~20M
• Number of Trips: ~15 M• Trip Type: One-way, Round-Trip, Multiple Journeys Trip
• Time range: February 2013- October 2019 • Number of Destinations: 1146
• Booking Creation Date
• Stay Duration
• Origin Airport
• Origin City
• Origin Country
• Origin Region
• Destination Airport
• Destination City
• Destination Country
• Destination Region
• Departure Date
• Departure Day of the Week
• Arrival Date
• Advanced Purchase
• Advanced Check-in
• Trip Number in Party
TripCustomer
• Age
• Customer Value
• Days to Next Bday
• Days to Next Flight
• Nationality
• Gender
• Last Booking Date
• Last Flown Date
• Type of Services
• Service Code
Trip
Services
Traveller
11. Data Pre-processing Pipeline
• Trips
• Traveler
demographics
Remove Travelers
with less than 5 Trips
• Remove Travelers
with less than 5 different Trips
• Remove Destinations visited less
than 20 times
Only 32% of the trips left Only 4% of the trips left
Business Leisure
Only 2% of the trips left
Number of Travelers 26K/20M (0.13%)
Number of Trips 300K/15M (2.1%)
Number of Destinations 119/1146 (10%)
Travelers Segmentation
11
12. 12
Data Pre-processing: Data Filtering for Recommendation
• Remove Travelers with less than 5 Trips (Different Destinations)
• Remove Destinations that are visited less than 20 Times
Kuala Lumpur Sydney London New York Paris
Traveler 1 8 2 1 0 0
Traveler 2 4 0 1 0 1
Traveler 3 2 2 2 1 0
Traveler 4 4 0 0 0 2
Traveler 5 1 0 2 0 3
• Number of Trips: ~4.8 M bookings
• Number of Travelers: 814 919
• Number of Destinations: 763
R =
• Sparsity is defined as follows: 𝜌𝜌 𝑅𝑅 = 1 −
#𝐼𝐼 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 𝐼𝐼𝐼𝐼
#𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈 × #𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼
#Feedbacks #Interactions #Cities #Travelers Sparsity
610 515 361 412 135 31 205 92%
• 𝜌𝜌(Leisure_Trips) = 99.8%: Too sparse to build a Recommender System
• More than 65% of travelers have traveled only 2 times
• Interaction Matrix: 𝑅𝑅 ∈ 𝑁𝑁#𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 × #𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷
:
𝑟𝑟𝑢𝑢𝑢𝑢 = #𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑡𝑡𝑡𝑡 𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷𝐷 𝑖𝑖 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑢𝑢
13. 13
Data Pre-processing: Customer Segmentation
CEM Trips
Business Leisure
Historical Trips already
labeled B/L
Training
B/L Classifier
Prediction
Trips Data: 122 242 trips
Features used:
• Number of Passenger, Stay Duration,
Saturday Stay, Purchase Anticipation, Age, Gender
Time Range:
• Feb 2014 - Feb 2017
Distribution:
• 40-60 % B/L
Training
Random Forest Classifier
Grid Search on Training Data
5 Fold Cross Validation for evaluation with 75-25%
Training & Test Set
Accuracy = 0.87, Precision = 0.87, Recall = 0.91
Features Importance
#Feedbacks #Interactions #Cities #Travelers Sparsity
304 019 152 547 119 26 019 95%
14. 14
Data Enrichment using Word Embeddings
Phuket
Adelaide
London
Etc.
Cities
…
Wikipedia Cities Content
1. Compute the TF-IDF of each word
the
a
Etc.
Pre-trained
Word Vectors [8]
2. London Textual Embedding:
Weighted sum of word vectors,
where the weight of each word vector corresponds to the term
frequency-inverse document frequency (TF-IDF) of the word
15. 15
Data Enrichment using Knowledge Graph Embeddings
Knowledge Graph Embeddings (KGE)
Phuket
Adelaide
London
Etc.
Cities
TransE Model[9] :
Given a triple (h, r, t) in the graph,
the idea is to minimize the distance
between h and t embeddings
KGE_Phuket
KGE_Adelaide
KGE_London
Etc.
KGE Cities
Knowledge Graph
Embedding of Phuket
Semantic Trails Knowledge Graph:
The knowledge graph represents the interaction user-venue,
through the property ’visiting’ as well as the relations
between the venue and the other entities,
namely: category, schema and city
https://arxiv.org/abs/1812.04367
16. 16
Deep Knowledge Factorization Machines
Deep Neural Network:
• Collaborative information
• Content information
• Contextual information
Semantic Trails Knowledge Graph
• What characterized a city the most?
• An Embedding of each city is constructed
based on TransE model
• TransE Model: Given a triple (h, r, t) in the
graph, the idea is to minimize the distance
between h and t embeddings
Wikipedia
• Representation of cities based on their textual description
in Wikipedia
• Each Wikipedia Document is encoded as a weighted sum of
word vectors
• We used pre-trained word vectors from fasttext (n-gram
model)
• N-gram model is similar to Skip-gram model, but instead of
learning a vector representation for a word, we learn a
representation for each character.
• Weights of the word vectors are their TF-IDF scores
Travelers' Profiles & Trips
External Data
17. Leave-one-out protocol: for each user, we remove the last destination he went to, and consider it as test set
17
Training Procedure and Evaluation ProtocolTime
Training Data
Test Data
Recommender
System
Non Existing
Traveler-Destination
pair
Recommender
System trained
Ranked list
of Destinations
Prediction
…
1.
2.
4.
3.
Hitrate@K [3]
MRR@K [7]
Adelaide
Osaka
Phuket
Brunei
19. 19
DKFM: what is the contribution of each input data?
Better
Deep Neural Network + Data Enrichment => Best results
Demographics
Data
Textual
Embedding
Knowledge
Graph
Embedding
HR@
10
MRR@
10
0.72 0.34
0.79 0.37
0.80 0.38
0.82 0.38
0.84 0.41
0.85 0.42
0.88 0.44
Input Contribution
20. 20
Conclusion and Future Work
Future Work
• Enrich cities’ characteristics using visual embeddings
• Explore other loss functions such as pairwise loss
• Explore the use of similarity measure inside the DNN such as cosine similarity
Conclusions
• Combining different types of input improves remarkably recommendation results
• DKFM model outperforms state-of-the-art collaborative filtering methods
Open Science
• DKFM implementation available at
https://gitlab.eurecom.fr/amadeus/DKFM-recommendation
21. 21
References
[1] Badrul Sarwar, George Karypis, Joseph A Konstan, and John Riedl. 2001. Item-based collaborative filtering
recommendation algorithms.
[2] Y. Hu, Y. Koren, and C. Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets.
[3] Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized
Ranking from Implicit Feedback.
[4] Steffen Rendle. 2010. Factorization Machines.
[5] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,Hrishi Aradhye, Glen Anderson, Greg
Corrado, Wei Chai, Mustafa Ispir, RohanAnil, Zakaria Haque, Lichan Hong, Vihan Jain, Xiaobing Liu, and Hemal Shah.2016.
Wide & Deep Learning for Recommender Systems.
[6] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering.
[7] Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A Factorization-Machine based
Neural Network for CTR Prediction.
[8] Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training
Distributed Word Representations.
[9] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Durán, Jason Weston, and Oksana Yakhnenko. 2013. Translating
Embeddings for Modeling Multi-relational Data.