Measuring the volume of information that the users, deliberately or not, leave on-line is an impossible mission. The vast majority of the actions performed by the users enclose more information than what the users themselves think they are producing.
To shed light on this truth, the talk will start with different examples of implicit information, namely traces that are often hidden inside other explicit feedbacks, or directly detectable by the users' actions. The focus will then move to browsing behavior analysis, including approaches to get a deeper understanding of the users, in particular cold-start situations. The talk will conclude showing how to follow these users traces to obtain reliable knowledge about the content consumed by the end-users.
10. Understanding the Users
structured
un-structured
semi-structured
implicit data
tough data, but very commonāØ
(it is often hard to understand)
user generated content
good data, common and with lot of
knowledge, but often difļ¬cult to use
explicit feedbackāØ
optimal data, but very rare āØ
(limited in applications/items/attributes)
11. explicit, clear and structuredstructured
understanding the users from explicit feedback
12. explicit, clear and structuredstructured
understanding the users from explicit feedback
13. explicit, clear and structuredstructured
understanding the users from explicit feedback
14. explicit, clear and structuredstructured
understanding the users from explicit feedback
15. explicit, clear and structuredstructured
understanding the users from explicit feedback
16. implicit, noisy and unstructuredun-structured
understanding the users from implicit feedback
17. implicit, noisy and unstructuredun-structured
understanding the users from implicit feedback
navigational patterns
user behavior
18. implicit, noisy and unstructuredun-structured
understanding the users from implicit feedback
navigational patterns
user behavior
item importance
content recommendation
19. implicit, noisy and unstructuredun-structured
understanding the users from implicit feedback
navigational patterns
user behavior
item importance
content recommendation
browsing graph
referrer graph
21. user generated content
ā the users are always
leaving information behind them ā
understanding the users from their content
semi-structured
22. user generated content
reviews / opinions
comments
media āØ
(images / visual content)
meta-data (gps, tags, ..)
interests + social
tweets / vine videos
ā the users are always
leaving information behind them ā
understanding the users from their content
semi-structured
23. ā the users are always
leaving information behind them ā
semi-structured
Understanding the users from their contentāØ
beyond the scope of their action
24. ā the users are always
leaving information behind them ā
semi-structured
Understanding the users from their contentāØ
beyond the scope of their action
25. ā the users are always
leaving
Loud and Trendy: Crowdsourcing
Impressions of Social Ambiance in Popular
Indoor Urban Places, CHā15
semi-structured
Understanding the users from their contentāØ
beyond the scope of their action
27. 5 stars rating explicit information
clear and easy to
understand
āconnecting people with great local businessesā
28. 5 stars rating explicit information
clear and easy to
understand
unstructured and noisy
contains extremely meaningful information
āconnecting people with great local businessesā
30. identify the āfood wordsā inside the review
understand the userās opinion
Understand Userās Taste
31. identify the āfood wordsā inside the review
understand the userās opinion
Understand Userās Taste
build a user taste proļ¬le
32. identify the āfood wordsā inside the review
understand the userās opinion
Understand Userās Taste
build a user taste proļ¬le build a restaurant āØ
ākitchen qualityā proļ¬le
38. user taste
proļ¬le
restaurant
kitchen quality
proļ¬le
user visits
a new place
what the user likes
the āspecialitiesā of the
restaurant: serendipity?
food or menu
recommendation
āBuon Appetito - Recommending Personalized Menusā, HTā14
44. Recommendation Experiment.
[avg-sent] āØ
most frequent positive food items among the proļ¬les (> threshold)
[user-words] āØ
user-based CF with weighted items by positive sentiments
[menu-words] āØ
frequent and good menu/item sets (Fuzzy Apriori)
[zero-sent] āØ
most frequent food items among the proļ¬les (no sentiments)
Food/Menu Recommender
45. Recommendation Experiment.
[avg-sent] āØ
most frequent positive food items among the proļ¬les (> threshold)
[user-words] āØ
user-based CF with weighted items by positive sentiments
[menu-words] āØ
frequent and good menu/item sets (Fuzzy Apriori)
[zero-sent] āØ
most frequent food items among the proļ¬les (no sentiments)
Food/Menu Recommender
56. User Browsing Graph
collect all browsing sessions
BrowseGraph
(wighted graph)
āBrowseRank: letting web users vote for page importanceā, SIGIRā08
āImage Ranking Based on Users Browsing Behaviorā, SIGIRā12
59. User Browsing Graph
āDiscovering Social Photo Navigation Patternsā, ICMEā12
identifying from where
users are entering the
website
capture usersā interest
(collecting userās
browsing patterns)
60. User Browsing Graph
āDiscovering Social Photo Navigation Patternsā, ICMEā12
identifying from where
users are entering the
website
(external) referrer URL
61. User Browsing Graph
āDiscovering Social Photo Navigation Patternsā, ICMEā12
identifying from where
users are entering the
website
(external) referrer URL
Does the referrer URL
tell us something about
the userās session?
65. User Browsing Graph
mail
search
engine
blogs
social
network
labeling referrer URLs (top domains)
āDiscovering Social Photo Navigation Patternsā, ICMEā12
Does the referrer URL tell us something about the userās session?
classify Flickr web pages (photos, groups, proļ¬le, ā¦)
66. The Predictive Power āØ
of the Referrer Domain
sample of 2 months
of Flickr logs
Apache Web Logs
<user_id,
Ā timestamp,
Ā referrer_url,
Ā current_url,
Ā user_agent>
~300M page views
~40M user sessions
~10M unique users
Flickr Data
72. Visitors behave differently depending on where they
come from
Users tend to perform similar sessions when coming
from the same referrer class (domain)
Note: referrer URL comes for free!
The Predictive Power āØ
of the Referrer Domain
āDiscovering Social Photo Navigation Patternsā, ICMEā12
73. Visitors behave differently depending on where they
come from
Users tend to perform similar sessions when coming
from the same referrer class (domain)
Note: referrer URL comes for free!
The Predictive Power āØ
of the Referrer Domain
āDiscovering Social Photo Navigation Patternsā, ICMEā12
What kind of knowledge
the referrer URL adds
within the BrowseGraph ?
75. User Browsing Graph
2 months of logs
~300M page views
~40M user sessions
~10M unique users
Flickr Data
76. User Browsing Graph
2 months of logs
~300M page views
~40M user sessions
~10M unique users
Flickr Data
considering
only photo
web page
77. User Browsing Graph
BrowseGraph
(wighted graph)
ranking of photos based
on browsing behavior
āImage Ranking Based on Users Browsing Behaviorā, SIGIRā12
2 months of logs
~300M page views
~40M user sessions
~10M unique users
Flickr Data
considering
only photo
web page
86. Photo Ranking āØ
Through the Browse Graph
Evaluation
internal popularity ā how popular is the photo within Flickr?
87. Photo Ranking āØ
Through the Browse Graph
Evaluation
internal popularity ā how popular is the photo within Flickr?
collective attention ā implicit visibility of the photo
88. Photo Ranking āØ
Through the Browse Graph
Evaluation
internal popularity ā how popular is the photo within Flickr?
collective attention ā implicit visibility of the photo
external popularity ā how popular is the photo outside Flickr?
89. Photo Ranking āØ
Through the Browse Graph
Evaluation
internal popularity ā how popular is the photo within Flickr?
collective attention ā implicit visibility of the photo
external popularity ā how popular is the photo outside Flickr?
diversity ā how diverse is the ranking?
90. Photo Ranking āØ
Through the Browse Graph
Internal Popularity
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
legend
91. Photo Ranking āØ
Through the Browse Graph
Internal Popularity
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
legend
92. Photo Ranking āØ
Through the Browse Graph
Internal Popularity
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
Favorites : ranks
images with highest
internal engagement
legend
93. Photo Ranking āØ
Through the Browse Graph
Collective Attention
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
legend
94. Photo Ranking āØ
Through the Browse Graph
Collective Attention
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
legend
95. Photo Ranking āØ
Through the Browse Graph
Collective Attention
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
legend
Favorites and Views
are not very
correlated
97. Photo Ranking āØ
Through the Browse Graph
External Popularity
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
legend
98. Photo Ranking āØ
Through the Browse Graph
External Popularity
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
legend
99. Photo Ranking āØ
Through the Browse Graph
External Popularity
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
legend
100. Photo Ranking āØ
Through the Browse Graph
External Popularity
x-axis: top N results ([1,1000] images)āØ
y-axis: cumulative value of the features
Favorites : low correlation
with external visibility
Page/BrowseRank : very
high correlation āthanks to
the referrer?
103. Photo Ranking āØ
Through the Browse Graph
Diversity
View and Time : rank āØ
better tagged photos
BR and PG : rank āØ
better photos that āØ
have more tags
108. About the Referrer URL :
information about the session the user is going to do
understanding how the webpages are linked from
the external world
Recap
Analysis of the Browsing Logs
109. About the Referrer URL :
information about the session the user is going to do
understanding how the webpages are linked from
the external world
Recap
Analysis of the Browsing Logs
About the BrowseGraph :
discovering content āvotedā by the users
extending the informativeness with the Referrer URL
110. About the Referrer URL :
information about the session the user is going to do
understanding how the webpages are linked from
the external world
Recap
Analysis of the Browsing Logs
About the BrowseGraph :
discovering content āvotedā by the users
extending the informativeness with the Referrer URL
111. About the Referrer URL :
information about the session the user is going to do
understanding how the webpages are linked from
the external world
Recap
Analysis of the Browsing Logs
About the BrowseGraph :
discovering content āvotedā by the users
extending the informativeness with the Referrer URL
Can we predict the content
the user is going to consume?
112. Can we predict the content
the user is going to consume?
113. Can we predict the content
the user is going to consume?
un-structured
114. Can we predict the content
the user is going to consume?
un-structured
implicit information āØ
(navigational patterns)
115. Can we predict the content
the user is going to consume?
un-structured
implicit information āØ
(navigational patterns)
browsing graphāØ
(referrer graph)
116. Can we predict the content
the user is going to consume?
un-structured
implicit information āØ
(navigational patterns)
prediction / recommendation
browsing graphāØ
(referrer graph)
126. Browse Graph on News
Predicting News Articles Consumption
Yahoo News
BrowseGraph
~500M pageviews
Social Network Search Engine
127. Browse Graph on News
Predicting News Articles Consumption
Yahoo News
BrowseGraph
~500M pageviews
Social Network Search Engine
128. āCold-start News Recommendation with Domain-dependent Browse Graphā, RecSysā14
Browse Graph on News
Predicting News Articles Consumption
Yahoo News
BrowseGraph
~500M pageviews
Social Network Search Engine
Domain-Dependent
BrowseGraph
..or just referrerGraph.
129. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
hypothesis : news articles consumed are
differentiable by the referrer domains
130. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
hypothesis : news articles consumed are
differentiable by the referrer domains
implement and evaluate a āØ
recommender system based on
the referrerGraphs
132. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
sessions are very short
average number of hops āØ
during browsing sessions
133. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
sessions are very short
average number of hops āØ
during browsing sessions
134. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
sessions are very short
average number of hops āØ
during browsing sessions
very different size
135. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
sessions are very short
average number of hops āØ
during browsing sessions
very different size well connected
136. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
Nodes Overlap and Importance
137. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
Nodes Overlap and Importance
homepage
google
yahoo
bing
facebook
twitter
reddit
homepage
google
yahoo
bing
facebook
twitter
reddit
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Jaccard Similarity of
Node Sets
138. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
Nodes Overlap and Importance
homepage
google
yahoo
bing
facebook
twitter
reddit
homepage
google
yahoo
bing
facebook
twitter
reddit
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Jaccard Similarity of
Node Sets
139. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
Nodes Overlap and Importance
homepage
google
yahoo
bing
facebook
twitter
reddit
homepage
google
yahoo
bing
facebook
twitter
reddit
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Jaccard Similarity of
Node Sets
homepage
google
yahoo
bing
facebook
twitter
reddit
homepage
google
yahoo
bing
facebook
twitter
reddit
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Kendall Between
News PageRanks
ā§
140. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
Most Common Categories
141. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
Most Common Categories
142. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
Most Common Categories
143. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
Most Common Categories
145. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
hypothesis : news articles consumed are
differentiable by the referrer domains
146. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
hypothesis : news articles consumed are
differentiable by the referrer domains
different graph structure
different interest of the users:
individual articles (node)
news articles topics
importance (PageRank ranking)
165. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
About the ReferrerGraph :
166. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
About the ReferrerGraph :
prediction information of the referrer URL + āØ
collective behaviors of the users
167. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
About the ReferrerGraph :
prediction information of the referrer URL + āØ
collective behaviors of the users
able to capture interest of users āeven for
cold-start problem
168. Browse Graph on News
Predicting News Articles ConsumptionYahoo News
BrowseGraph
About the ReferrerGraph :
prediction information of the referrer URL + āØ
collective behaviors of the users
able to capture interest of users āeven for
cold-start problem
extremely powerful in the news context
175. User interactions
Future Work
Extending Implicit Signals
location data (IP Address, Mobile GPS)
device type (tablet vs. mobile vs. desktop)
custom webpage data (Social Media, ā¦)
176. User interactions
Future Work
Extending Implicit Signals
location data (IP Address, Mobile GPS)
device type (tablet vs. mobile vs. desktop)
custom webpage data (Social Media, ā¦)
Integrating User Proļ¬le
long term user information
userās proļ¬le changes over time āØ
(with respect to the referrer?)
177. User interactions
Future Work
Extending Implicit Signals
location data (IP Address, Mobile GPS)
device type (tablet vs. mobile vs. desktop)
custom webpage data (Social Media, ā¦)
Integrating User Proļ¬le
long term user information
userās proļ¬le changes over time āØ
(with respect to the referrer?)
Experiment Diļ¬erent Graphs
graph of actions instead of pageviews? āØ
(share actions, explicit activity, ads, ā¦)