RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity biases in recommender systems

IRG IR Group @ UAM
Exploring social network effects on popularity biases in recommender systems
6th ACM RecSys Workshop on Recommender Systems and the Social Web – RSWeb 2014
Foster City, CA, 6 October 2014
6th ACM RecSys Workshop on Recommender Systems
and the Social Web – RSWeb 2014
Exploring social network effects
on popularity biases
in recommender systems
Rocío Cañamares and Pablo Castells
Universidad Autónoma de Madrid
http://ir.ii.uam.es

IRG IR Group @ UAM
Outline of my talk
 Why is popularity effective?
 When is popularity effective?
– How does an item become popular?
– A stochastic model of social communication and rating behavior
 Simulation-based experiments for “what if” scenarios
 Conclusions

IRG IR Group @ UAM
The effectiveness of precision in top-k recommendation
 Popularity tests well for top-k precision in offline experiments
(Cremonesi et al RecSys 2010, etc.)
 But… does this reflect true precision?
 …or might there be an artificial bias that rewards popular items
in the offline experimental procedure?
 There is of course the issue of lack of novelty, but we shall focus
here on accuracy

IRG IR Group @ UAM
Why is popularity effective?

IRG IR Group @ UAM
Why is popularity rank an effective recommendation
Items
Observed user-item interaction
Unobserved preference
Users
The good old rating matrix…

IRG IR Group @ UAM
Popular items
(short head)
Rest of items
(long tail)
Observed user-item interaction
Items
Users
Rating matrix in practice

IRG IR Group @ UAM
 In a random split, popular items have more test hits than average (more  more )
 Thus recommending them is effective (at least better than random)
 But how about true precision?  What’s in the “ ” cells?
Test data (relevant items)
Training data
Items
Users
Popular items
(short head)
Rest of items
(long tail)
avg P@푘 ∼
+
푘

IRG IR Group @ UAM


Or is it? A toy simplified example





Item
A
Item
B
1 2
3 8
3 4
7 8
Observed
P@1
True
P@1
Popularity recommendation
Random recommendation

Ratings

IRG IR Group @ UAM
When is popularity effective?

IRG IR Group @ UAM
When is popularity effective?
Why do popular items get more ratings?
And how does that relate with item relevance?
(“relevance” meaning target users like the items)

IRG IR Group @ UAM
Rating generation
In order for a rating to be produced…
1. Discovery: the user needs to discover the item
– And then find out whether or not she likes it
2. Rating decision: the user needs to tell the system about it
– I.e. rate the item
So the biases in discovery and rating decisions should result in
(may explain?) biases in rating distribution (i.e. popularity)

IRG IR Group @ UAM
Discovery sources
How do people find items
 We search/browse for them
 We randomly run into them
 They are advertised to us
 They are brought to us by a recommender system
 ···
 We find them through our friends
 We define a stochastic model
– Social communication and rating
– User decisions dependent on item relevance
 We analyze the effect on popularity precision
– Simulation

IRG IR Group @ UAM
A model of social discovery and rating propagation
Rate




Rate? 

RTaetlel??






Rating decision
푝 푟푎푡푒 푠푒푒푛, 푙푖푘푒푑
푝 푟푎푡푒 푠푒푒푛, ¬푙푖푘푒푑
Communication decision
푝 푡푒푙푙 푠푒푒푛, 푙푖푘푒푑
푝 푡푒푙푙 푠푒푒푛, ¬푙푖푘푒푑
• Kown item sampling
• Friend sampling
• Boostrapping discovery
from exogenous source

IRG IR Group @ UAM
From user behavior model to macro social effect
Communication-relevance bias
푝 푡푒푙푙 푠푒푒푛, 푙푖푘푒푑 , 푝 푡푒푙푙 푠푒푒푛, ¬푙푖푘푒푑
Global discovery-relevance bias
푝 푠푒푒푛 푙푖푘푒푑 , 푝 푠푒푒푛 ¬푙푖푘푒푑
Rating-relevance decision bias
푝 푟푎푡푒 푠푒푒푛, 푙푖푘푒푑 ,
Global rating-relevance bias
푝 푙푖푘푒푑 푟푎푡푒푑 , 푝 푙푖푘푒푑 ¬푟푎푡푒푑
Expected precision
of popularity-rank recommendation
User behavior
model parameters

IRG IR Group @ UAM
Two approaches to analyze the model effects
 Theoretical
 Simulate and see what happens…
Challenging! Work in progress…

IRG IR Group @ UAM
Experiments

IRG IR Group @ UAM
Experiments – Simulation setup
 Social network: ~4,000 users, ~88,000 arcs
– Facebook network data from Jure Leskovec
– Random graphs: Barabási-Albert, Erdös-Rényi
 3,700 items
 We simulate a relevance distribution with a long-tail shape,
randomly assigned to user-item pairs
 Bootstrapping: exogenous random
discovery every ~1,000 time cycles
 Stop simulation when 500,000 ratings
are produced
Roughly
MovieLens 1M
scale
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000
푖

IRG IR Group @ UAM
Experiments – Simulation setup
 At any point in the simulation we are able to:
– Split the rating data and run a recommender system (e.g. popularity)
– Measure the precision of the recommendations – observed and true
 By running different configurations we can observe the
results in different scenarios
– We test in general one bias at a time: discovery or rating
– We show single shot no average

IRG IR Group @ UAM
Research questions for experiments
 How does popularity compare with random recomendation
precision depending on the four user behavior parameters?
 Does it make a difference to consider all ratings or only positive
ratings in popularity rank?
 Does the social network topology and network phenomena
make a difference?
 Can observed and true precision disagree?

IRG IR Group @ UAM
Effect of communication behavior (with 푝 푟푎푡푒 푠푒푒푛 = 1)
Positive popularity Simple popularity
0
1
푝 푡푒푙푙 푠푒푒푛, 푙푖푘푒푑 푝 푡푒푙푙 푠푒푒푛, 푙푖푘푒푑
1 1
1
0
0
0
0
0
-0 0
> rnd = rnd < rnd
1 1
0 1 0 1
-0 -0 0 0 0 0 0 0 0 0 0
-0 0 0 0 0 0 0 0 0 0 0
-0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
-0 -0 -0 -0 -0 -0 -0 -0 -0 -0 -0
-0 -0 -0 -0 -0 -0 -0 -0 -0 0 0
-0 -0 -0 -0 -0 -0 -0 -0 -0 0 0
-0 -0 -0 -0 -0 -0 -0 0 0 0 0
-0 -0 -0 -0 -0 -0 -0 0 0 0 0
-0 -0 -0 -0 -0 -0 0 0 0 0 0
-0 -0 -0 -0 -0 0 0 0 0 0 0
-0 -0 -0 0 0 0 0 0 0 0 0
-0 -0 0 0 0 0 0 0 0 0 0
-0 -0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
-0 0 0 0 -0 0 0 0 0 0 0
-0 -0 0 0 0 0 0 0 0 0 0
-0 -0 0 0 0 0 0 0 0 0 0
0 -0 0 0 0 0 0 0 0 0 0
-0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
(Temporal split)
 Precision grows with
푝 푡푒푙푙 푠푒푒푛, 푙푖푘푒푑
 True precision worse
than rnd sometimes
 Positive pop better
than simple pop
Observed precision
 Almost always better
than random
 Grows with 푝 푡푒푙푙 푠푒푒푛
Viral discovery effect on
pop concentration
True precision
 Degrades with
Observed P@10 diff True P@10 diff

IRG IR Group @ UAM
Effect of rating behavior (with 푝 푡푒푙푙 푠푒푒푛 = 1)
0
1
0 푝 푟푎푡푒 푠푒푒푛, 푙푖푘푒푑 푝 푟푎푡푒 푠푒푒푛, 푙푖푘푒푑
1
1
0
0
-0 0
> rnd = rnd < rnd
1
0 1 0 1
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
-0 0 0 0 0 0 0 0 0 0
-0 -0 0 0 0 0 0 0 0 0
-0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 -0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 -0 -0 -0 -0
0 0 0 0 0 -0 -0 -0 -0 0
-0 -0 -0 -0 -0 -0 0 0 -0 -0
-0 -0 -0 -0 0 0 -0 0 -0 -0
-0 -0 -0 -0 0 0 0 0 0 0
-0 -0 -0 0 0 0 0 0 0 0
-0 -0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
(Temporal split)
Observed precision
 Almost always better
than random
 Grows with
 Grows slightly with
푝 푟푎푡푒 푠푒푒푛, ¬푙푖푘푒 !!
True precision
 Positive pop always
better than random
 Simple pop sometimes
worse than random
 Degrades with
푝 푟푎푡푒 푠푒푒푛, 푙푖푘푒푑 !!
 Viral effect: liked items
get “sold out”
1
1
0
0

IRG IR Group @ UAM
Effect of rating behavior (with 푝 푡푒푙푙 푠푒푒푛 = 1)
0
1
0 푝 푟푎푡푒 푠푒푒푛, 푙푖푘푒푑 푝 푟푎푡푒 푠푒푒푛, 푙푖푘푒푑
1
1
0
0
-0 0
> rnd = rnd < rnd
1
0 1 0 1
(Random split)
Observed precision
 Always better than rnd
 Grows with
 Decreases with
푝 푟푎푡푒 푠푒푒푛, ¬푙푖푘푒
True precision
 Positive pop always
better than random,
almost constant
 Simple pop worse than
rnd when rating bias
is negative
Viral discovery has
little effect
1
1
0
0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
-0 -0 -0 -0 -0 -0 -0 0 -0 -0
-0 -0 -0 -0 -0 -0 -0 -0 0 -0
-0 -0 -0 -0 -0 -0 -0 0 0 0
-0 -0 -0 -0 -0 -0 -0 0 0 0
-0 -0 -0 -0 -0 -0 0 0 0 0
-0 -0 -0 -0 0 0 0 0 0 0
-0 -0 -0 -0 0 0 0 0 0 0
-0 -0 0 0 0 0 0 0 0 0
-0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0

IRG IR Group @ UAM
Social network topology effect
0
0.1
0.2
0.3
Observed True Observed True
Facebook Barabási-Albert
P@10
Relevant popularity
푝 푡푒푙푙 푠푒푒푛, 푙푖푘푒푑 = 1 푝 푟푎푡푒 푠푒푒푛, 푙푖푘푒푑 = 1
푝 푡푒푙푙 푠푒푒푛,¬푙푖푘푒푑 = 1 푝 푟푎푡푒 푠푒푒푛,¬푙푖푘푒푑 = 0
0
0.1
0.2
0.3
Observed True Observed True
Facebook Barabási-Albert
P@10
Popularity

IRG IR Group @ UAM
Contradicting observed and true precision
0
0.05
0.1
0.15
0.2
0.25
Observed True
P@10
Simple popularity
Positive popularity
Random
recommendation
푝 푡푒푙푙 푠푒푒푛, 푙푖푘푒푑 = 0 푝 푟푎푡푒 푠푒푒푛, 푙푖푘푒푑 = 1
푝 푡푒푙푙 푠푒푒푛,¬푙푖푘푒푑 = 1 푝 푟푎푡푒 푠푒푒푛,¬푙푖푘푒푑 = 1
Random
is here 

IRG IR Group @ UAM
Conclusions
 Observed precision of popularity is always better than random
 True precision of popularity is worse than random when:
– Users talk about items they dislike more often than ones they like
– Users rate items they dislike more often than ones they like
 Positive popularity is considerably more robust than simple popularity
– Fairly immune to user rating behavior on disliked items
 Viral effects in temporal split
– Determined by a) user communication frequency, and b) social network topology
– Early popular items are recommendable to fewer users than in a random split
– Popularity may then become less useful for recommendation
 It is not impossible for true and observed precision to be inconsistent

IRG IR Group @ UAM
Future work
 Analytic work (in progress)
 Very easy to generalize the model, just to mention a few possibilities…
– Arbitrarily biased exogenous sources, including recommender systems
– Dynamic social network, dynamic item lifecycles
– User behavior dependence on discovery source
– Social influence propagation, dynamic user preferences
 So far a first step
– Understanding how social behavior patterns impact true popularity effectiveness
 Next questions
– User studies
– Tracking and detecting the collective behavior patterns in real settings
– What to do about it
a) In the evaluation procedure & metrics and/or interpretation of results
b) In the algorithms which may potentially take popularity as a signal

RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity biases in recommender systems

Recommended

Recommended

More Related Content

Similar to RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity biases in recommender systems

Similar to RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity biases in recommender systems (20)

More from Pablo Castells

More from Pablo Castells (8)

Recently uploaded

Recently uploaded (20)

RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity biases in recommender systems