SlideShare a Scribd company logo
1 of 22
Download to read offline
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
A Probabilistic Reformulation of
Memory-Based Collaborative Filtering
– Implications on Popularity Biases
40th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval (SIGIR 2017)
Rocío Cañamares and Pablo Castells
Autónoma University of Madrid
http://ir.ii.uam.es
Tokyo, Japan, 8 August 2017
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
The recommender systems task
𝑢
𝑖
𝑣
Clara
Sanabras
The Beatles
Vanessa
Da Mata
A recommender system
1. Observes users as they carry out activities in the system
2. Detects behavior patterns, identifies evidence of interests
3. Predicts and suggests choices of potential interest
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
kNN user-based
Ƹ𝑟 𝑢, 𝑖 = ෍
𝑣
𝑤𝑣 𝑟 𝑣, 𝑖
𝑣
𝑖
Target
item
𝑢
Target
user
Neighbor
users
The 𝒌 nearest neighbors approach
(user-based)
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
𝑣
kNN user-based with cosine similarity
Ƹ𝑟 𝑢, 𝑖 = 𝐶 ෍
𝑣
𝑠𝑖𝑚 𝑢, 𝑣 𝑟 𝑣, 𝑖
𝑠𝑖𝑚 𝑢, 𝑣 = cos 𝑢 · Ԧ𝑣 =
𝑢 · Ԧ𝑣
𝑢 Ԧ𝑣
൘1 ෍
𝑣
𝑠𝑖𝑚 𝑢, 𝑣
1
𝐶 =
=
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 2 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 2
𝑖
Target
item
𝑢
Target
user
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
The kNN scheme
 Has been around since the early 90’s
 Is easy to understand, implement, explain
 Is competitive and broadly used in industry today
 Is heuristic
– Many variants, not clear which one is better
 Why a probabilistic reformulation?
– For the sake of it 
– May help better understand,
explain and configure kNN
HeuristicProbabilistic
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Probability space: item choice
Target
user
Items
User
choice
What item would
the user choose?
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Probability space: item choice
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢
𝐼𝑈
Future user choices “urn”
What item would
the user choose?
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
The main idea: marginalization
Past user choices “urn”
𝐽 = 𝐼𝑉
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢
𝐼𝑈
Future user choices “urn”
= ෍
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐼 = 𝑖 𝑈 = 𝑢, 𝑉 = 𝑣, 𝐽 = 𝐼
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣
The main idea: marginalization
𝐽 = 𝐼𝑉 𝐼𝑈
Future user choices “urn”Past user choices “urn”
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣
Probability estimation
Use past choices as a sample
of future choices distribution
𝐽 = 𝐼𝑉 𝐼𝑈
Past user choices “urn”
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣
Probability estimation
𝑟 𝑣, 𝑖 ≡ # times 𝑣
has interacted with 𝑖
𝑝 𝐽 = 𝑖 𝑉 = 𝑣 =
𝑟 𝑣, 𝑖
σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
𝐽 = 𝐼𝑉 𝐼𝑈
Past user choices “urn”
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼
𝑟 𝑣, 𝑖
σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
Probability estimation
𝑟 𝑣, 𝑖 ≡ # times 𝑣
has interacted with 𝑖
𝐽 = 𝐼𝑉 𝐼𝑈
𝑝 𝐽 = 𝑖 𝑉 = 𝑣 =
𝑟 𝑣, 𝑖
σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
Past user choices “urn”
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼
𝑟 𝑣, 𝑖
σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
Probability estimation
𝑟 𝑣, 𝑖 ≡ # times 𝑣
has interacted with 𝑖
𝐽 = 𝐼𝑉 𝐼𝑈
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 =
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗
σ 𝑤∈𝒰 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑤, 𝑗
Past user choices “urn”
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ ෍
𝑣∈𝒰
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
𝑟 𝑣, 𝑖
Putting all together…
𝐽 = 𝐼𝑉 𝐼𝑈
Quite the same as the heuristic
user-based kNN scheme!
= ෍
𝑣∈𝒰
𝑢 · Ԧ𝑣
𝑢 1 Ԧ𝑣 1
𝑟 𝑣, 𝑖
Past user choices “urn”
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
 Item-based
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ 𝐶 ෍
𝑗∈ℐ
Ԧ𝑖 · Ԧ𝑗
Ԧ𝑗 1
𝑟 𝑢, 𝑗 𝐶 =
σ 𝑣∈𝒰 𝑟 𝑣, 𝑖
σ 𝑣∈𝒰 𝑟 𝑣, 𝑖 Ԧ𝑣 1
 Normalized variants
– User-based
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ 𝐶 ෍
𝑣∈𝒰
𝑢 · Ԧ𝑣
𝑢 1 Ԧ𝑣 1
𝑟 𝑣, 𝑖 𝐶 = ൘1 ෍
𝑣∈𝒰
𝑟 𝑣,𝑖 >0
𝑢 · Ԧ𝑣
– Item-based
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ 𝐶 ෍
𝑗∈ℐ
Ԧ𝑖 · Ԧ𝑗
Ԧ𝑗 1
𝑟 𝑢, 𝑗 𝐶 = ൙Ԧ𝑖 ෍
𝑗∈ℐ
𝑟 𝑣,𝑖 >0
Ԧ𝑖 · Ԧ𝑗
Other variants
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Popularity bias
 If pairwise user independence: 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐼 = 𝐽 = 𝑝 𝑉 = 𝑣
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∼ ෍
𝑣∈𝒰
𝑝 𝐽 = 𝑖 𝑉 = 𝑣 𝑝 𝑉 = 𝑣 = 𝑝 𝐽 = 𝑖
 is the popularity of item 𝑖
 Therefore kNN:
– Is biased towards popular items
– Needs pairwise user-user dependence to work properly
 Other kNN variants
– Normalized user-based kNN is biased to the average rating
– Item-based kNN (normalized or not) is also biased to popularity
𝑝 𝐽 = 𝑖 ∝ ෍
𝑣∈𝒰
𝑟 𝑣, 𝑖
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Experiments
 Random rating split 80% training, 20% test
 Parameter tuning by grid search
 Dirichlet smoothing for probabilistic kNN
Domain # users # items # ratings
MovieLens 1M Movies 6,040 3,706 1,000,209
Netflix Movies 480,189 17,770 100,480,507
Last.fm Music 992 174,091 898,073
Crowd random Music 1,054 1,084 103,584
 Test probabilistic against heuristic variants
 Check popularity biases
 Datasets
Public
Flat rating
distribution
over items
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
0
0.1
0.2
0.3
User
based
Item
based
User
based
Item
based
Not
normalized Normalized
0
0.05
0.1
0.15
0.2
User
based
Item
based
User
based
Item
based
Not
normalized Normalized
0
0.1
0.2
0.3
User
based
Item
based
User
based
Item
based
Not
normalized Normalized
Public datasets – Results
MovieLens 1M Netflix Last.fm
nDCG@10
Heuristic Probabilistic


Heuristic Probabilistic




Similar accuracy overall
Some improvements on item-based
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
0
0.01
0.02
User
based
Item
based
User
based
Item
based
Not
normalized Normalized
Crowdsourced dataset – Results
nDCG@10
Heuristic Probabilistic
As good as
not normalized!
Heuristic item-based
With flat ratings
distribution…
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
0
2000
4000
6000
1 2 3 4 5
Average rating
Public datasets – Popularity biases
0
2000
4000
0 2000 4000
Popularity
Probabilistic
0
2000
4000
6000
0 2000 4000
Popularity
Not normalized Normalized
User-based kNN (MovieLens 1M)
Popularity Popularity
0
2000
4000
6000
1 2 3 4 5
Average rating
Heuristic
0
2000
4000
0 2000 4000
Popularity
0
2000
4000
0 2000 4000
Popularity
Not normalized Normalized
0
1000
2000
3000
4000
0 2000 4000
Popularity
0
2000
4000
6000
0 2000 4000
PopularityPopularity Popularity
Average ratingAverage rating
Quite the same trends
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Public datasets – Popularity biases
Probabilistic Heuristic
Not normalized Normalized Not normalized Normalized
Item-based kNN (MovieLens 1M)
0
2000
4000
0 2000 4000
Popularity
0
2000
4000
0 2000 4000
Popularity
0
2000
4000
0 2000 4000
Popularity
0
500
1000
0 2000 4000
Popularity
0
500
1000
1 2 3 4 5
Average rating
Not quite the same trends
Popularity Popularity Popularity Popularity
Average rating
IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Conclusion
 Full probabilistic reformulation of kNN scheme
– With classic variants
 The probabilistic formulation…
– Provides a precise explanation why kNN works, under what condition
– Explains why kNN tends to recommend popular items
– Has the advantages of a probabilistic formulation
 Equivalent accuracy and behavior to heuristic formulations
– More so for user-based variants
– Probabilistic item-based is more consistent than heuristic
– Accuracy of normalized kNN might be misrepresented on common datasets
 Future work: explore further empirical optimization, inter-user
dependency analysis, other collaborative filtering methods…

More Related Content

Similar to SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases

Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Lippo Group Digital
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
Dave King
 

Similar to SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases (20)

MOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERING
MOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERINGMOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERING
MOVIE RECOMMENDATION SYSTEM USING COLLABORATIVE FILTERING
 
Mining User Interests from Social Media
Mining User Interests from Social MediaMining User Interests from Social Media
Mining User Interests from Social Media
 
Shaping our AI (Strategy)?
Shaping our AI (Strategy)?Shaping our AI (Strategy)?
Shaping our AI (Strategy)?
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Technique
 
Social Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender SystemsSocial Aspects of Interactive Recommender Systems
Social Aspects of Interactive Recommender Systems
 
A Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation SystemA Content Boosted Hybrid Recommendation System
A Content Boosted Hybrid Recommendation System
 
Gaber.pdf
Gaber.pdfGaber.pdf
Gaber.pdf
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Resume
ResumeResume
Resume
 
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation SystemIRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
 
Data Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association RuleData Mining For Supermarket Sale Analysis Using Association Rule
Data Mining For Supermarket Sale Analysis Using Association Rule
 
Analysis on Recommended System for Web Information Retrieval Using HMM
Analysis on Recommended System for Web Information Retrieval Using HMMAnalysis on Recommended System for Web Information Retrieval Using HMM
Analysis on Recommended System for Web Information Retrieval Using HMM
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
Rules Reduction using Evolutionary Meta-Heuristics
Rules Reduction using  Evolutionary Meta-HeuristicsRules Reduction using  Evolutionary Meta-Heuristics
Rules Reduction using Evolutionary Meta-Heuristics
 
powerpoint presentation on movie recommender system.
powerpoint presentation on movie recommender system.powerpoint presentation on movie recommender system.
powerpoint presentation on movie recommender system.
 
Digital Trails Dave King 1 5 10 Part 2 D3
Digital Trails   Dave King   1 5 10   Part 2   D3Digital Trails   Dave King   1 5 10   Part 2   D3
Digital Trails Dave King 1 5 10 Part 2 D3
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
A recommender system-using novel deep network collaborative filtering
A recommender system-using novel deep network collaborative filteringA recommender system-using novel deep network collaborative filtering
A recommender system-using novel deep network collaborative filtering
 
Bulldozer price prediction using regression model (Research Ethics).pptx
Bulldozer price prediction using regression model (Research Ethics).pptxBulldozer price prediction using regression model (Research Ethics).pptx
Bulldozer price prediction using regression model (Research Ethics).pptx
 
Gunjan insight student conference v2
Gunjan insight student conference v2Gunjan insight student conference v2
Gunjan insight student conference v2
 

More from Pablo Castells

SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
Pablo Castells
 
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
Pablo Castells
 

More from Pablo Castells (8)

Rational and irrational bias in recommendation
Rational and irrational bias in recommendationRational and irrational bias in recommendation
Rational and irrational bias in recommendation
 
Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?
 
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationRecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
 
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender S...
 
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
 
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsSIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
 
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
 
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
 

Recently uploaded

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 

SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases

  • 1. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Rocío Cañamares and Pablo Castells Autónoma University of Madrid http://ir.ii.uam.es Tokyo, Japan, 8 August 2017
  • 2. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 The recommender systems task 𝑢 𝑖 𝑣 Clara Sanabras The Beatles Vanessa Da Mata A recommender system 1. Observes users as they carry out activities in the system 2. Detects behavior patterns, identifies evidence of interests 3. Predicts and suggests choices of potential interest
  • 3. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 kNN user-based Ƹ𝑟 𝑢, 𝑖 = ෍ 𝑣 𝑤𝑣 𝑟 𝑣, 𝑖 𝑣 𝑖 Target item 𝑢 Target user Neighbor users The 𝒌 nearest neighbors approach (user-based)
  • 4. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 𝑣 kNN user-based with cosine similarity Ƹ𝑟 𝑢, 𝑖 = 𝐶 ෍ 𝑣 𝑠𝑖𝑚 𝑢, 𝑣 𝑟 𝑣, 𝑖 𝑠𝑖𝑚 𝑢, 𝑣 = cos 𝑢 · Ԧ𝑣 = 𝑢 · Ԧ𝑣 𝑢 Ԧ𝑣 ൘1 ෍ 𝑣 𝑠𝑖𝑚 𝑢, 𝑣 1 𝐶 = = σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 2 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 2 𝑖 Target item 𝑢 Target user
  • 5. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 The kNN scheme  Has been around since the early 90’s  Is easy to understand, implement, explain  Is competitive and broadly used in industry today  Is heuristic – Many variants, not clear which one is better  Why a probabilistic reformulation? – For the sake of it  – May help better understand, explain and configure kNN HeuristicProbabilistic
  • 6. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Probability space: item choice Target user Items User choice What item would the user choose?
  • 7. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Probability space: item choice Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 𝐼𝑈 Future user choices “urn” What item would the user choose?
  • 8. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 The main idea: marginalization Past user choices “urn” 𝐽 = 𝐼𝑉 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 𝐼𝑈 Future user choices “urn” = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐼 = 𝑖 𝑈 = 𝑢, 𝑉 = 𝑣, 𝐽 = 𝐼
  • 9. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 The main idea: marginalization 𝐽 = 𝐼𝑉 𝐼𝑈 Future user choices “urn”Past user choices “urn”
  • 10. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 Probability estimation Use past choices as a sample of future choices distribution 𝐽 = 𝐼𝑉 𝐼𝑈 Past user choices “urn”
  • 11. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 Probability estimation 𝑟 𝑣, 𝑖 ≡ # times 𝑣 has interacted with 𝑖 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 = 𝑟 𝑣, 𝑖 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 𝐽 = 𝐼𝑉 𝐼𝑈 Past user choices “urn”
  • 12. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑟 𝑣, 𝑖 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 Probability estimation 𝑟 𝑣, 𝑖 ≡ # times 𝑣 has interacted with 𝑖 𝐽 = 𝐼𝑉 𝐼𝑈 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 = 𝑟 𝑣, 𝑖 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 Past user choices “urn”
  • 13. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 = ෍ 𝑣 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑟 𝑣, 𝑖 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 Probability estimation 𝑟 𝑣, 𝑖 ≡ # times 𝑣 has interacted with 𝑖 𝐽 = 𝐼𝑉 𝐼𝑈 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 = σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗 σ 𝑤∈𝒰 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑤, 𝑗 Past user choices “urn”
  • 14. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Given a target user 𝑢, rank items by decreasing value of 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ ෍ 𝑣∈𝒰 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 𝑟 𝑣, 𝑖 Putting all together… 𝐽 = 𝐼𝑉 𝐼𝑈 Quite the same as the heuristic user-based kNN scheme! = ෍ 𝑣∈𝒰 𝑢 · Ԧ𝑣 𝑢 1 Ԧ𝑣 1 𝑟 𝑣, 𝑖 Past user choices “urn”
  • 15. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017  Item-based 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ 𝐶 ෍ 𝑗∈ℐ Ԧ𝑖 · Ԧ𝑗 Ԧ𝑗 1 𝑟 𝑢, 𝑗 𝐶 = σ 𝑣∈𝒰 𝑟 𝑣, 𝑖 σ 𝑣∈𝒰 𝑟 𝑣, 𝑖 Ԧ𝑣 1  Normalized variants – User-based 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ 𝐶 ෍ 𝑣∈𝒰 𝑢 · Ԧ𝑣 𝑢 1 Ԧ𝑣 1 𝑟 𝑣, 𝑖 𝐶 = ൘1 ෍ 𝑣∈𝒰 𝑟 𝑣,𝑖 >0 𝑢 · Ԧ𝑣 – Item-based 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝ 𝐶 ෍ 𝑗∈ℐ Ԧ𝑖 · Ԧ𝑗 Ԧ𝑗 1 𝑟 𝑢, 𝑗 𝐶 = ൙Ԧ𝑖 ෍ 𝑗∈ℐ 𝑟 𝑣,𝑖 >0 Ԧ𝑖 · Ԧ𝑗 Other variants
  • 16. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Popularity bias  If pairwise user independence: 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐼 = 𝐽 = 𝑝 𝑉 = 𝑣 𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∼ ෍ 𝑣∈𝒰 𝑝 𝐽 = 𝑖 𝑉 = 𝑣 𝑝 𝑉 = 𝑣 = 𝑝 𝐽 = 𝑖  is the popularity of item 𝑖  Therefore kNN: – Is biased towards popular items – Needs pairwise user-user dependence to work properly  Other kNN variants – Normalized user-based kNN is biased to the average rating – Item-based kNN (normalized or not) is also biased to popularity 𝑝 𝐽 = 𝑖 ∝ ෍ 𝑣∈𝒰 𝑟 𝑣, 𝑖
  • 17. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Experiments  Random rating split 80% training, 20% test  Parameter tuning by grid search  Dirichlet smoothing for probabilistic kNN Domain # users # items # ratings MovieLens 1M Movies 6,040 3,706 1,000,209 Netflix Movies 480,189 17,770 100,480,507 Last.fm Music 992 174,091 898,073 Crowd random Music 1,054 1,084 103,584  Test probabilistic against heuristic variants  Check popularity biases  Datasets Public Flat rating distribution over items
  • 18. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 0 0.1 0.2 0.3 User based Item based User based Item based Not normalized Normalized 0 0.05 0.1 0.15 0.2 User based Item based User based Item based Not normalized Normalized 0 0.1 0.2 0.3 User based Item based User based Item based Not normalized Normalized Public datasets – Results MovieLens 1M Netflix Last.fm nDCG@10 Heuristic Probabilistic   Heuristic Probabilistic     Similar accuracy overall Some improvements on item-based
  • 19. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 0 0.01 0.02 User based Item based User based Item based Not normalized Normalized Crowdsourced dataset – Results nDCG@10 Heuristic Probabilistic As good as not normalized! Heuristic item-based With flat ratings distribution…
  • 20. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 0 2000 4000 6000 1 2 3 4 5 Average rating Public datasets – Popularity biases 0 2000 4000 0 2000 4000 Popularity Probabilistic 0 2000 4000 6000 0 2000 4000 Popularity Not normalized Normalized User-based kNN (MovieLens 1M) Popularity Popularity 0 2000 4000 6000 1 2 3 4 5 Average rating Heuristic 0 2000 4000 0 2000 4000 Popularity 0 2000 4000 0 2000 4000 Popularity Not normalized Normalized 0 1000 2000 3000 4000 0 2000 4000 Popularity 0 2000 4000 6000 0 2000 4000 PopularityPopularity Popularity Average ratingAverage rating Quite the same trends
  • 21. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Public datasets – Popularity biases Probabilistic Heuristic Not normalized Normalized Not normalized Normalized Item-based kNN (MovieLens 1M) 0 2000 4000 0 2000 4000 Popularity 0 2000 4000 0 2000 4000 Popularity 0 2000 4000 0 2000 4000 Popularity 0 500 1000 0 2000 4000 Popularity 0 500 1000 1 2 3 4 5 Average rating Not quite the same trends Popularity Popularity Popularity Popularity Average rating
  • 22. IRGIRGroup @UAM A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases 40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017) Tokyo, Japan, 8 August 2017 Conclusion  Full probabilistic reformulation of kNN scheme – With classic variants  The probabilistic formulation… – Provides a precise explanation why kNN works, under what condition – Explains why kNN tends to recommend popular items – Has the advantages of a probabilistic formulation  Equivalent accuracy and behavior to heuristic formulations – More so for user-based variants – Probabilistic item-based is more consistent than heuristic – Accuracy of normalized kNN might be misrepresented on common datasets  Future work: explore further empirical optimization, inter-user dependency analysis, other collaborative filtering methods…