Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
1. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
A Probabilistic Reformulation of
Memory-Based Collaborative Filtering
– Implications on Popularity Biases
40th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval (SIGIR 2017)
Rocío Cañamares and Pablo Castells
Autónoma University of Madrid
http://ir.ii.uam.es
Tokyo, Japan, 8 August 2017
2. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
The recommender systems task
𝑢
𝑖
𝑣
Clara
Sanabras
The Beatles
Vanessa
Da Mata
A recommender system
1. Observes users as they carry out activities in the system
2. Detects behavior patterns, identifies evidence of interests
3. Predicts and suggests choices of potential interest
3. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
kNN user-based
Ƹ𝑟 𝑢, 𝑖 =
𝑣
𝑤𝑣 𝑟 𝑣, 𝑖
𝑣
𝑖
Target
item
𝑢
Target
user
Neighbor
users
The 𝒌 nearest neighbors approach
(user-based)
4. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
𝑣
kNN user-based with cosine similarity
Ƹ𝑟 𝑢, 𝑖 = 𝐶
𝑣
𝑠𝑖𝑚 𝑢, 𝑣 𝑟 𝑣, 𝑖
𝑠𝑖𝑚 𝑢, 𝑣 = cos 𝑢 · Ԧ𝑣 =
𝑢 · Ԧ𝑣
𝑢 Ԧ𝑣
൘1
𝑣
𝑠𝑖𝑚 𝑢, 𝑣
1
𝐶 =
=
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 2 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗 2
𝑖
Target
item
𝑢
Target
user
5. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
The kNN scheme
Has been around since the early 90’s
Is easy to understand, implement, explain
Is competitive and broadly used in industry today
Is heuristic
– Many variants, not clear which one is better
Why a probabilistic reformulation?
– For the sake of it
– May help better understand,
explain and configure kNN
HeuristicProbabilistic
6. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Probability space: item choice
Target
user
Items
User
choice
What item would
the user choose?
7. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Probability space: item choice
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢
𝐼𝑈
Future user choices “urn”
What item would
the user choose?
8. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
The main idea: marginalization
Past user choices “urn”
𝐽 = 𝐼𝑉
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢
𝐼𝑈
Future user choices “urn”
=
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐼 = 𝑖 𝑈 = 𝑢, 𝑉 = 𝑣, 𝐽 = 𝐼
9. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 =
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣
The main idea: marginalization
𝐽 = 𝐼𝑉 𝐼𝑈
Future user choices “urn”Past user choices “urn”
10. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 =
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣
Probability estimation
Use past choices as a sample
of future choices distribution
𝐽 = 𝐼𝑉 𝐼𝑈
Past user choices “urn”
11. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 =
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 𝑝 𝐽 = 𝑖 𝑉 = 𝑣
Probability estimation
𝑟 𝑣, 𝑖 ≡ # times 𝑣
has interacted with 𝑖
𝑝 𝐽 = 𝑖 𝑉 = 𝑣 =
𝑟 𝑣, 𝑖
σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
𝐽 = 𝐼𝑉 𝐼𝑈
Past user choices “urn”
12. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 =
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼
𝑟 𝑣, 𝑖
σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
Probability estimation
𝑟 𝑣, 𝑖 ≡ # times 𝑣
has interacted with 𝑖
𝐽 = 𝐼𝑉 𝐼𝑈
𝑝 𝐽 = 𝑖 𝑉 = 𝑣 =
𝑟 𝑣, 𝑖
σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
Past user choices “urn”
13. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 =
𝑣
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼
𝑟 𝑣, 𝑖
σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
Probability estimation
𝑟 𝑣, 𝑖 ≡ # times 𝑣
has interacted with 𝑖
𝐽 = 𝐼𝑉 𝐼𝑈
𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐽 = 𝐼 =
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗
σ 𝑤∈𝒰 σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑤, 𝑗
Past user choices “urn”
14. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Given a target user 𝑢, rank items by decreasing value of
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∝
𝑣∈𝒰
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 𝑟 𝑣, 𝑗
σ 𝑗∈ℐ 𝑟 𝑢, 𝑗 σ 𝑗∈ℐ 𝑟 𝑣, 𝑗
𝑟 𝑣, 𝑖
Putting all together…
𝐽 = 𝐼𝑉 𝐼𝑈
Quite the same as the heuristic
user-based kNN scheme!
=
𝑣∈𝒰
𝑢 · Ԧ𝑣
𝑢 1 Ԧ𝑣 1
𝑟 𝑣, 𝑖
Past user choices “urn”
16. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Popularity bias
If pairwise user independence: 𝑝 𝑉 = 𝑣 𝑈 = 𝑢, 𝐼 = 𝐽 = 𝑝 𝑉 = 𝑣
𝑝 𝐼 = 𝑖 𝑈 = 𝑢 ∼
𝑣∈𝒰
𝑝 𝐽 = 𝑖 𝑉 = 𝑣 𝑝 𝑉 = 𝑣 = 𝑝 𝐽 = 𝑖
is the popularity of item 𝑖
Therefore kNN:
– Is biased towards popular items
– Needs pairwise user-user dependence to work properly
Other kNN variants
– Normalized user-based kNN is biased to the average rating
– Item-based kNN (normalized or not) is also biased to popularity
𝑝 𝐽 = 𝑖 ∝
𝑣∈𝒰
𝑟 𝑣, 𝑖
17. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Experiments
Random rating split 80% training, 20% test
Parameter tuning by grid search
Dirichlet smoothing for probabilistic kNN
Domain # users # items # ratings
MovieLens 1M Movies 6,040 3,706 1,000,209
Netflix Movies 480,189 17,770 100,480,507
Last.fm Music 992 174,091 898,073
Crowd random Music 1,054 1,084 103,584
Test probabilistic against heuristic variants
Check popularity biases
Datasets
Public
Flat rating
distribution
over items
18. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
0
0.1
0.2
0.3
User
based
Item
based
User
based
Item
based
Not
normalized Normalized
0
0.05
0.1
0.15
0.2
User
based
Item
based
User
based
Item
based
Not
normalized Normalized
0
0.1
0.2
0.3
User
based
Item
based
User
based
Item
based
Not
normalized Normalized
Public datasets – Results
MovieLens 1M Netflix Last.fm
nDCG@10
Heuristic Probabilistic
Heuristic Probabilistic
Similar accuracy overall
Some improvements on item-based
19. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
0
0.01
0.02
User
based
Item
based
User
based
Item
based
Not
normalized Normalized
Crowdsourced dataset – Results
nDCG@10
Heuristic Probabilistic
As good as
not normalized!
Heuristic item-based
With flat ratings
distribution…
20. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
0
2000
4000
6000
1 2 3 4 5
Average rating
Public datasets – Popularity biases
0
2000
4000
0 2000 4000
Popularity
Probabilistic
0
2000
4000
6000
0 2000 4000
Popularity
Not normalized Normalized
User-based kNN (MovieLens 1M)
Popularity Popularity
0
2000
4000
6000
1 2 3 4 5
Average rating
Heuristic
0
2000
4000
0 2000 4000
Popularity
0
2000
4000
0 2000 4000
Popularity
Not normalized Normalized
0
1000
2000
3000
4000
0 2000 4000
Popularity
0
2000
4000
6000
0 2000 4000
PopularityPopularity Popularity
Average ratingAverage rating
Quite the same trends
21. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Public datasets – Popularity biases
Probabilistic Heuristic
Not normalized Normalized Not normalized Normalized
Item-based kNN (MovieLens 1M)
0
2000
4000
0 2000 4000
Popularity
0
2000
4000
0 2000 4000
Popularity
0
2000
4000
0 2000 4000
Popularity
0
500
1000
0 2000 4000
Popularity
0
500
1000
1 2 3 4 5
Average rating
Not quite the same trends
Popularity Popularity Popularity Popularity
Average rating
22. IRGIRGroup @UAM
A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases
40th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017)
Tokyo, Japan, 8 August 2017
Conclusion
Full probabilistic reformulation of kNN scheme
– With classic variants
The probabilistic formulation…
– Provides a precise explanation why kNN works, under what condition
– Explains why kNN tends to recommend popular items
– Has the advantages of a probabilistic formulation
Equivalent accuracy and behavior to heuristic formulations
– More so for user-based variants
– Probabilistic item-based is more consistent than heuristic
– Accuracy of normalized kNN might be misrepresented on common datasets
Future work: explore further empirical optimization, inter-user
dependency analysis, other collaborative filtering methods…