1. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias in recommendation:
avoid it or embrace it?
Pablo Castells
Universidad Autónoma de Madrid
http://ir.ii.uam.es/castells
Amazon, Barcelona, February 17, 2020
2. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Outline
1. Bias and fairness
2. Removing the bias
3. Understanding the bias
4. Conclusion
3. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
1. Bias and fairness
2. Removing the bias
3. Understanding the bias
4. Conclusion
Outline
4. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias in search
“Search engine manipulation is a serious threat
to the democratic system of government”
Google could manipulate 2.6 – 10M votes
Before
Pro-Clinton
Pro-Trump
Date
After
Robert Epstein
R. Epstein, R. E. Robertson. A Method for
Detecting Bias in Search Rankings, with
Evidence of Systematic Bias Related to the
2016 Presidential Election. White paper,
American Institute for Behavioral Research
and Technology, June 2017.
R. Epstein, R. E. Robertson. The search
engine manipulation effect (SEME) and its
possible impact on the outcomes of
elections. PNAS 112(33), August 2015.
Election day
5. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
COMPAS system: ground truth evaluation proves bias
Black
45%
White
23%
Black
28%
White
48%
False positives False negatives
𝑃 FP Black ≫ 𝑃 FP
𝑃 FN Black ≪ 𝑃 FN
Recidivism prediction
6. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias = under / over-represented features
Female
53%
Male
47% Female
32%
Male
68%
PhDs (USA) Full professors (USA)
𝑃 Female Professor ≪ 𝑃 Female PhD
𝑃 Female Professor ≪ 𝑃 Female
7. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias = under / over-represented features
Female
16%
Male
84%
PhDs (Spain) Full professors (Spain)
𝑃 Female Professor ≪ 𝑃 Female PhD
𝑃 Female Professor ≪ 𝑃 Female
Female
51%
Male
49%
8. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias = under / over-represented features
Other
38%
White male
62%
Other
9%
White male
91%
Fortune 500 employees Fortune 500 CEOs
𝑃 White male CEO ≫ 𝑃 White male Employee
9. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias in algorithms
Bias in many application domains
Face recognition / surveillance
Recruiting
Loans
News, social media
Search
···
Typically the bias is in the data, in the history – the algorithm
learns and reproduces / amplifies the human bias
Baeza-Yates, R. Bias on the Web. Communications of the ACM 61(6), May 2018.
10. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias in search engines’ results?
Promote own services
Gender and ethnic stereotypes
– In autocomplete
– In spelling correction
Impact on people’s perceptions (e.g. shift voting)
Relevance bias
11. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias is… bad?
12. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
The popularity bias in information retrieval
Popularity: “unidimensional” bias
– The overrepresented “feature” is the item itself
– Analysis can be generalized to any feature
Issues related to user satisfaction
– Does the bias hurt the system effectiveness
– Does the bias distort evaluation
13. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias in search
In input
– “Popularity” bias in queries
– “Popularity” bias in click logs
– “Popularity” bias in sales
– “Popularity” bias in Web links
– Position bias in clicks
In output
– Sellers in search results
– Expose the catalog
Bias in offline evaluation
14. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias in recommendation
Items
Users
Popular
items
Rest of items
(long tail)
Items
Nºinteractions
In the (input) data Popular items
Long-tail items
15. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
0
1000
2000
0 1000 2000
0
400
800
0 1000 2000
0
1000
2000
3000
0 1000 2000
Bias in recommendation
In algorithms (output)
Matrix factorization
Nº positive ratings
Nºtimestop10
800
400
0
0 1000 2000
User-based kNN
Nº positive ratings
2000
1000
0
0 1000 2000
Item-based kNN
Nº positive ratings
3000
1000
0
0 1000 2000
0
2000
4000
0 1000 2000
Oracle optimal !!
Nº positive ratings
4000
2000
0
0 1000 2000
2000
R. Cañamares, P. Castells. A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases. SIGIR 2017.
D. Jannach et al. What recommenders recommend: an analysis of recommendation biases and possible countermeasures. UMUAI 25(5), Dec. 2015.
MovieLens 1M dataset
1M ratings, 6K users, 4K items
Random rating split 80% training / 20% test
16. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias in recommendation
Netflix dataset
100M ratings, 0.5M users, 18K items
Random rating split 80% training / 20% test
Random
Positive rating count
User-based kNN
Matrix factorization0.3
0.2
0.1
0
nDCG@10
Average rating value
In offline evaluation
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
P. Cremonesi, Y. Koren, R. Turrin.. Performance of recommender algorithms on top-n recommendation tasks. RecSys 2010.
D. Jannach et al. What recommenders recommend: an analysis of recommendation biases and possible countermeasures. UMUAI 25(5), Dec. 2015.
0
0.1
0.2
0.3
Random
Avg.rating
Nr.ratings
User-based
Matrixfact.
nDCG@10
Netflix
17. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias in recommendation
What to do about the bias?
18. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Outline
1. Bias and fairness
2. Removing the bias
3. Understanding the bias
4. Conclusion
19. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
1. Bias and fairness
2. Removing the bias
3. Understanding the bias
4. Conclusion
Outline
20. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
What to do about the bias
Answer 1 – Bias is bad
⇒ Remove the bias in your recommendations
21. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Avoiding popularity: novelty and diversity
Avoid popularity in the output recommendations
Novelty: limited value of popular recommendations
Try to move towards the long tail
Diversity / fairness: avoid filter bubble
and concentration over few items
Give all items some chance to be exposed
→ Reranking, multiarmed bandits, etc.
Items
#interactions
𝑎 𝑏
P. Castells, N. J. Hurley, S. Vargas. Novelty and Diversity in
Recommender Systems. In Recommender Systems Handbook,
2nd edition. Springer, 2015.
22. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Avoiding popularity: novelty and diversity
Context
Recommended item
Target user’s
experience
Everyone else’s
experience
Everyone else’s
recommendations
Other items in the
same recommendation
Unexpectedness
Intra-list
diversity
Long-tail
novelty Sales diversity
Distance or identity
Item novelty model
P. Castells, N. J. Hurley, S. Vargas. Novelty and Diversity in Recommender Systems.
In Recommender Systems Handbook, 2nd edition. Springer, 2015.
Problem solved?
23. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Avoiding popularity: novelty and diversity
Novelty / diversity
Relevance
Stimulation
P. Castells, N. J. Hurley, S. Vargas. Novelty and Diversity in Recommender Systems. In Recommender Systems Handbook, 2nd edition. Springer, 2015.
24. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Avoiding popularity: novelty and diversity
Items
#interactions
𝑎 𝒃 𝒄
Still
a bias
25. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
What to do about the bias
Answer 2 – Bias is bad
⇒ Remove the bias in offline evaluation
26. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Popularity bias in offline evaluation
Popular items
(short head)
Rest of items
(long tail)
Observed user-item interaction
Unobserved preference
Items
Users
Ratings are missing
not at random (MNAR)
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
27. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Positive rating count
Popularity bias in offline evaluation
Test data (relevant items)
Training data
Unobserved preference
Items
Users
Popular items
(short head)
Rest of items
(long tail)
avg P@𝑘 ∼
+
𝑘
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
Ratings are missing
not at random (MNAR)
0.3
0.2
0.1
0
nDCG@10
0
0.1
0.2
0.3
Random
Avg.rating
Nr.ratings
User-based
Matrixfact.
nDCG@10
Netflix
28. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Removing the popularity bias in offline evaluation
A. Handling the (test) data
Items Items
#ratings
Flat test Popularity strata
Time
Temporal split
Test data (relevant items)
Training data
Unobserved preference
A. Bellogín, P. Castells, I. Cantador. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval 20(6), July 2017.
P. Cremonesi, Y. Koren, R. Turrin.. Performance of recommender algorithms on top-n recommendation tasks. RecSys 2010.
H. Steck. Training and Testing of Recommender Systems on Data Missing not at Random. KDD 2010.
H. Steck. Item popularity and recommendation accuracy. RecSys 2011.
29. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Removing the popularity bias in offline evaluation
B. Correcting for bias in the metrics
C. Unbiased learning
D. Unbiased datasets
A. Bellogín, P. Castells, I. Cantador. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval 20(6), July 2017.
P. Cremonesi, Y. Koren, R. Turrin.. Performance of recommender algorithms on top-n recommendation tasks. RecSys 2010.
J. M. Hernández-Lobato, N. Houlsby, Z. Ghahramani. Probabilistic Matrix Factorization with Non-random Missing Data. ICML 2014.
H. Steck. Training and Testing of Recommender Systems on Data Missing not at Random. KDD 2010.
H. Steck. Item popularity and recommendation accuracy. RecSys 2011.
Stratified recall
Off-policy evaluation
Inverse propensity scoring
···
30. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Debiasing evaluation: Inverse Propensity Scoring
𝑃 𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
𝑢
𝑖
𝑃 =
1
𝑅
𝑖∈𝑅
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖
What you
want to measure
𝑃 =
1
𝑅
𝑖∈𝑅
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖 · 𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
What you
can measure
Biased estimate
Problems: 1) High variance, and
2) How to estimate propensity
𝑃 =
1
𝑅
𝑖∈𝑅
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖 · 𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
𝑃 𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Unbiasedestimate
T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, T. Joachims. Recommendations as Treatments: Debiasing Learning and Evaluation. ICML 2016.
Swaminathan, A., Krishnamurthy, A., Agarwal, A., Dudik, M., Langford, J., Jose, D., Zitouni, I. Off-policy Evaluation for Slate Recommendation. NIPS 2017.
31. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
0.005
0
0.01
0.015
0
0.04
0.080.08
0
0.04
0
0.1
0.20.2
0.1
0
0.005
0
0.01
0.015
0
0.04
0.080.08
0
0.04
0
0.1
0.20.2
0.1
0
0.005
0
0.01
0.015
0
0.04
0.080.08
0
0.04
0
0.1
0.20.2
0.1
0
Debiasing evaluation: experiments
P. Castells, R. Cañamares. Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis. REVEAL@RecSys 2018.
Temporal split IPSRandom split
MovieLens 1M
Recall@10
Flat test
Matrix factorization
Random
Average rating value
Positive rating count
User-based kNN
0.05
0
0.1
32. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Debiasing evaluation: Inverse Propensity Scoring
Playlist recommendation
Comparison of 12 recommender systems
Metric: impression-to-stream
1. Online (multivariate) AB test
2. Offline evaluation with IPS variants
– IPS
– Capped IPS
– Normalized capped IPS
A. Gruson, P. Chandar, C. Charbuillet, J. McInerney, S. Hansen, D. Tardieu, B. Carterette. Offline Evaluation to Make Decisions about Playlist Recommendation
Algorithms. WSDM 2019.
Spotify evaluation experiment
33. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Debiasing evaluation: Inverse Propensity Scoring
ABtest
IPS Normalized
capped IPS
Capped IPS
A. Gruson, P. Chandar, C. Charbuillet, J. McInerney, S. Hansen, D. Tardieu, B. Carterette. Offline Evaluation to Make Decisions about Playlist Recommendation
Algorithms. WSDM 2019.
Recommender system ranking comparison
0
2
4
6
8
10
12
0 2 4 6 8 10 12
0
2
4
6
8
10
12
0 2 4 6 8 10 12
0
2
4
6
8
10
12
0 2 4 6 8 10 12
Spotify evaluation experiment
34. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Debiasing evaluation: Inverse Propensity Scoring
CappedIPS
IPS Capped IPSIPS
A. Gruson, P. Chandar, C. Charbuillet, J. McInerney, S. Hansen, D. Tardieu, B. Carterette. Offline Evaluation to Make Decisions about Playlist Recommendation
Algorithms. WSDM 2019.
Normalized
cappedIPS
Normalized
cappedIPS
0
2
4
6
8
10
12
0 2 4 6 8 10 12
0
2
4
6
8
10
12
0 2 4 6 8 10 12
0
2
4
6
8
10
12
0 2 4 6 8 10 12
Spotify evaluation experiment
Recommender system ranking comparison
35. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Debiasing evaluation: unbiased data
Yahoo! R3
Free user interaction
5,400Yahoo!radiousers
10 random tracks per user
MNAR training data
MAR test data
B. Marlin, R. Zemel. Collaborative prediction and ranking with non-random missing data. RecSys 2009.
130K ratings
1,000 music tracks
randomly sampled
36. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Unbiased test data: experiments
Yahoo! R3
0
0.01
0.02
0
0.02
0.04
0
0.02
0.04
0.06
0
0.02
0.04
0
0.05
0.1
0.15
0
0.1
0.2
Recall@10
0
0.02
0.01
0
0.02
0.04
0
0.06
0.04
0.02
0
0.02
0.04
0
0.15
0.1
0.05
MNAR random split MAR test
CM100k
MNAR random split MAR test
0.2
0
0.1
MNAR
random split
MAR testIPS IPS MAR test
Yahoo! R3
0
0.01
0.02
0
0.02
0.04
0
0.02
0.04
0.06
0
0.02
0.04
0
0.05
0.1
0.15
0
0.1
0.2
Recall@10
0
0.02
0.01
0
0.02
0.04
0
0.06
0.04
0.02
0
0.02
0.04
0
0.15
0.1
0.05
MNAR random split MAR test
CM100k
MNAR random split MAR test
0.2
0
0.1
Yahoo! R3 CM100K
Recall@10
Recall@10
MNAR
random split
P. Castells, R. Cañamares. Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis. REVEAL@RecSys 2018.
Matrix factorization
Random
Average rating value
Positive rating count
User-based kNN
37. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Bias in recommendation
Is bias bad?
How bad?
38. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Outline
1. Bias and fairness
2. Removing the bias
3. Understanding the bias
4. Conclusion
39. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
1. Bias and fairness
2. Removing the bias
3. Understanding the bias
4. Conclusion
Outline
40. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Can we trust our experiments?
Computed on available
user taste observations
Computed with full
knowledge of user tastes
Observed metric value True metric value
Items
Users
Relevant
Non relevant
Missing ratings
?
≈
Items
Users
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
41. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Observed
Understanding the bias
Items
Users
Items
Users
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Relevant
Non relevant
Observation vs. relevance
42. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Observed
Understanding the bias
Items
Users
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Relevant
Non relevant
∧
Observation vs. relevance
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖
43. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Missing rating
Understanding the bias
Items
Users
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Relevant
Non relevant
∧
Observation vs. relevance
44. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Understanding the bias
Items
Users
Items
Users
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Relevant
Non relevant
Observed
Observation vs. relevance
45. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Understanding the bias
Items
Users
Items
Users
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Relevant
Non relevant
Observed
Observation vs. relevance
46. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Understanding the bias
Items
Users
Items
Users
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Relevant
Non relevant
Observed
Observation vs. relevance
47. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Understanding the bias
Items
Users
Items
Users
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Relevant
Non relevant
Observed
Observation vs. relevance
48. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Formal analysis
Very simple questions
1. Does popularity help or hurt recommendation effectiveness?
2. Which is better, the majority taste (positive rating count)
or the higher consensus (average rating value)?
3. Do biased metric values agree with true (unbiased) values?
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
49. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Research questions
Optimal recommendation
Optimal non-personalized
recommendation
Random recommendation
Highest
consensus
?
?
?
?
Personalized
recommendations
Largest
majority
Bad personalized
recommendations
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
50. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Where does popularity come from?
Items
#interactions
𝑎 𝑏
What made 𝑎 be so much
more popular than 𝑏?
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
51. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
𝑢
Rating generation
𝐷𝑖𝑠𝑐𝑜𝑣𝑒𝑟𝑢,𝑖 𝐸𝑛𝑔𝑎𝑔𝑒 𝑢,𝑖 𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖
𝑖
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖 ∧ 𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Rec
algorithm
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
52. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Conditional (in)dependences between variables
𝑢
𝐷𝑖𝑠𝑐𝑜𝑣𝑒𝑟𝑢,𝑖 𝐸𝑛𝑔𝑎𝑔𝑒 𝑢,𝑖 𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖
𝑖
Items
#Interactions
Popularity distribution
𝑝 𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑖
𝐷𝑖𝑠𝑐𝑜𝑣𝑒𝑟
𝑖
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡
𝑂𝑏𝑠𝑒𝑟𝑣𝑒
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
54. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Conditional (in)dependences between variables
𝑢
𝐷 𝑢,𝑖 𝐸 𝑢,𝑖 𝑂 𝑢,𝑖
𝑅 𝑢,𝑖
𝑖
𝐷
𝑖
𝑅
𝑂
Items
#Interactions
Popularity distribution
𝑝 𝑂 𝑖
𝐷
𝑖
𝑅
𝑂
𝐷
𝑖
𝑅
𝑂
1. Observation depends
just on relevance
2. Observation independent
from relevance
3. Observation depends
on both items and relevance
𝑝 𝑂 𝑅, 𝑖 = 𝑝 𝑂 𝐷, 𝑅, 𝑖 𝑝 𝐷 𝑅, 𝑖
55. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Findings
Optimal recommendation
Optimal non-personalized
recommendation
Random recommendation
1. Observation conditionally independent from item
Highest
consensus
Biased and unbiased
precision agree
Largest
majority
Biased 𝑷 ∝ Unbiased 𝑷
Even if 𝑷 𝑶𝒃𝒔𝒆𝒓𝒗𝒆𝒅 ¬𝑹𝒆𝒍𝒆𝒗𝒂𝒏𝒕 > 𝑷 𝑶𝒃𝒔𝒆𝒓𝒗𝒆𝒅 𝑹𝒆𝒍𝒆𝒗𝒂𝒏𝒕
56. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Findings
Optimal recommendation
Random recommendation
2. Observation conditionally independent from relevance
a) Observation correlates with relevance
Highest
consensus
Biased and unbiased
precision agree
Largest
majority
Biased 𝑷 ∝ Unbiased 𝑷
Optimal non-personalized
recommendation
57. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Highest
consensus
Largest
majority
Findings
Optimal recommendation
Random recommendation
Biased 𝑷 Unbiased 𝑷
2. Observation conditionally independent from relevance
b) Observation does not correlate with relevance
Biased and unbiased
precision disagree
Highest
consensus
Largest
majority
Optimal non-personalized
recommendation
58. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Findings
Optimal recommendation
Random recommendation
Unbiased 𝑷
2. Observation conditionally independent from relevance
b) Observation correlates negatively with relevance
Largest
majority
Highest
consensus
Largest
majority
Highest
consensus
Biased 𝑷
Biased and unbiased
precision disagree
!!
Optimal non-personalized
recommendation
59. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Findings
Optimal recommendation
Random recommendation
3. No assumption
𝔼 𝑃@1 𝜃 = න
Ω 𝑛
𝔼 𝑃@1 𝜃, 𝜔 𝑑𝜔
Highest
consensus
Largest
majority
Largest
majority
Highest
consensus
Unbiased 𝑷Biased 𝑷
Biased and unbiased
precision disagree
Optimal non-personalized
recommendation
60. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Dependence between observation and relevance
For example…
Find items through search engines, good
recommender systems, good friends
Rational herd behavior
Rate based on whether you like
rated,rel ¬rated,rel rated,¬rel ¬rated,¬rel
1. Observation conditionally independent from item
Relevant rated
Relevant unrated
Non-relevant rated
Non-relevant unrated
Items
61. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Dependence between observation and relevance
rated,rel ¬rated,rel rated,¬rel ¬rated,¬rel
1. Observation conditionally independent from item
Relevant rated
Relevant unrated
Non-relevant rated
Non-relevant unrated
Items
Mellow BarcelonaGates Diagonal
I found and chose
this nice hotel
I never saw this one
Relevance possibly explains the resulting
observation I produced in Booking.com
62. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Dependence between observation and relevance
rated,rel ¬rated,rel rated,¬rel ¬rated,¬rel
1. Observation conditionally independent from item
Relevant rated
Relevant unrated
Non-relevant rated
Non-relevant unrated
Items
Other possible
examples…
63. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Dependence between observation and relevance
rated,rel ¬rated,rel rated,¬rel ¬rated,¬relrated,rel ¬rated,rel rated,¬rel ¬rated,¬relrated,rel ¬rated,rel rated,¬rel ¬rated,¬rel
a) Positive correlation b) No correlation c) Negative correlation
2. Observation conditionally independent from relevance
For example…
Heavy (and/or good) advertisement
Social conformity, fashion
Reinforcement loops
Randomness + snowball effects
Items Items Items
Relevant rated
Relevant unrated
Non-relevant rated
Non-relevant unrated
64. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Dependence between observation and relevance
rated,rel ¬rated,rel rated,¬rel ¬rated,¬relrated,rel ¬rated,rel rated,¬rel ¬rated,¬relrated,rel ¬rated,rel rated,¬rel ¬rated,¬rel
a) Positive correlation b) No correlation c) Negative correlation
2. Observation conditionally independent from relevance
Items Items Items
65. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Typical case
Optimal non-personalized
recommendation
Random recommendation
Highest
consensus
Largest
majority
Biased 𝑷 ∝ Unbiased 𝑷
Empirical results would suggest the typical case is a mix of
1. Relevance dependence
2. Item dependence with a) positive correlation
Observation bias stronger than relevance
66. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Typical case
rated,rel ¬rated,rel rated,¬rel ¬rated,¬rel
1. Observation conditionally independent from item
rated,rel ¬rated,rel rated,¬rel ¬rated,¬rel
a) Positive correlation
2. Observation conditionally independent from relevance
Relevant rated
Relevant unrated
Non-relevant rated
Non-relevant unrated
Typical case would seem a combination
of these two
Items
Items
67. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Closing the loop: relevance + novelty
CM100K (ir.ii.uam.es/cm100k)
1,000 music tracks
randomly sampled from deezer.com
User is familiar with
1,000users
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
User is not familiar with
100MARjudgments
68. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Closing the loop: relevance + novelty
CM100K
Undiscovered
nDCG@10
R. Cañamares, P. Castells. From the PRP to the Low Prior Discovery Recall Principle for Recommender Systems. SIGIR 2018.
69. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Implications on personalized algorithms
0
0.01
0.02
0.03
Obs True
0
0.1
0.2
0.3
Obs
MovieLens 1M CM100K
nDCG@10
0
0.01
0.02
0.03
Obs True
0
0.1
0.2
0.3
Obs
Non-normalized kNN
(biased to popularity)
Normalized kNN
(biased to avg rating)
Biased evaluation
Non-normalized > normalized Non-normalized < normalized
Unbiased evaluation
R. Cañamares, P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018.
70. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Generalization to other biases
Items
Users
Items
Users
𝑅𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑢,𝑖𝑂𝑏𝑠𝑒𝑟𝑣𝑒 𝑢,𝑖
Relevant
Non relevant
Observed
Complex observation biases
R. Cañamares, P. Castells. A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases. SIGIR 2017.
71. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Outline
1. Bias and fairness
2. Removing the bias
3. Analysis of popularity in recommendation
4. Conclusion
72. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
1. Bias and fairness
2. Removing the bias
3. Analysis of popularity in recommendation
4. Conclusion
Outline
73. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Conclusions
Popularity can be ok as long as it emerges out of relevance
MNAR offline evaluation tends to agree with MAR evaluation
– Ratings appear to depend on both relevance and items
– Item dependence may be stronger, but tends to agree with relevance
– User bias to rate relevant or non-relevant should not make a difference
Consensus seems slightly better behaved than majority
– And much better at novel relevant findings
No universal solution to deal with bias – understand the bias
– Caution with eventual scenarios with strong item dependence
uncorrelated to or against relevance
Analysis can be generalized to other biases and features
74. IRGIRGroup @UAM
Bias in recommendation: avoid it or embrace it?
Amazon, Barcelona, February 17, 2020
Ongoing and future directions
Inverse propensity scoring
Unbiased datasets
Popularity bias in false-positive metrics
Popularity from social network dynamics
Multi-armed bandit recommendation algorithms
– Specific algorithms (e.g. bandit kNN)
– Better understanding feedback loop effects
and how to cope with them
R. Cañamares, M. Redondo, P. Castells.. Multi-Armed Recommender System Bandit Ensembles. RecSys 2019.
J. Sanz-Cruzado, E. López, P. Castells.. A Simple Multi-Armed Nearest-Neighbor Bandit for Interactive Recommendation. RecSys 2019.