SlideShare a Scribd company logo
1 of 1
Download to read offline
1
Characterization of Fair Experiments
for Recommender System Evaluation – A Formal Analysis
Pablo Castells and Rocío Cañamares
Universidad Autónoma de Madrid
{pablo.castells,rocio.cannamares}@uam.es
Workshop on offline evaluation for recommender systems (REVEAL 2018) at the 12th ACM Conference on Recommender Systems (RecSys 2018)
IRGIRGroup @UAM
Empirical fairness test
Evaluation fairness condition
𝑃 =
𝑅 ∩ ℛ
𝑅
= 𝑝 ℛ 𝑅
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑅 ∩ ℛ
ℛ
= 𝑝 𝑅 ℛ
𝑅
𝒥𝑡𝑒𝑠𝑡
𝒰 × ℐ
𝒥𝑡𝑟𝑎𝑖𝑛
𝒯
𝑃 =
𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡
𝑅
= 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 =
𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡
𝑅 ∩ 𝒥𝑡𝑒𝑠𝑡
=
𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹
𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡
𝑅𝑒𝑐𝑎𝑙𝑙
Metric definition Metric estimatesElements of an experiment
All user-item pairs
Training sample
Target
pairs
𝑅 (recommendation)
𝑅 ∩ ℛ 𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡
𝑅
Fair estimates
Preservation of system comparisons: 𝑃 𝑅1 ≤ 𝑃 𝑅2 ⟺ 𝑃 𝑅1 ≤ 𝑃 𝑅2 – we say 𝑃 ∝ 𝑃
Metric estimate preserves system comparison ⟺ 𝑝 𝒥𝑡𝑒𝑠𝑡 ℛ, 𝑅 is the same for all systems
𝑷 ∝ 𝑷 ⟺ 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 ∼ 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝒯 ∀𝑹 ⊂ 𝒯 (and same for 𝑅𝑒𝑐𝑎𝑙𝑙 ∝ 𝑅𝑒𝑐𝑎𝑙𝑙)
Fair experiment ⟺ test judgments are identically and independently distributed over relevant targets
𝑅 ⊊ 𝒯
𝒥𝑡𝑒𝑠𝑡 ⊆ 𝒯 ⊆ 𝒰 × ℐ ∖ 𝒥𝑡𝑟𝑎𝑖𝑛
𝒥𝑡𝑟𝑎𝑖𝑛 ∩ 𝒥𝑡𝑒𝑠𝑡 = ∅
• •
••
• Take some sample 𝒥 = 𝒥𝑡𝑟𝑎𝑖𝑛 ∪ 𝒥𝑡𝑒𝑠𝑡 of user preferences (ratings, judgments, observed interaction)
• Null hypothesis: Let user preferences ℛ be random, i.e. uniformly and independently distributed over items (the sample 𝒥 may not)
• Run an experiment over 𝒥, ℛ for a set of recommendation algorithms
• Some system is better than random recommendation  Then your experiment is unfair (the data sampling/subsampling, the metric, etc.)
Analysis of common experimental protocols
Random rating split Flat test [1] Popularity strata [1]
2. Randomized (forced)
test judgments [2,3]
1. Free user feedback
Null hypothesis recreation: taking e.g. MovieLens 1M, keep ratings (judgment set 𝒥) but shuffle rating values (ℛ set) over ratings
 User preferences become random, uniform and independent between users
 Judgment distribution retains popularity biases and inter-user (and inter-item) dependencies
Items
Judgments
Items
Judgments
Items
Judgments
Original data (MovieLens 1M) Null hypothesis (randomized MovieLens 1M preferences )
Items
Judgments
𝒥𝑡𝑒𝑠𝑡
𝒥𝑡𝑟𝑎𝑖𝑛
FairNot fair
Conclusions
• We also examine experimental protocols analytically
 Empirical fairness test is consistent with analytical
fairness condition
• Temporal split can be usually expected to be still biased
• Interleaved AB tests should be fair
Not fully fair Not fully fair FairHow fair?
• Only randomized test judgments or 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 ensure fairness
 But 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 is not as realistic as 𝒯 ← 𝒰 × ℐ ∖ 𝒥𝑡𝑟𝑎𝑖𝑛 (plus coverage shortfalls)
 Forced judgments to be handled with some care to be fully fair (see in paper)
• Other protocols are biased to non-random patterns in observations
 Popularity, inter-user dependences, etc. (avg rating would not seem affected though)
𝒯 ← 𝒥𝑡𝑒𝑠𝑡
Random
Popularity
Positive popularity
Average rating
User-based kNN
Matrix factorization
0
0.05
0.1
0.15
0
0.2
0.4
𝒯 ← 𝒥𝑡𝑒𝑠𝑡
0
0.2
0.4
0.6
0
0.1
0.2
0.3
P@10
0
0.05
0.1
0.15
0
0.025
0.05
0.075
0
0.02
0.04
Simulated random
test sample of
random preferences
1. A. Bellogín, P. Castells and I. Cantador. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval 20(6), July 2017, pp. 606-634.
2. R. Cañamares and P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018, Ann Arbor, MI, USA, July 2018, pp. 415-424.
3. B. Marlin and R. Zemel. Collaborative prediction and ranking with nonrandom missing data. RecSys 2009, New York, NY, USA, October 2009, pp. 5-12.
Test
sample
ℛ
Relevant
pairs

More Related Content

What's hot

Mechanical system design
Mechanical system designMechanical system design
Mechanical system designSushil Kuwar
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysissunilgv06
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesYoga Setiawan
 
Factor analysis
Factor analysisFactor analysis
Factor analysis緯鈞 沈
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisUC Davis
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018Philip Ramsey
 
Evaluation in Africa RISING
Evaluation in Africa RISINGEvaluation in Africa RISING
Evaluation in Africa RISINGafrica-rising
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniquesDinul
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Itembarthriley
 
Errors in chemical analysis
Errors in chemical analysisErrors in chemical analysis
Errors in chemical analysisUMAR ALI
 
Imputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsImputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsNitin George
 
Analyzing Responses to Likert Items
Analyzing Responses to Likert ItemsAnalyzing Responses to Likert Items
Analyzing Responses to Likert ItemsSanjay Kairam
 
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics SoftwareHarnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Softwarejatwood3
 

What's hot (20)

Item analysis
Item analysisItem analysis
Item analysis
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysis
 
Mechanical system design
Mechanical system designMechanical system design
Mechanical system design
 
Sensitivity analysis
Sensitivity analysisSensitivity analysis
Sensitivity analysis
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
poster_Reza
poster_Rezaposter_Reza
poster_Reza
 
Poster
PosterPoster
Poster
 
Matching methods
Matching methodsMatching methods
Matching methods
 
Doe01 intro
Doe01 introDoe01 intro
Doe01 intro
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
 
Model validation strategies ftc 2018
Model validation strategies ftc 2018Model validation strategies ftc 2018
Model validation strategies ftc 2018
 
Evaluation in Africa RISING
Evaluation in Africa RISINGEvaluation in Africa RISING
Evaluation in Africa RISING
 
Specification based or black box techniques
Specification based or black box techniquesSpecification based or black box techniques
Specification based or black box techniques
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Item
 
Errors in chemical analysis
Errors in chemical analysisErrors in chemical analysis
Errors in chemical analysis
 
Imputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trialsImputation techniques for missing data in clinical trials
Imputation techniques for missing data in clinical trials
 
Analyzing Responses to Likert Items
Analyzing Responses to Likert ItemsAnalyzing Responses to Likert Items
Analyzing Responses to Likert Items
 
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics SoftwareHarnessing The Proteome With Proteo Iq Quantitative Proteomics Software
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
 

Similar to REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis

Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyRecommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyVito Walter Anelli
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPAlAcademia Tsr
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...Alejandro Bellogin
 
آلية التقييم الحسي للأغذية
آلية التقييم الحسي للأغذيةآلية التقييم الحسي للأغذية
آلية التقييم الحسي للأغذيةUniv. of Tripoli
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalJulián Urbano
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationScientificRevenue
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
 
Experimental Design.pptx
Experimental Design.pptxExperimental Design.pptx
Experimental Design.pptxOnlineWorld4
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
Computer simulation technique the definitive introduction - harry perros
Computer simulation technique   the definitive introduction - harry perrosComputer simulation technique   the definitive introduction - harry perros
Computer simulation technique the definitive introduction - harry perrosJesmin Rahaman
 
Converting Measurement Systems From Attribute
Converting Measurement Systems From AttributeConverting Measurement Systems From Attribute
Converting Measurement Systems From Attributejdavidgreen007
 
Data Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser UniversityData Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser Universitysoniyamarghani
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityJulián Urbano
 
Using evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityUsing evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityFaysal Ahmed
 
Practical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisPractical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisGabor Szabo, CQE
 
Evaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender SystemsEvaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender SystemsMegaVjohnson
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
 

Similar to REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis (20)

Recommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross EntropyRecommender Systems Fairness Evaluation via Generalized Cross Entropy
Recommender Systems Fairness Evaluation via Generalized Cross Entropy
 
SHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLPSHE, Quality, and Ethics in Medical Laboratories - PCLP
SHE, Quality, and Ethics in Medical Laboratories - PCLP
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
آلية التقييم الحسي للأغذية
آلية التقييم الحسي للأغذيةآلية التقييم الحسي للأغذية
آلية التقييم الحسي للأغذية
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
Causal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous OptimizationCausal Inference, Reinforcement Learning, and Continuous Optimization
Causal Inference, Reinforcement Learning, and Continuous Optimization
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
Experimental Design.pptx
Experimental Design.pptxExperimental Design.pptx
Experimental Design.pptx
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
Computer simulation technique the definitive introduction - harry perros
Computer simulation technique   the definitive introduction - harry perrosComputer simulation technique   the definitive introduction - harry perros
Computer simulation technique the definitive introduction - harry perros
 
Converting Measurement Systems From Attribute
Converting Measurement Systems From AttributeConverting Measurement Systems From Attribute
Converting Measurement Systems From Attribute
 
Data Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser UniversityData Ananlysis lecture 7 Simon Fraser University
Data Ananlysis lecture 7 Simon Fraser University
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
Using evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and qualityUsing evolutionary testing to improve efficiency and quality
Using evolutionary testing to improve efficiency and quality
 
Practical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems AnalysisPractical Tools for Measurement Systems Analysis
Practical Tools for Measurement Systems Analysis
 
Evaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender SystemsEvaluating Collaborative Filtering Recommender Systems
Evaluating Collaborative Filtering Recommender Systems
 
Dissertation Aaron Tesch
Dissertation Aaron TeschDissertation Aaron Tesch
Dissertation Aaron Tesch
 
Software Testing
Software Testing Software Testing
Software Testing
 
Diss Pres
Diss PresDiss Pres
Diss Pres
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 

More from Pablo Castells

Rational and irrational bias in recommendation
Rational and irrational bias in recommendationRational and irrational bias in recommendation
Rational and irrational bias in recommendationPablo Castells
 
Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Pablo Castells
 
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationRecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationPablo Castells
 
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...Pablo Castells
 
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...Pablo Castells
 
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...Pablo Castells
 
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsSIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsPablo Castells
 
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...Pablo Castells
 
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...Pablo Castells
 

More from Pablo Castells (9)

Rational and irrational bias in recommendation
Rational and irrational bias in recommendationRational and irrational bias in recommendation
Rational and irrational bias in recommendation
 
Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?Bias in recommendation: avoid it or embrace it?
Bias in recommendation: avoid it or embrace it?
 
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System EvaluationRecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
RecSys 2020 - On Target Item Sampling in Offline Recommender System Evaluation
 
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
SIGIR 2018 - Should I Follow the Crowd? A Probabilistic Analysis of the Effec...
 
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
SIGIR 2017 - A Probabilistic Reformulation of Memory-Based Collaborative Filt...
 
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
RSWeb @ ACM RecSys 2014 - Exploring social network effects on popularity bias...
 
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender SystemsSIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
SIGIR 2011 Poster - Intent-Oriented Diversity in Recommender Systems
 
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrie...
SIGIR 2012 - Explicit Relevance Models in Intent-Oriented Information Retrie...
 
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
ACM RecSys 2011 - Rank and Relevance in Novelty and Diversity Metrics for Rec...
 

Recently uploaded

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

REVEAL @ RecSys 2018 - Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis

  • 1. 1 Characterization of Fair Experiments for Recommender System Evaluation – A Formal Analysis Pablo Castells and Rocío Cañamares Universidad Autónoma de Madrid {pablo.castells,rocio.cannamares}@uam.es Workshop on offline evaluation for recommender systems (REVEAL 2018) at the 12th ACM Conference on Recommender Systems (RecSys 2018) IRGIRGroup @UAM Empirical fairness test Evaluation fairness condition 𝑃 = 𝑅 ∩ ℛ 𝑅 = 𝑝 ℛ 𝑅 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑅 ∩ ℛ ℛ = 𝑝 𝑅 ℛ 𝑅 𝒥𝑡𝑒𝑠𝑡 𝒰 × ℐ 𝒥𝑡𝑟𝑎𝑖𝑛 𝒯 𝑃 = 𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡 𝑅 = 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 𝑃 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡 𝑅 ∩ 𝒥𝑡𝑒𝑠𝑡 = 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡 𝑅𝑒𝑐𝑎𝑙𝑙 Metric definition Metric estimatesElements of an experiment All user-item pairs Training sample Target pairs 𝑅 (recommendation) 𝑅 ∩ ℛ 𝑅 ∩ ℛ ∩ 𝒥𝑡𝑒𝑠𝑡 𝑅 Fair estimates Preservation of system comparisons: 𝑃 𝑅1 ≤ 𝑃 𝑅2 ⟺ 𝑃 𝑅1 ≤ 𝑃 𝑅2 – we say 𝑃 ∝ 𝑃 Metric estimate preserves system comparison ⟺ 𝑝 𝒥𝑡𝑒𝑠𝑡 ℛ, 𝑅 is the same for all systems 𝑷 ∝ 𝑷 ⟺ 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝑹 ∼ 𝒑 𝓙 𝒕𝒆𝒔𝒕 𝓡, 𝒯 ∀𝑹 ⊂ 𝒯 (and same for 𝑅𝑒𝑐𝑎𝑙𝑙 ∝ 𝑅𝑒𝑐𝑎𝑙𝑙) Fair experiment ⟺ test judgments are identically and independently distributed over relevant targets 𝑅 ⊊ 𝒯 𝒥𝑡𝑒𝑠𝑡 ⊆ 𝒯 ⊆ 𝒰 × ℐ ∖ 𝒥𝑡𝑟𝑎𝑖𝑛 𝒥𝑡𝑟𝑎𝑖𝑛 ∩ 𝒥𝑡𝑒𝑠𝑡 = ∅ • • •• • Take some sample 𝒥 = 𝒥𝑡𝑟𝑎𝑖𝑛 ∪ 𝒥𝑡𝑒𝑠𝑡 of user preferences (ratings, judgments, observed interaction) • Null hypothesis: Let user preferences ℛ be random, i.e. uniformly and independently distributed over items (the sample 𝒥 may not) • Run an experiment over 𝒥, ℛ for a set of recommendation algorithms • Some system is better than random recommendation  Then your experiment is unfair (the data sampling/subsampling, the metric, etc.) Analysis of common experimental protocols Random rating split Flat test [1] Popularity strata [1] 2. Randomized (forced) test judgments [2,3] 1. Free user feedback Null hypothesis recreation: taking e.g. MovieLens 1M, keep ratings (judgment set 𝒥) but shuffle rating values (ℛ set) over ratings  User preferences become random, uniform and independent between users  Judgment distribution retains popularity biases and inter-user (and inter-item) dependencies Items Judgments Items Judgments Items Judgments Original data (MovieLens 1M) Null hypothesis (randomized MovieLens 1M preferences ) Items Judgments 𝒥𝑡𝑒𝑠𝑡 𝒥𝑡𝑟𝑎𝑖𝑛 FairNot fair Conclusions • We also examine experimental protocols analytically  Empirical fairness test is consistent with analytical fairness condition • Temporal split can be usually expected to be still biased • Interleaved AB tests should be fair Not fully fair Not fully fair FairHow fair? • Only randomized test judgments or 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 ensure fairness  But 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 is not as realistic as 𝒯 ← 𝒰 × ℐ ∖ 𝒥𝑡𝑟𝑎𝑖𝑛 (plus coverage shortfalls)  Forced judgments to be handled with some care to be fully fair (see in paper) • Other protocols are biased to non-random patterns in observations  Popularity, inter-user dependences, etc. (avg rating would not seem affected though) 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 Random Popularity Positive popularity Average rating User-based kNN Matrix factorization 0 0.05 0.1 0.15 0 0.2 0.4 𝒯 ← 𝒥𝑡𝑒𝑠𝑡 0 0.2 0.4 0.6 0 0.1 0.2 0.3 P@10 0 0.05 0.1 0.15 0 0.025 0.05 0.075 0 0.02 0.04 Simulated random test sample of random preferences 1. A. Bellogín, P. Castells and I. Cantador. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval 20(6), July 2017, pp. 606-634. 2. R. Cañamares and P. Castells. Should I Follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2018, Ann Arbor, MI, USA, July 2018, pp. 415-424. 3. B. Marlin and R. Zemel. Collaborative prediction and ranking with nonrandom missing data. RecSys 2009, New York, NY, USA, October 2009, pp. 5-12. Test sample ℛ Relevant pairs