SlideShare a Scribd company logo
1 of 74
Download to read offline
Replicable Evaluation of
Recommender Systems
Alejandro Bellogín (Universidad Autónoma de Madrid, Spain)
Alan Said (Recorded Future, Sweden)
Tutorial at ACM RecSys 2015
Stephansdom
2
Stephansdom
3
Stephansdom
4
Stephansdom
5
Stephansdom
6
Stephansdom
7
#EVALTUT
8
Outline
• Background and Motivation [10 minutes]
• Evaluating Recommender Systems [20 minutes]
• Replicating Evaluation Results [20 minutes]
• Replication by Example [20 minutes]
• Conclusions and Wrap-up [10 minutes]
• Questions [10 minutes]
9
Outline
• Background and Motivation [10 minutes]
• Evaluating Recommender Systems [20 minutes]
• Replicating Evaluation Results [20 minutes]
• Replication by Example [20 minutes]
• Conclusions and Wrap-up [10 minutes]
• Questions [10 minutes]
10
Background
• A recommender system aims to find and
suggest items of likely interest based on the
users’ preferences
11
Background
• A recommender system aims to find and
suggest items of likely interest based on the
users’ preferences
12
Background
• A recommender system aims to find and
suggest items of likely interest based on the
users’ preferences
• Examples:
– Netflix: TV shows and movies
– Amazon: products
– LinkedIn: jobs and colleagues
– Last.fm: music artists and tracks
– Facebook: friends
13
Background
• Typically, the interactions between user and
system are recorded in the form of ratings
– But also: clicks (implicit feedback)
• This is represented as a user-item matrix:
i1 … ik … im
u1
…
uj ?
…
un
14
Motivation
• Evaluation is an integral part of any
experimental research area
• It allows us to compare methods…
15
Motivation
• Evaluation is an integral part of any
experimental research area
• It allows us to compare methods…
• … and identify winners (in competitions)
16
Motivation
A proper evaluation culture allows advance the
field
… or at least, identify when there is a problem!
17
Motivation
In RecSys, we find inconsistent evaluation results,
for the “same”
– Dataset
– Algorithm
– Evaluation metric
Movielens 1M
[Cremonesi et al, 2010]
Movielens 100k
[Gorla et al, 2013]
Movielens 1M
[Yin et al, 2012]
Movielens 100k, SVD
[Jambor & Wang, 2010]
18
Motivation
In RecSys, we find inconsistent evaluation results,
for the “same”
– Dataset
– Algorithm
– Evaluation metric
0
0.05
0.30
0.35
0.40
TR 3 TR 4 TeI TrI AI OPR
P@50 SVD50
IB
UB50
[Bellogín et al, 2011]
19
Motivation
In RecSys, we find inconsistent evaluation results,
for the “same”
– Dataset
– Algorithm
– Evaluation metric
0
0.05
0.30
0.35
0.40
TR 3 TR 4 TeI TrI AI OPR
P@50 SVD50
IB
UB50
We need to understand why this happens
20
In this tutorial
• We will present the basics of evaluation
– Accuracy metrics: error-based, ranking-based
– Also coverage, diversity, and novelty
• We will focus on replication and reproducibility
– Define the context
– Present typical problems
– Propose some guidelines
21
Replicability
• Why do we need to
replicate?
22
Reproducibility
Why do we need to
reproduce?
Because these two are not
the same
23
NOT in this tutorial
• In-depth analysis of evaluation metrics
– See chapter 9 on handbook [Shani & Gunawardana, 2011]
• Novel evaluation dimensions
– See tutorials at WSDM ’14 and SIGIR ‘13 on
diversity and novelty
• User evaluation
– See tutorial at RecSys 2012
• Comparison of evaluation results in research
– See RepSys workshop at RecSys 2013
– See [Said & Bellogín 2014]
24
Outline
• Background and Motivation [10 minutes]
• Evaluating Recommender Systems [20 minutes]
• Replicating Evaluation Results [20 minutes]
• Replication by Example [20 minutes]
• Conclusions and Wrap-up [10 minutes]
• Questions [10 minutes]
25
Recommender Systems Evaluation
Typically: as a black box
Train Test
Valida
tion
Dataset
Recommender
generates
a ranking
(for a user)
a prediction for a
given item (and user)
precision
error
coverage
… 26
Recommender Systems Evaluation
Train Test
Valida
tion
Dataset
Recommender
generates
a ranking
(for a user)
a prediction for a
given item (and user)
precision
error
coverage
… 27
The reproducible way: as black boxes
Recommender as a black box
What do you do when a recommender cannot
predict a score?
This has an impact on coverage
28[Said & Bellogín, 2014]
Candidate item generation
as a black box
How do you select the candidate items to be
ranked?
Solid triangle represents the target user.
Boxed ratings denote test set.
0
0.05
0.30
0.35
0.40
TR 3 TR 4 TeI TrI AI OPR
P@50 SVD50
IB
UB50
0
0.2
0.9
0.1
0.3
1.0
29
How do you select the candidate items to be
ranked?
[Said & Bellogín, 2014] 30
Candidate item generation
as a black box
Evaluation metric computation
as a black box
What do you do when a recommender cannot
predict a score?
– This has an impact on coverage
– It can also affect error-based metrics
MAE = Mean Absolute Error
RMSE = Root Mean Squared Error 31
Evaluation metric computation
as a black box
What do you do when a recommender cannot
predict a score?
– This has an impact on coverage
– It can also affect error-based metrics
User-item pairs Real Rec1 Rec2 Rec3
(u1, i1) 5 4 NaN 4
(u1, i2) 3 2 4 NaN
(u1, i3) 1 1 NaN 1
(u2, i1) 3 2 4 NaN
MAE/RMSE, ignoring NaNs 0.75/0.87 2.00/2.00 0.50/0.70
MAE/RMSE, NaNs as 0 0.75/0.87 2.00/2.65 1.75/2.18
MAE/RMSE, NaNs as 3 0.75/0.87 1.50/1.58 0.25/0.50 32
Using internal evaluation methods in Mahout
(AM), LensKit (LK), and MyMediaLite (MML)
[Said & Bellogín, 2014] 33
Evaluation metric computation
as a black box
Variations on metrics:
Error-based metrics can be normalized or averaged
per user:
– Normalize RMSE or MAE by the range of the
ratings (divide by rmax – rmin)
– Average RMSE or MAE to compensate for
unbalanced distributions of items or users
34
Evaluation metric computation
as a black box
Variations on metrics:
nDCG has at least two discounting functions
(linear and exponential decay)
35
Evaluation metric computation
as a black box
Variations on metrics:
Ranking-based metrics are usually computed up to
a ranking position or cutoff k
P = Precision (Precision at k)
R = Recall (Recall at k)
MAP = Mean Average Precision 36
Evaluation metric computation
as a black box
If ties are present in the ranking scores, results
may depend on the implementation
37
Evaluation metric computation
as a black box
[Bellogín et al, 2013]
Not clear how to measure diversity/novelty in
offline experiments (directly measured in online
experiments):
– Using a taxonomy (items about novel topics) [Weng
et al, 2007]
– New items over time [Lathia et al, 2010]
– Based on entropy, self-information and Kullback-
Leibler divergence [Bellogín et al, 2010; Zhou et al, 2010;
Filippone & Sanguinetti, 2010]
38
Evaluation metric computation
as a black box
Recommender Systems Evaluation:
Summary
• Usually, evaluation seen as a black box
• The evaluation process involves everything:
splitting, recommendation, candidate item
generation, and metric computation
• We should agree on standard implementations,
parameters, instantiations, …
– Example: trec_eval in IR
39
Outline
• Background and Motivation [10 minutes]
• Evaluating Recommender Systems [20 minutes]
• Replicating Evaluation Results [20 minutes]
• Replication by Example [20 minutes]
• Conclusions and Wrap-up [10 minutes]
• Questions [10 minutes]
40
Reproducible Experimental Design
• We need to distinguish
– Replicability
– Reproducibility
• Different aspects:
– Algorithmic
– Published results
– Experimental design
• Goal: have a reproducible experimental
environment
41
Definition:
Replicability
To copy something
• The results
• The data
• The approach
Being able to evaluate
in the same setting
and obtain the same
results
42
Definition:
Reproducibility
To recreate something
• The (complete) set
of experiments
• The (complete) set
of results
• The (complete)
experimental setup
To (re) launch it in
production with the
same results
43
Comparing against the state-of-the-art
Your settings are not exactly
like those in paper X, but it is
a relevant paper
Reproduce results
of paper X
Congrats, you’re
done!
Replicate results of
paper X
Congrats! You have shown that
paper X behaves different in
the new setting
Sorry, there is something
wrong/incomplete in the
experimental design
They agree
They do not
agree
Do results
match the
original
paper?
Yes!
No!
Do results
agree with
original
paper?
44
What about Reviewer 3?
• “It would be interesting to see this done on a
different dataset…”
– Repeatability
– The same person doing the whole pipeline over
again
• “How does your approach compare to
*Reviewer 3 et al. 2003+?”
– Reproducibility or replicability (depending on how
similar the two papers are)
45
Repeat vs. replicate vs. reproduce vs. reuse
46
Motivation for reproducibility
In order to ensure that our experiments,
settings, and results are:
– Valid
– Generalizable
– Of use for others
– etc.
we must make sure that others can reproduce
our experiments in their setting
47
Making reproducibility easier
• Description, description,
description
• No magic numbers
• Specify values for all parameters
• Motivate!
• Keep a detailed protocol
• Describe process clearly
• Use standards
• Publish code (nobody expects
you to be an awesome
developer, you’re a researcher)
48
Replicability, reproducibility, and progress
• Can there be actual progress if no valid
comparison can be done?
• What is the point of comparing two
approaches if the comparison is flawed?
• How do replicability and reproducibility
facilitate actual progress in the field?
49
Summary
• Important issues in recommendation
– Validity of results (replicability)
– Comparability of results (reproducibility)
– Validity of experimental setup (repeatability)
• We need to incorporate reproducibility and
replication to facilitate the progress in the field
50
Outline
• Background and Motivation [10 minutes]
• Evaluating Recommender Systems [20 minutes]
• Replicating Evaluation Results [20 minutes]
• Replication by Example [20 minutes]
• Conclusions and Wrap-up [10 minutes]
• Questions [10 minutes]
51
Replication by Example
• Demo time!
• Check
– http://www.recommenders.net/tutorial
• Checkout
– https://github.com/recommenders/tutorial.git
52
The things we write
mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation”
53
The things we forget to write
mvn -o exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation"
-Dexec.args=”-u false"
54
mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation”
The things we forget to write
mvn -o exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation"
-Dexec.args="-t 4.0"
55
mvn -o exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation"
-Dexec.args=”-u false"
mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation”
Outline
• Background and Motivation [10 minutes]
• Evaluating Recommender Systems [20 minutes]
• Replicating Evaluation Results [20 minutes]
• Replication by Example [20 minutes]
• Conclusions and Wrap-up [10 minutes]
• Questions [10 minutes]
56
Key Takeaways
• Every decision has an impact
– We should log every step taken in the
experimental part and report that log
• There are more things besides papers
– Source code, web appendix, etc. are very useful to
provide additional details not present in the paper
• You should not fool yourself
– You have to be critical about what you measure
and not trust intermediate “black boxes”
57
We must avoid this
From http://dilbert.com/strips/comic/2010-11-07/
58
Next steps?
• Agree on standard implementations
• Replicable badges for journals / conferences
59
Next steps?
• Agree on standard implementations
• Replicable badges for journals / conferences
http://validation.scienceexchange.com
The aim of the Reproducibility Initiative is to identify and reward high
quality reproducible research via independent validation of key
experimental results
60
Next steps?
• Agree on standard implementations
• Replicable badges for journals / conferences
• Investigate how to improve reproducibility
61
Next steps?
• Agree on standard implementations
• Replicable badges for journals / conferences
• Investigate how to improve reproducibility
• Benchmark, report, and store results
62
Pointers
• Email and Twitter
– Alejandro Bellogín
• alejandro.bellogin@uam.es
• @abellogin
– Alan Said
• alansaid@acm.org
• @alansaid
• Slides:
• In Slideshare... soon!
63
RiVal
Recommender System Evaluation Toolkit
http://rival.recommenders.net
http://github.com/recommenders/rival
64
Thank you!
65
References and Additional reading
• [Armstrong et al, 2009] Improvements That Don’t Add Up: Ad-Hoc Retrieval Results Since
1998. CIKM
• [Bellogín et al, 2010] A Study of Heterogeneity in Recommendations for a Social Music
Service. HetRec
• [Bellogín et al, 2011] Precision-Oriented Evaluation of Recommender Systems: an Algorithm
Comparison. RecSys
• [Bellogín et al, 2013] An Empirical Comparison of Social, Collaborative Filtering, and Hybrid
Recommenders. ACM TIST
• [Ben-Shimon et al, 2015] RecSys Challenge 2015 and the YOOCHOOSE Dataset. RecSys
• [Cremonesi et al, 2010] Performance of Recommender Algorithms on Top-N
Recommendation Tasks. RecSys
• [Filippone & Sanguinetti, 2010] Information Theoretic Novelty Detection. Pattern
Recognition
• [Fleder & Hosanagar, 2009] Blockbuster Culture’s Next Rise or Fall: The Impact of
Recommender Systems on Sales Diversity. Management Science
• [Ge et al, 2010] Beyond accuracy: evaluating recommender systems by coverage and
serendipity. RecSys
• [Gorla et al, 2013] Probabilistic Group Recommendation via Information Matching. WWW66
References and Additional reading
• [Herlocker et al, 2004] Evaluating Collaborative Filtering Recommender Systems. ACM
Transactions on Information Systems
• [Jambor & Wang, 2010] Goal-Driven Collaborative Filtering. ECIR
• [Knijnenburg et al, 2011] A Pragmatic Procedure to Support the User-Centric Evaluation of
Recommender Systems. RecSys
• [Koren, 2008] Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering
Model. KDD
• [Lathia et al, 2010] Temporal Diversity in Recommender Systems. SIGIR
• [Li et al, 2010] Improving One-Class Collaborative Filtering by Incorporating Rich User
Information. CIKM
• [Pu et al, 2011] A User-Centric Evaluation Framework for Recommender Systems. RecSys
• [Said & Bellogín, 2014] Comparative Recommender System Evaluation: Benchmarking
Recommendation Frameworks. RecSys
• [Schein et al, 2002] Methods and Metrics for Cold-Start Recommendations. SIGIR
• [Shani & Gunawardana, 2011] Evaluating Recommendation Systems. Recommender Systems
Handbook
• [Steck & Xin, 2010] A Generalized Probabilistic Framework and its Variants for Training Top-k
Recommender Systems. PRSAT 67
References and Additional reading
• [Tikk et al, 2014] Comparative Evaluation of Recommender Systems for Digital Media. IBC
• [Vargas & Castells, 2011] Rank and Relevance in Novelty and Diversity Metrics for
Recommender Systems. RecSys
• [Weng et al, 2007] Improving Recommendation Novelty Based on Topic Taxonomy. WI-IAT
• [Yin et al, 2012] Challenging the Long Tail Recommendation. VLDB
• [Zhang & Hurley, 2008] Avoiding Monotony: Improving the Diversity of Recommendation
Lists. RecSys
• [Zhang & Hurley, 2009] Statistical Modeling of Diversity in Top-N Recommender Systems. WI-
IAT
• [Zhou et al, 2010] Solving the Apparent Diversity-Accuracy Dilemma of Recommender
Systems. PNAS
• [Ziegler et al, 2005] Improving Recommendation Lists Through Topic Diversification. WWW
68
Rank-score (Half-Life Utility)
69
Mean Reciprocal Rank
70
Mean Percentage Ranking
[Li et al, 2010]
71
Global ROC
[Schein et al, 2002]
72
Customer ROC
[Schein et al, 2002]
73
Popularity-stratified recall
[Steck & Xin, 2010]
74

More Related Content

What's hot

Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...Alejandro Bellogin
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi
 

What's hot (20)

Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender Systems
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Learning to Personalize
Learning to PersonalizeLearning to Personalize
Learning to Personalize
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 
Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019Netflix talk at ML Platform meetup Sep 2019
Netflix talk at ML Platform meetup Sep 2019
 

Viewers also liked

Setting Goals and Choosing Metrics for Recommender System Evaluations
Setting Goals and Choosing Metrics for Recommender System EvaluationsSetting Goals and Choosing Metrics for Recommender System Evaluations
Setting Goals and Choosing Metrics for Recommender System EvaluationsGunnar Fabritius (Schroeder)
 
Automatic Selection of Linked Open Data features in Graph-based Recommender S...
Automatic Selection of Linked Open Data features in Graph-based Recommender S...Automatic Selection of Linked Open Data features in Graph-based Recommender S...
Automatic Selection of Linked Open Data features in Graph-based Recommender S...Cataldo Musto
 
Case-based Recommender Systems for Personalized Finance Advisory
Case-based Recommender Systems for Personalized Finance AdvisoryCase-based Recommender Systems for Personalized Finance Advisory
Case-based Recommender Systems for Personalized Finance AdvisoryCataldo Musto
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRoelof Pieters
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...Anmol Bhasin
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyAlexandros Karatzoglou
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation systemKimikazu Kato
 
Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...
Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...
Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...Sage Franch
 
Design of recommender systems
Design of recommender systemsDesign of recommender systems
Design of recommender systemsRashmi Sinha
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys MeetupAlexandros Karatzoglou
 
Recommender system
Recommender systemRecommender system
Recommender systemJie Jin
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisitedXavier Amatriain
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 
The Universal Recommender
The Universal RecommenderThe Universal Recommender
The Universal RecommenderPat Ferrel
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
 

Viewers also liked (20)

Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Setting Goals and Choosing Metrics for Recommender System Evaluations
Setting Goals and Choosing Metrics for Recommender System EvaluationsSetting Goals and Choosing Metrics for Recommender System Evaluations
Setting Goals and Choosing Metrics for Recommender System Evaluations
 
Automatic Selection of Linked Open Data features in Graph-based Recommender S...
Automatic Selection of Linked Open Data features in Graph-based Recommender S...Automatic Selection of Linked Open Data features in Graph-based Recommender S...
Automatic Selection of Linked Open Data features in Graph-based Recommender S...
 
Case-based Recommender Systems for Personalized Finance Advisory
Case-based Recommender Systems for Personalized Finance AdvisoryCase-based Recommender Systems for Personalized Finance Advisory
Case-based Recommender Systems for Personalized Finance Advisory
 
Recommender Systems, Matrices and Graphs
Recommender Systems, Matrices and GraphsRecommender Systems, Matrices and Graphs
Recommender Systems, Matrices and Graphs
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
 
Machine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 SydneyMachine Learning for Recommender Systems MLSS 2015 Sydney
Machine Learning for Recommender Systems MLSS 2015 Sydney
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation system
 
Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...
Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...
Artificial Intelligence as an Interface - How Conversation Bots Are Changing ...
 
Design of recommender systems
Design of recommender systemsDesign of recommender systems
Design of recommender systems
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys Meetup
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 
The Universal Recommender
The Universal RecommenderThe Universal Recommender
The Universal Recommender
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 

Similar to Replicable Evaluation of Recommender Systems

Caveon webinar series Standard Setting for the 21st Century, Using Informa...
Caveon webinar series    Standard Setting for the 21st Century, Using Informa...Caveon webinar series    Standard Setting for the 21st Century, Using Informa...
Caveon webinar series Standard Setting for the 21st Century, Using Informa...Caveon Test Security
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsTetsuya Sakai
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Alan Said
 
Derk jan de Grood - ET, Best of Both Worlds
Derk jan de Grood - ET, Best of Both WorldsDerk jan de Grood - ET, Best of Both Worlds
Derk jan de Grood - ET, Best of Both WorldsTEST Huddle
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 
Lecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfLecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfAbdullahOmar64
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webKarishma chaudhary
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptxBillyMoses1
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring EvaluationYun Huang
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for TestingSQALab
 
Psychometric Studies in the Development of an Inkjet Printer
Psychometric Studies in the Development of an Inkjet PrinterPsychometric Studies in the Development of an Inkjet Printer
Psychometric Studies in the Development of an Inkjet PrinterDavid Lee
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsPier Luca Lanzi
 
Download the presentation
Download the presentationDownload the presentation
Download the presentationbutest
 

Similar to Replicable Evaluation of Recommender Systems (20)

Caveon webinar series Standard Setting for the 21st Century, Using Informa...
Caveon webinar series    Standard Setting for the 21st Century, Using Informa...Caveon webinar series    Standard Setting for the 21st Century, Using Informa...
Caveon webinar series Standard Setting for the 21st Century, Using Informa...
 
Designing Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence IntervalsDesigning Test Collections That Provide Tight Confidence Intervals
Designing Test Collections That Provide Tight Confidence Intervals
 
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
Comparative Recommender System Evaluation: Benchmarking Recommendation Frame...
 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
 
Derk jan de Grood - ET, Best of Both Worlds
Derk jan de Grood - ET, Best of Both WorldsDerk jan de Grood - ET, Best of Both Worlds
Derk jan de Grood - ET, Best of Both Worlds
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Lecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdfLecture_4_Data_Gathering_and_Analysis.pdf
Lecture_4_Data_Gathering_and_Analysis.pdf
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the web
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Presentation of Project and Critique.pptx
Presentation of Project and Critique.pptxPresentation of Project and Critique.pptx
Presentation of Project and Critique.pptx
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
Psychometric Studies in the Development of an Inkjet Printer
Psychometric Studies in the Development of an Inkjet PrinterPsychometric Studies in the Development of an Inkjet Printer
Psychometric Studies in the Development of an Inkjet Printer
 
DMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification ModelsDMTM 2015 - 14 Evaluation of Classification Models
DMTM 2015 - 14 Evaluation of Classification Models
 
Validation Studies in Simulation-based Education - Deb Rooney
Validation Studies in Simulation-based Education - Deb RooneyValidation Studies in Simulation-based Education - Deb Rooney
Validation Studies in Simulation-based Education - Deb Rooney
 
TCI in primary care - SEM (2006)
TCI in primary care - SEM (2006)TCI in primary care - SEM (2006)
TCI in primary care - SEM (2006)
 
Download the presentation
Download the presentationDownload the presentation
Download the presentation
 

More from Alejandro Bellogin

Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?Alejandro Bellogin
 
Revisiting neighborhood-based recommenders for temporal scenarios
Revisiting neighborhood-based recommenders for temporal scenariosRevisiting neighborhood-based recommenders for temporal scenarios
Revisiting neighborhood-based recommenders for temporal scenariosAlejandro Bellogin
 
Evaluating decision-aware recommender systems
Evaluating decision-aware recommender systemsEvaluating decision-aware recommender systems
Evaluating decision-aware recommender systemsAlejandro Bellogin
 
Implicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix FactorizationImplicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix FactorizationAlejandro Bellogin
 
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluationRiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluationAlejandro Bellogin
 
CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013Alejandro Bellogin
 
CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013Alejandro Bellogin
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyAlejandro Bellogin
 
Understanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender SystemsUnderstanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender SystemsAlejandro Bellogin
 
Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?Alejandro Bellogin
 
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...Alejandro Bellogin
 
Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Alejandro Bellogin
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Alejandro Bellogin
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Alejandro Bellogin
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Alejandro Bellogin
 
Predicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - SlidesPredicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - SlidesAlejandro Bellogin
 
Predicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slamPredicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slamAlejandro Bellogin
 
Predicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - PosterPredicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - PosterAlejandro Bellogin
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Alejandro Bellogin
 

More from Alejandro Bellogin (19)

Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?
 
Revisiting neighborhood-based recommenders for temporal scenarios
Revisiting neighborhood-based recommenders for temporal scenariosRevisiting neighborhood-based recommenders for temporal scenarios
Revisiting neighborhood-based recommenders for temporal scenarios
 
Evaluating decision-aware recommender systems
Evaluating decision-aware recommender systemsEvaluating decision-aware recommender systems
Evaluating decision-aware recommender systems
 
Implicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix FactorizationImplicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix Factorization
 
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluationRiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
 
CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013
 
CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 
Understanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender SystemsUnderstanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender Systems
 
Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?
 
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
 
Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
 
Predicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - SlidesPredicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - Slides
 
Predicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slamPredicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slam
 
Predicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - PosterPredicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - Poster
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
 

Recently uploaded

Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxnoordubaliya2003
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 

Recently uploaded (20)

Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
preservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptxpreservation, maintanence and improvement of industrial organism.pptx
preservation, maintanence and improvement of industrial organism.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 

Replicable Evaluation of Recommender Systems

  • 1. Replicable Evaluation of Recommender Systems Alejandro Bellogín (Universidad Autónoma de Madrid, Spain) Alan Said (Recorded Future, Sweden) Tutorial at ACM RecSys 2015
  • 9. Outline • Background and Motivation [10 minutes] • Evaluating Recommender Systems [20 minutes] • Replicating Evaluation Results [20 minutes] • Replication by Example [20 minutes] • Conclusions and Wrap-up [10 minutes] • Questions [10 minutes] 9
  • 10. Outline • Background and Motivation [10 minutes] • Evaluating Recommender Systems [20 minutes] • Replicating Evaluation Results [20 minutes] • Replication by Example [20 minutes] • Conclusions and Wrap-up [10 minutes] • Questions [10 minutes] 10
  • 11. Background • A recommender system aims to find and suggest items of likely interest based on the users’ preferences 11
  • 12. Background • A recommender system aims to find and suggest items of likely interest based on the users’ preferences 12
  • 13. Background • A recommender system aims to find and suggest items of likely interest based on the users’ preferences • Examples: – Netflix: TV shows and movies – Amazon: products – LinkedIn: jobs and colleagues – Last.fm: music artists and tracks – Facebook: friends 13
  • 14. Background • Typically, the interactions between user and system are recorded in the form of ratings – But also: clicks (implicit feedback) • This is represented as a user-item matrix: i1 … ik … im u1 … uj ? … un 14
  • 15. Motivation • Evaluation is an integral part of any experimental research area • It allows us to compare methods… 15
  • 16. Motivation • Evaluation is an integral part of any experimental research area • It allows us to compare methods… • … and identify winners (in competitions) 16
  • 17. Motivation A proper evaluation culture allows advance the field … or at least, identify when there is a problem! 17
  • 18. Motivation In RecSys, we find inconsistent evaluation results, for the “same” – Dataset – Algorithm – Evaluation metric Movielens 1M [Cremonesi et al, 2010] Movielens 100k [Gorla et al, 2013] Movielens 1M [Yin et al, 2012] Movielens 100k, SVD [Jambor & Wang, 2010] 18
  • 19. Motivation In RecSys, we find inconsistent evaluation results, for the “same” – Dataset – Algorithm – Evaluation metric 0 0.05 0.30 0.35 0.40 TR 3 TR 4 TeI TrI AI OPR P@50 SVD50 IB UB50 [Bellogín et al, 2011] 19
  • 20. Motivation In RecSys, we find inconsistent evaluation results, for the “same” – Dataset – Algorithm – Evaluation metric 0 0.05 0.30 0.35 0.40 TR 3 TR 4 TeI TrI AI OPR P@50 SVD50 IB UB50 We need to understand why this happens 20
  • 21. In this tutorial • We will present the basics of evaluation – Accuracy metrics: error-based, ranking-based – Also coverage, diversity, and novelty • We will focus on replication and reproducibility – Define the context – Present typical problems – Propose some guidelines 21
  • 22. Replicability • Why do we need to replicate? 22
  • 23. Reproducibility Why do we need to reproduce? Because these two are not the same 23
  • 24. NOT in this tutorial • In-depth analysis of evaluation metrics – See chapter 9 on handbook [Shani & Gunawardana, 2011] • Novel evaluation dimensions – See tutorials at WSDM ’14 and SIGIR ‘13 on diversity and novelty • User evaluation – See tutorial at RecSys 2012 • Comparison of evaluation results in research – See RepSys workshop at RecSys 2013 – See [Said & Bellogín 2014] 24
  • 25. Outline • Background and Motivation [10 minutes] • Evaluating Recommender Systems [20 minutes] • Replicating Evaluation Results [20 minutes] • Replication by Example [20 minutes] • Conclusions and Wrap-up [10 minutes] • Questions [10 minutes] 25
  • 26. Recommender Systems Evaluation Typically: as a black box Train Test Valida tion Dataset Recommender generates a ranking (for a user) a prediction for a given item (and user) precision error coverage … 26
  • 27. Recommender Systems Evaluation Train Test Valida tion Dataset Recommender generates a ranking (for a user) a prediction for a given item (and user) precision error coverage … 27 The reproducible way: as black boxes
  • 28. Recommender as a black box What do you do when a recommender cannot predict a score? This has an impact on coverage 28[Said & Bellogín, 2014]
  • 29. Candidate item generation as a black box How do you select the candidate items to be ranked? Solid triangle represents the target user. Boxed ratings denote test set. 0 0.05 0.30 0.35 0.40 TR 3 TR 4 TeI TrI AI OPR P@50 SVD50 IB UB50 0 0.2 0.9 0.1 0.3 1.0 29
  • 30. How do you select the candidate items to be ranked? [Said & Bellogín, 2014] 30 Candidate item generation as a black box
  • 31. Evaluation metric computation as a black box What do you do when a recommender cannot predict a score? – This has an impact on coverage – It can also affect error-based metrics MAE = Mean Absolute Error RMSE = Root Mean Squared Error 31
  • 32. Evaluation metric computation as a black box What do you do when a recommender cannot predict a score? – This has an impact on coverage – It can also affect error-based metrics User-item pairs Real Rec1 Rec2 Rec3 (u1, i1) 5 4 NaN 4 (u1, i2) 3 2 4 NaN (u1, i3) 1 1 NaN 1 (u2, i1) 3 2 4 NaN MAE/RMSE, ignoring NaNs 0.75/0.87 2.00/2.00 0.50/0.70 MAE/RMSE, NaNs as 0 0.75/0.87 2.00/2.65 1.75/2.18 MAE/RMSE, NaNs as 3 0.75/0.87 1.50/1.58 0.25/0.50 32
  • 33. Using internal evaluation methods in Mahout (AM), LensKit (LK), and MyMediaLite (MML) [Said & Bellogín, 2014] 33 Evaluation metric computation as a black box
  • 34. Variations on metrics: Error-based metrics can be normalized or averaged per user: – Normalize RMSE or MAE by the range of the ratings (divide by rmax – rmin) – Average RMSE or MAE to compensate for unbalanced distributions of items or users 34 Evaluation metric computation as a black box
  • 35. Variations on metrics: nDCG has at least two discounting functions (linear and exponential decay) 35 Evaluation metric computation as a black box
  • 36. Variations on metrics: Ranking-based metrics are usually computed up to a ranking position or cutoff k P = Precision (Precision at k) R = Recall (Recall at k) MAP = Mean Average Precision 36 Evaluation metric computation as a black box
  • 37. If ties are present in the ranking scores, results may depend on the implementation 37 Evaluation metric computation as a black box [Bellogín et al, 2013]
  • 38. Not clear how to measure diversity/novelty in offline experiments (directly measured in online experiments): – Using a taxonomy (items about novel topics) [Weng et al, 2007] – New items over time [Lathia et al, 2010] – Based on entropy, self-information and Kullback- Leibler divergence [Bellogín et al, 2010; Zhou et al, 2010; Filippone & Sanguinetti, 2010] 38 Evaluation metric computation as a black box
  • 39. Recommender Systems Evaluation: Summary • Usually, evaluation seen as a black box • The evaluation process involves everything: splitting, recommendation, candidate item generation, and metric computation • We should agree on standard implementations, parameters, instantiations, … – Example: trec_eval in IR 39
  • 40. Outline • Background and Motivation [10 minutes] • Evaluating Recommender Systems [20 minutes] • Replicating Evaluation Results [20 minutes] • Replication by Example [20 minutes] • Conclusions and Wrap-up [10 minutes] • Questions [10 minutes] 40
  • 41. Reproducible Experimental Design • We need to distinguish – Replicability – Reproducibility • Different aspects: – Algorithmic – Published results – Experimental design • Goal: have a reproducible experimental environment 41
  • 42. Definition: Replicability To copy something • The results • The data • The approach Being able to evaluate in the same setting and obtain the same results 42
  • 43. Definition: Reproducibility To recreate something • The (complete) set of experiments • The (complete) set of results • The (complete) experimental setup To (re) launch it in production with the same results 43
  • 44. Comparing against the state-of-the-art Your settings are not exactly like those in paper X, but it is a relevant paper Reproduce results of paper X Congrats, you’re done! Replicate results of paper X Congrats! You have shown that paper X behaves different in the new setting Sorry, there is something wrong/incomplete in the experimental design They agree They do not agree Do results match the original paper? Yes! No! Do results agree with original paper? 44
  • 45. What about Reviewer 3? • “It would be interesting to see this done on a different dataset…” – Repeatability – The same person doing the whole pipeline over again • “How does your approach compare to *Reviewer 3 et al. 2003+?” – Reproducibility or replicability (depending on how similar the two papers are) 45
  • 46. Repeat vs. replicate vs. reproduce vs. reuse 46
  • 47. Motivation for reproducibility In order to ensure that our experiments, settings, and results are: – Valid – Generalizable – Of use for others – etc. we must make sure that others can reproduce our experiments in their setting 47
  • 48. Making reproducibility easier • Description, description, description • No magic numbers • Specify values for all parameters • Motivate! • Keep a detailed protocol • Describe process clearly • Use standards • Publish code (nobody expects you to be an awesome developer, you’re a researcher) 48
  • 49. Replicability, reproducibility, and progress • Can there be actual progress if no valid comparison can be done? • What is the point of comparing two approaches if the comparison is flawed? • How do replicability and reproducibility facilitate actual progress in the field? 49
  • 50. Summary • Important issues in recommendation – Validity of results (replicability) – Comparability of results (reproducibility) – Validity of experimental setup (repeatability) • We need to incorporate reproducibility and replication to facilitate the progress in the field 50
  • 51. Outline • Background and Motivation [10 minutes] • Evaluating Recommender Systems [20 minutes] • Replicating Evaluation Results [20 minutes] • Replication by Example [20 minutes] • Conclusions and Wrap-up [10 minutes] • Questions [10 minutes] 51
  • 52. Replication by Example • Demo time! • Check – http://www.recommenders.net/tutorial • Checkout – https://github.com/recommenders/tutorial.git 52
  • 53. The things we write mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation” 53
  • 54. The things we forget to write mvn -o exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation" -Dexec.args=”-u false" 54 mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation”
  • 55. The things we forget to write mvn -o exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation" -Dexec.args="-t 4.0" 55 mvn -o exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation" -Dexec.args=”-u false" mvn exec:java -Dexec.mainClass="net.recommenders.tutorial.CrossValidation”
  • 56. Outline • Background and Motivation [10 minutes] • Evaluating Recommender Systems [20 minutes] • Replicating Evaluation Results [20 minutes] • Replication by Example [20 minutes] • Conclusions and Wrap-up [10 minutes] • Questions [10 minutes] 56
  • 57. Key Takeaways • Every decision has an impact – We should log every step taken in the experimental part and report that log • There are more things besides papers – Source code, web appendix, etc. are very useful to provide additional details not present in the paper • You should not fool yourself – You have to be critical about what you measure and not trust intermediate “black boxes” 57
  • 58. We must avoid this From http://dilbert.com/strips/comic/2010-11-07/ 58
  • 59. Next steps? • Agree on standard implementations • Replicable badges for journals / conferences 59
  • 60. Next steps? • Agree on standard implementations • Replicable badges for journals / conferences http://validation.scienceexchange.com The aim of the Reproducibility Initiative is to identify and reward high quality reproducible research via independent validation of key experimental results 60
  • 61. Next steps? • Agree on standard implementations • Replicable badges for journals / conferences • Investigate how to improve reproducibility 61
  • 62. Next steps? • Agree on standard implementations • Replicable badges for journals / conferences • Investigate how to improve reproducibility • Benchmark, report, and store results 62
  • 63. Pointers • Email and Twitter – Alejandro Bellogín • alejandro.bellogin@uam.es • @abellogin – Alan Said • alansaid@acm.org • @alansaid • Slides: • In Slideshare... soon! 63
  • 64. RiVal Recommender System Evaluation Toolkit http://rival.recommenders.net http://github.com/recommenders/rival 64
  • 66. References and Additional reading • [Armstrong et al, 2009] Improvements That Don’t Add Up: Ad-Hoc Retrieval Results Since 1998. CIKM • [Bellogín et al, 2010] A Study of Heterogeneity in Recommendations for a Social Music Service. HetRec • [Bellogín et al, 2011] Precision-Oriented Evaluation of Recommender Systems: an Algorithm Comparison. RecSys • [Bellogín et al, 2013] An Empirical Comparison of Social, Collaborative Filtering, and Hybrid Recommenders. ACM TIST • [Ben-Shimon et al, 2015] RecSys Challenge 2015 and the YOOCHOOSE Dataset. RecSys • [Cremonesi et al, 2010] Performance of Recommender Algorithms on Top-N Recommendation Tasks. RecSys • [Filippone & Sanguinetti, 2010] Information Theoretic Novelty Detection. Pattern Recognition • [Fleder & Hosanagar, 2009] Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity. Management Science • [Ge et al, 2010] Beyond accuracy: evaluating recommender systems by coverage and serendipity. RecSys • [Gorla et al, 2013] Probabilistic Group Recommendation via Information Matching. WWW66
  • 67. References and Additional reading • [Herlocker et al, 2004] Evaluating Collaborative Filtering Recommender Systems. ACM Transactions on Information Systems • [Jambor & Wang, 2010] Goal-Driven Collaborative Filtering. ECIR • [Knijnenburg et al, 2011] A Pragmatic Procedure to Support the User-Centric Evaluation of Recommender Systems. RecSys • [Koren, 2008] Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model. KDD • [Lathia et al, 2010] Temporal Diversity in Recommender Systems. SIGIR • [Li et al, 2010] Improving One-Class Collaborative Filtering by Incorporating Rich User Information. CIKM • [Pu et al, 2011] A User-Centric Evaluation Framework for Recommender Systems. RecSys • [Said & Bellogín, 2014] Comparative Recommender System Evaluation: Benchmarking Recommendation Frameworks. RecSys • [Schein et al, 2002] Methods and Metrics for Cold-Start Recommendations. SIGIR • [Shani & Gunawardana, 2011] Evaluating Recommendation Systems. Recommender Systems Handbook • [Steck & Xin, 2010] A Generalized Probabilistic Framework and its Variants for Training Top-k Recommender Systems. PRSAT 67
  • 68. References and Additional reading • [Tikk et al, 2014] Comparative Evaluation of Recommender Systems for Digital Media. IBC • [Vargas & Castells, 2011] Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems. RecSys • [Weng et al, 2007] Improving Recommendation Novelty Based on Topic Taxonomy. WI-IAT • [Yin et al, 2012] Challenging the Long Tail Recommendation. VLDB • [Zhang & Hurley, 2008] Avoiding Monotony: Improving the Diversity of Recommendation Lists. RecSys • [Zhang & Hurley, 2009] Statistical Modeling of Diversity in Top-N Recommender Systems. WI- IAT • [Zhou et al, 2010] Solving the Apparent Diversity-Accuracy Dilemma of Recommender Systems. PNAS • [Ziegler et al, 2005] Improving Recommendation Lists Through Topic Diversification. WWW 68
  • 71. Mean Percentage Ranking [Li et al, 2010] 71
  • 72. Global ROC [Schein et al, 2002] 72
  • 73. Customer ROC [Schein et al, 2002] 73