This document discusses using ranking fairness metrics to assess viewpoint diversity in search results. It presents existing binomial and multinomial fairness metrics that can quantify the level of viewpoint diversity represented in search rankings. Through simulation studies with synthetic datasets, it evaluates how the metrics perform under different levels of ranking bias and proportions of viewpoints. The results show the metrics are effective in measuring viewpoint diversity, but their appropriate usage depends on factors like the ranking bias strength and direction. The document concludes the metrics can help assess real search results and align metric and user behavior outcomes.
Assessing Viewpoint Diversity in Search Results Using Ranking Fairness Metrics
1. 1
WIS
Web
Information
Systems
Assessing Viewpoint Diversity
in Search Results Using
Ranking Fairness Metrics
Tim Draws1, Nava Tintarev1, Ujwal
Gadiraju1, Alessandro Bozzon1, and
Benjamin Timmermans2
1TU Delft, The Netherlands
2IBM, The Netherlands t.a.draws@tudelft.nl
2. 2
WIS
Web
Information
Systems
Biases in web search
• Position bias2-4
• “Search Engine Manipulation Effect”1,5
How can we quantify (a lack of) viewpoint
diversity in search results?
Yes!
Yes!
Yes!
Yes!
Yes!
No!
No!
7. 9
WIS
Web
Information
Systems
Metrics we consider
Binomial viewpoint fairness
– Normalized Discounted Difference (nDD)6
– Normalized Discounted Ratio (nDR)6
– Normalized Discounted Kullback-Leibler Divergence (nDKL)6
Multinomial viewpoint fairness
– Normalized Discounted Jensen-Shannon Divergence (nDJS)
8. 11
WIS
Web
Information
Systems
Simulation studies
How do the metrics behave for different
levels of viewpoint diversity?
• Three synthetic data sets S1, S2, S3
• Per set created rankings to simulate different
levels of viewpoint diversity
9. 13
WIS
Web
Information
Systems
Weighted sampling procedure
Rank Viewpoint
1 Strongly opposing
2 Strongly opposing
3 Opposing
4 Somewhat opposing
5 Supporting
6 Strongly opposing
… …
S1
sampling
Per set: created rankings with different levels of ranking bias
• Binomial viewpoint fairness: all opposing viewpoints get w1, all others w2
• Multinomial viewpoint fairness: random viewpoint get w1, all others w2
10. 14
WIS
Web
Information
Systems
Results: binomial viewpoint fairness
nDD nDR nDKL
−1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 −1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Ranking bias
Meanmetricvalue
Distribution S1 S2 S3
• All metrics assess binomial viewpoint fairness (as expected)
• All metrics are asymmetric (proportion of protected items and
”direction” of bias matter)
• Which metric to use depends on strength of ranking bias
11. 15
WIS
Web
Information
Systems
Results: multinomial viewpoint fairness
0.0
0.1
0.2
−1.0 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0
Ranking bias
MeannDJSvalue
Distribution S1 S2 S3
• nDJS assesses multinomial viewpoint fairness
• nDJS is also asymmetric (proportion of protected items and ”direction”
of bias matter)
• Careful interpretation: values not directly comparable to other metrics
12. 16
WIS
Web
Information
Systems
Discussion
• Metrics work for assessing viewpoint diversity
• Considerations:
– What is the underlying aim?
– How balanced is the data overall?
– How strong is the ranking bias?
– What is the direction of ranking bias?
13. 17
WIS
Web
Information
Systems
Take home and future work
• Ranking fairness metrics can be used for
assessing viewpoint diversity in search results
– (when interpreted correctly)
• Future work can use these metrics to…
– …assess viewpoint diversity in real search results
– …align different metric and behavioral outcomes
14. 18
WIS
Web
Information
Systems
References
[1] R. Epstein and R. E. Robertson. The search engine manipulation effect (SEME) and its possible impact on the outcomes of elections. Proceedings
of the National Academy of Sciences of the United States of America, 112(33):E4512–E4521, 2015.
[2] A. Ghose, P. G. Ipeirotis, and B. Li. Examining the impact of ranking on consumer behavior and search engine revenue. Management Science,
60(7):1632–1654, 2014.
[3] L. A. Granka, T. Joachims, and G. Gay. Eye-tracking analysis of user behavior in WWW search.
Proceedings of Sheffield SIGIR - Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,
pages 478–479, 2004.
[4] B. Pan, H. Hembrooke, T. Joachims, L. Lorigo, G. Gay, and L. Granka. In Google we trust: Users’ decisions on rank, position, and relevance.
Journal of Computer-Mediated Communication, 12(3):801– 823, 2007.
[5] F. A. Pogacar, A. Ghenai, M. D. Smucker, and C. L. Clarke. The positive and negative influence of search results on people’s decisions about the
efficacy of medical treatments. ICTIR 2017 - Proceedings of the 2017 ACM SIGIR International Conference on the Theory of Information Retrieval,
pages 209– 216, 2017.
[6] Yang, K., & Stoyanovich, J. Measuring fairness in ranked outputs. Proceedings of the 29th International Conference on Scientific and Statistical
Database Management, pages 1-6, 2017.
Editor's Notes
Introduce myself
Second year PhD
Search results on disputed topic: various viewpoints within topic
diversity across ranking
Position bias: trust and interact with higher results more
Also voting preferences, judgment on medical treatment
Important to maintain viewpoint diversity
So far unclear how to assess viewpoint diversity in search results this paper
Protected non-protected attribute
Example: gender bias in job candidate list
Mostly: statistical parity
Explain formula: F is function to evaluate statistical parity
Low value (0) is fair, high value (1) is unfair
How to use this for viewpoint diversity?
Assess viewpoint div. using specific class of metrics: ranking fairness
Simulation study on existing metrics
Novel metric, also simulation study
assumption: 7 classes
Also assume that ranking assessor has specific aim as to what they are concerned about
We consider two different aims
assumption: 7 classes
Also assume that ranking assessor has specific aim as to what they are concerned about
We consider two different aims
Quickly repeat formula, metrics differ in F
F evaluates statistical parity by comparing to ideal ranking
Briefly describe each metric
nDJS because others are not applicable to multinomial (details in paper)
These metrics QUANTIFY (no “fairness criterion”)
Goal: see how metrics behave in different settings of viewpoint diversity
Three data sets consisting of viewpoint labels
Created rankings with different levels of ranking bias from each set
Here: the more bias, the less viewpoint diversity
Done by weighted sampling
Goal: see how metrics behave in different settings of viewpoint diversity
Three data sets consisting of viewpoint labels
Created rankings with different levels of ranking bias from each set
Here: the more bias, the less viewpoint diversity
Done by weighted sampling
Draw from data set without replacement
Sampling is weighted
Summary: two simulation studies, each with three sets, per set 21 settings of ranking bias, 1000 times per setting
Draw from data set without replacement
Sampling is weighted
Two weights (which varies) to advantage / disadvantage
Summary: two simulation studies, each with three sets, per set 21 settings of ranking bias, 1000 times per setting
Explain ranking bias + mean metric outcome
All metrics seem to work
nDR is not normalized properly
Whether to use nDD or nDKL depends on strength of ranking bias
take home: use nDD / nDKL; proportion of protected + direction of bias is important to know
Works
Doesn’t go to 1 (don’t compare)
Take home: same lessons as before
Considerations need to be taken to decide which metric to use and how sensitive the metric is
Considerations:
Binomial or multinomial?
The more balanced, the better the sensitivity
If strong and binomial, use nDKL, otherwise nDD
If protected group is advantaged, the same ranking bias produces a different outcome
It would be good to have a simulator for interpreting metrics (I am working on that)
In general, nDD, nDKL, or nDJS
Correct interpretation: awareness of data skew and bias direction
Future work: assessment + align metric outcomes with SEME