SlideShare a Scribd company logo
1 of 53
Crowdsourcing for Information Retrieval:
From Statistics to Ethics

Matt Lease
School of Information
University of Texas at Austin

@mattlease

ml@utexas.edu
Roadmap
• Scalability Challenges in IR Evaluation (brief)
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing

Matt Lease <ml@utexas.edu>

2
Roadmap
• Scalability Challenges in IR Evaluation (brief)
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing

Matt Lease <ml@utexas.edu>

3
Why Evaluation at Scale?
• Evaluation should closely
mirror real use conditions
• The best algorithm at
small scale may not be
best at larger scales
– Banko and Brill (2001)
– Halevy et al. (2009)

• IR systems should be evaluated on the scale of
data which users will search in practice
Matt Lease <ml@utexas.edu>

4
Why is Evaluation at Scale Hard?
• Multiple ways to evaluate; consider Cranfield
– Given a document collection and set of user queries
– Label documents for relevance to each query
– Evaluate search algorithms on these queries & documents

• Labeling data is slow/expensive/difficult
• Approach 1: label less data (e.g. active learning)
– Pooling, metrics robust to sparse data (e.g., BPref)
– Measure only relative performance (e.g., statAP, MTC)

• Approach 2: label data more efficiently
– Crowdsourcing (e.g., Amazon’s Mechanical Turk)
Matt Lease <ml@utexas.edu>

5
6
Crowdsourcing for IR Evaluation
• Origin: Alonso et al. (SIGIR Forum 2008)
– Continuing active area of research

• Primary concern: ensuring reliable data
– Reliable data provides foundation for evaluation
– If QA inefficient, overhead could reduce any savings
– Common strategy: ask multiple people to judge
relevance, then aggregate their answers (consensus)

Matt Lease <ml@utexas.edu>

7
Roadmap
• Scalability Challenges in Evaluating IR Systems
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing

Matt Lease <ml@utexas.edu>

8
SQUARE: A Benchmark for Research
on Computing Crowd Consensus
Aashish Sheshadri and M. Lease, HCOMP’13
ir.ischool.utexas.edu/square (open source)

Matt Lease <ml@utexas.edu>

9
Background
• How do we resolve disagreement of multiple
peoples’ answers to arrive at consensus?
• Simple baseline: majority voting
• Long history pre-dating crowdsourcing
– Dawid and Skene’79, Smyth et al., ’95
– Recent focus on quality assurance with crowds

• Many more methods, active research topic
– Across many areas: ML, Vision, NLP, IR, DB, …
Matt Lease <ml@utexas.edu>

10
Why Benchmark?
• Drive field innovation by clear challenge tasks
– e.g., David Tse’s FIST 2012 Keynote (Comp. Biology)

• Many other things we can learn
– How do methods compare?
• Qualitatively & quantitatively?

– What is the state-of-the-art today?
– What works, what doesn’t, and why?
• Where is further research most needed?

– How has field progressed over time?
Matt Lease <ml@utexas.edu>

11
Cons Method
-

-

Most limited model
Cannot be supervised

No confusion matrix

-

Pros

Simple, fast, no training
Task-independent

MV
ZC
Demartini’12
Worker Reliability
parameters

-

Task-independent
Can be supervised
Allows priors on worker
reliability & class distribution

GLAD
-

-

-

-

-

No confusion matrix
No worker priors

Classification only
Space prop. to num classes
No worker priors

Classification only
Space prop. to num classes
No worker priors

Classification only
Space prop. to num classes
Automatic classifier requires
feature representation

Classification only
Complex
with
many
hyper-parameters.
Unclear how to supervise

Whitehill et al.’09
Worker Reliability &
Task Difficulty params

Naïve Bayes (NB)
Snow et al.,’08
= D&S Model fully-supervised

Dawid & Skene’79 (DS)
Class priors &
Worker Confusion matrices

Raykar et al.’10 (RY)
Worker confusion, sensitivity, specificity
(Optional) Automatic Classifier

-

Task-independent
Can be supervised
Prior on class distribution

-

Supports multi-class tasks
Models worker confusion
Simple maximum-likelihood

-

Supports multi-class tasks
Models worker confusion
Unsup, semi-sup, or fully-sup

-

Classifier not required
Priors on worker confusion
and class distribution.
Has multi-class support.
Can be supervised.

Welinder et al.’10 (CUBAM)

-

Worker reliability and confusion

-

Annotation noise
Task Difficulty

More Complex

Method =
Model +
Training +
Inference

Confusion Matrix

Detailed model of the
annotation process.
Can identify worker clusters .
Has multi-class support.
12
13
Results: Unsupervised Accuracy
Relative gain/loss vs. majority voting
15%

10%

5%

0%

-5%
DS

ZC

RY

GLAD

CUBCAM

-10%

-15%
BM

HCB

SpamCF

WVSCM

WB

RTE

TEMP

WSD

AC2

HC

ALL

14
Results: Varying Supervision

Matt Lease <ml@utexas.edu>

15
Findings
• Majority voting never best, rarely much worse
• Each method often best for some condition
– E.g., original dataset designed for

• DS & RY tend to perform best (RY adds priors)
• No method performs far beyond others
– Of course, contributions aren’t just empirical…

Matt Lease <ml@utexas.edu>

16
Why Don’t We See Bigger Gains?
• Gold is too noisy to detect improvement?
– Cormack & Kolcz’09, Klebanov & Beigman’10

• Limited tasks / scenarios considered?
– e.g., we exclude hybrid methods & worker filtering

• Might we see greater differences from
– Better benchmark tests?
– Better tuning of methods?
– Additional methods?

• We invite community contributions!
Matt Lease <ml@utexas.edu>

17
Roadmap
• Scalability Challenges in Evaluating IR Systems
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing

Matt Lease <ml@utexas.edu>

18
Crowdsourced Task Routing via
Matrix Factorization
HyunJoon Jung and M. Lease
arXiv 1310.5142, under review

Matt Lease <ml@utexas.edu>
Matt Lease <ml@utexas.edu>

20
Task Routing: Background
• Selection vs. recommendation vs. assignment
– Potential to improve work quality & satisfaction
– task search time has latency & is uncompensated
– Tradeoffs in push vs. pull, varying models

• Many matching criteria one could consider
– Preferences, Experience, Skills, Job constraints, …

• References
– Law and von Ahn, 2011 (Ch. 4)
– Chilton et al., 2010
• MTurk “free” selection constrained by search interface
Matt Lease <ml@utexas.edu>

21
Matrix Factorization Approach
• Collaborative filtering-based recommendation
• Intuition: achieve similar accuracy on similar tasks
– Notion is more general: e.g. preference, expertise, etc.
Worker-example matrix for each task
w1

Comprehensive worker-task matrix

..

wm

w1
0

e1

w2

w2

..

wm

w1
0

w2
0
0
1

..

1

1

0

…

1

1

1

Tn

1

1

e2

e1
1

…

e2

e1
1

en

…

e2
1
…

en

N Tasks

en

1
1
1
1

w1
T1

w2

..

0.39

wm

w2

..

wm

T1

0.72

0.59

0.70

0.75

T2

0.5

0.54

0.66

0.73

…

0.66

0.71

0.78

0.89

Tn

0.55

w1

0.87

0.83

0.72

0.91

wm
T2

0.5

0.54

0.66

0

Accumulate
repeated
crowdsourced
data

0.78
0.83

0.89

0.72

Tabularize a
worker-task
relational
model
Matt Lease <ml@utexas.edu>

Apply MF for
inferring
missing values

Select bestpredicted
workers for a
target task
22
Matrix Factorization
• Automatically induce latent features
– Task-independent

• Popular due to robustness to sparsity
– SVD sensitive matrix density; PMF much more robust
M workers (M>>N)

Worker Features
T

N tasks

»

Rij

WT

D = N-1
dimensions

T  R D M

Rij  Wi T j   W T ik T jk
T

k

Task Features
e.g., rating of user i for movie j

W  R D N

Matt Lease <ml@utexas.edu>

23
Datasets
• 3 MTurk text tasks
• Simulated data

24
Baselines
• Random assignment
– no accuracy prediction; just for task routing

• Simple average
– Average worker’s accuracies across past tasks

• Weighted average
– weight each task in average by similarity to target task
• task similarity must be estimated from data
Matt Lease <ml@utexas.edu>

25
Estimating Task Similarity
• Define by Pearson correlation over per-task
accuracies of workers who perform both
– Ignore any workers doing only one of the tasks

Matt Lease <ml@utexas.edu>

26
Results – RMSE & Mean Acc. (MTurk data)
Average over tasks
k = 1 to 20 workers

Per-task & Average
k=10 workers

Matt Lease <ml@utexas.edu>

27
Findings
• How does MF prediction accuracy vary given
task similarity, matrix size, & matrix density?
– Feasible, PMF beats SVD, more data = better…

• MF task routing vs. baselines?
– Much better than random; baselines fine in most
sparse conditions; improvement beyond that

Matt Lease <ml@utexas.edu>

28
Open Questions
• Other ways to infer task similarity (e.g. textual)
• Under “Big Data” conditions?
• When integrating target task observations?

• How to better model crowd & spam?
• How to address live task routing challenges?
Matt Lease <ml@utexas.edu>

29
Roadmap
• Scalability Challenges in Evaluating IR Systems
• Benchmarking Statistical Consensus Methods
• Task Routing via Matrix Factorization
• Toward Ethical Crowdsourcing

Matt Lease <ml@utexas.edu>

30
A Few Moral Dilemmas
• A “fair” price for online work in a global economy?
– Is it better to pay nothing (i.e., volunteers, gamification)
rather than pay something small for valuable work?

• Are we obligated to inform people how their
participation / work products will be used?
– If my IRB doesn’t require me to obtain informed consent,
is there some other moral obligation to do so?

• A worker finds his ID posted in a researcher’s online
source code and asks that it be removed. This can’t
be done without recreating the repo, which many
people use. What should be done?
Matt Lease <ml@utexas.edu>

31
Mechanical Turk is Not Anonymous

Matthew Lease, Jessica Hullman, Jeffrey P. Bigham, Michael S. Bernstein, Juho Kim,
Walter S. Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C. Miller.
Online: Social Science Research Network, March 6, 2013
ssrn.com/abstract=2190946
`

Amazon profile page
URLs use the same
IDs as used on MTurk
How do we respond
when we learn we’ve
exposed people to risk?

33
Ethical Crowdsourcing
• Assume researchers have good intentions, and
so issues of gross negligence are rare
– Withholding promised pay after work performed
– Not obtaining or complying with IRB oversight

• Instead, great challenge is how to recognize our
impacts appropriate actions in a complex world
– Educating ourselves takes time & effort
– Failing to educate ourselves could harm to others

• How can we strike a reasonable balance between
complete apathy vs. being overly alarmist?
Matt Lease <ml@utexas.edu>

34
CACM August, 2013

Paul Hyman. Communications of the ACM, Vol. 56 No. 8, Pages 19-21, August 2013.

Matt Lease <ml@utexas.edu>

35
•
•
•
•
•

Contribute to society and human well-being
Avoid harm to others
Be honest and trustworthy
Be fair and take action not to discriminate
Respect the privacy of others

COMPLIANCE WITH THE CODE. As an ACM member I will
– Uphold and promote the principles of this Code
– Treat violations of this code as inconsistent with
membership in the ACM
Matt Lease <ml@utexas.edu>

36
CS2008 Curriculum Update (ACM, IEEE)
There is reasonably wide agreement that this topic of legal, social,
professional and ethical should feature in all computing degrees.
…financial and economic imperatives …Which approaches are less
expensive and is this sensible? With the advent of outsourcing and
off-shoring these matters become more complex and take on new
dimensions …there are often related ethical issues concerning
exploitation… Such matters ought to feature in courses on legal,
ethical and professional practice.
if ethical considerations are covered only in the standalone course and
not “in context,” it will reinforce the false notion that technical processes
are void of ethical issues. Thus it is important that several traditional
courses include modules that analyze ethical considerations in the
context of the technical subject matter … It would be explicitly against
the spirit of the recommendations to have only a standalone course.
Matt Lease <ml@utexas.edu>

37
“Contribute to society and human
well-being; avoid harm to others”
• Do we have a moral obligation to try to ascertain
conditions under which work is performed? Or the
impact we have upon those performing the work?

• Do we feel differently when work is performed by
– Political refugees? Children? Prisoners? Disabled?

• How do we know who is doing the work, or if a
decision to work (for a given price) is freely made?
– Does it matter why someone accepts offered work?
Matt Lease <ml@utexas.edu>

38
Matt Lease <ml@utexas.edu>

39
Who are
the workers?

• A. Baio, November 2008. The Faces of Mechanical Turk.
• P. Ipeirotis. March 2010. The New Demographics of
Mechanical Turk
• J. Ross, et al. Who are the Crowdworkers? CHI 2010.
Matt Lease <ml@utexas.edu>

40
Some Notable Prior Research
• Silberman, Irani, and Ross (2010)
– “How should we… conceptualize the role of these people
who we ask to power our computing?”

– “abstraction hides detail'‘ - some details may be worth
keeping conspicuously present (Jessica Hullman)

• Irani and Silberman (2013)
– “…AMT helps employers see themselves as builders of
innovative technologies, rather than employers unconcerned
with working conditions.”
– “…human computation currently relies on worker invisibility.”

• Fort, Adda, and Cohen (2011)
– “…opportunities for our community to deliberately value
ethics above cost savings.”

41
Power Asymmetry on MTurk

• Mistakes happen, such as wrongly rejecting work – e.g., error by
new student, software bug, poor instructions, noisy gold, etc.
• How do we balance the harm caused by our mistakes to workers
(our liability) vs. our cost/effort of preventing such mistakes?
Matt Lease <ml@utexas.edu>

42
Task Decomposition
By minimizing context, greater task efficiency &
accuracy can often be achieved in practice
– e.g. “Can you name who is in this photo?”

• Much research on ways to streamline work
and decompose complex tasks
Matt Lease <ml@utexas.edu>

43
Context & Informed Consent

• Assume we wish to obtain informed consent
• Without context, consent cannot be informed
– Zittrain, Ubiquitous human computing (2008)

44
Independent Contractors vs. Employees
• Wolfson & Lease, ASIS&T’11
• Many platforms classify workers as independent
contractors (piece-work, not hourly)
– Legislators/courts must ultimately decide

• Different work classifications yield different legal
rights/protections & responsibilities
– Domestic vs. international workers
– Employment taxes
– Litigation can both cause or redress harm

• Law aside, to what extent do moral principles
underlying current laws apply to online work?
Matt Lease <ml@utexas.edu>

45
Consequences of Human Computation
as a Panacea where AI Falls Short
•
•
•
•

The Googler who Looked at the Worst of the Internet
Policing the Web’s Lurid Precincts
Facebook content moderation
The dirty job of keeping Facebook clean

• Even linguistic annotators report stress &
nightmares from reading news articles!
Matt Lease <ml@utexas.edu>

46
What about Freedom?
• Crowdsourcing vision: empowering freedom
– work whenever you want for whomever you want

• Risk: people compelled to perform work
– Chinese prisoners farming gold online
– Digital sweat shops? Digital slaves?
– We know relatively little today about work conditions
– How might we monitor and mitigate risk/growth of
crowd work inflicting harm to at-risk populations?

– Traction? Human Trafficking at MSR Summit’12
Matt Lease <ml@utexas.edu>

47
Robert Sim, MSR Summit’12

Matt Lease <ml@utexas.edu>

48
Join the conversation!
Crowdwork-ethics, by Six Silberman
http://crowdwork-ethics.wtf.tw

an informal, occasional blog for researchers
interested in ethical issues in crowd work

Matt Lease <ml@utexas.edu>

49
The Future of Crowd Work, CSCW’13

Kittur, Nickerson, Bernstein, Gerber,
Shaw, Zimmerman, Lease, and Horton
Matt Lease <ml@utexas.edu>

50
Additional References
• Irani, Lilly C. The Ideological Work of Microwork. In preparation,
draft available online.
• Adda, Gilles, et al. Crowdsourcing for language resource
development: Critical analysis of amazon mechanical turk
overpowering use. Proceedings of the 5th Language and Technology
Conference (LTC). 2011.
• Adda, Gilles, and Joseph J. Mariani. Economic, Legal and Ethical
analysis of Crowdsourcing for Speech Processing. (2013).
• Harris, Christopher G., and Padmini Srinivasan. Crowdsourcing and
Ethics. Security and Privacy in Social Networks. 67-83. 2013.
• Harris, Christopher G. Dirty Deeds Done Dirt Cheap: A Darker Side
to Crowdsourcing. IEEE 3rd conference on social computing
(socialcom). 2011.

• Horton, John J. The condition of the Turking class: Are online
employers fair and honest?. Economics Letters 111.1 (2011): 10-12.
Matt Lease <ml@utexas.edu>

51
Additional References (2)
• Bederson, B. B., & Quinn, A. J. Web workers unite! addressing challenges
of online laborers. In CHI 2011 Human Computation Workshop, 97-106.
• Bederson, B. B., & Quinn, A. J. Participation in Human Computation. In
CHI 2011 Human Computation Workshop.

• Felstiner, Alek. Working the Crowd: Employment and Labor Law in the
Crowdsourcing Industry. Berkeley J. Employment & Labor Law 32.1 2011
• Felstiner, Alek. Sweatshop or Paper Route?: Child Labor Laws and InGame Work. CrowdConf (2010).
• Larson, Martha. Toward Responsible and Sustainable Crowsourcing.
Blog post + Slides from Dagstuhl, September 2013.
• Vili Lehdonvirta and Paul Mezier. Identity and Self-Organization in
Unstructured Work. Unpublished working paper. 16 October 2013.
• Zittrain, Jonathan. Minds for Sale. You Tube.
Matt Lease <ml@utexas.edu>

52
Thank You!
See also: SIAM’13 Tutorial

Slides: www.slideshare.net/mattlease

ir.ischool.utexas.edu

Matt Lease <ml@utexas.edu>

53

More Related Content

Similar to Crowdsourcing for Information Retrieval: From Statistics to Ethics

Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationMatthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
A Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on TwitterA Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on TwitterJonathas Magalhães
 
On Quality Control and Machine Learning in Crowdsourcing
On Quality Control and Machine Learning in CrowdsourcingOn Quality Control and Machine Learning in Crowdsourcing
On Quality Control and Machine Learning in CrowdsourcingMatthew Lease
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreTuri, Inc.
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientistMatthew Evans
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Jin Young Kim
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...PhD Assistance
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEbutest
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - BengaluruKunal Jain
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 

Similar to Crowdsourcing for Information Retrieval: From Statistics to Ethics (20)

Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
A Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on TwitterA Query Routing Model to Rank Expertcandidates on Twitter
A Query Routing Model to Rank Expertcandidates on Twitter
 
On Quality Control and Machine Learning in Crowdsourcing
On Quality Control and Machine Learning in CrowdsourcingOn Quality Control and Machine Learning in Crowdsourcing
On Quality Control and Machine Learning in Crowdsourcing
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Machine learning in Banks
Machine learning in BanksMachine learning in Banks
Machine learning in Banks
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Classification
ClassificationClassification
Classification
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
How do crowdworkers learn
How do crowdworkers learnHow do crowdworkers learn
How do crowdworkers learn
 
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
Fairness in Search & RecSys 네이버 검색 콜로키움 김진영
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
 
Machine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AEMachine Learning for automated diagnosis of distributed ...AE
Machine Learning for automated diagnosis of distributed ...AE
 
Mini datathon - Bengaluru
Mini datathon - BengaluruMini datathon - Bengaluru
Mini datathon - Bengaluru
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkMatthew Lease
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkMatthew Lease
 

More from Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical Turk
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Crowdsourcing for Information Retrieval: From Statistics to Ethics

  • 1. Crowdsourcing for Information Retrieval: From Statistics to Ethics Matt Lease School of Information University of Texas at Austin @mattlease ml@utexas.edu
  • 2. Roadmap • Scalability Challenges in IR Evaluation (brief) • Benchmarking Statistical Consensus Methods • Task Routing via Matrix Factorization • Toward Ethical Crowdsourcing Matt Lease <ml@utexas.edu> 2
  • 3. Roadmap • Scalability Challenges in IR Evaluation (brief) • Benchmarking Statistical Consensus Methods • Task Routing via Matrix Factorization • Toward Ethical Crowdsourcing Matt Lease <ml@utexas.edu> 3
  • 4. Why Evaluation at Scale? • Evaluation should closely mirror real use conditions • The best algorithm at small scale may not be best at larger scales – Banko and Brill (2001) – Halevy et al. (2009) • IR systems should be evaluated on the scale of data which users will search in practice Matt Lease <ml@utexas.edu> 4
  • 5. Why is Evaluation at Scale Hard? • Multiple ways to evaluate; consider Cranfield – Given a document collection and set of user queries – Label documents for relevance to each query – Evaluate search algorithms on these queries & documents • Labeling data is slow/expensive/difficult • Approach 1: label less data (e.g. active learning) – Pooling, metrics robust to sparse data (e.g., BPref) – Measure only relative performance (e.g., statAP, MTC) • Approach 2: label data more efficiently – Crowdsourcing (e.g., Amazon’s Mechanical Turk) Matt Lease <ml@utexas.edu> 5
  • 6. 6
  • 7. Crowdsourcing for IR Evaluation • Origin: Alonso et al. (SIGIR Forum 2008) – Continuing active area of research • Primary concern: ensuring reliable data – Reliable data provides foundation for evaluation – If QA inefficient, overhead could reduce any savings – Common strategy: ask multiple people to judge relevance, then aggregate their answers (consensus) Matt Lease <ml@utexas.edu> 7
  • 8. Roadmap • Scalability Challenges in Evaluating IR Systems • Benchmarking Statistical Consensus Methods • Task Routing via Matrix Factorization • Toward Ethical Crowdsourcing Matt Lease <ml@utexas.edu> 8
  • 9. SQUARE: A Benchmark for Research on Computing Crowd Consensus Aashish Sheshadri and M. Lease, HCOMP’13 ir.ischool.utexas.edu/square (open source) Matt Lease <ml@utexas.edu> 9
  • 10. Background • How do we resolve disagreement of multiple peoples’ answers to arrive at consensus? • Simple baseline: majority voting • Long history pre-dating crowdsourcing – Dawid and Skene’79, Smyth et al., ’95 – Recent focus on quality assurance with crowds • Many more methods, active research topic – Across many areas: ML, Vision, NLP, IR, DB, … Matt Lease <ml@utexas.edu> 10
  • 11. Why Benchmark? • Drive field innovation by clear challenge tasks – e.g., David Tse’s FIST 2012 Keynote (Comp. Biology) • Many other things we can learn – How do methods compare? • Qualitatively & quantitatively? – What is the state-of-the-art today? – What works, what doesn’t, and why? • Where is further research most needed? – How has field progressed over time? Matt Lease <ml@utexas.edu> 11
  • 12. Cons Method - - Most limited model Cannot be supervised No confusion matrix - Pros Simple, fast, no training Task-independent MV ZC Demartini’12 Worker Reliability parameters - Task-independent Can be supervised Allows priors on worker reliability & class distribution GLAD - - - - - No confusion matrix No worker priors Classification only Space prop. to num classes No worker priors Classification only Space prop. to num classes No worker priors Classification only Space prop. to num classes Automatic classifier requires feature representation Classification only Complex with many hyper-parameters. Unclear how to supervise Whitehill et al.’09 Worker Reliability & Task Difficulty params Naïve Bayes (NB) Snow et al.,’08 = D&S Model fully-supervised Dawid & Skene’79 (DS) Class priors & Worker Confusion matrices Raykar et al.’10 (RY) Worker confusion, sensitivity, specificity (Optional) Automatic Classifier - Task-independent Can be supervised Prior on class distribution - Supports multi-class tasks Models worker confusion Simple maximum-likelihood - Supports multi-class tasks Models worker confusion Unsup, semi-sup, or fully-sup - Classifier not required Priors on worker confusion and class distribution. Has multi-class support. Can be supervised. Welinder et al.’10 (CUBAM) - Worker reliability and confusion - Annotation noise Task Difficulty More Complex Method = Model + Training + Inference Confusion Matrix Detailed model of the annotation process. Can identify worker clusters . Has multi-class support. 12
  • 13. 13
  • 14. Results: Unsupervised Accuracy Relative gain/loss vs. majority voting 15% 10% 5% 0% -5% DS ZC RY GLAD CUBCAM -10% -15% BM HCB SpamCF WVSCM WB RTE TEMP WSD AC2 HC ALL 14
  • 15. Results: Varying Supervision Matt Lease <ml@utexas.edu> 15
  • 16. Findings • Majority voting never best, rarely much worse • Each method often best for some condition – E.g., original dataset designed for • DS & RY tend to perform best (RY adds priors) • No method performs far beyond others – Of course, contributions aren’t just empirical… Matt Lease <ml@utexas.edu> 16
  • 17. Why Don’t We See Bigger Gains? • Gold is too noisy to detect improvement? – Cormack & Kolcz’09, Klebanov & Beigman’10 • Limited tasks / scenarios considered? – e.g., we exclude hybrid methods & worker filtering • Might we see greater differences from – Better benchmark tests? – Better tuning of methods? – Additional methods? • We invite community contributions! Matt Lease <ml@utexas.edu> 17
  • 18. Roadmap • Scalability Challenges in Evaluating IR Systems • Benchmarking Statistical Consensus Methods • Task Routing via Matrix Factorization • Toward Ethical Crowdsourcing Matt Lease <ml@utexas.edu> 18
  • 19. Crowdsourced Task Routing via Matrix Factorization HyunJoon Jung and M. Lease arXiv 1310.5142, under review Matt Lease <ml@utexas.edu>
  • 21. Task Routing: Background • Selection vs. recommendation vs. assignment – Potential to improve work quality & satisfaction – task search time has latency & is uncompensated – Tradeoffs in push vs. pull, varying models • Many matching criteria one could consider – Preferences, Experience, Skills, Job constraints, … • References – Law and von Ahn, 2011 (Ch. 4) – Chilton et al., 2010 • MTurk “free” selection constrained by search interface Matt Lease <ml@utexas.edu> 21
  • 22. Matrix Factorization Approach • Collaborative filtering-based recommendation • Intuition: achieve similar accuracy on similar tasks – Notion is more general: e.g. preference, expertise, etc. Worker-example matrix for each task w1 Comprehensive worker-task matrix .. wm w1 0 e1 w2 w2 .. wm w1 0 w2 0 0 1 .. 1 1 0 … 1 1 1 Tn 1 1 e2 e1 1 … e2 e1 1 en … e2 1 … en N Tasks en 1 1 1 1 w1 T1 w2 .. 0.39 wm w2 .. wm T1 0.72 0.59 0.70 0.75 T2 0.5 0.54 0.66 0.73 … 0.66 0.71 0.78 0.89 Tn 0.55 w1 0.87 0.83 0.72 0.91 wm T2 0.5 0.54 0.66 0 Accumulate repeated crowdsourced data 0.78 0.83 0.89 0.72 Tabularize a worker-task relational model Matt Lease <ml@utexas.edu> Apply MF for inferring missing values Select bestpredicted workers for a target task 22
  • 23. Matrix Factorization • Automatically induce latent features – Task-independent • Popular due to robustness to sparsity – SVD sensitive matrix density; PMF much more robust M workers (M>>N) Worker Features T N tasks » Rij WT D = N-1 dimensions T  R D M Rij  Wi T j   W T ik T jk T k Task Features e.g., rating of user i for movie j W  R D N Matt Lease <ml@utexas.edu> 23
  • 24. Datasets • 3 MTurk text tasks • Simulated data 24
  • 25. Baselines • Random assignment – no accuracy prediction; just for task routing • Simple average – Average worker’s accuracies across past tasks • Weighted average – weight each task in average by similarity to target task • task similarity must be estimated from data Matt Lease <ml@utexas.edu> 25
  • 26. Estimating Task Similarity • Define by Pearson correlation over per-task accuracies of workers who perform both – Ignore any workers doing only one of the tasks Matt Lease <ml@utexas.edu> 26
  • 27. Results – RMSE & Mean Acc. (MTurk data) Average over tasks k = 1 to 20 workers Per-task & Average k=10 workers Matt Lease <ml@utexas.edu> 27
  • 28. Findings • How does MF prediction accuracy vary given task similarity, matrix size, & matrix density? – Feasible, PMF beats SVD, more data = better… • MF task routing vs. baselines? – Much better than random; baselines fine in most sparse conditions; improvement beyond that Matt Lease <ml@utexas.edu> 28
  • 29. Open Questions • Other ways to infer task similarity (e.g. textual) • Under “Big Data” conditions? • When integrating target task observations? • How to better model crowd & spam? • How to address live task routing challenges? Matt Lease <ml@utexas.edu> 29
  • 30. Roadmap • Scalability Challenges in Evaluating IR Systems • Benchmarking Statistical Consensus Methods • Task Routing via Matrix Factorization • Toward Ethical Crowdsourcing Matt Lease <ml@utexas.edu> 30
  • 31. A Few Moral Dilemmas • A “fair” price for online work in a global economy? – Is it better to pay nothing (i.e., volunteers, gamification) rather than pay something small for valuable work? • Are we obligated to inform people how their participation / work products will be used? – If my IRB doesn’t require me to obtain informed consent, is there some other moral obligation to do so? • A worker finds his ID posted in a researcher’s online source code and asks that it be removed. This can’t be done without recreating the repo, which many people use. What should be done? Matt Lease <ml@utexas.edu> 31
  • 32. Mechanical Turk is Not Anonymous Matthew Lease, Jessica Hullman, Jeffrey P. Bigham, Michael S. Bernstein, Juho Kim, Walter S. Lasecki, Saeideh Bakhshi, Tanushree Mitra, and Robert C. Miller. Online: Social Science Research Network, March 6, 2013 ssrn.com/abstract=2190946
  • 33. ` Amazon profile page URLs use the same IDs as used on MTurk How do we respond when we learn we’ve exposed people to risk? 33
  • 34. Ethical Crowdsourcing • Assume researchers have good intentions, and so issues of gross negligence are rare – Withholding promised pay after work performed – Not obtaining or complying with IRB oversight • Instead, great challenge is how to recognize our impacts appropriate actions in a complex world – Educating ourselves takes time & effort – Failing to educate ourselves could harm to others • How can we strike a reasonable balance between complete apathy vs. being overly alarmist? Matt Lease <ml@utexas.edu> 34
  • 35. CACM August, 2013 Paul Hyman. Communications of the ACM, Vol. 56 No. 8, Pages 19-21, August 2013. Matt Lease <ml@utexas.edu> 35
  • 36. • • • • • Contribute to society and human well-being Avoid harm to others Be honest and trustworthy Be fair and take action not to discriminate Respect the privacy of others COMPLIANCE WITH THE CODE. As an ACM member I will – Uphold and promote the principles of this Code – Treat violations of this code as inconsistent with membership in the ACM Matt Lease <ml@utexas.edu> 36
  • 37. CS2008 Curriculum Update (ACM, IEEE) There is reasonably wide agreement that this topic of legal, social, professional and ethical should feature in all computing degrees. …financial and economic imperatives …Which approaches are less expensive and is this sensible? With the advent of outsourcing and off-shoring these matters become more complex and take on new dimensions …there are often related ethical issues concerning exploitation… Such matters ought to feature in courses on legal, ethical and professional practice. if ethical considerations are covered only in the standalone course and not “in context,” it will reinforce the false notion that technical processes are void of ethical issues. Thus it is important that several traditional courses include modules that analyze ethical considerations in the context of the technical subject matter … It would be explicitly against the spirit of the recommendations to have only a standalone course. Matt Lease <ml@utexas.edu> 37
  • 38. “Contribute to society and human well-being; avoid harm to others” • Do we have a moral obligation to try to ascertain conditions under which work is performed? Or the impact we have upon those performing the work? • Do we feel differently when work is performed by – Political refugees? Children? Prisoners? Disabled? • How do we know who is doing the work, or if a decision to work (for a given price) is freely made? – Does it matter why someone accepts offered work? Matt Lease <ml@utexas.edu> 38
  • 40. Who are the workers? • A. Baio, November 2008. The Faces of Mechanical Turk. • P. Ipeirotis. March 2010. The New Demographics of Mechanical Turk • J. Ross, et al. Who are the Crowdworkers? CHI 2010. Matt Lease <ml@utexas.edu> 40
  • 41. Some Notable Prior Research • Silberman, Irani, and Ross (2010) – “How should we… conceptualize the role of these people who we ask to power our computing?” – “abstraction hides detail'‘ - some details may be worth keeping conspicuously present (Jessica Hullman) • Irani and Silberman (2013) – “…AMT helps employers see themselves as builders of innovative technologies, rather than employers unconcerned with working conditions.” – “…human computation currently relies on worker invisibility.” • Fort, Adda, and Cohen (2011) – “…opportunities for our community to deliberately value ethics above cost savings.” 41
  • 42. Power Asymmetry on MTurk • Mistakes happen, such as wrongly rejecting work – e.g., error by new student, software bug, poor instructions, noisy gold, etc. • How do we balance the harm caused by our mistakes to workers (our liability) vs. our cost/effort of preventing such mistakes? Matt Lease <ml@utexas.edu> 42
  • 43. Task Decomposition By minimizing context, greater task efficiency & accuracy can often be achieved in practice – e.g. “Can you name who is in this photo?” • Much research on ways to streamline work and decompose complex tasks Matt Lease <ml@utexas.edu> 43
  • 44. Context & Informed Consent • Assume we wish to obtain informed consent • Without context, consent cannot be informed – Zittrain, Ubiquitous human computing (2008) 44
  • 45. Independent Contractors vs. Employees • Wolfson & Lease, ASIS&T’11 • Many platforms classify workers as independent contractors (piece-work, not hourly) – Legislators/courts must ultimately decide • Different work classifications yield different legal rights/protections & responsibilities – Domestic vs. international workers – Employment taxes – Litigation can both cause or redress harm • Law aside, to what extent do moral principles underlying current laws apply to online work? Matt Lease <ml@utexas.edu> 45
  • 46. Consequences of Human Computation as a Panacea where AI Falls Short • • • • The Googler who Looked at the Worst of the Internet Policing the Web’s Lurid Precincts Facebook content moderation The dirty job of keeping Facebook clean • Even linguistic annotators report stress & nightmares from reading news articles! Matt Lease <ml@utexas.edu> 46
  • 47. What about Freedom? • Crowdsourcing vision: empowering freedom – work whenever you want for whomever you want • Risk: people compelled to perform work – Chinese prisoners farming gold online – Digital sweat shops? Digital slaves? – We know relatively little today about work conditions – How might we monitor and mitigate risk/growth of crowd work inflicting harm to at-risk populations? – Traction? Human Trafficking at MSR Summit’12 Matt Lease <ml@utexas.edu> 47
  • 48. Robert Sim, MSR Summit’12 Matt Lease <ml@utexas.edu> 48
  • 49. Join the conversation! Crowdwork-ethics, by Six Silberman http://crowdwork-ethics.wtf.tw an informal, occasional blog for researchers interested in ethical issues in crowd work Matt Lease <ml@utexas.edu> 49
  • 50. The Future of Crowd Work, CSCW’13 Kittur, Nickerson, Bernstein, Gerber, Shaw, Zimmerman, Lease, and Horton Matt Lease <ml@utexas.edu> 50
  • 51. Additional References • Irani, Lilly C. The Ideological Work of Microwork. In preparation, draft available online. • Adda, Gilles, et al. Crowdsourcing for language resource development: Critical analysis of amazon mechanical turk overpowering use. Proceedings of the 5th Language and Technology Conference (LTC). 2011. • Adda, Gilles, and Joseph J. Mariani. Economic, Legal and Ethical analysis of Crowdsourcing for Speech Processing. (2013). • Harris, Christopher G., and Padmini Srinivasan. Crowdsourcing and Ethics. Security and Privacy in Social Networks. 67-83. 2013. • Harris, Christopher G. Dirty Deeds Done Dirt Cheap: A Darker Side to Crowdsourcing. IEEE 3rd conference on social computing (socialcom). 2011. • Horton, John J. The condition of the Turking class: Are online employers fair and honest?. Economics Letters 111.1 (2011): 10-12. Matt Lease <ml@utexas.edu> 51
  • 52. Additional References (2) • Bederson, B. B., & Quinn, A. J. Web workers unite! addressing challenges of online laborers. In CHI 2011 Human Computation Workshop, 97-106. • Bederson, B. B., & Quinn, A. J. Participation in Human Computation. In CHI 2011 Human Computation Workshop. • Felstiner, Alek. Working the Crowd: Employment and Labor Law in the Crowdsourcing Industry. Berkeley J. Employment & Labor Law 32.1 2011 • Felstiner, Alek. Sweatshop or Paper Route?: Child Labor Laws and InGame Work. CrowdConf (2010). • Larson, Martha. Toward Responsible and Sustainable Crowsourcing. Blog post + Slides from Dagstuhl, September 2013. • Vili Lehdonvirta and Paul Mezier. Identity and Self-Organization in Unstructured Work. Unpublished working paper. 16 October 2013. • Zittrain, Jonathan. Minds for Sale. You Tube. Matt Lease <ml@utexas.edu> 52
  • 53. Thank You! See also: SIAM’13 Tutorial Slides: www.slideshare.net/mattlease ir.ischool.utexas.edu Matt Lease <ml@utexas.edu> 53