SlideShare a Scribd company logo
1 of 189
Crowdsourcing for Research & Engineering
                                                                 Omar Alonso
                                                                 Microsoft

                                                                 Matthew Lease
                                                                 University of Texas at Austin

                                                                 November 1, 2011




 November 1, 2011   Crowdsourcing for Research and Engineering                         1
Tutorial Objectives
• What is crowdsourcing?
• How and when to use crowdsourcing?
• How to use Mechanical Turk
• Experimental setup and design guidelines for
  working with the crowd
• Quality control: issues, measuring, and improving
• Future trends
• Research landscape and open challenges

November 1, 2011      Crowdsourcing for Research and Engineering   2
Tutorial Outline
I. Introduction to Crowdsourcing
     I. Introduction, Examples, Terminology
     II. Primary focus on micro-tasks
II. Tools and Platforms: APIs and Examples




III. Methodology for effective crowdsourcing
     I. Methods, Examples, and Tips
     II. Quality control: monitoring & improving
IV. Future Trends
November 1, 2011     Crowdsourcing for Research and Engineering   3
I
INTRODUCTION TO CROWDSOURCING

November 1, 2011   Crowdsourcing for Research and Engineering   4
From Outsourcing to Crowdsourcing
• Take a job traditionally
  performed by a known agent
  (often an employee)
• Outsource it to an undefined,
  generally large group of
  people via an open call
• New application of principles
  from open source movement
• Evolving & broadly defined ...
 November 1, 2011   Crowdsourcing for Research and Engineering   5
Examples




Crowdsourcing 101: Putting the WSDM of Crowds to Work for You.   6
November 1, 2011   Crowdsourcing for Research and Engineering   7
Crowdsourcing models
•   Virtual work, Micro-tasks, & Aggregators
•   Open Innovation, Co-Creation, & Contests
•   Citizen Science
•   Prediction Markets
•   Crowd Funding and Charity
•   “Gamification” (not serious gaming)
•   Transparent
•   cQ&A, Social Search, and Polling
•   Human Sensing
November 1, 2011       Crowdsourcing for Research and Engineering   8
What is Crowdsourcing?
• A collection of mechanisms and associated
  methodologies for scaling and directing crowd
  activities to achieve some goal(s)
• Enabled by internet-connectivity
• Many related areas
      –   Collective intelligence
      –   Social computing
      –   People services
      –   Human computation (next slide…)
• Good work is creative, innovative, surprising, …
November 1, 2011        Crowdsourcing for Research and Engineering   9
Human Computation
• Having people do stuff instead of computers
• Investigates use of people to execute certain
  computations for which capabilities of current
  automated methods are more limited
• Explores the metaphor of computation for
  characterizing attributes, capabilities, and
  limitations of human performance in executing
  desired tasks
• Computation is required, crowd is not
• Pioneer: Luis von Ahn’s thesis (2005)
November 1, 2011      Crowdsourcing for Research and Engineering   10
What is not crowdsourcing?
• Ingredients necessary but not sufficient
      – A crowd
      – Digital communication
• Post-hoc use of undirected crowd behaviors
      – e.g. Data mining, visualization
• Conducting a traditional survey or poll
• Human Computation with one or few people
      – E.g. traditional active learning
• …

November 1, 2011       Crowdsourcing for Research and Engineering   11
Crowdsourcing Key Questions
• What are the goals?
      – Purposeful directing of human activity

• How can you incentivize participation?
      – Incentive engineering
      – Who are the target participants?

• Which model(s) are most appropriate?
      – How to adapt them to your context and goals?
November 1, 2011    Crowdsourcing for Research and Engineering   12
What do you want to accomplish?
• Perform specified task(s)
• Innovate and/or discover
• Create
• Predict
• Fund
• Learn
• Monitor
November 1, 2011   Crowdsourcing for Research and Engineering
Why Should Anyone Participate?




Don’t let this happen to you …
November 1, 2011   Crowdsourcing for Research and Engineering   14
Incentive Engineering
• Earn Money (real or virtual)
• Have fun (or pass the time)
• Socialize with others
• Obtain recognition or prestige (leaderboards, badges)
• Do Good (altruism)
• Learn something new
• Obtain something else
• Create self-serving resource

Multiple incentives can often operate in parallel (*caveat)
November 1, 2011       Crowdsourcing for Research and Engineering   15
Models: Goal(s) + Incentives
•   Virtual work, Micro-tasks, & Aggregators
•   Open Innovation, Co-Creation, & Contests
•   Citizen Science
•   Prediction Markets
•   Crowd Funding and Charity
•   “Gamification” (not serious gaming)
•   Transparent
•   cQ&A, Social Search, and Polling
•   Human Sensing
November 1, 2011   Crowdsourcing for Research and Engineering   16
Example: Wikipedia
• Earn Money (real or virtual)
• Have fun (or pass the time)
• Socialize with others
• Obtain recognition or prestige
• Do Good (altruism)
• Learn something new
• Obtain something else
• Create self-serving resource

November 1, 2011      Crowdsourcing for Research and Engineering   17
Example:
• Earn Money (real or virtual)
• Have fun (or pass the time)
• Socialize with others
• Obtain recognition or prestige
• Do Good (altruism)
• Learn something new
• Obtain something else
• Create self-serving resource

November 1, 2011    Crowdsourcing for Research and Engineering   18
Example: ESP and GWaP
L. Von Ahn and L. Dabbish (2004)




November 1, 2011        Crowdsourcing for Research and Engineering   19
Example: ESP
• Earn Money (real or virtual)
• Have fun (or pass the time)
• Socialize with others
• Obtain recognition or prestige
• Do Good (altruism)
• Learn something new
• Obtain something else
• Create self-serving resource

November 1, 2011   Crowdsourcing for Research and Engineering   20
Example: fold.it
S. Cooper et al. (2010)




Alice G. Walton. Online Gamers Help Solve Mystery of
Critical AIDS Virus Enzyme. The Atlantic, October 8, 2011.
November 1, 2011    Crowdsourcing for Research and Engineering   21
Example: fold.it
• Earn Money (real or virtual)
• Have fun (or pass the time)
• Socialize with others
• Obtain recognition or prestige
• Do Good (altruism)
• Learn something new
• Obtain something else
• Create self-serving resource

November 1, 2011    Crowdsourcing for Research and Engineering   22
Example: FreeRice




November 1, 2011     Crowdsourcing for Research and Engineering   23
Example: FreeRice
• Earn Money (real or virtual)
• Have fun (or pass the time)
• Socialize with others
• Obtain recognition or prestige
• Do Good (altruism)
• Learn something new
• Obtain something else
• Create self-serving resource

November 1, 2011     Crowdsourcing for Research and Engineering   24
Example: cQ&A, Social Search, & Polling




November 1, 2011   Crowdsourcing for Research and Engineering   25
Example: cQ&A
• Earn Money (real or virtual)
• Have fun (or pass the time)
• Socialize with others
• Obtain recognition or prestige
• Do Good (altruism)
• Learn something new
• Obtain something else
• Create self-serving resource

November 1, 2011    Crowdsourcing for Research and Engineering   26
Example: reCaptcha




November 1, 2011      Crowdsourcing for Research and Engineering   27
Example: reCaptcha
• Earn Money (real or virtual)
• Have fun (or pass the time)
• Socialize with others
• Obtain recognition or prestige
• Do Good (altruism)                                               Is there an existing human
                                                                   activity you can harness
• Learn something new                                              for another purpose?

• Obtain something else
• Create self-serving resource

November 1, 2011      Crowdsourcing for Research and Engineering                           28
Example: Mechanical Turk




         J. Pontin. Artificial Intelligence, With Help From
         the Humans. New York Times (March 25, 2007)
 November 1, 2011      Crowdsourcing for Research and Engineering   29
Example: Mechanical Turk
• Earn Money (real or virtual)
• Have fun (or pass the time)
• Socialize with others
• Obtain recognition or prestige
• Do Good (altruism)
• Learn something new
• Obtain something else
• Create self-serving resource

November 1, 2011   Crowdsourcing for Research and Engineering   30
Look Before You Leap a
• Wolfson & Lease (2011)
• Identify a few potential legal pitfalls to know about
  when considering crowdsourcing
   –   employment law
   –   patent inventorship
   –   data security and the Federal Trade Commission
   –   copyright ownership
   –   securities regulation of crowdfunding
• Take-away: don’t panic, just be mindful of the law
 November 1, 2011        Crowdsourcing for Research and Engineering   31
Example: SamaSource




 Incentive for YOU: Do Good
 Terminology: channels
November 1, 2011       Crowdsourcing for Research and Engineering   32
Who are
the workers?


• A. Baio, November 2008. The Faces of Mechanical Turk.
• P. Ipeirotis. March 2010. The New Demographics of
  Mechanical Turk
• J. Ross, et al. Who are the Crowdworkers?... CHI 2010.
 November 1, 2011   Crowdsourcing for Research and Engineering   33
MTurk Demographics
• 2008-2009 studies found
  less global and diverse
  than previously thought
      – US
      – Female
      – Educated
      – Bored
      – Money is secondary

November 1, 2011      Crowdsourcing for Research and Engineering   34
2010 shows increasing diversity
47% US, 34% India, 19% other (P. Ipeitorotis. March 2010)




 November 1, 2011   Crowdsourcing for Research and Engineering   35
MICRO-TASKS++ AND EXAMPLES

November 1, 2011        Crowdsourcing for Research and Engineering   36
Chess machine unveiled in 1770 by Wolfgang von Kempelen (1734–1804)

•     “Micro-task” crowdsourcing marketplace
•     On-demand, scalable, real-time workforce
•     Online since 2005 (and still in “beta”)
•     Programmer’s API & “Dashboard” GUI
•     Sponsorship: TREC 2011 Crowdsourcing Track (pending)
    November 1, 2011            Crowdsourcing for Research and Engineering           37
Does anyone really use it? Yes!




   http://www.mturk-tracker.com (P. Ipeirotis’10)

From 1/09 – 4/10, 7M HITs from 10K requestors
worth $500,000 USD (significant under-estimate)
 November 1, 2011   Crowdsourcing for Research and Engineering   38
• Labor on-demand, Channels, Quality control features
• Sponsorship
   – Research Workshops: CSE’10, CSDM’11, CIR’11,
   – TREC 2011 Crowdsourcing Track




   November 1, 2011   Crowdsourcing for Research and Engineering   39
CloudFactory
•   Information below from Mark Sears (Oct. 18, 2011)
•   Cloud Labor API
      – Tools to design virtual assembly lines
      – workflows with multiple tasks chained together
•   Focus on self serve tools for people to easily design crowd-powered assembly lines
    that can be easily integrated into software applications
•   Interfaces: command-line, RESTful API, and Web
•   Each “task station” can have either a human or robot worker assigned
      – web software services (AlchemyAPI, SendGrid, Google APIs, Twilio, etc.) or local software can
        be combined with human computation
•   Many built-in "best practices"
      – “Tournament Stations” where multiple results are compared by a other cloud workers until
        confidence of best answer is reached
      – “Improver Stations” have workers improve and correct work by other workers
      – Badges are earned by cloud workers passing tests created by requesters
      – Training and tools to create skill tests will be flexible
      – Algorithms to detect and kick out spammers/cheaters/lazy/bad workers
•   Sponsorship: TREC 2012 Crowdsourcing Track
November 1, 2011                 Crowdsourcing for Research and Engineering                        40
More Crowd Labor Platforms
•    Clickworker
•    CloudCrowd
•    CrowdSource
•    DoMyStuff
•    Humanoid (by Matt Swason et al.)
•    Microtask
•    MobileWorks (by Anand Kulkarni )
•    myGengo
•    SmartSheet
•    vWorker
•    Industry heavy-weights
       –   Elance
       –   Liveops
       –   oDesk
       –   uTest
•    and more…

    November 1, 2011       Crowdsourcing for Research and Engineering   41
Why Micro-Tasks?
• Easy, cheap and fast
• Ready-to use infrastructure, e.g.
      – MTurk payments, workforce, interface widgets
      – CrowdFlower quality control mechanisms, etc.
      – Many others …
• Allows early, iterative, frequent trials
      – Iteratively prototype and test new ideas
      – Try new tasks, test when you want & as you go
• Many successful examples of use reported
November 1, 2011     Crowdsourcing for Research and Engineering   42
Micro-Task Issues
• Process
      – Task design, instructions, setup, iteration
• Choose crowdsourcing platform (or roll your own)
• Human factors
      – Payment / incentives, interface and interaction design,
        communication, reputation, recruitment, retention
• Quality Control / Data Quality
      – Trust, reliability, spam detection, consensus labeling


November 1, 2011      Crowdsourcing for Research and Engineering   43
Legal Disclaimer:
             Caution Tape and Silver Bullets




• Often still involves more art than science
• Not a magic panacea, but another alternative
   – one more data point for analysis, complements other methods
• Quality may be traded off for time/cost/effort
• Hard work & experimental design still required!
 November 1, 2011    Crowdsourcing for Research and Engineering   44
Hello World Demo
• We’ll show a simple, short demo of MTurk
• This is a teaser highlighting things we’ll discuss
      – Don’t worry about details; we’ll revisit them
• Specific task unimportant
• Big idea: easy, fast, cheap to label with MTurk!




November 1, 2011     Crowdsourcing for Research and Engineering   45
Jane saw the man with the binoculars




November 1, 2011   Crowdsourcing for Research and Engineering   46
DEMO


November 1, 2011   Crowdsourcing for Research and Engineering   47
Traditional Data Collection
• Setup data collection software / harness
• Recruit participants
• Pay a flat fee for experiment or hourly wage

• Characteristics
      –   Slow
      –   Expensive
      –   Tedious
      –   Sample Bias

November 1, 2011        Crowdsourcing for Research and Engineering   48
Research Using Micro-Tasks
• Let’s see examples of micro-task usage
      – Many areas: IR, NLP, computer vision, user studies,
        usability testing, psychological studies, surveys, …


• Check bibliography at end for more references




November 1, 2011     Crowdsourcing for Research and Engineering   49
NLP Example – Dialect Identification




November 1, 2011   Crowdsourcing for Research and Engineering   50
NLP Example – Spelling correction




November 1, 2011   Crowdsourcing for Research and Engineering   51
NLP Example – Machine Translation
• Manual evaluation on translation quality is
  slow and expensive
• High agreement between non-experts and
  experts
• $0.10 to translate a sentence


   C. Callison-Burch. “Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk”, EMNLP 2009.

   B. Bederson et al. Translation by Interactive Collaboration between Monolingual Users, GI 2010



November 1, 2011                        Crowdsourcing for Research and Engineering                                          52
NLP Example – Snow et al. (2008)

• 5 Tasks
      –   Affect recognition
      –   Word similarity
      –   Recognizing textual entailment
      –   Event temporal ordering
      –   Word sense disambiguation
• high agreement between crowd
  labels and expert “gold” labels
      – assumes training data for worker bias correction
• 22K labels for $26 !
November 1, 2011       Crowdsourcing for Research and Engineering   53
Computer Vision – Painting Similarity




                                       Kovashka & Lease, CrowdConf’10

November 1, 2011   Crowdsourcing for Research and Engineering           54
User Studies
• Investigate attitudes about saving, sharing, publishing,
  and removing online photos
• Survey
      – A scenario-based probe of respondent attitudes, designed
        to yield quantitative data
      – A set of questions (close and open-ended)
      – Importance of recent activity
      – 41 question
      – 7 point scale
• 250 respondents

   C. Marshall and F. Shipman. “The Ownership and Reuse of Visual Media”, JCDL 2011.


November 1, 2011                        Crowdsourcing for Research and Engineering     55
Remote Usability Testing
• Liu et al. (in preparation)
• Compares remote usability testing using MTurk and
  CrowdFlower (not uTest) vs. traditional on-site testing
• Advantages
      –   More Participants
      –   More Diverse Participants
      –   High Speed
      –   Low Cost
• Disadvantages
      –   Lower Quality Feedback
      –   Less Interaction
      –   Greater need for quality control
      –   Less Focused User Groups
November 1, 2011          Crowdsourcing for Research and Engineering   56
IR Example – Relevance and ads




November 1, 2011   Crowdsourcing for Research and Engineering   57
IR Example – Product Search




November 1, 2011   Crowdsourcing for Research and Engineering   58
IR Example – Snippet Evaluation
•    Study on summary lengths
•    Determine preferred result length
•    Asked workers to categorize web queries
•    Asked workers to evaluate snippet quality
•    Payment between $0.01 and $0.05 per HIT


    M. Kaisser, M. Hearst, and L. Lowe. “Improving Search Results Quality by Customizing Summary Lengths”, ACL/HLT, 2008.




November 1, 2011                         Crowdsourcing for Research and Engineering                                         59
IR Example – Relevance Assessment
•     Replace TREC-like relevance assessors with MTurk?
•     Selected topic “space program” (011)
•     Modified original 4-page instructions from TREC
•     Workers more accurate than original assessors!
•     40% provided justification for each answer


    O. Alonso and S. Mizzaro. “Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment”, SIGIR Workshop
    on the Future of IR Evaluation, 2009.



    November 1, 2011                          Crowdsourcing for Research and Engineering                                           60
IR Example – Timeline Annotation
• Workers annotate timeline on politics, sports, culture
• Given a timex (1970s, 1982, etc.) suggest something
• Given an event (Vietnam, World cup, etc.) suggest a timex




 K. Berberich, S. Bedathur, O. Alonso, G. Weikum “A Language Modeling Approach for Temporal Information Needs”. ECIR 2010




 November 1, 2011                       Crowdsourcing for Research and Engineering                                          61
How can I get started?

• You have an idea
• Easy, cheap, fast, and iterative sounds good


Can you test your idea via crowdsourcing?
• Is my idea crowdsourcable?
• How do I start?
• What do I need?
November 1, 2011       Crowdsourcing for Research and Engineering   62
Tip for Getting Started: do work
Try doing work before you create work for others!




November 1, 2011   Crowdsourcing for Research and Engineering   63
II
              AMAZON MECHANICAL TURK

November 1, 2011   Crowdsourcing for Research and Engineering   64
Mechanical What?




November 1, 2011     Crowdsourcing for Research and Engineering   65
MTurk: The Requester
•   Sign up with your Amazon account
•   Amazon payments
•   Purchase prepaid HITs
•   There is no minimum or up-front fee
•   MTurk collects a 10% commission
•   The minimum commission charge is $0.005 per HIT




November 1, 2011       Crowdsourcing for Research and Engineering   66
MTurk Dashboard
• Three tabs
      – Design
      – Publish
      – Manage
• Design
      – HIT Template
• Publish
      – Make work available
• Manage
      – Monitor progress


November 1, 2011       Crowdsourcing for Research and Engineering   67
MTurk: Dashboard - II




November 1, 2011       Crowdsourcing for Research and Engineering   68
MTurk API
•   Amazon Web Services API
•   Rich set of services
•   Command line tools
•   More flexibility than dashboard




November 1, 2011   Crowdsourcing for Research and Engineering   69
MTurk Dashboard vs. API
• Dashboard
      – Easy to prototype
      – Setup and launch an experiment in a few minutes
• API
      – Ability to integrate AMT as part of a system
      – Ideal if you want to run experiments regularly
      – Schedule tasks


November 1, 2011        Crowdsourcing for Research and Engineering   70
Working on MTurk
• Sign up with your Amazon account
• Tabs
      – Account: work approved/rejected
      – HIT: browse and search for work
      – Qualifications: browse & search qualifications
• Start turking!



November 1, 2011     Crowdsourcing for Research and Engineering   71
Why Eytan Adar hates MTurk Research
          (at least sort of)
• Overly-narrow focus on Turk & other platforms
      – Identify general vs. platform-specific problems
      – Academic vs. Industrial problems
• Lack of appreciation of interdisciplinary nature
      – Some problems well-studied in other areas
      – Human behavior hasn’t changed much
• Turks aren’t Martians
      – How many prior user studies do we have to
        reproduce on MTurk before we can get over it?
November 1, 2011     Crowdsourcing for Research and Engineering   72
III
        RELEVANCE JUDGING & CROWDSOURCING

November 1, 2011   Crowdsourcing for Research and Engineering   73
November 1, 2011   Crowdsourcing for Research and Engineering   74
Motivating Example: Relevance Judging

• Relevance of search results is difficult to judge
      – Highly subjective
      – Expensive to measure
• Professional editors commonly used
• Potential benefits of crowdsourcing
      – Scalability (time and cost)
      – Diversity of judgments


November 1, 2011     Crowdsourcing for Research and Engineering   75
November 1, 2011   Crowdsourcing for Research and Engineering   76
Started with a joke …




November 1, 2011   Crowdsourcing for Research and Engineering   77
Results for {idiot} at WSDM 2011
February 2011: 5/7 (R), 2/7 (NR)
    –   Most of the time those TV reality stars have absolutely no talent. They do whatever
        they can to make a quick dollar. Most of the time the reality tv stars don not have
        a mind of their own.   R
    –   Most are just celebrity wannabees. Many have little or no talent, they just want
        fame. R
    –   I can see this one going both ways. A particular sort of reality star comes to
        mind, though, one who was voted off Survivor because he chose not to use his
        immunity necklace. Sometimes the label fits, but sometimes it might be unfair. R
    –   Just because someone else thinks they are an "idiot", doesn't mean that is what the
        word means. I don't like to think that any one person's photo would be used to
        describe a certain term.   NR
    –   While some reality-television stars are genuinely stupid (or cultivate an image of
        stupidity), that does not mean they can or should be classified as "idiots." Some
        simply act that way to increase their TV exposure and potential earnings. Other
        reality-television stars are really intelligent people, and may be considered as
        idiots by people who don't like them or agree with them. It is too subjective an
        issue to be a good result for a search engine. NR
    –   Have you seen the knuckledraggers on reality television? They should be required to
        change their names to idiot after appearing on the show. You could put numbers
        after the word idiot so we can tell them apart. R
    –   Although I have not followed too many of these shows, those that I have encountered
        have for a great part a very common property. That property is that most of the
        participants involved exhibit a shallow self-serving personality that borders on
        social pathological behavior. To perform or act in such an abysmal way could only
        be an act of an idiot. R
 November 1, 2011             Crowdsourcing for Research and Engineering               78
Two Simple Examples of MTurk
1. Ask workers to classify a query
2. Ask workers to judge document relevance

Steps
• Define high-level task
• Design & implement interface & backend
• Launch, monitor progress, and assess work
• Iterate design

November 1, 2011   Crowdsourcing for Research and Engineering   79
Query Classification Task
•   Ask the user to classify a query
•   Show a form that contains a few categories
•   Upload a few queries (~20)
•   Use 3 workers




November 1, 2011         Crowdsourcing for Research and Engineering   80
DEMO


November 1, 2011   Crowdsourcing for Research and Engineering   81
November 1, 2011   Crowdsourcing for Research and Engineering   82
Relevance Judging Task
• Use a few documents from a standard
  collection used for evaluating search engines
• Ask user to make binary judgments
• Modification: graded judging
• Use 5 workers




November 1, 2011        Crowdsourcing for Research and Engineering   83
DEMO


November 1, 2011   Crowdsourcing for Research and Engineering   84
IV
                   METHODOLOGY FOR EFFECTIVE
                            CROWDSOURCING
November 1, 2011      Crowdsourcing for Research and Engineering   85
November 1, 2011   Crowdsourcing for Research and Engineering   86
Typical Workflow
•   Define and design what to test
•   Sample data
•   Design the experiment
•   Run experiment
•   Collect data and analyze results
•   Quality control



November 1, 2011     Crowdsourcing for Research and Engineering   87
Development Framework
• Incremental approach
• Measure, evaluate, and adjust as you go
• Suitable for repeatable tasks




November 1, 2011        Crowdsourcing for Research and Engineering   88
Survey Design
•   One of the most important parts
•   Part art, part science
•   Instructions are key
•   Prepare to iterate




November 1, 2011   Crowdsourcing for Research and Engineering   89
Questionnaire Design
• Ask the right questions
• Workers may not be IR experts so don’t
  assume the same understanding in terms of
  terminology
• Show examples
• Hire a technical writer
      – Engineer writes the specification
      – Writer communicates

November 1, 2011       Crowdsourcing for Research and Engineering   90
UX Design
• Time to apply all those usability concepts
• Generic tips
      – Experiment should be self-contained.
      – Keep it short and simple. Brief and concise.
      – Be very clear with the relevance task.
      – Engage with the worker. Avoid boring stuff.
      – Always ask for feedback (open-ended question) in
        an input box.

November 1, 2011    Crowdsourcing for Research and Engineering   91
UX Design - II
•   Presentation
•   Document design
•   Highlight important concepts
•   Colors and fonts
•   Need to grab attention
•   Localization



November 1, 2011   Crowdsourcing for Research and Engineering   92
Examples - I
• Asking too much, task not clear, “do NOT/reject”
• Worker has to do a lot of stuff




November 1, 2011   Crowdsourcing for Research and Engineering   93
Example - II
• Lot of work for a few cents
• Go here, go there, copy, enter, count …




November 1, 2011   Crowdsourcing for Research and Engineering   94
A Better Example
• All information is available
      – What to do
      – Search result
      – Question to answer




November 1, 2011        Crowdsourcing for Research and Engineering   95
November 1, 2011   Crowdsourcing for Research and Engineering   96
Form and Metadata
• Form with a close question (binary relevance) and
  open-ended question (user feedback)
• Clear title, useful keywords
• Workers need to find your task




November 1, 2011      Crowdsourcing for Research and Engineering   97
Relevance Judging – Example I




November 1, 2011   Crowdsourcing for Research and Engineering   98
Relevance Judging – Example II




November 1, 2011    Crowdsourcing for Research and Engineering   99
Implementation
• Similar to a UX
• Build a mock up and test it with your team
      – Yes, you need to judge some tasks
• Incorporate feedback and run a test on MTurk
  with a very small data set
      – Time the experiment
      – Do people understand the task?
• Analyze results
      – Look for spammers
      – Check completion times
• Iterate and modify accordingly
November 1, 2011     Crowdsourcing for Research and Engineering   100
Implementation – II
• Introduce quality control
      – Qualification test
      – Gold answers (honey pots)
•   Adjust passing grade and worker approval rate
•   Run experiment with new settings & same data
•   Scale on data
•   Scale on workers

November 1, 2011      Crowdsourcing for Research and Engineering   101
Experiment in Production
•   Lots of tasks on MTurk at any moment
•   Need to grab attention
•   Importance of experiment metadata
•   When to schedule
      – Split a large task into batches and have 1 single
        batch in the system
      – Always review feedback from batch n before
        uploading n+1

November 1, 2011     Crowdsourcing for Research and Engineering   102
How Much to Pay?
• Price commensurate with task effort
      – Ex: $0.02 for yes/no answer + $0.02 bonus for optional feedback
• Ethics & market-factors: W. Mason and S. Suri, 2010.
      – e.g. non-profit SamaSource contracts workers refugee camps
      – Predict right price given market & task: Wang et al. CSDM’11
• Uptake & time-to-completion vs. Cost & Quality
      – Too little $$, no interest or slow – too much $$, attract spammers
      – Real problem is lack of reliable QA substrate
• Accuracy & quantity
      – More pay = more work, not better (W. Mason and D. Watts, 2009)
• Heuristics: start small, watch uptake and bargaining feedback
• Worker retention (“anchoring”)
See also: L.B. Chilton et al. KDD-HCOMP 2010.
   November 1, 2011                    Crowdsourcing for Research and Engineering   103
November 1, 2011   Crowdsourcing for Research and Engineering   104
Quality Control in General
• Extremely important part of the experiment
• Approach as “overall” quality; not just for workers
• Bi-directional channel
   – You may think the worker is doing a bad job.
   – The same worker may think you are a lousy requester.




 November 1, 2011     Crowdsourcing for Research and Engineering   105
When to assess quality of work
• Beforehand (prior to main task activity)
      – How: “qualification tests” or similar mechanism
      – Purpose: screening, selection, recruiting, training
• During
      – How: assess labels as worker produces them
            • Like random checks on a manufacturing line
      – Purpose: calibrate, reward/penalize, weight
• After
      – How: compute accuracy metrics post-hoc
      – Purpose: filter, calibrate, weight, retain (HR)
      – E.g. Jung & Lease (2011), Tang & Lease (2011), ...
November 1, 2011          Crowdsourcing for Research and Engineering   106
How to assess quality of work?
• Compare worker’s label vs.
      – Known (correct, trusted) label
      – Other workers’ labels
            • P. Ipeirotis. Worker Evaluation in Crowdsourcing: Gold Data or
              Multiple Workers? Sept. 2010.
      – Model predictions of the above
            • Model the labels (Ryu & Lease, ASIS&T11)
            • Model the workers (Chen et al., AAAI’10)
• Verify worker’s label
      – Yourself
      – Tiered approach (e.g. Find-Fix-Verify)
            • Quinn and B. Bederson’09, Bernstein et al.’10
November 1, 2011           Crowdsourcing for Research and Engineering   107
Typical Assumptions
• Objective truth exists
      – no minority voice / rare insights
      – Can relax this to model “truth distribution”
• Automatic answer comparison/evaluation
      – What about free text responses? Hope from NLP…
            • Automatic essay scoring
            • Translation (BLEU: Papineni, ACL’2002)
            • Summarization (Rouge: C.Y. Lin, WAS’2004)
      – Have people do it (yourself or find-verify crowd, etc.)
November 1, 2011         Crowdsourcing for Research and Engineering   108
Distinguishing Bias vs. Noise
• Ipeirotis (HComp 2010)
• People often have consistent, idiosyncratic
  skews in their labels (bias)
      – E.g. I like action movies, so they get higher ratings
• Once detected, systematic bias can be
  calibrated for and corrected (yeah!)
• Noise, however, seems random & inconsistent
      – this is the real issue we want to focus on

November 1, 2011     Crowdsourcing for Research and Engineering   109
Comparing to known answers
• AKA: gold, honey pot, verifiable answer, trap
• Assumes you have known answers
• Cost vs. Benefit
      – Producing known answers (experts?)
      – % of work spent re-producing them
• Finer points
      – Controls against collusion
      – What if workers recognize the honey pots?

November 1, 2011    Crowdsourcing for Research and Engineering   110
Comparing to other workers
•   AKA: consensus, plurality, redundant labeling
•   Well-known metrics for measuring agreement
•   Cost vs. Benefit: % of work that is redundant
•   Finer points
      – Is consensus “truth” or systematic bias of group?
      – What if no one really knows what they’re doing?
            • Low-agreement across workers indicates problem is with the
              task (or a specific example), not the workers
      – Risk of collusion
• Sheng et al. (KDD 2008)
November 1, 2011          Crowdsourcing for Research and Engineering   111
Comparing to predicted label
• Ryu & Lease, ASIS&T11 (CrowdConf’11 poster)
• Catch-22 extremes
      – If model is really bad, why bother comparing?
      – If model is really good, why collect human labels?
• Exploit model confidence
      – Trust predictions proportional to confidence
      – What if model very confident and wrong?
• Active learning
      – Time sensitive: Accuracy / confidence changes
November 1, 2011     Crowdsourcing for Research and Engineering   112
Compare to predicted worker labels
• Chen et al., AAAI’10
• Avoid inefficiency of redundant labeling
      – See also: Dekel & Shamir (COLT’2009)
• Train a classifier for each worker
• For each example labeled by a worker
      – Compare to predicted labels for all other workers
• Issues
     • Sparsity: workers have to stick around to train model…
     • Time-sensitivity: New workers & incremental updates?

November 1, 2011      Crowdsourcing for Research and Engineering   113
Methods for measuring agreement
• What to look for
      – Agreement, reliability, validity
• Inter-agreement level
      – Agreement between judges
      – Agreement between judges and the gold set
• Some statistics
      –   Percentage agreement
      –   Cohen’s kappa (2 raters)
      –   Fleiss’ kappa (any number of raters)
      –   Krippendorff’s alpha
• With majority vote, what if 2 say relevant, 3 say not?
      – Use expert to break ties (Kochhar et al, HCOMP’10; GQR)
      – Collect more judgments as needed to reduce uncertainty
November 1, 2011         Crowdsourcing for Research and Engineering   114
Inter-rater reliability
• Lots of research
• Statistics books cover most of the material
• Three categories based on the goals
      – Consensus estimates
      – Consistency estimates
      – Measurement estimates




November 1, 2011       Crowdsourcing for Research and Engineering   115
Sample code
      – R packages psy and irr
      >library(psy)
      >library(irr)
      >my_data <- read.delim(file="test.txt",
        head=TRUE, sep="t")
      >kappam.fleiss(my_data,exact=FALSE)

      >my_data2 <- read.delim(file="test2.txt",
        head=TRUE, sep="t")
      >ckappa(my_data2)




November 1, 2011    Crowdsourcing for Research and Engineering   116
k coefficient
• Different interpretations of k
• For practical purposes you need to be >= moderate
• Results may vary
           k                               Interpretation
           <0                              Poor agreement
           0.01 – 0.20                     Slight agreement
           0.21 – 0.40                     Fair agreement
           0.41 – 0.60                     Moderate agreement
           0.61 – 0.80                     Substantial agreement
           0.81 – 1.00                     Almost perfect agreement


November 1, 2011         Crowdsourcing for Research and Engineering   117
Detection Theory
• Sensitivity measures
      – High sensitivity: good ability to discriminate
      – Low sensitivity: poor ability
                   Stimulus         “Yes”                  “No”
                   Class
                   S1               Hits                   Misses
                   S2               False alarms           Correct
                                                           rejections

           Hit rate H = P(“yes”|S2)
           False alarm rate F = P(“yes”|S1)


November 1, 2011              Crowdsourcing for Research and Engineering   118
November 1, 2011   Crowdsourcing for Research and Engineering   119
Finding Consensus
• When multiple workers disagree on the
  correct label, how do we resolve this?
      – Simple majority vote (or average and round)
      – Weighted majority vote (e.g. naive bayes)
• Many papers from machine learning…
• If wide disagreement, likely there is a bigger
  problem which consensus doesn’t address


November 1, 2011     Crowdsourcing for Research and Engineering   120
Quality Control on MTurk
• Rejecting work & Blocking workers (more later…)
    – Requestors don’t want bad PR or complaint emails
    – Common practice: always pay, block as needed
• Approval rate: easy to use, but value?
    – P. Ipeirotis. Be a Top Mechanical Turk Worker: You Need $5
      and 5 Minutes. Oct. 2010
    – Many requestors don’t ever reject…
• Qualification test
    – Pre-screen workers’ capabilities & effectiveness
    – Example and pros/cons in next slides…
• Geographic restrictions
• Mechanical Turk Masters (June 23, 2011)
    – Recent addition, degree of benefit TBD…
  November 1, 2011         Crowdsourcing for Research and Engineering   121
Tools and Packages for MTurk
• QA infrastructure layers atop MTurk promote
  useful separation-of-concerns from task
      – TurkIt
            • Quik Turkit provides nearly realtime services
      –   Turkit-online (??)
      –   Get Another Label (& qmturk)
      –   Turk Surveyor
      –   cv-web-annotation-toolkit (image labeling)
      –   Soylent
      –   Boto (python library)
            • Turkpipe: submit batches of jobs using the command line.
• More needed…
November 1, 2011           Crowdsourcing for Research and Engineering    122
A qualification test snippet
<Question>
  <QuestionIdentifier>question1</QuestionIdentifier>
  <QuestionContent>
    <Text>Carbon monoxide poisoning is</Text>
  </QuestionContent>
  <AnswerSpecification>
    <SelectionAnswer>
       <StyleSuggestion>radiobutton</StyleSuggestion>
         <Selections>
           <Selection>
              <SelectionIdentifier>1</SelectionIdentifier>
              <Text>A chemical technique</Text>
           </Selection>
           <Selection>
              <SelectionIdentifier>2</SelectionIdentifier>
              <Text>A green energy treatment</Text>
           </Selection>
           <Selection>
               <SelectionIdentifier>3</SelectionIdentifier>
               <Text>A phenomena associated with sports</Text>
           </Selection>
           <Selection>
               <SelectionIdentifier>4</SelectionIdentifier>
               <Text>None of the above</Text>
           </Selection>
         </Selections>
    </SelectionAnswer>
  </AnswerSpecification>
</Question> 2011
  November 1,                    Crowdsourcing for Research and Engineering   123
Qualification tests: pros and cons
• Advantages
      – Great tool for controlling quality
      – Adjust passing grade
• Disadvantages
      –   Extra cost to design and implement the test
      –   May turn off workers, hurt completion time
      –   Refresh the test on a regular basis
      –   Hard to verify subjective tasks like judging relevance
• Try creating task-related questions to get worker
  familiar with task before starting task in earnest
November 1, 2011        Crowdsourcing for Research and Engineering   124
More on quality control & assurance
• HR issues: recruiting, selection, & retention
      – e.g., post/tweet, design a better qualification test,
        bonuses, …
• Collect more redundant judgments…
      – at some point defeats cost savings of
        crowdsourcing
      – 5 workers is often sufficient



November 1, 2011     Crowdsourcing for Research and Engineering   125
Robots and Captchas
• Some reports of robots on MTurk
      – E.g. McCreadie et al. (2011)
      – violation of terms of service
      – Artificial artificial artificial intelligence
• Captchas seem ideal, but…
      – There is abuse of robots using turkers to solve captchas so
        they can access web resources
      – Turker wisdom is therefore to avoid such HITs
• What to do?
      –   Use standard captchas, notify workers
      –   Block robots other ways (e.g. external HITs)
      –   Catch robots through standard QC, response times
      –   Use HIT-specific captchas (Kazai et al., 2011)
November 1, 2011          Crowdsourcing for Research and Engineering   126
Was the task difficult?
• Ask workers to rate difficulty of a search topic
• 50 topics; 5 workers, $0.01 per task




November 1, 2011        Crowdsourcing for Research and Engineering   127
Other quality heuristics
• Justification/feedback as quasi-captcha
      – Successfully proven in past experiments
      – Should be optional
      – Automatically verifying feedback was written by a
        person may be difficult (classic spam detection task)
• Broken URL/incorrect object
      – Leave an outlier in the data set
      – Workers will tell you
      – If somebody answers “excellent” on a graded
        relevance test for a broken URL => probably spammer

November 1, 2011        Crowdsourcing for Research and Engineering   128
Dealing with bad workers
• Pay for “bad” work instead of rejecting it?
   – Pro: preserve reputation, admit if poor design at fault
   – Con: promote fraud, undermine approval rating system
• Use bonus as incentive
   – Pay the minimum $0.01 and $0.01 for bonus
   – Better than rejecting a $0.02 task
• If spammer “caught”, block from future tasks
   – May be easier to always pay, then block as needed

 November 1, 2011         Crowdsourcing for Research and Engineering   129
Worker feedback
• Real feedback received via email after rejection
• Worker XXX
    I did. If you read these articles most of them have
    nothing to do with space programs. I’m not an idiot.

• Worker XXX
    As far as I remember there wasn't an explanation about
    what to do when there is no name in the text. I believe I
    did write a few comments on that, too. So I think you're
    being unfair rejecting my HITs.




November 1, 2011     Crowdsourcing for Research and Engineering   130
Real email exchange with worker after rejection
WORKER: this is not fair , you made me work for 10 cents and i lost my 30 minutes
of time ,power and lot more and gave me 2 rejections at least you may keep it
pending. please show some respect to turkers

REQUESTER: I'm sorry about the rejection. However, in the directions given in the
hit, we have the following instructions: IN ORDER TO GET PAID, you must judge all 5
webpages below *AND* complete a minimum of three HITs.

Unfortunately, because you only completed two hits, we had to reject those hits.
We do this because we need a certain amount of data on which to make decisions
about judgment quality. I'm sorry if this caused any distress. Feel free to contact me
if you have any additional questions or concerns.

WORKER: I understood the problems. At that time my kid was crying and i went to
look after. that's why i responded like that. I was very much worried about a hit
being rejected. The real fact is that i haven't seen that instructions of 5 web page
and started doing as i do the dolores labs hit, then someone called me and i went
to attend that call. sorry for that and thanks for your kind concern.
  November 1, 2011           Crowdsourcing for Research and Engineering         131
Exchange with worker
•   Worker XXX
    Thank you. I will post positive feedback for you at
    Turker Nation.

Me: was this a sarcastic comment?

•   I took a chance by accepting some of your HITs to see if
    you were a trustworthy author. My experience with you
    has been favorable so I will put in a good word for you
    on that website. This will help you get higher quality
    applicants in the future, which will provide higher
    quality work, which might be worth more to you, which
    hopefully means higher HIT amounts in the future.




November 1, 2011        Crowdsourcing for Research and Engineering   132
Build Your Reputation as a Requestor
• Word of mouth effect
      – Workers trust the requester (pay on time, clear
        explanation if there is a rejection)
      – Experiments tend to go faster
      – Announce forthcoming tasks (e.g. tweet)
• Disclose your real identity?



November 1, 2011    Crowdsourcing for Research and Engineering   133
Other practical tips
• Sign up as worker and do some HITs
• “Eat your own dog food”
• Monitor discussion forums
• Address feedback (e.g., poor guidelines,
  payments, passing grade, etc.)
• Everything counts!
      – Overall design only as strong as weakest link


November 1, 2011      Crowdsourcing for Research and Engineering   134
Content quality
• People like to work on things that they like
• TREC ad-hoc vs. INEX
      – TREC experiments took twice to complete
      – INEX (Wikipedia), TREC (LA Times, FBIS)
• Topics
      – INEX: Olympic games, movies, salad recipes, etc.
      – TREC: cosmic events, Schengen agreement, etc.
• Content and judgments according to modern times
      – Airport security docs are pre 9/11
      – Antarctic exploration (global warming )

November 1, 2011       Crowdsourcing for Research and Engineering   135
Content quality - II
• Document length
• Randomize content
• Avoid worker fatigue
      – Judging 100 documents on the same subject can
        be tiring, leading to decreasing quality




November 1, 2011      Crowdsourcing for Research and Engineering   136
Presentation
• People scan documents for relevance cues
• Document design
• Highlighting no more than 10%




November 1, 2011   Crowdsourcing for Research and Engineering   137
Presentation - II




November 1, 2011    Crowdsourcing for Research and Engineering   138
Relevance justification
• Why settle for a label?
• Let workers justify answers
      – cf. Zaidan et al. (2007) “annotator rationales”
• INEX
      – 22% of assignments with comments
• Must be optional
• Let’s see how people justify

November 1, 2011        Crowdsourcing for Research and Engineering   139
“Relevant” answers
 [Salad Recipes]
 Doesn't mention the word 'salad', but the recipe is one that could be considered a
    salad, or a salad topping, or a sandwich spread.
 Egg salad recipe
 Egg salad recipe is discussed.
 History of salad cream is discussed.
 Includes salad recipe
 It has information about salad recipes.
 Potato Salad
 Potato salad recipes are listed.
 Recipe for a salad dressing.
 Salad Recipes are discussed.
 Salad cream is discussed.
 Salad info and recipe
 The article contains a salad recipe.
 The article discusses methods of making potato salad.
 The recipe is for a dressing for a salad, so the information is somewhat narrow for
    the topic but is still potentially relevant for a researcher.
 This article describes a specific salad. Although it does not list a specific recipe,
    it does contain information relevant to the search topic.
 gives a recipe for tuna salad
 relevant for tuna salad recipes
 relevant to salad recipes
 this is on-topic for salad recipes




November 1, 2011            Crowdsourcing for Research and Engineering             140
“Not relevant” answers
[Salad Recipes]
About gaming not salad recipes.
Article is about Norway.
Article is about Region Codes.
Article is about forests.
Article is about geography.
Document is about forest and trees.
Has nothing to do with salad or recipes.
Not a salad recipe
Not about recipes
Not about salad recipes
There is no recipe, just a comment on how salads fit into meal formats.
There is nothing mentioned about salads.
While dressings should be mentioned with salads, this is an article on one specific
    type of dressing, no recipe for salads.
article about a swiss tv show
completely off-topic for salad recipes
not a salad recipe
not about salad recipes
totally off base



November 1, 2011            Crowdsourcing for Research and Engineering                141
November 1, 2011   Crowdsourcing for Research and Engineering   142
Feedback length

• Workers will justify answers
• Has to be optional for good
  feedback
• In E51, mandatory comments
  – Length dropped
  – “Relevant” or “Not Relevant



  November 1, 2011    Crowdsourcing for Research and Engineering   143
Other design principles
• Text alignment
• Legibility
• Reading level: complexity of words and sentences
• Attractiveness (worker’s attention & enjoyment)
• Multi-cultural / multi-lingual
• Who is the audience (e.g. target worker community)
      – Special needs communities (e.g. simple color blindness)
• Parsimony
• Cognitive load: mental rigor needed to perform task
• Exposure effect
November 1, 2011        Crowdsourcing for Research and Engineering   144
Platform alternatives
• Why MTurk
      – Amazon brand, lots of research papers
      – Speed, price, diversity, payments
• Why not
      – Crowdsourcing != Mturk
      – Spam, no analytics, must build tools for worker & task quality
• How to build your own crowdsourcing platform
      –   Back-end
      –   Template language for creating experiments
      –   Scheduler
      –   Payments?


November 1, 2011        Crowdsourcing for Research and Engineering   145
The human side
• As a worker
    –   I hate when instructions are not clear
    –   I’m not a spammer – I just don’t get what you want
    –   Boring task
    –   A good pay is ideal but not the only condition for engagement
• As a requester
    – Attrition
    – Balancing act: a task that would produce the right results and
      is appealing to workers
    – I want your honest answer for the task
    – I want qualified workers; system should do some of that for me
• Managing crowds and tasks is a daily activity
    – more difficult than managing computers
 November 1, 2011       Crowdsourcing for Research and Engineering   146
Things that work
•   Qualification tests
•   Honey-pots
•   Good content and good presentation
•   Economy of attention
•   Things to improve
      – Manage workers in different levels of expertise
        including spammers and potential cases.
      – Mix different pools of workers based on different
        profile and expertise levels.

November 1, 2011     Crowdsourcing for Research and Engineering   147
Things that need work
• UX and guidelines
      – Help the worker
      – Cost of interaction
•   Scheduling and refresh rate
•   Exposure effect
•   Sometimes we just don’t agree
•   How crowdsourcable is your task

November 1, 2011       Crowdsourcing for Research and Engineering   148
V.
                   FUTURE TRENDS: FROM LABELING TO
                              HUMAN COMPUTATION
November 1, 2011         Crowdsourcing for Research and Engineering   149
The Turing Test (Alan Turing, 1950)




November 1, 2011   Crowdsourcing for Research and Engineering   150
November 1, 2011   Crowdsourcing for Research and Engineering   151
The Turing Test (Alan Turing, 1950)




November 1, 2011   Crowdsourcing for Research and Engineering   152
What is a Computer?




November 1, 2011      Crowdsourcing for Research and Engineering   153
• What was old is new
• “Crowdsourcing: A New
  Branch of Computer Science”
  (March 29, 2011)

• See also: M. Croarken
  (2003), Tabulating the
  heavens: computing the
  Nautical Almanac in
  18th-century England
                                                 Princeton University Press, 2005
  November 1, 2011   Crowdsourcing for Research and Engineering           154
Davis et al. (2010) The HPU.




                                            HPU




November 1, 2011   Crowdsourcing for Research and Engineering   155
Remembering the Human in HPU
• Not just turning a mechanical crank




November 1, 2011   Crowdsourcing for Research and Engineering   156
Human Computation
Rebirth of people as ‘computists’; people do tasks computers cannot (do well)
Stage 1: Detecting robots
     – CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart
     – No useful work produced; people just answer questions with known answers

Stage 2: Labeling data (at scale)
     – E.g. ESP game, typical use of MTurk
     – Game changer for AI: starving for data

Stage 3: General “human computation” (HPU)
     – people do arbitrarily sophisticated tasks (i.e. compute arbitrary functions)
     – HPU as core component in system architecture, many “HPC” invocations
     – blend HPU with automation for a new class of hybrid applications
     – New tradeoffs possible in latency/cost vs. functionality/accuracy

  November 1, 2011             Crowdsourcing for Research and Engineering               157
Mobile Phone App: “Amazon Remembers”




November 1, 2011   Crowdsourcing for Research and Engineering   158
Soylent: A Word Processor with a Crowd Inside

 • Bernstein et al., UIST 2010




 November 1, 2011   Crowdsourcing for Research and Engineering   159
CrowdSearch and mCrowd
• T. Yan, MobiSys 2010




November 1, 2011   Crowdsourcing for Research and Engineering   160
Translation by monolingual speakers
• C. Hu, CHI 2009




November 1, 2011    Crowdsourcing for Research and Engineering   161
Wisdom of Crowds (WoC)
Requires
• Diversity
• Independence
• Decentralization
• Aggregation

Input: large, diverse sample
     (to increase likelihood of overall pool quality)
Output: consensus or selection (aggregation)
November 1, 2011    Crowdsourcing for Research and Engineering   162
WoC vs. Ensemble Learning
• Combine multiple models to improve performance
  over any constituent model
   – Can use many weak learners to make a strong one
   – Compensate for poor models with extra computation
• Works better with diverse, independent learners
• cf. NIPS 2010-2011 Workshops
   – Computational Social Science & the Wisdom of Crowds
• More investigation needed of traditional feature-
  based machine learning & ensemble methods for
  consensus labeling with crowdsourcing
  November 1, 2011   Crowdsourcing for Research and Engineering   163
Unreasonable Effectiveness of Data
• Massive free Web data
  changed how we train
  learning systems
  – Banko and Brill (2001).
    Human Language Tech.
  – Halevy et al. (2009). IEEE
    Intelligent Systems.

 • How might access to cheap & plentiful labeled
   data change the balance again?
  November 1, 2011   Crowdsourcing for Research and Engineering   164
Active Learning
• Minimize number of labels to achieve goal
  accuracy rate of classifier
      – Select examples to label to maximize learning
• Vijayanarasimhan and Grauman (CVPR 2011)
      – Simple margin criteria: select maximally uncertain
        examples to label next

      – Finding which examples are uncertain can be
        computationally intensive (workers have to wait)
      – Use locality-sensitive hashing to find uncertain
        examples in sub-linear time
November 1, 2011     Crowdsourcing for Research and Engineering   165
Active Learning (2)
• V&G report each learning iteration ~ 75 min
      – 15 minutes for model training & selection
      – 60 minutes waiting for crowd labels
• Leaving workers idle may lose them, slowing
  uptake and completion times
• Keep workers occupied
      – Mason and Suri (2010): paid waiting room
      – Laws et al. (EMNLP 2011): parallelize labeling and
        example selection via producer-consumer model
            • Workers consume examples, produce labels
            • Model consumes label, produces examples
November 1, 2011        Crowdsourcing for Research and Engineering   166
MapReduce with human computation
• Commonalities
      – Large task divided into smaller sub-problems
      – Work distributed among worker nodes (workers)
      – Collect all answers and combine them
      – Varying performance of heterogeneous
        CPUs/HPUs
• Variations
      – Human response latency / size of “cluster”
      – Some tasks are not suitable

November 1, 2011    Crowdsourcing for Research and Engineering   167
A Few Questions
• How should we balance automation vs.
  human computation? Which does what?

• Who’s the right person for the job?

• How do we handle complex tasks? Can we
  decompose them into smaller tasks? How?


November 1, 2011    Crowdsourcing for Research and Engineering   168
Research problems – operational
• Methodology
      – Budget, people, document, queries, presentation,
        incentives, etc.
      – Scheduling
      – Quality
• What’s the best “mix” of HC for a task?
• What are the tasks suitable for HC?
• Can I crowdsource my task?
      – Eickhoff and de Vries, WSDM 2011 CSDM Workshop

November 1, 2011    Crowdsourcing for Research and Engineering   169
More problems
• Human factors vs. outcomes
• Editors vs. workers
• Pricing tasks
• Predicting worker quality from observable
  properties (e.g. task completion time)
• HIT / Requestor ranking or recommendation
• Expert search : who are the right workers given
  task nature and constraints
• Ensemble methods for Crowd Wisdom consensus
November 1, 2011    Crowdsourcing for Research and Engineering   170
Problems: crowds, clouds and algorithms
• Infrastructure
     – Current platforms are very rudimentary
     – No tools for data analysis
• Dealing with uncertainty (propagate rather than mask)
     –   Temporal and labeling uncertainty
     –   Learning algorithms
     –   Search evaluation
     –   Active learning (which example is likely to be labeled correctly)
• Combining CPU + HPU
     – Human Remote Call?
     – Procedural vs. declarative?
     – Integration points with enterprise systems
 November 1, 2011          Crowdsourcing for Research and Engineering        171
CrowdForge: MapReduce for
     Automation + Human Computation




       Kittur et al., CHI 2011


November 1, 2011         Crowdsourcing for Research and Engineering   172
Conclusions
•   Crowdsourcing works and is here to stay
•   Fast turnaround, easy to experiment, cheap
•   Still have to design the experiments carefully!
•   Usability considerations
•   Worker quality
•   User feedback extremely useful



November 1, 2011   Crowdsourcing for Research and Engineering   173
Conclusions - II
• Lots of opportunities to improve current platforms
• Integration with current systems
• While MTurk first to-market in micro-task vertical,
  many other vendors are emerging with different
  affordances or value-added features

• Many open research problems …



November 1, 2011    Crowdsourcing for Research and Engineering   174
Conclusions – III
• Important to know your limitations and be
  ready to collaborate
• Lots of different skills and expertise required
      – Social/behavioral science
      – Human factors
      – Algorithms
      – Economics
      – Distributed systems
      – Statistics

November 1, 2011    Crowdsourcing for Research and Engineering   175
VIII
                                 REFERENCES & RESOURCES
November 1, 2011   Crowdsourcing for Research and Engineering   176
Books
• Omar Alonso, Gabriella Kazai, and Stefano
  Mizzaro. (2012). Crowdsourcing for Search
  Engine Evaluation: Why and How.

• Law and von Ahn (2011).
  Human Computation




  November 1, 2011   Crowdsourcing for Research and Engineering   177
More Books
                   July 2010, kindle-only: “This book introduces you to the
                   top crowdsourcing sites and outlines step by step with
                   photos the exact process to get started as a requester on
                   Amazon Mechanical Turk.“




November 1, 2011             Crowdsourcing for Research and Engineering   178
2011 Tutorials and Keynotes
•   By Omar Alonso and/or Matthew Lease
     –   CLEF: Crowdsourcing for Information Retrieval Experimentation and Evaluation (Sep. 20, Omar only)
     –   CrowdConf (Nov. 1, this is it!)
     –   IJCNLP: Crowd Computing: Opportunities and Challenges (Nov. 10, Matt only)
     –   WSDM: Crowdsourcing 101: Putting the WSDM of Crowds to Work for You (Feb. 9)
     –   SIGIR: Crowdsourcing for Information Retrieval: Principles, Methods, and Applications (July 24)

•   AAAI: Human Computation: Core Research Questions and State of the Art
     –   Edith Law and Luis von Ahn, August 7
•   ASIS&T: How to Identify Ducks In Flight: A Crowdsourcing Approach to Biodiversity Research and
    Conservation
     –   Steve Kelling, October 10, ebird
•   EC: Conducting Behavioral Research Using Amazon's Mechanical Turk
     –   Winter Mason and Siddharth Suri, June 5
•   HCIC: Quality Crowdsourcing for Human Computer Interaction Research
     –   Ed Chi, June 14-18, about HCIC)
     –   Also see his: Crowdsourcing for HCI Research with Amazon Mechanical Turk
•   Multimedia: Frontiers in Multimedia Search
     –   Alan Hanjalic and Martha Larson, Nov 28
•   VLDB: Crowdsourcing Applications and Platforms
     –   Anhai Doan, Michael Franklin, Donald Kossmann, and Tim Kraska)
•   WWW: Managing Crowdsourced Human Computation
     –   Panos Ipeirotis and Praveen Paritosh
    November 1, 2011                    Crowdsourcing for Research and Engineering                           179
2011 Workshops & Conferences
•   AAAI-HCOMP: 3rd Human Computation Workshop (Aug. 8)
•   ACIS: Crowdsourcing, Value Co-Creation, & Digital Economy Innovation (Nov. 30 – Dec. 2)
•   Crowdsourcing Technologies for Language and Cognition Studies (July 27)
•   CHI-CHC: Crowdsourcing and Human Computation (May 8)
•   CIKM: BooksOnline (Oct. 24, “crowdsourcing … online books”)
•   CrowdConf 2011 -- 2nd Conf. on the Future of Distributed Work (Nov. 1-2)
•   Crowdsourcing: Improving … Scientific Data Through Social Networking (June 13)
•   EC: Workshop on Social Computing and User Generated Content (June 5)
•   ICWE: 2nd International Workshop on Enterprise Crowdsourcing (June 20)
•   Interspeech: Crowdsourcing for speech processing (August)
•   NIPS: Second Workshop on Computational Social Science and the Wisdom of Crowds (Dec. TBD)
•   SIGIR-CIR: Workshop on Crowdsourcing for Information Retrieval (July 28)
•   TREC-Crowd: Year 1 of TREC Crowdsourcing Track (Nov. 16-18)
•   UbiComp: 2nd Workshop on Ubiquitous Crowdsourcing (Sep. 18)
•   WSDM-CSDM: Crowdsourcing for Search and Data Mining (Feb. 9)
    November 1, 2011             Crowdsourcing for Research and Engineering                   180
Things to Come in 2012
• AAAI Symposium: Wisdom of the Crowd
   – March 26-28
• Year 2 of TREC Crowdsourcing Track
• Human Computation workshop/conference (TBD)
• Journal Special Issues
   – Springer’s Information Retrieval:
     Crowdsourcing for Information Retrieval
   – Hindawi’s Advances in Multimedia Journal:
     Multimedia Semantics Analysis via Crowdsourcing Geocontext
   – IEEE Internet Computing: Crowdsourcing (Sept./Oct. 2012)
   – IEEE Transactions on Multimedia:
     Crowdsourcing in Multimedia (proposal in review)
  November 1, 2011        Crowdsourcing for Research and Engineering   181
Thank You!
Crowdsourcing news & information:
  ir.ischool.utexas.edu/crowd

For further questions, contact us at:
  omar.alonso@microsoft.com
  ml@ischool.utexas.edu




Cartoons by Mateo Burtch (buta@sonic.net)
November 1, 2011   Crowdsourcing for Research and Engineering   182
Recent Overview Papers
• Alex Quinn and Ben Bederson. Human
  Computation: A Survey and Taxonomy of a
  Growing Field. In Proceedings of CHI 2011.
• Man-Ching Yuen, Irwin King, and Kwong-Sak
  Leung. A Survey of Crowdsourcing Systems.
  SocialCom 2011.
• A. Doan, R. Ramakrishnan, A. Halevy.
  Crowdsourcing Systems on the World-Wide
  Web. Communications of the ACM, 2011.

November 1, 2011        Crowdsourcing for Research and Engineering   183
Resources
A Few Blogs
 Behind Enemy Lines (P.G. Ipeirotis, NYU)
 Deneme: a Mechanical Turk experiments blog (Gret Little, MIT)
 CrowdFlower Blog
 http://experimentalturk.wordpress.com
 Jeff Howe

A Few Sites
 The Crowdsortium
 Crowdsourcing.org
 CrowdsourceBase (for workers)
 Daily Crowdsource

MTurk Forums and Resources
 Turker Nation: http://turkers.proboards.com
 http://www.turkalert.com (and its blog)
 Turkopticon: report/avoid shady requestors
 Amazon Forum for MTurk
November 1, 2011             Crowdsourcing for Research and Engineering   184
Bibliography
   J. Barr and L. Cabrera. “AI gets a Brain”, ACM Queue, May 2006.
   Bernstein, M. et al. Soylent: A Word Processor with a Crowd Inside. UIST 2010. Best Student Paper award.
   Bederson, B.B., Hu, C., & Resnik, P. Translation by Iteractive Collaboration between Monolingual Users, Proceedings of Graphics
    Interface (GI 2010), 39-46.
   N. Bradburn, S. Sudman, and B. Wansink. Asking Questions: The Definitive Guide to Questionnaire Design, Jossey-Bass, 2004.
   C. Callison-Burch. “Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk”, EMNLP 2009.
   P. Dai, Mausam, and D. Weld. “Decision-Theoretic of Crowd-Sourced Workflows”, AAAI, 2010.
   J. Davis et al. “The HPU”, IEEE Computer Vision and Pattern Recognition Workshop on Advancing Computer Vision with Human
    in the Loop (ACVHL), June 2010.
   M. Gashler, C. Giraud-Carrier, T. Martinez. Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous, ICMLA 2008.
   D. A. Grier. When Computers Were Human. Princeton University Press, 2005. ISBN 0691091579
   JS. Hacker and L. von Ahn. “Matchin: Eliciting User Preferences with an Online Game”, CHI 2009.
   J. Heer, M. Bobstock. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design”, CHI 2010.
   P. Heymann and H. Garcia-Molina. “Human Processing”, Technical Report, Stanford Info Lab, 2010.
   J. Howe. “Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business”. Crown Business, New York, 2008.
   P. Hsueh, P. Melville, V. Sindhwami. “Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria”. NAACL HLT
    Workshop on Active Learning and NLP, 2009.
   B. Huberman, D. Romero, and F. Wu. “Crowdsourcing, attention and productivity”. Journal of Information Science, 2009.
   P.G. Ipeirotis. The New Demographics of Mechanical Turk. March 9, 2010. PDF and Spreadsheet.
   P.G. Ipeirotis, R. Chandrasekar and P. Bennett. Report on the human computation workshop. SIGKDD Explorations v11 no 2 pp. 80-83, 2010.
   P.G. Ipeirotis. Analyzing the Amazon Mechanical Turk Marketplace. CeDER-10-04 (Sept. 11, 2010)

     November 1, 2011                         Crowdsourcing for Research and Engineering                                       185
Bibliography (2)
   A. Kittur, E. Chi, and B. Suh. “Crowdsourcing user studies with Mechanical Turk”, SIGCHI 2008.
   Aniket Kittur, Boris Smus, Robert E. Kraut. CrowdForge: Crowdsourcing Complex Work. CHI 2011
   Adriana Kovashka and Matthew Lease. “Human and Machine Detection of … Similarity in Art”. CrowdConf 2010.
   K. Krippendorff. "Content Analysis", Sage Publications, 2003
   G. Little, L. Chilton, M. Goldman, and R. Miller. “TurKit: Tools for Iterative Tasks on Mechanical Turk”, HCOMP 2009.
   T. Malone, R. Laubacher, and C. Dellarocas. Harnessing Crowds: Mapping the Genome of Collective Intelligence.
    2009.
   W. Mason and D. Watts. “Financial Incentives and the ’Performance of Crowds’”, HCOMP Workshop at KDD 2009.
   J. Nielsen. “Usability Engineering”, Morgan-Kaufman, 1994.
   A. Quinn and B. Bederson. “A Taxonomy of Distributed Human Computation”, Technical Report HCIL-2009-23, 2009
   J. Ross, L. Irani, M. Six Silberman, A. Zaldivar, and B. Tomlinson. “Who are the Crowdworkers?: Shifting
    Demographics in Amazon Mechanical Turk”. CHI 2010.
   F. Scheuren. “What is a Survey” (http://www.whatisasurvey.info) 2004.
   R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. “Cheap and Fast But is it Good? Evaluating Non-Expert Annotations
    for Natural Language Tasks”. EMNLP-2008.
   V. Sheng, F. Provost, P. Ipeirotis. “Get Another Label? Improving Data Quality … Using Multiple, Noisy Labelers”
    KDD 2008.
   S. Weber. “The Success of Open Source”, Harvard University Press, 2004.
   L. von Ahn. Games with a purpose. Computer, 39 (6), 92–94, 2006.
   L. von Ahn and L. Dabbish. “Designing Games with a purpose”. CACM, Vol. 51, No. 8, 2008.

November 1, 2011                      Crowdsourcing for Research and Engineering                                    186
Bibliography (3)
   Shuo Chen et al. What if the Irresponsible Teachers Are Dominating? A Method of Training on Samples and
    Clustering on Teachers. AAAI 2010.
   Paul Heymann, Hector Garcia-Molina: Turkalytics: analytics for human computation. WWW 2011.
   Florian Laws, Christian Scheible and Hinrich Schütze. Active Learning with Amazon Mechanical Turk.
    EMNLP 2011.
   C.Y. Lin. Rouge: A package for automatic evaluation of summaries. Proceedings of the workshop on text
    summarization branches out (WAS), 2004.
   C. Marshall and F. Shipman “The Ownership and Reuse of Visual Media”, JCDL, 2011.
   Hohyon Ryu and Matthew Lease. Crowdworker Filtering with Support Vector Machine. ASIS&T 2011.
   Wei Tang and Matthew Lease. Semi-Supervised Consensus Labeling for Crowdsourcing. ACM SIGIR
    Workshop on Crowdsourcing for Information Retrieval (CIR), 2011.
   S. Vijayanarasimhan and K. Grauman. Large-Scale Live Active Learning: Training Object Detectors with
    Crawled Data and Crowds. CVPR 2011.
   Stephen Wolfson and Matthew Lease. Look Before You Leap: Legal Pitfalls of Crowdsourcing. ASIS&T 2011.




November 1, 2011                    Crowdsourcing for Research and Engineering                                187
Crowdsourcing in IR: 2008-2010
   2008
          O. Alonso, D. Rose, and B. Stewart. “Crowdsourcing for relevance evaluation”, SIGIR Forum, Vol. 42, No. 2.

   2009
          O. Alonso and S. Mizzaro. “Can we get rid of TREC Assessors? Using Mechanical Turk for … Assessment”. SIGIR Workshop on the Future of IR Evaluation.
          P.N. Bennett, D.M. Chickering, A. Mityagin. Learning Consensus Opinion: Mining Data from a Labeling Game. WWW.
          G. Kazai, N. Milic-Frayling, and J. Costello. “Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments”, SIGIR.
          G. Kazai and N. Milic-Frayling. “… Quality of Relevance Assessments Collected through Crowdsourcing”. SIGIR Workshop on the Future of IR Evaluation.
          Law et al. “SearchWar”. HCOMP.
          H. Ma, R. Chandrasekar, C. Quirk, and A. Gupta. “Improving Search Engines Using Human Computation Games”, CIKM 2009.

   2010
          SIGIR Workshop on Crowdsourcing for Search Evaluation.
          O. Alonso, R. Schenkel, and M. Theobald. “Crowdsourcing Assessments for XML Ranked Retrieval”, ECIR.
          K. Berberich, S. Bedathur, O. Alonso, G. Weikum “A Language Modeling Approach for Temporal Information Needs”, ECIR.
          C. Grady and M. Lease. “Crowdsourcing Document Relevance Assessment with Mechanical Turk”. NAACL HLT Workshop on … Amazon's Mechanical Turk.
          Grace Hui Yang, Anton Mityagin, Krysta M. Svore, and Sergey Markov . “Collecting High Quality Overlapping Labels at Low Cost”. SIGIR.
          G. Kazai. “An Exploration of the Influence that Task Parameters Have on the Performance of Crowds”. CrowdConf.
          G. Kazai. “… Crowdsourcing in Building an Evaluation Platform for Searching Collections of Digitized Books”., Workshop on Very Large Digital Libraries (VLDL)
          Stephanie Nowak and Stefan Ruger. How Reliable are Annotations via Crowdsourcing? MIR.
          Jean-François Paiement, Dr. James G. Shanahan, and Remi Zajac. “Crowdsourcing Local Search Relevance”. CrowdConf.
          Maria Stone and Omar Alonso. “A Comparison of On-Demand Workforce with Trained Judges for Web Search Relevance Evaluation”. CrowdConf.
          T. Yan, V. Kumar, and D. Ganesan. CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones. MobiSys pp. 77--90, 2010.




     November 1, 2011                                   Crowdsourcing for Research and Engineering                                                             188
Crowdsourcing in IR: 2011
   WSDM Workshop on Crowdsourcing for Search and Data Mining.
   SIGIR Workshop on Crowdsourcing for Information Retrieval.


   O. Alonso and R. Baeza-Yates. “Design and Implementation of Relevance Assessments using Crowdsourcing, ECIR 2011.
   Roi Blanco, Harry Halpin, Daniel Herzig, Peter Mika, Jeffrey Pound, Henry Thompson, Thanh D. Tran. “Repeatable and
    Reliable Search System Evaluation using Crowd-Sourcing”. SIGIR 2011.
   Yen-Ta Huang, An-Jung Cheng, Liang-Chi Hsieh, Winston H. Hsu, Kuo-Wei Chang. “Region-Based Landmark Discovery by
    Crowdsourcing Geo-Referenced Photos.” SIGIR 2011.
   Hyun Joon Jung, Matthew Lease . “Improving Consensus Accuracy via Z-score and Weighted Voting”. HCOMP 2011.
   G. Kasneci, J. Van Gael, D. Stern, and T. Graepel, CoBayes: Bayesian Knowledge Corroboration with Assessors of
    Unknown Areas of Expertise, WSDM 2011.
   Gabriella Kazai,. “In Search of Quality in Crowdsourcing for Search Engine Evaluation”, ECIR 2011.
   Gabriella Kazai, Jaap Kamps, Marijn Koolen, Natasa Milic-Frayling. “Crowdsourcing for Book Search Evaluation: Impact of Quality
    on Comparative System Ranking.” SIGIR 2011.
   Abhimanu Kumar, Matthew Lease . “Learning to Rank From a Noisy Crowd”. SIGIR 2011.
   Edith Law, Paul N. Bennett, and Eric Horvitz. “The Effects of Choice in Routing Relevance Judgments”. SIGIR 2011.




     November 1, 2011                       Crowdsourcing for Research and Engineering                                  189

More Related Content

What's hot

The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Alexandru Iosup
 

What's hot (7)

The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.Cloud Programming Models: eScience, Big Data, etc.
Cloud Programming Models: eScience, Big Data, etc.
 

Viewers also liked

Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMatthew Lease
 
Discovering and Navigating Memes in Social Media
Discovering and Navigating Memes in Social MediaDiscovering and Navigating Memes in Social Media
Discovering and Navigating Memes in Social MediaMatthew Lease
 
The future of work, a whitepaper
The future of work, a whitepaperThe future of work, a whitepaper
The future of work, a whitepaperPatrick Savalle
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Matthew Lease
 
An explorative approach for Crowdsourcing tasks design
An explorative approach for Crowdsourcing tasks design �An explorative approach for Crowdsourcing tasks design �
An explorative approach for Crowdsourcing tasks design Andrea Mauri
 
Social Network Drives Business Model
Social Network Drives Business ModelSocial Network Drives Business Model
Social Network Drives Business Modelmaglione
 
Model-driven Development of Social Network-enabled Applications
Model-driven Development of Social Network-enabled ApplicationsModel-driven Development of Social Network-enabled Applications
Model-driven Development of Social Network-enabled ApplicationsMarco Brambilla
 
Crowdsourcing: A Business Model Game Changer
Crowdsourcing: A Business Model Game ChangerCrowdsourcing: A Business Model Game Changer
Crowdsourcing: A Business Model Game ChangerCrowdsourcing Week
 
Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)Andry Alamsyah
 
Grammar and clarity
Grammar and clarityGrammar and clarity
Grammar and claritypdbrans
 
PR powerpoint
PR powerpointPR powerpoint
PR powerpointabrophy
 
07 revision & editing
07 revision & editing07 revision & editing
07 revision & editinglizabethwalsh
 
Top Secrets of Effective Proofreading
Top Secrets of Effective ProofreadingTop Secrets of Effective Proofreading
Top Secrets of Effective ProofreadingFrancesca Airaghi
 
Communication Workshop: Oral and written communication
Communication Workshop: Oral and written communicationCommunication Workshop: Oral and written communication
Communication Workshop: Oral and written communicationNicola Hodge
 

Viewers also liked (20)

Mechanical Turk is Not Anonymous
Mechanical Turk is Not AnonymousMechanical Turk is Not Anonymous
Mechanical Turk is Not Anonymous
 
Discovering and Navigating Memes in Social Media
Discovering and Navigating Memes in Social MediaDiscovering and Navigating Memes in Social Media
Discovering and Navigating Memes in Social Media
 
The future of work, a whitepaper
The future of work, a whitepaperThe future of work, a whitepaper
The future of work, a whitepaper
 
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
Crowd Computing: Opportunities & Challenges (IJCNLP 2011 Keynote)
 
An explorative approach for Crowdsourcing tasks design
An explorative approach for Crowdsourcing tasks design �An explorative approach for Crowdsourcing tasks design �
An explorative approach for Crowdsourcing tasks design
 
CloudSource-white paper
CloudSource-white paperCloudSource-white paper
CloudSource-white paper
 
Social Network Drives Business Model
Social Network Drives Business ModelSocial Network Drives Business Model
Social Network Drives Business Model
 
LinkedIn Social Spotlight
LinkedIn Social SpotlightLinkedIn Social Spotlight
LinkedIn Social Spotlight
 
Model-driven Development of Social Network-enabled Applications
Model-driven Development of Social Network-enabled ApplicationsModel-driven Development of Social Network-enabled Applications
Model-driven Development of Social Network-enabled Applications
 
Crowdsourcing: A Business Model Game Changer
Crowdsourcing: A Business Model Game ChangerCrowdsourcing: A Business Model Game Changer
Crowdsourcing: A Business Model Game Changer
 
Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)Pilkada DKI 2017 Social Network Model (Early Report)
Pilkada DKI 2017 Social Network Model (Early Report)
 
Grammar and clarity
Grammar and clarityGrammar and clarity
Grammar and clarity
 
PR powerpoint
PR powerpointPR powerpoint
PR powerpoint
 
IntroToPR
IntroToPRIntroToPR
IntroToPR
 
Trivium PR
Trivium PRTrivium PR
Trivium PR
 
Revision
RevisionRevision
Revision
 
Proofreading
ProofreadingProofreading
Proofreading
 
07 revision & editing
07 revision & editing07 revision & editing
07 revision & editing
 
Top Secrets of Effective Proofreading
Top Secrets of Effective ProofreadingTop Secrets of Effective Proofreading
Top Secrets of Effective Proofreading
 
Communication Workshop: Oral and written communication
Communication Workshop: Oral and written communicationCommunication Workshop: Oral and written communication
Communication Workshop: Oral and written communication
 

Similar to Crowdsourcing For Research and Engineering (Tutorial given at CrowdConf 2011)

Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsMatthew Lease
 
Interpretation, measurement and mixed methods
Interpretation, measurement and mixed methodsInterpretation, measurement and mixed methods
Interpretation, measurement and mixed methodsTuukka Ylä-Anttila
 
Digital Humanities in Practice, DHC 2012
Digital Humanities in Practice, DHC 2012Digital Humanities in Practice, DHC 2012
Digital Humanities in Practice, DHC 2012Monica Bulger
 
Linas Eriksonas, On startups and subcultures
Linas Eriksonas, On startups and subculturesLinas Eriksonas, On startups and subcultures
Linas Eriksonas, On startups and subculturesLinas Eriksonas
 
Flux ethnography and design in a shifting landscape
Flux  ethnography and design in a shifting landscapeFlux  ethnography and design in a shifting landscape
Flux ethnography and design in a shifting landscapeMerlien Institute
 
Designing Useful and Usable Augmented Reality Experiences
Designing Useful and Usable Augmented Reality Experiences Designing Useful and Usable Augmented Reality Experiences
Designing Useful and Usable Augmented Reality Experiences Yan Xu
 
Experimental Modes of Civic Engagement in Civic Tech: Meeting people where th...
Experimental Modes of Civic Engagement in Civic Tech: Meeting people where th...Experimental Modes of Civic Engagement in Civic Tech: Meeting people where th...
Experimental Modes of Civic Engagement in Civic Tech: Meeting people where th...Smart Chicago Collaborative
 
Crowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library designCrowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library designRose Holley
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012Anna De Liddo
 
I would DiYSE for it! A manifesto for do-it-yourself internet-of-things creation
I would DiYSE for it! A manifesto for do-it-yourself internet-of-things creationI would DiYSE for it! A manifesto for do-it-yourself internet-of-things creation
I would DiYSE for it! A manifesto for do-it-yourself internet-of-things creationDries De Roeck
 
Social innovation and collaborative design
Social innovation and collaborative designSocial innovation and collaborative design
Social innovation and collaborative designMaksym Klyuchar
 
Designing Interactions / Experiences: Lecture #02
Designing Interactions / Experiences: Lecture #02Designing Interactions / Experiences: Lecture #02
Designing Interactions / Experiences: Lecture #02Itamar Medeiros
 
Accessibility as Innovation - giving your potential users the chance to inspi...
Accessibility as Innovation - giving your potential users the chance to inspi...Accessibility as Innovation - giving your potential users the chance to inspi...
Accessibility as Innovation - giving your potential users the chance to inspi...Jonathan Hassell
 
G325 final revision exam tips 2017 - 1a, 1b, Online Age
G325 final revision exam tips 2017 - 1a, 1b, Online AgeG325 final revision exam tips 2017 - 1a, 1b, Online Age
G325 final revision exam tips 2017 - 1a, 1b, Online Agealevelmedia
 
Pnc2010 knowledge-talk
Pnc2010 knowledge-talkPnc2010 knowledge-talk
Pnc2010 knowledge-talkShih-Chieh Li
 
Cimeon Ellerton and Alison Whitaker, The Audience Agency: The Reverential Gap
Cimeon Ellerton and Alison Whitaker, The Audience Agency: The Reverential GapCimeon Ellerton and Alison Whitaker, The Audience Agency: The Reverential Gap
Cimeon Ellerton and Alison Whitaker, The Audience Agency: The Reverential GapBethBate
 
The Future of Social Analytics - Defrag 2010
The Future of Social Analytics - Defrag 2010The Future of Social Analytics - Defrag 2010
The Future of Social Analytics - Defrag 2010Dion Hinchcliffe
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesChantal van Son
 

Similar to Crowdsourcing For Research and Engineering (Tutorial given at CrowdConf 2011) (20)

Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
 
DREaM Event 2: Andy McKinlay
DREaM Event 2: Andy McKinlayDREaM Event 2: Andy McKinlay
DREaM Event 2: Andy McKinlay
 
Interpretation, measurement and mixed methods
Interpretation, measurement and mixed methodsInterpretation, measurement and mixed methods
Interpretation, measurement and mixed methods
 
Digital Humanities in Practice, DHC 2012
Digital Humanities in Practice, DHC 2012Digital Humanities in Practice, DHC 2012
Digital Humanities in Practice, DHC 2012
 
Linas Eriksonas, On startups and subcultures
Linas Eriksonas, On startups and subculturesLinas Eriksonas, On startups and subcultures
Linas Eriksonas, On startups and subcultures
 
Flux ethnography and design in a shifting landscape
Flux  ethnography and design in a shifting landscapeFlux  ethnography and design in a shifting landscape
Flux ethnography and design in a shifting landscape
 
Designing Useful and Usable Augmented Reality Experiences
Designing Useful and Usable Augmented Reality Experiences Designing Useful and Usable Augmented Reality Experiences
Designing Useful and Usable Augmented Reality Experiences
 
Experimental Modes of Civic Engagement in Civic Tech: Meeting people where th...
Experimental Modes of Civic Engagement in Civic Tech: Meeting people where th...Experimental Modes of Civic Engagement in Civic Tech: Meeting people where th...
Experimental Modes of Civic Engagement in Civic Tech: Meeting people where th...
 
Crowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library designCrowdsourcing based curation and user engagement in digital library design
Crowdsourcing based curation and user engagement in digital library design
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012De liddo & Buckingham Shum jurix2012
De liddo & Buckingham Shum jurix2012
 
I would DiYSE for it! A manifesto for do-it-yourself internet-of-things creation
I would DiYSE for it! A manifesto for do-it-yourself internet-of-things creationI would DiYSE for it! A manifesto for do-it-yourself internet-of-things creation
I would DiYSE for it! A manifesto for do-it-yourself internet-of-things creation
 
Social innovation and collaborative design
Social innovation and collaborative designSocial innovation and collaborative design
Social innovation and collaborative design
 
Designing Interactions / Experiences: Lecture #02
Designing Interactions / Experiences: Lecture #02Designing Interactions / Experiences: Lecture #02
Designing Interactions / Experiences: Lecture #02
 
Accessibility as Innovation - giving your potential users the chance to inspi...
Accessibility as Innovation - giving your potential users the chance to inspi...Accessibility as Innovation - giving your potential users the chance to inspi...
Accessibility as Innovation - giving your potential users the chance to inspi...
 
G325 final revision exam tips 2017 - 1a, 1b, Online Age
G325 final revision exam tips 2017 - 1a, 1b, Online AgeG325 final revision exam tips 2017 - 1a, 1b, Online Age
G325 final revision exam tips 2017 - 1a, 1b, Online Age
 
Pnc2010 knowledge-talk
Pnc2010 knowledge-talkPnc2010 knowledge-talk
Pnc2010 knowledge-talk
 
Cimeon Ellerton and Alison Whitaker, The Audience Agency: The Reverential Gap
Cimeon Ellerton and Alison Whitaker, The Audience Agency: The Reverential GapCimeon Ellerton and Alison Whitaker, The Audience Agency: The Reverential Gap
Cimeon Ellerton and Alison Whitaker, The Audience Agency: The Reverential Gap
 
The Future of Social Analytics - Defrag 2010
The Future of Social Analytics - Defrag 2010The Future of Social Analytics - Defrag 2010
The Future of Social Analytics - Defrag 2010
 
Digital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social SciencesDigital Humanities and “Digital” Social Sciences
Digital Humanities and “Digital” Social Sciences
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkMatthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Matthew Lease
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationMatthew Lease
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkMatthew Lease
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsMatthew Lease
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Matthew Lease
 

More from Matthew Lease (18)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 
Crowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical TurkCrowdsourcing Transcription Beyond Mechanical Turk
Crowdsourcing Transcription Beyond Mechanical Turk
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences. Crowdsourcing & ethics: a few thoughts and refences.
Crowdsourcing & ethics: a few thoughts and refences.
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Crowdsourcing For Research and Engineering (Tutorial given at CrowdConf 2011)

  • 1. Crowdsourcing for Research & Engineering Omar Alonso Microsoft Matthew Lease University of Texas at Austin November 1, 2011 November 1, 2011 Crowdsourcing for Research and Engineering 1
  • 2. Tutorial Objectives • What is crowdsourcing? • How and when to use crowdsourcing? • How to use Mechanical Turk • Experimental setup and design guidelines for working with the crowd • Quality control: issues, measuring, and improving • Future trends • Research landscape and open challenges November 1, 2011 Crowdsourcing for Research and Engineering 2
  • 3. Tutorial Outline I. Introduction to Crowdsourcing I. Introduction, Examples, Terminology II. Primary focus on micro-tasks II. Tools and Platforms: APIs and Examples III. Methodology for effective crowdsourcing I. Methods, Examples, and Tips II. Quality control: monitoring & improving IV. Future Trends November 1, 2011 Crowdsourcing for Research and Engineering 3
  • 4. I INTRODUCTION TO CROWDSOURCING November 1, 2011 Crowdsourcing for Research and Engineering 4
  • 5. From Outsourcing to Crowdsourcing • Take a job traditionally performed by a known agent (often an employee) • Outsource it to an undefined, generally large group of people via an open call • New application of principles from open source movement • Evolving & broadly defined ... November 1, 2011 Crowdsourcing for Research and Engineering 5
  • 6. Examples Crowdsourcing 101: Putting the WSDM of Crowds to Work for You. 6
  • 7. November 1, 2011 Crowdsourcing for Research and Engineering 7
  • 8. Crowdsourcing models • Virtual work, Micro-tasks, & Aggregators • Open Innovation, Co-Creation, & Contests • Citizen Science • Prediction Markets • Crowd Funding and Charity • “Gamification” (not serious gaming) • Transparent • cQ&A, Social Search, and Polling • Human Sensing November 1, 2011 Crowdsourcing for Research and Engineering 8
  • 9. What is Crowdsourcing? • A collection of mechanisms and associated methodologies for scaling and directing crowd activities to achieve some goal(s) • Enabled by internet-connectivity • Many related areas – Collective intelligence – Social computing – People services – Human computation (next slide…) • Good work is creative, innovative, surprising, … November 1, 2011 Crowdsourcing for Research and Engineering 9
  • 10. Human Computation • Having people do stuff instead of computers • Investigates use of people to execute certain computations for which capabilities of current automated methods are more limited • Explores the metaphor of computation for characterizing attributes, capabilities, and limitations of human performance in executing desired tasks • Computation is required, crowd is not • Pioneer: Luis von Ahn’s thesis (2005) November 1, 2011 Crowdsourcing for Research and Engineering 10
  • 11. What is not crowdsourcing? • Ingredients necessary but not sufficient – A crowd – Digital communication • Post-hoc use of undirected crowd behaviors – e.g. Data mining, visualization • Conducting a traditional survey or poll • Human Computation with one or few people – E.g. traditional active learning • … November 1, 2011 Crowdsourcing for Research and Engineering 11
  • 12. Crowdsourcing Key Questions • What are the goals? – Purposeful directing of human activity • How can you incentivize participation? – Incentive engineering – Who are the target participants? • Which model(s) are most appropriate? – How to adapt them to your context and goals? November 1, 2011 Crowdsourcing for Research and Engineering 12
  • 13. What do you want to accomplish? • Perform specified task(s) • Innovate and/or discover • Create • Predict • Fund • Learn • Monitor November 1, 2011 Crowdsourcing for Research and Engineering
  • 14. Why Should Anyone Participate? Don’t let this happen to you … November 1, 2011 Crowdsourcing for Research and Engineering 14
  • 15. Incentive Engineering • Earn Money (real or virtual) • Have fun (or pass the time) • Socialize with others • Obtain recognition or prestige (leaderboards, badges) • Do Good (altruism) • Learn something new • Obtain something else • Create self-serving resource Multiple incentives can often operate in parallel (*caveat) November 1, 2011 Crowdsourcing for Research and Engineering 15
  • 16. Models: Goal(s) + Incentives • Virtual work, Micro-tasks, & Aggregators • Open Innovation, Co-Creation, & Contests • Citizen Science • Prediction Markets • Crowd Funding and Charity • “Gamification” (not serious gaming) • Transparent • cQ&A, Social Search, and Polling • Human Sensing November 1, 2011 Crowdsourcing for Research and Engineering 16
  • 17. Example: Wikipedia • Earn Money (real or virtual) • Have fun (or pass the time) • Socialize with others • Obtain recognition or prestige • Do Good (altruism) • Learn something new • Obtain something else • Create self-serving resource November 1, 2011 Crowdsourcing for Research and Engineering 17
  • 18. Example: • Earn Money (real or virtual) • Have fun (or pass the time) • Socialize with others • Obtain recognition or prestige • Do Good (altruism) • Learn something new • Obtain something else • Create self-serving resource November 1, 2011 Crowdsourcing for Research and Engineering 18
  • 19. Example: ESP and GWaP L. Von Ahn and L. Dabbish (2004) November 1, 2011 Crowdsourcing for Research and Engineering 19
  • 20. Example: ESP • Earn Money (real or virtual) • Have fun (or pass the time) • Socialize with others • Obtain recognition or prestige • Do Good (altruism) • Learn something new • Obtain something else • Create self-serving resource November 1, 2011 Crowdsourcing for Research and Engineering 20
  • 21. Example: fold.it S. Cooper et al. (2010) Alice G. Walton. Online Gamers Help Solve Mystery of Critical AIDS Virus Enzyme. The Atlantic, October 8, 2011. November 1, 2011 Crowdsourcing for Research and Engineering 21
  • 22. Example: fold.it • Earn Money (real or virtual) • Have fun (or pass the time) • Socialize with others • Obtain recognition or prestige • Do Good (altruism) • Learn something new • Obtain something else • Create self-serving resource November 1, 2011 Crowdsourcing for Research and Engineering 22
  • 23. Example: FreeRice November 1, 2011 Crowdsourcing for Research and Engineering 23
  • 24. Example: FreeRice • Earn Money (real or virtual) • Have fun (or pass the time) • Socialize with others • Obtain recognition or prestige • Do Good (altruism) • Learn something new • Obtain something else • Create self-serving resource November 1, 2011 Crowdsourcing for Research and Engineering 24
  • 25. Example: cQ&A, Social Search, & Polling November 1, 2011 Crowdsourcing for Research and Engineering 25
  • 26. Example: cQ&A • Earn Money (real or virtual) • Have fun (or pass the time) • Socialize with others • Obtain recognition or prestige • Do Good (altruism) • Learn something new • Obtain something else • Create self-serving resource November 1, 2011 Crowdsourcing for Research and Engineering 26
  • 27. Example: reCaptcha November 1, 2011 Crowdsourcing for Research and Engineering 27
  • 28. Example: reCaptcha • Earn Money (real or virtual) • Have fun (or pass the time) • Socialize with others • Obtain recognition or prestige • Do Good (altruism) Is there an existing human activity you can harness • Learn something new for another purpose? • Obtain something else • Create self-serving resource November 1, 2011 Crowdsourcing for Research and Engineering 28
  • 29. Example: Mechanical Turk J. Pontin. Artificial Intelligence, With Help From the Humans. New York Times (March 25, 2007) November 1, 2011 Crowdsourcing for Research and Engineering 29
  • 30. Example: Mechanical Turk • Earn Money (real or virtual) • Have fun (or pass the time) • Socialize with others • Obtain recognition or prestige • Do Good (altruism) • Learn something new • Obtain something else • Create self-serving resource November 1, 2011 Crowdsourcing for Research and Engineering 30
  • 31. Look Before You Leap a • Wolfson & Lease (2011) • Identify a few potential legal pitfalls to know about when considering crowdsourcing – employment law – patent inventorship – data security and the Federal Trade Commission – copyright ownership – securities regulation of crowdfunding • Take-away: don’t panic, just be mindful of the law November 1, 2011 Crowdsourcing for Research and Engineering 31
  • 32. Example: SamaSource Incentive for YOU: Do Good Terminology: channels November 1, 2011 Crowdsourcing for Research and Engineering 32
  • 33. Who are the workers? • A. Baio, November 2008. The Faces of Mechanical Turk. • P. Ipeirotis. March 2010. The New Demographics of Mechanical Turk • J. Ross, et al. Who are the Crowdworkers?... CHI 2010. November 1, 2011 Crowdsourcing for Research and Engineering 33
  • 34. MTurk Demographics • 2008-2009 studies found less global and diverse than previously thought – US – Female – Educated – Bored – Money is secondary November 1, 2011 Crowdsourcing for Research and Engineering 34
  • 35. 2010 shows increasing diversity 47% US, 34% India, 19% other (P. Ipeitorotis. March 2010) November 1, 2011 Crowdsourcing for Research and Engineering 35
  • 36. MICRO-TASKS++ AND EXAMPLES November 1, 2011 Crowdsourcing for Research and Engineering 36
  • 37. Chess machine unveiled in 1770 by Wolfgang von Kempelen (1734–1804) • “Micro-task” crowdsourcing marketplace • On-demand, scalable, real-time workforce • Online since 2005 (and still in “beta”) • Programmer’s API & “Dashboard” GUI • Sponsorship: TREC 2011 Crowdsourcing Track (pending) November 1, 2011 Crowdsourcing for Research and Engineering 37
  • 38. Does anyone really use it? Yes! http://www.mturk-tracker.com (P. Ipeirotis’10) From 1/09 – 4/10, 7M HITs from 10K requestors worth $500,000 USD (significant under-estimate) November 1, 2011 Crowdsourcing for Research and Engineering 38
  • 39. • Labor on-demand, Channels, Quality control features • Sponsorship – Research Workshops: CSE’10, CSDM’11, CIR’11, – TREC 2011 Crowdsourcing Track November 1, 2011 Crowdsourcing for Research and Engineering 39
  • 40. CloudFactory • Information below from Mark Sears (Oct. 18, 2011) • Cloud Labor API – Tools to design virtual assembly lines – workflows with multiple tasks chained together • Focus on self serve tools for people to easily design crowd-powered assembly lines that can be easily integrated into software applications • Interfaces: command-line, RESTful API, and Web • Each “task station” can have either a human or robot worker assigned – web software services (AlchemyAPI, SendGrid, Google APIs, Twilio, etc.) or local software can be combined with human computation • Many built-in "best practices" – “Tournament Stations” where multiple results are compared by a other cloud workers until confidence of best answer is reached – “Improver Stations” have workers improve and correct work by other workers – Badges are earned by cloud workers passing tests created by requesters – Training and tools to create skill tests will be flexible – Algorithms to detect and kick out spammers/cheaters/lazy/bad workers • Sponsorship: TREC 2012 Crowdsourcing Track November 1, 2011 Crowdsourcing for Research and Engineering 40
  • 41. More Crowd Labor Platforms • Clickworker • CloudCrowd • CrowdSource • DoMyStuff • Humanoid (by Matt Swason et al.) • Microtask • MobileWorks (by Anand Kulkarni ) • myGengo • SmartSheet • vWorker • Industry heavy-weights – Elance – Liveops – oDesk – uTest • and more… November 1, 2011 Crowdsourcing for Research and Engineering 41
  • 42. Why Micro-Tasks? • Easy, cheap and fast • Ready-to use infrastructure, e.g. – MTurk payments, workforce, interface widgets – CrowdFlower quality control mechanisms, etc. – Many others … • Allows early, iterative, frequent trials – Iteratively prototype and test new ideas – Try new tasks, test when you want & as you go • Many successful examples of use reported November 1, 2011 Crowdsourcing for Research and Engineering 42
  • 43. Micro-Task Issues • Process – Task design, instructions, setup, iteration • Choose crowdsourcing platform (or roll your own) • Human factors – Payment / incentives, interface and interaction design, communication, reputation, recruitment, retention • Quality Control / Data Quality – Trust, reliability, spam detection, consensus labeling November 1, 2011 Crowdsourcing for Research and Engineering 43
  • 44. Legal Disclaimer: Caution Tape and Silver Bullets • Often still involves more art than science • Not a magic panacea, but another alternative – one more data point for analysis, complements other methods • Quality may be traded off for time/cost/effort • Hard work & experimental design still required! November 1, 2011 Crowdsourcing for Research and Engineering 44
  • 45. Hello World Demo • We’ll show a simple, short demo of MTurk • This is a teaser highlighting things we’ll discuss – Don’t worry about details; we’ll revisit them • Specific task unimportant • Big idea: easy, fast, cheap to label with MTurk! November 1, 2011 Crowdsourcing for Research and Engineering 45
  • 46. Jane saw the man with the binoculars November 1, 2011 Crowdsourcing for Research and Engineering 46
  • 47. DEMO November 1, 2011 Crowdsourcing for Research and Engineering 47
  • 48. Traditional Data Collection • Setup data collection software / harness • Recruit participants • Pay a flat fee for experiment or hourly wage • Characteristics – Slow – Expensive – Tedious – Sample Bias November 1, 2011 Crowdsourcing for Research and Engineering 48
  • 49. Research Using Micro-Tasks • Let’s see examples of micro-task usage – Many areas: IR, NLP, computer vision, user studies, usability testing, psychological studies, surveys, … • Check bibliography at end for more references November 1, 2011 Crowdsourcing for Research and Engineering 49
  • 50. NLP Example – Dialect Identification November 1, 2011 Crowdsourcing for Research and Engineering 50
  • 51. NLP Example – Spelling correction November 1, 2011 Crowdsourcing for Research and Engineering 51
  • 52. NLP Example – Machine Translation • Manual evaluation on translation quality is slow and expensive • High agreement between non-experts and experts • $0.10 to translate a sentence C. Callison-Burch. “Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk”, EMNLP 2009. B. Bederson et al. Translation by Interactive Collaboration between Monolingual Users, GI 2010 November 1, 2011 Crowdsourcing for Research and Engineering 52
  • 53. NLP Example – Snow et al. (2008) • 5 Tasks – Affect recognition – Word similarity – Recognizing textual entailment – Event temporal ordering – Word sense disambiguation • high agreement between crowd labels and expert “gold” labels – assumes training data for worker bias correction • 22K labels for $26 ! November 1, 2011 Crowdsourcing for Research and Engineering 53
  • 54. Computer Vision – Painting Similarity Kovashka & Lease, CrowdConf’10 November 1, 2011 Crowdsourcing for Research and Engineering 54
  • 55. User Studies • Investigate attitudes about saving, sharing, publishing, and removing online photos • Survey – A scenario-based probe of respondent attitudes, designed to yield quantitative data – A set of questions (close and open-ended) – Importance of recent activity – 41 question – 7 point scale • 250 respondents C. Marshall and F. Shipman. “The Ownership and Reuse of Visual Media”, JCDL 2011. November 1, 2011 Crowdsourcing for Research and Engineering 55
  • 56. Remote Usability Testing • Liu et al. (in preparation) • Compares remote usability testing using MTurk and CrowdFlower (not uTest) vs. traditional on-site testing • Advantages – More Participants – More Diverse Participants – High Speed – Low Cost • Disadvantages – Lower Quality Feedback – Less Interaction – Greater need for quality control – Less Focused User Groups November 1, 2011 Crowdsourcing for Research and Engineering 56
  • 57. IR Example – Relevance and ads November 1, 2011 Crowdsourcing for Research and Engineering 57
  • 58. IR Example – Product Search November 1, 2011 Crowdsourcing for Research and Engineering 58
  • 59. IR Example – Snippet Evaluation • Study on summary lengths • Determine preferred result length • Asked workers to categorize web queries • Asked workers to evaluate snippet quality • Payment between $0.01 and $0.05 per HIT M. Kaisser, M. Hearst, and L. Lowe. “Improving Search Results Quality by Customizing Summary Lengths”, ACL/HLT, 2008. November 1, 2011 Crowdsourcing for Research and Engineering 59
  • 60. IR Example – Relevance Assessment • Replace TREC-like relevance assessors with MTurk? • Selected topic “space program” (011) • Modified original 4-page instructions from TREC • Workers more accurate than original assessors! • 40% provided justification for each answer O. Alonso and S. Mizzaro. “Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment”, SIGIR Workshop on the Future of IR Evaluation, 2009. November 1, 2011 Crowdsourcing for Research and Engineering 60
  • 61. IR Example – Timeline Annotation • Workers annotate timeline on politics, sports, culture • Given a timex (1970s, 1982, etc.) suggest something • Given an event (Vietnam, World cup, etc.) suggest a timex K. Berberich, S. Bedathur, O. Alonso, G. Weikum “A Language Modeling Approach for Temporal Information Needs”. ECIR 2010 November 1, 2011 Crowdsourcing for Research and Engineering 61
  • 62. How can I get started? • You have an idea • Easy, cheap, fast, and iterative sounds good Can you test your idea via crowdsourcing? • Is my idea crowdsourcable? • How do I start? • What do I need? November 1, 2011 Crowdsourcing for Research and Engineering 62
  • 63. Tip for Getting Started: do work Try doing work before you create work for others! November 1, 2011 Crowdsourcing for Research and Engineering 63
  • 64. II AMAZON MECHANICAL TURK November 1, 2011 Crowdsourcing for Research and Engineering 64
  • 65. Mechanical What? November 1, 2011 Crowdsourcing for Research and Engineering 65
  • 66. MTurk: The Requester • Sign up with your Amazon account • Amazon payments • Purchase prepaid HITs • There is no minimum or up-front fee • MTurk collects a 10% commission • The minimum commission charge is $0.005 per HIT November 1, 2011 Crowdsourcing for Research and Engineering 66
  • 67. MTurk Dashboard • Three tabs – Design – Publish – Manage • Design – HIT Template • Publish – Make work available • Manage – Monitor progress November 1, 2011 Crowdsourcing for Research and Engineering 67
  • 68. MTurk: Dashboard - II November 1, 2011 Crowdsourcing for Research and Engineering 68
  • 69. MTurk API • Amazon Web Services API • Rich set of services • Command line tools • More flexibility than dashboard November 1, 2011 Crowdsourcing for Research and Engineering 69
  • 70. MTurk Dashboard vs. API • Dashboard – Easy to prototype – Setup and launch an experiment in a few minutes • API – Ability to integrate AMT as part of a system – Ideal if you want to run experiments regularly – Schedule tasks November 1, 2011 Crowdsourcing for Research and Engineering 70
  • 71. Working on MTurk • Sign up with your Amazon account • Tabs – Account: work approved/rejected – HIT: browse and search for work – Qualifications: browse & search qualifications • Start turking! November 1, 2011 Crowdsourcing for Research and Engineering 71
  • 72. Why Eytan Adar hates MTurk Research (at least sort of) • Overly-narrow focus on Turk & other platforms – Identify general vs. platform-specific problems – Academic vs. Industrial problems • Lack of appreciation of interdisciplinary nature – Some problems well-studied in other areas – Human behavior hasn’t changed much • Turks aren’t Martians – How many prior user studies do we have to reproduce on MTurk before we can get over it? November 1, 2011 Crowdsourcing for Research and Engineering 72
  • 73. III RELEVANCE JUDGING & CROWDSOURCING November 1, 2011 Crowdsourcing for Research and Engineering 73
  • 74. November 1, 2011 Crowdsourcing for Research and Engineering 74
  • 75. Motivating Example: Relevance Judging • Relevance of search results is difficult to judge – Highly subjective – Expensive to measure • Professional editors commonly used • Potential benefits of crowdsourcing – Scalability (time and cost) – Diversity of judgments November 1, 2011 Crowdsourcing for Research and Engineering 75
  • 76. November 1, 2011 Crowdsourcing for Research and Engineering 76
  • 77. Started with a joke … November 1, 2011 Crowdsourcing for Research and Engineering 77
  • 78. Results for {idiot} at WSDM 2011 February 2011: 5/7 (R), 2/7 (NR) – Most of the time those TV reality stars have absolutely no talent. They do whatever they can to make a quick dollar. Most of the time the reality tv stars don not have a mind of their own. R – Most are just celebrity wannabees. Many have little or no talent, they just want fame. R – I can see this one going both ways. A particular sort of reality star comes to mind, though, one who was voted off Survivor because he chose not to use his immunity necklace. Sometimes the label fits, but sometimes it might be unfair. R – Just because someone else thinks they are an "idiot", doesn't mean that is what the word means. I don't like to think that any one person's photo would be used to describe a certain term. NR – While some reality-television stars are genuinely stupid (or cultivate an image of stupidity), that does not mean they can or should be classified as "idiots." Some simply act that way to increase their TV exposure and potential earnings. Other reality-television stars are really intelligent people, and may be considered as idiots by people who don't like them or agree with them. It is too subjective an issue to be a good result for a search engine. NR – Have you seen the knuckledraggers on reality television? They should be required to change their names to idiot after appearing on the show. You could put numbers after the word idiot so we can tell them apart. R – Although I have not followed too many of these shows, those that I have encountered have for a great part a very common property. That property is that most of the participants involved exhibit a shallow self-serving personality that borders on social pathological behavior. To perform or act in such an abysmal way could only be an act of an idiot. R November 1, 2011 Crowdsourcing for Research and Engineering 78
  • 79. Two Simple Examples of MTurk 1. Ask workers to classify a query 2. Ask workers to judge document relevance Steps • Define high-level task • Design & implement interface & backend • Launch, monitor progress, and assess work • Iterate design November 1, 2011 Crowdsourcing for Research and Engineering 79
  • 80. Query Classification Task • Ask the user to classify a query • Show a form that contains a few categories • Upload a few queries (~20) • Use 3 workers November 1, 2011 Crowdsourcing for Research and Engineering 80
  • 81. DEMO November 1, 2011 Crowdsourcing for Research and Engineering 81
  • 82. November 1, 2011 Crowdsourcing for Research and Engineering 82
  • 83. Relevance Judging Task • Use a few documents from a standard collection used for evaluating search engines • Ask user to make binary judgments • Modification: graded judging • Use 5 workers November 1, 2011 Crowdsourcing for Research and Engineering 83
  • 84. DEMO November 1, 2011 Crowdsourcing for Research and Engineering 84
  • 85. IV METHODOLOGY FOR EFFECTIVE CROWDSOURCING November 1, 2011 Crowdsourcing for Research and Engineering 85
  • 86. November 1, 2011 Crowdsourcing for Research and Engineering 86
  • 87. Typical Workflow • Define and design what to test • Sample data • Design the experiment • Run experiment • Collect data and analyze results • Quality control November 1, 2011 Crowdsourcing for Research and Engineering 87
  • 88. Development Framework • Incremental approach • Measure, evaluate, and adjust as you go • Suitable for repeatable tasks November 1, 2011 Crowdsourcing for Research and Engineering 88
  • 89. Survey Design • One of the most important parts • Part art, part science • Instructions are key • Prepare to iterate November 1, 2011 Crowdsourcing for Research and Engineering 89
  • 90. Questionnaire Design • Ask the right questions • Workers may not be IR experts so don’t assume the same understanding in terms of terminology • Show examples • Hire a technical writer – Engineer writes the specification – Writer communicates November 1, 2011 Crowdsourcing for Research and Engineering 90
  • 91. UX Design • Time to apply all those usability concepts • Generic tips – Experiment should be self-contained. – Keep it short and simple. Brief and concise. – Be very clear with the relevance task. – Engage with the worker. Avoid boring stuff. – Always ask for feedback (open-ended question) in an input box. November 1, 2011 Crowdsourcing for Research and Engineering 91
  • 92. UX Design - II • Presentation • Document design • Highlight important concepts • Colors and fonts • Need to grab attention • Localization November 1, 2011 Crowdsourcing for Research and Engineering 92
  • 93. Examples - I • Asking too much, task not clear, “do NOT/reject” • Worker has to do a lot of stuff November 1, 2011 Crowdsourcing for Research and Engineering 93
  • 94. Example - II • Lot of work for a few cents • Go here, go there, copy, enter, count … November 1, 2011 Crowdsourcing for Research and Engineering 94
  • 95. A Better Example • All information is available – What to do – Search result – Question to answer November 1, 2011 Crowdsourcing for Research and Engineering 95
  • 96. November 1, 2011 Crowdsourcing for Research and Engineering 96
  • 97. Form and Metadata • Form with a close question (binary relevance) and open-ended question (user feedback) • Clear title, useful keywords • Workers need to find your task November 1, 2011 Crowdsourcing for Research and Engineering 97
  • 98. Relevance Judging – Example I November 1, 2011 Crowdsourcing for Research and Engineering 98
  • 99. Relevance Judging – Example II November 1, 2011 Crowdsourcing for Research and Engineering 99
  • 100. Implementation • Similar to a UX • Build a mock up and test it with your team – Yes, you need to judge some tasks • Incorporate feedback and run a test on MTurk with a very small data set – Time the experiment – Do people understand the task? • Analyze results – Look for spammers – Check completion times • Iterate and modify accordingly November 1, 2011 Crowdsourcing for Research and Engineering 100
  • 101. Implementation – II • Introduce quality control – Qualification test – Gold answers (honey pots) • Adjust passing grade and worker approval rate • Run experiment with new settings & same data • Scale on data • Scale on workers November 1, 2011 Crowdsourcing for Research and Engineering 101
  • 102. Experiment in Production • Lots of tasks on MTurk at any moment • Need to grab attention • Importance of experiment metadata • When to schedule – Split a large task into batches and have 1 single batch in the system – Always review feedback from batch n before uploading n+1 November 1, 2011 Crowdsourcing for Research and Engineering 102
  • 103. How Much to Pay? • Price commensurate with task effort – Ex: $0.02 for yes/no answer + $0.02 bonus for optional feedback • Ethics & market-factors: W. Mason and S. Suri, 2010. – e.g. non-profit SamaSource contracts workers refugee camps – Predict right price given market & task: Wang et al. CSDM’11 • Uptake & time-to-completion vs. Cost & Quality – Too little $$, no interest or slow – too much $$, attract spammers – Real problem is lack of reliable QA substrate • Accuracy & quantity – More pay = more work, not better (W. Mason and D. Watts, 2009) • Heuristics: start small, watch uptake and bargaining feedback • Worker retention (“anchoring”) See also: L.B. Chilton et al. KDD-HCOMP 2010. November 1, 2011 Crowdsourcing for Research and Engineering 103
  • 104. November 1, 2011 Crowdsourcing for Research and Engineering 104
  • 105. Quality Control in General • Extremely important part of the experiment • Approach as “overall” quality; not just for workers • Bi-directional channel – You may think the worker is doing a bad job. – The same worker may think you are a lousy requester. November 1, 2011 Crowdsourcing for Research and Engineering 105
  • 106. When to assess quality of work • Beforehand (prior to main task activity) – How: “qualification tests” or similar mechanism – Purpose: screening, selection, recruiting, training • During – How: assess labels as worker produces them • Like random checks on a manufacturing line – Purpose: calibrate, reward/penalize, weight • After – How: compute accuracy metrics post-hoc – Purpose: filter, calibrate, weight, retain (HR) – E.g. Jung & Lease (2011), Tang & Lease (2011), ... November 1, 2011 Crowdsourcing for Research and Engineering 106
  • 107. How to assess quality of work? • Compare worker’s label vs. – Known (correct, trusted) label – Other workers’ labels • P. Ipeirotis. Worker Evaluation in Crowdsourcing: Gold Data or Multiple Workers? Sept. 2010. – Model predictions of the above • Model the labels (Ryu & Lease, ASIS&T11) • Model the workers (Chen et al., AAAI’10) • Verify worker’s label – Yourself – Tiered approach (e.g. Find-Fix-Verify) • Quinn and B. Bederson’09, Bernstein et al.’10 November 1, 2011 Crowdsourcing for Research and Engineering 107
  • 108. Typical Assumptions • Objective truth exists – no minority voice / rare insights – Can relax this to model “truth distribution” • Automatic answer comparison/evaluation – What about free text responses? Hope from NLP… • Automatic essay scoring • Translation (BLEU: Papineni, ACL’2002) • Summarization (Rouge: C.Y. Lin, WAS’2004) – Have people do it (yourself or find-verify crowd, etc.) November 1, 2011 Crowdsourcing for Research and Engineering 108
  • 109. Distinguishing Bias vs. Noise • Ipeirotis (HComp 2010) • People often have consistent, idiosyncratic skews in their labels (bias) – E.g. I like action movies, so they get higher ratings • Once detected, systematic bias can be calibrated for and corrected (yeah!) • Noise, however, seems random & inconsistent – this is the real issue we want to focus on November 1, 2011 Crowdsourcing for Research and Engineering 109
  • 110. Comparing to known answers • AKA: gold, honey pot, verifiable answer, trap • Assumes you have known answers • Cost vs. Benefit – Producing known answers (experts?) – % of work spent re-producing them • Finer points – Controls against collusion – What if workers recognize the honey pots? November 1, 2011 Crowdsourcing for Research and Engineering 110
  • 111. Comparing to other workers • AKA: consensus, plurality, redundant labeling • Well-known metrics for measuring agreement • Cost vs. Benefit: % of work that is redundant • Finer points – Is consensus “truth” or systematic bias of group? – What if no one really knows what they’re doing? • Low-agreement across workers indicates problem is with the task (or a specific example), not the workers – Risk of collusion • Sheng et al. (KDD 2008) November 1, 2011 Crowdsourcing for Research and Engineering 111
  • 112. Comparing to predicted label • Ryu & Lease, ASIS&T11 (CrowdConf’11 poster) • Catch-22 extremes – If model is really bad, why bother comparing? – If model is really good, why collect human labels? • Exploit model confidence – Trust predictions proportional to confidence – What if model very confident and wrong? • Active learning – Time sensitive: Accuracy / confidence changes November 1, 2011 Crowdsourcing for Research and Engineering 112
  • 113. Compare to predicted worker labels • Chen et al., AAAI’10 • Avoid inefficiency of redundant labeling – See also: Dekel & Shamir (COLT’2009) • Train a classifier for each worker • For each example labeled by a worker – Compare to predicted labels for all other workers • Issues • Sparsity: workers have to stick around to train model… • Time-sensitivity: New workers & incremental updates? November 1, 2011 Crowdsourcing for Research and Engineering 113
  • 114. Methods for measuring agreement • What to look for – Agreement, reliability, validity • Inter-agreement level – Agreement between judges – Agreement between judges and the gold set • Some statistics – Percentage agreement – Cohen’s kappa (2 raters) – Fleiss’ kappa (any number of raters) – Krippendorff’s alpha • With majority vote, what if 2 say relevant, 3 say not? – Use expert to break ties (Kochhar et al, HCOMP’10; GQR) – Collect more judgments as needed to reduce uncertainty November 1, 2011 Crowdsourcing for Research and Engineering 114
  • 115. Inter-rater reliability • Lots of research • Statistics books cover most of the material • Three categories based on the goals – Consensus estimates – Consistency estimates – Measurement estimates November 1, 2011 Crowdsourcing for Research and Engineering 115
  • 116. Sample code – R packages psy and irr >library(psy) >library(irr) >my_data <- read.delim(file="test.txt", head=TRUE, sep="t") >kappam.fleiss(my_data,exact=FALSE) >my_data2 <- read.delim(file="test2.txt", head=TRUE, sep="t") >ckappa(my_data2) November 1, 2011 Crowdsourcing for Research and Engineering 116
  • 117. k coefficient • Different interpretations of k • For practical purposes you need to be >= moderate • Results may vary k Interpretation <0 Poor agreement 0.01 – 0.20 Slight agreement 0.21 – 0.40 Fair agreement 0.41 – 0.60 Moderate agreement 0.61 – 0.80 Substantial agreement 0.81 – 1.00 Almost perfect agreement November 1, 2011 Crowdsourcing for Research and Engineering 117
  • 118. Detection Theory • Sensitivity measures – High sensitivity: good ability to discriminate – Low sensitivity: poor ability Stimulus “Yes” “No” Class S1 Hits Misses S2 False alarms Correct rejections Hit rate H = P(“yes”|S2) False alarm rate F = P(“yes”|S1) November 1, 2011 Crowdsourcing for Research and Engineering 118
  • 119. November 1, 2011 Crowdsourcing for Research and Engineering 119
  • 120. Finding Consensus • When multiple workers disagree on the correct label, how do we resolve this? – Simple majority vote (or average and round) – Weighted majority vote (e.g. naive bayes) • Many papers from machine learning… • If wide disagreement, likely there is a bigger problem which consensus doesn’t address November 1, 2011 Crowdsourcing for Research and Engineering 120
  • 121. Quality Control on MTurk • Rejecting work & Blocking workers (more later…) – Requestors don’t want bad PR or complaint emails – Common practice: always pay, block as needed • Approval rate: easy to use, but value? – P. Ipeirotis. Be a Top Mechanical Turk Worker: You Need $5 and 5 Minutes. Oct. 2010 – Many requestors don’t ever reject… • Qualification test – Pre-screen workers’ capabilities & effectiveness – Example and pros/cons in next slides… • Geographic restrictions • Mechanical Turk Masters (June 23, 2011) – Recent addition, degree of benefit TBD… November 1, 2011 Crowdsourcing for Research and Engineering 121
  • 122. Tools and Packages for MTurk • QA infrastructure layers atop MTurk promote useful separation-of-concerns from task – TurkIt • Quik Turkit provides nearly realtime services – Turkit-online (??) – Get Another Label (& qmturk) – Turk Surveyor – cv-web-annotation-toolkit (image labeling) – Soylent – Boto (python library) • Turkpipe: submit batches of jobs using the command line. • More needed… November 1, 2011 Crowdsourcing for Research and Engineering 122
  • 123. A qualification test snippet <Question> <QuestionIdentifier>question1</QuestionIdentifier> <QuestionContent> <Text>Carbon monoxide poisoning is</Text> </QuestionContent> <AnswerSpecification> <SelectionAnswer> <StyleSuggestion>radiobutton</StyleSuggestion> <Selections> <Selection> <SelectionIdentifier>1</SelectionIdentifier> <Text>A chemical technique</Text> </Selection> <Selection> <SelectionIdentifier>2</SelectionIdentifier> <Text>A green energy treatment</Text> </Selection> <Selection> <SelectionIdentifier>3</SelectionIdentifier> <Text>A phenomena associated with sports</Text> </Selection> <Selection> <SelectionIdentifier>4</SelectionIdentifier> <Text>None of the above</Text> </Selection> </Selections> </SelectionAnswer> </AnswerSpecification> </Question> 2011 November 1, Crowdsourcing for Research and Engineering 123
  • 124. Qualification tests: pros and cons • Advantages – Great tool for controlling quality – Adjust passing grade • Disadvantages – Extra cost to design and implement the test – May turn off workers, hurt completion time – Refresh the test on a regular basis – Hard to verify subjective tasks like judging relevance • Try creating task-related questions to get worker familiar with task before starting task in earnest November 1, 2011 Crowdsourcing for Research and Engineering 124
  • 125. More on quality control & assurance • HR issues: recruiting, selection, & retention – e.g., post/tweet, design a better qualification test, bonuses, … • Collect more redundant judgments… – at some point defeats cost savings of crowdsourcing – 5 workers is often sufficient November 1, 2011 Crowdsourcing for Research and Engineering 125
  • 126. Robots and Captchas • Some reports of robots on MTurk – E.g. McCreadie et al. (2011) – violation of terms of service – Artificial artificial artificial intelligence • Captchas seem ideal, but… – There is abuse of robots using turkers to solve captchas so they can access web resources – Turker wisdom is therefore to avoid such HITs • What to do? – Use standard captchas, notify workers – Block robots other ways (e.g. external HITs) – Catch robots through standard QC, response times – Use HIT-specific captchas (Kazai et al., 2011) November 1, 2011 Crowdsourcing for Research and Engineering 126
  • 127. Was the task difficult? • Ask workers to rate difficulty of a search topic • 50 topics; 5 workers, $0.01 per task November 1, 2011 Crowdsourcing for Research and Engineering 127
  • 128. Other quality heuristics • Justification/feedback as quasi-captcha – Successfully proven in past experiments – Should be optional – Automatically verifying feedback was written by a person may be difficult (classic spam detection task) • Broken URL/incorrect object – Leave an outlier in the data set – Workers will tell you – If somebody answers “excellent” on a graded relevance test for a broken URL => probably spammer November 1, 2011 Crowdsourcing for Research and Engineering 128
  • 129. Dealing with bad workers • Pay for “bad” work instead of rejecting it? – Pro: preserve reputation, admit if poor design at fault – Con: promote fraud, undermine approval rating system • Use bonus as incentive – Pay the minimum $0.01 and $0.01 for bonus – Better than rejecting a $0.02 task • If spammer “caught”, block from future tasks – May be easier to always pay, then block as needed November 1, 2011 Crowdsourcing for Research and Engineering 129
  • 130. Worker feedback • Real feedback received via email after rejection • Worker XXX I did. If you read these articles most of them have nothing to do with space programs. I’m not an idiot. • Worker XXX As far as I remember there wasn't an explanation about what to do when there is no name in the text. I believe I did write a few comments on that, too. So I think you're being unfair rejecting my HITs. November 1, 2011 Crowdsourcing for Research and Engineering 130
  • 131. Real email exchange with worker after rejection WORKER: this is not fair , you made me work for 10 cents and i lost my 30 minutes of time ,power and lot more and gave me 2 rejections at least you may keep it pending. please show some respect to turkers REQUESTER: I'm sorry about the rejection. However, in the directions given in the hit, we have the following instructions: IN ORDER TO GET PAID, you must judge all 5 webpages below *AND* complete a minimum of three HITs. Unfortunately, because you only completed two hits, we had to reject those hits. We do this because we need a certain amount of data on which to make decisions about judgment quality. I'm sorry if this caused any distress. Feel free to contact me if you have any additional questions or concerns. WORKER: I understood the problems. At that time my kid was crying and i went to look after. that's why i responded like that. I was very much worried about a hit being rejected. The real fact is that i haven't seen that instructions of 5 web page and started doing as i do the dolores labs hit, then someone called me and i went to attend that call. sorry for that and thanks for your kind concern. November 1, 2011 Crowdsourcing for Research and Engineering 131
  • 132. Exchange with worker • Worker XXX Thank you. I will post positive feedback for you at Turker Nation. Me: was this a sarcastic comment? • I took a chance by accepting some of your HITs to see if you were a trustworthy author. My experience with you has been favorable so I will put in a good word for you on that website. This will help you get higher quality applicants in the future, which will provide higher quality work, which might be worth more to you, which hopefully means higher HIT amounts in the future. November 1, 2011 Crowdsourcing for Research and Engineering 132
  • 133. Build Your Reputation as a Requestor • Word of mouth effect – Workers trust the requester (pay on time, clear explanation if there is a rejection) – Experiments tend to go faster – Announce forthcoming tasks (e.g. tweet) • Disclose your real identity? November 1, 2011 Crowdsourcing for Research and Engineering 133
  • 134. Other practical tips • Sign up as worker and do some HITs • “Eat your own dog food” • Monitor discussion forums • Address feedback (e.g., poor guidelines, payments, passing grade, etc.) • Everything counts! – Overall design only as strong as weakest link November 1, 2011 Crowdsourcing for Research and Engineering 134
  • 135. Content quality • People like to work on things that they like • TREC ad-hoc vs. INEX – TREC experiments took twice to complete – INEX (Wikipedia), TREC (LA Times, FBIS) • Topics – INEX: Olympic games, movies, salad recipes, etc. – TREC: cosmic events, Schengen agreement, etc. • Content and judgments according to modern times – Airport security docs are pre 9/11 – Antarctic exploration (global warming ) November 1, 2011 Crowdsourcing for Research and Engineering 135
  • 136. Content quality - II • Document length • Randomize content • Avoid worker fatigue – Judging 100 documents on the same subject can be tiring, leading to decreasing quality November 1, 2011 Crowdsourcing for Research and Engineering 136
  • 137. Presentation • People scan documents for relevance cues • Document design • Highlighting no more than 10% November 1, 2011 Crowdsourcing for Research and Engineering 137
  • 138. Presentation - II November 1, 2011 Crowdsourcing for Research and Engineering 138
  • 139. Relevance justification • Why settle for a label? • Let workers justify answers – cf. Zaidan et al. (2007) “annotator rationales” • INEX – 22% of assignments with comments • Must be optional • Let’s see how people justify November 1, 2011 Crowdsourcing for Research and Engineering 139
  • 140. “Relevant” answers [Salad Recipes] Doesn't mention the word 'salad', but the recipe is one that could be considered a salad, or a salad topping, or a sandwich spread. Egg salad recipe Egg salad recipe is discussed. History of salad cream is discussed. Includes salad recipe It has information about salad recipes. Potato Salad Potato salad recipes are listed. Recipe for a salad dressing. Salad Recipes are discussed. Salad cream is discussed. Salad info and recipe The article contains a salad recipe. The article discusses methods of making potato salad. The recipe is for a dressing for a salad, so the information is somewhat narrow for the topic but is still potentially relevant for a researcher. This article describes a specific salad. Although it does not list a specific recipe, it does contain information relevant to the search topic. gives a recipe for tuna salad relevant for tuna salad recipes relevant to salad recipes this is on-topic for salad recipes November 1, 2011 Crowdsourcing for Research and Engineering 140
  • 141. “Not relevant” answers [Salad Recipes] About gaming not salad recipes. Article is about Norway. Article is about Region Codes. Article is about forests. Article is about geography. Document is about forest and trees. Has nothing to do with salad or recipes. Not a salad recipe Not about recipes Not about salad recipes There is no recipe, just a comment on how salads fit into meal formats. There is nothing mentioned about salads. While dressings should be mentioned with salads, this is an article on one specific type of dressing, no recipe for salads. article about a swiss tv show completely off-topic for salad recipes not a salad recipe not about salad recipes totally off base November 1, 2011 Crowdsourcing for Research and Engineering 141
  • 142. November 1, 2011 Crowdsourcing for Research and Engineering 142
  • 143. Feedback length • Workers will justify answers • Has to be optional for good feedback • In E51, mandatory comments – Length dropped – “Relevant” or “Not Relevant November 1, 2011 Crowdsourcing for Research and Engineering 143
  • 144. Other design principles • Text alignment • Legibility • Reading level: complexity of words and sentences • Attractiveness (worker’s attention & enjoyment) • Multi-cultural / multi-lingual • Who is the audience (e.g. target worker community) – Special needs communities (e.g. simple color blindness) • Parsimony • Cognitive load: mental rigor needed to perform task • Exposure effect November 1, 2011 Crowdsourcing for Research and Engineering 144
  • 145. Platform alternatives • Why MTurk – Amazon brand, lots of research papers – Speed, price, diversity, payments • Why not – Crowdsourcing != Mturk – Spam, no analytics, must build tools for worker & task quality • How to build your own crowdsourcing platform – Back-end – Template language for creating experiments – Scheduler – Payments? November 1, 2011 Crowdsourcing for Research and Engineering 145
  • 146. The human side • As a worker – I hate when instructions are not clear – I’m not a spammer – I just don’t get what you want – Boring task – A good pay is ideal but not the only condition for engagement • As a requester – Attrition – Balancing act: a task that would produce the right results and is appealing to workers – I want your honest answer for the task – I want qualified workers; system should do some of that for me • Managing crowds and tasks is a daily activity – more difficult than managing computers November 1, 2011 Crowdsourcing for Research and Engineering 146
  • 147. Things that work • Qualification tests • Honey-pots • Good content and good presentation • Economy of attention • Things to improve – Manage workers in different levels of expertise including spammers and potential cases. – Mix different pools of workers based on different profile and expertise levels. November 1, 2011 Crowdsourcing for Research and Engineering 147
  • 148. Things that need work • UX and guidelines – Help the worker – Cost of interaction • Scheduling and refresh rate • Exposure effect • Sometimes we just don’t agree • How crowdsourcable is your task November 1, 2011 Crowdsourcing for Research and Engineering 148
  • 149. V. FUTURE TRENDS: FROM LABELING TO HUMAN COMPUTATION November 1, 2011 Crowdsourcing for Research and Engineering 149
  • 150. The Turing Test (Alan Turing, 1950) November 1, 2011 Crowdsourcing for Research and Engineering 150
  • 151. November 1, 2011 Crowdsourcing for Research and Engineering 151
  • 152. The Turing Test (Alan Turing, 1950) November 1, 2011 Crowdsourcing for Research and Engineering 152
  • 153. What is a Computer? November 1, 2011 Crowdsourcing for Research and Engineering 153
  • 154. • What was old is new • “Crowdsourcing: A New Branch of Computer Science” (March 29, 2011) • See also: M. Croarken (2003), Tabulating the heavens: computing the Nautical Almanac in 18th-century England Princeton University Press, 2005 November 1, 2011 Crowdsourcing for Research and Engineering 154
  • 155. Davis et al. (2010) The HPU. HPU November 1, 2011 Crowdsourcing for Research and Engineering 155
  • 156. Remembering the Human in HPU • Not just turning a mechanical crank November 1, 2011 Crowdsourcing for Research and Engineering 156
  • 157. Human Computation Rebirth of people as ‘computists’; people do tasks computers cannot (do well) Stage 1: Detecting robots – CAPTCHA: Completely Automated Public Turing test to tell Computers and Humans Apart – No useful work produced; people just answer questions with known answers Stage 2: Labeling data (at scale) – E.g. ESP game, typical use of MTurk – Game changer for AI: starving for data Stage 3: General “human computation” (HPU) – people do arbitrarily sophisticated tasks (i.e. compute arbitrary functions) – HPU as core component in system architecture, many “HPC” invocations – blend HPU with automation for a new class of hybrid applications – New tradeoffs possible in latency/cost vs. functionality/accuracy November 1, 2011 Crowdsourcing for Research and Engineering 157
  • 158. Mobile Phone App: “Amazon Remembers” November 1, 2011 Crowdsourcing for Research and Engineering 158
  • 159. Soylent: A Word Processor with a Crowd Inside • Bernstein et al., UIST 2010 November 1, 2011 Crowdsourcing for Research and Engineering 159
  • 160. CrowdSearch and mCrowd • T. Yan, MobiSys 2010 November 1, 2011 Crowdsourcing for Research and Engineering 160
  • 161. Translation by monolingual speakers • C. Hu, CHI 2009 November 1, 2011 Crowdsourcing for Research and Engineering 161
  • 162. Wisdom of Crowds (WoC) Requires • Diversity • Independence • Decentralization • Aggregation Input: large, diverse sample (to increase likelihood of overall pool quality) Output: consensus or selection (aggregation) November 1, 2011 Crowdsourcing for Research and Engineering 162
  • 163. WoC vs. Ensemble Learning • Combine multiple models to improve performance over any constituent model – Can use many weak learners to make a strong one – Compensate for poor models with extra computation • Works better with diverse, independent learners • cf. NIPS 2010-2011 Workshops – Computational Social Science & the Wisdom of Crowds • More investigation needed of traditional feature- based machine learning & ensemble methods for consensus labeling with crowdsourcing November 1, 2011 Crowdsourcing for Research and Engineering 163
  • 164. Unreasonable Effectiveness of Data • Massive free Web data changed how we train learning systems – Banko and Brill (2001). Human Language Tech. – Halevy et al. (2009). IEEE Intelligent Systems. • How might access to cheap & plentiful labeled data change the balance again? November 1, 2011 Crowdsourcing for Research and Engineering 164
  • 165. Active Learning • Minimize number of labels to achieve goal accuracy rate of classifier – Select examples to label to maximize learning • Vijayanarasimhan and Grauman (CVPR 2011) – Simple margin criteria: select maximally uncertain examples to label next – Finding which examples are uncertain can be computationally intensive (workers have to wait) – Use locality-sensitive hashing to find uncertain examples in sub-linear time November 1, 2011 Crowdsourcing for Research and Engineering 165
  • 166. Active Learning (2) • V&G report each learning iteration ~ 75 min – 15 minutes for model training & selection – 60 minutes waiting for crowd labels • Leaving workers idle may lose them, slowing uptake and completion times • Keep workers occupied – Mason and Suri (2010): paid waiting room – Laws et al. (EMNLP 2011): parallelize labeling and example selection via producer-consumer model • Workers consume examples, produce labels • Model consumes label, produces examples November 1, 2011 Crowdsourcing for Research and Engineering 166
  • 167. MapReduce with human computation • Commonalities – Large task divided into smaller sub-problems – Work distributed among worker nodes (workers) – Collect all answers and combine them – Varying performance of heterogeneous CPUs/HPUs • Variations – Human response latency / size of “cluster” – Some tasks are not suitable November 1, 2011 Crowdsourcing for Research and Engineering 167
  • 168. A Few Questions • How should we balance automation vs. human computation? Which does what? • Who’s the right person for the job? • How do we handle complex tasks? Can we decompose them into smaller tasks? How? November 1, 2011 Crowdsourcing for Research and Engineering 168
  • 169. Research problems – operational • Methodology – Budget, people, document, queries, presentation, incentives, etc. – Scheduling – Quality • What’s the best “mix” of HC for a task? • What are the tasks suitable for HC? • Can I crowdsource my task? – Eickhoff and de Vries, WSDM 2011 CSDM Workshop November 1, 2011 Crowdsourcing for Research and Engineering 169
  • 170. More problems • Human factors vs. outcomes • Editors vs. workers • Pricing tasks • Predicting worker quality from observable properties (e.g. task completion time) • HIT / Requestor ranking or recommendation • Expert search : who are the right workers given task nature and constraints • Ensemble methods for Crowd Wisdom consensus November 1, 2011 Crowdsourcing for Research and Engineering 170
  • 171. Problems: crowds, clouds and algorithms • Infrastructure – Current platforms are very rudimentary – No tools for data analysis • Dealing with uncertainty (propagate rather than mask) – Temporal and labeling uncertainty – Learning algorithms – Search evaluation – Active learning (which example is likely to be labeled correctly) • Combining CPU + HPU – Human Remote Call? – Procedural vs. declarative? – Integration points with enterprise systems November 1, 2011 Crowdsourcing for Research and Engineering 171
  • 172. CrowdForge: MapReduce for Automation + Human Computation Kittur et al., CHI 2011 November 1, 2011 Crowdsourcing for Research and Engineering 172
  • 173. Conclusions • Crowdsourcing works and is here to stay • Fast turnaround, easy to experiment, cheap • Still have to design the experiments carefully! • Usability considerations • Worker quality • User feedback extremely useful November 1, 2011 Crowdsourcing for Research and Engineering 173
  • 174. Conclusions - II • Lots of opportunities to improve current platforms • Integration with current systems • While MTurk first to-market in micro-task vertical, many other vendors are emerging with different affordances or value-added features • Many open research problems … November 1, 2011 Crowdsourcing for Research and Engineering 174
  • 175. Conclusions – III • Important to know your limitations and be ready to collaborate • Lots of different skills and expertise required – Social/behavioral science – Human factors – Algorithms – Economics – Distributed systems – Statistics November 1, 2011 Crowdsourcing for Research and Engineering 175
  • 176. VIII REFERENCES & RESOURCES November 1, 2011 Crowdsourcing for Research and Engineering 176
  • 177. Books • Omar Alonso, Gabriella Kazai, and Stefano Mizzaro. (2012). Crowdsourcing for Search Engine Evaluation: Why and How. • Law and von Ahn (2011). Human Computation November 1, 2011 Crowdsourcing for Research and Engineering 177
  • 178. More Books July 2010, kindle-only: “This book introduces you to the top crowdsourcing sites and outlines step by step with photos the exact process to get started as a requester on Amazon Mechanical Turk.“ November 1, 2011 Crowdsourcing for Research and Engineering 178
  • 179. 2011 Tutorials and Keynotes • By Omar Alonso and/or Matthew Lease – CLEF: Crowdsourcing for Information Retrieval Experimentation and Evaluation (Sep. 20, Omar only) – CrowdConf (Nov. 1, this is it!) – IJCNLP: Crowd Computing: Opportunities and Challenges (Nov. 10, Matt only) – WSDM: Crowdsourcing 101: Putting the WSDM of Crowds to Work for You (Feb. 9) – SIGIR: Crowdsourcing for Information Retrieval: Principles, Methods, and Applications (July 24) • AAAI: Human Computation: Core Research Questions and State of the Art – Edith Law and Luis von Ahn, August 7 • ASIS&T: How to Identify Ducks In Flight: A Crowdsourcing Approach to Biodiversity Research and Conservation – Steve Kelling, October 10, ebird • EC: Conducting Behavioral Research Using Amazon's Mechanical Turk – Winter Mason and Siddharth Suri, June 5 • HCIC: Quality Crowdsourcing for Human Computer Interaction Research – Ed Chi, June 14-18, about HCIC) – Also see his: Crowdsourcing for HCI Research with Amazon Mechanical Turk • Multimedia: Frontiers in Multimedia Search – Alan Hanjalic and Martha Larson, Nov 28 • VLDB: Crowdsourcing Applications and Platforms – Anhai Doan, Michael Franklin, Donald Kossmann, and Tim Kraska) • WWW: Managing Crowdsourced Human Computation – Panos Ipeirotis and Praveen Paritosh November 1, 2011 Crowdsourcing for Research and Engineering 179
  • 180. 2011 Workshops & Conferences • AAAI-HCOMP: 3rd Human Computation Workshop (Aug. 8) • ACIS: Crowdsourcing, Value Co-Creation, & Digital Economy Innovation (Nov. 30 – Dec. 2) • Crowdsourcing Technologies for Language and Cognition Studies (July 27) • CHI-CHC: Crowdsourcing and Human Computation (May 8) • CIKM: BooksOnline (Oct. 24, “crowdsourcing … online books”) • CrowdConf 2011 -- 2nd Conf. on the Future of Distributed Work (Nov. 1-2) • Crowdsourcing: Improving … Scientific Data Through Social Networking (June 13) • EC: Workshop on Social Computing and User Generated Content (June 5) • ICWE: 2nd International Workshop on Enterprise Crowdsourcing (June 20) • Interspeech: Crowdsourcing for speech processing (August) • NIPS: Second Workshop on Computational Social Science and the Wisdom of Crowds (Dec. TBD) • SIGIR-CIR: Workshop on Crowdsourcing for Information Retrieval (July 28) • TREC-Crowd: Year 1 of TREC Crowdsourcing Track (Nov. 16-18) • UbiComp: 2nd Workshop on Ubiquitous Crowdsourcing (Sep. 18) • WSDM-CSDM: Crowdsourcing for Search and Data Mining (Feb. 9) November 1, 2011 Crowdsourcing for Research and Engineering 180
  • 181. Things to Come in 2012 • AAAI Symposium: Wisdom of the Crowd – March 26-28 • Year 2 of TREC Crowdsourcing Track • Human Computation workshop/conference (TBD) • Journal Special Issues – Springer’s Information Retrieval: Crowdsourcing for Information Retrieval – Hindawi’s Advances in Multimedia Journal: Multimedia Semantics Analysis via Crowdsourcing Geocontext – IEEE Internet Computing: Crowdsourcing (Sept./Oct. 2012) – IEEE Transactions on Multimedia: Crowdsourcing in Multimedia (proposal in review) November 1, 2011 Crowdsourcing for Research and Engineering 181
  • 182. Thank You! Crowdsourcing news & information: ir.ischool.utexas.edu/crowd For further questions, contact us at: omar.alonso@microsoft.com ml@ischool.utexas.edu Cartoons by Mateo Burtch (buta@sonic.net) November 1, 2011 Crowdsourcing for Research and Engineering 182
  • 183. Recent Overview Papers • Alex Quinn and Ben Bederson. Human Computation: A Survey and Taxonomy of a Growing Field. In Proceedings of CHI 2011. • Man-Ching Yuen, Irwin King, and Kwong-Sak Leung. A Survey of Crowdsourcing Systems. SocialCom 2011. • A. Doan, R. Ramakrishnan, A. Halevy. Crowdsourcing Systems on the World-Wide Web. Communications of the ACM, 2011. November 1, 2011 Crowdsourcing for Research and Engineering 183
  • 184. Resources A Few Blogs  Behind Enemy Lines (P.G. Ipeirotis, NYU)  Deneme: a Mechanical Turk experiments blog (Gret Little, MIT)  CrowdFlower Blog  http://experimentalturk.wordpress.com  Jeff Howe A Few Sites  The Crowdsortium  Crowdsourcing.org  CrowdsourceBase (for workers)  Daily Crowdsource MTurk Forums and Resources  Turker Nation: http://turkers.proboards.com  http://www.turkalert.com (and its blog)  Turkopticon: report/avoid shady requestors  Amazon Forum for MTurk November 1, 2011 Crowdsourcing for Research and Engineering 184
  • 185. Bibliography  J. Barr and L. Cabrera. “AI gets a Brain”, ACM Queue, May 2006.  Bernstein, M. et al. Soylent: A Word Processor with a Crowd Inside. UIST 2010. Best Student Paper award.  Bederson, B.B., Hu, C., & Resnik, P. Translation by Iteractive Collaboration between Monolingual Users, Proceedings of Graphics Interface (GI 2010), 39-46.  N. Bradburn, S. Sudman, and B. Wansink. Asking Questions: The Definitive Guide to Questionnaire Design, Jossey-Bass, 2004.  C. Callison-Burch. “Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk”, EMNLP 2009.  P. Dai, Mausam, and D. Weld. “Decision-Theoretic of Crowd-Sourced Workflows”, AAAI, 2010.  J. Davis et al. “The HPU”, IEEE Computer Vision and Pattern Recognition Workshop on Advancing Computer Vision with Human in the Loop (ACVHL), June 2010.  M. Gashler, C. Giraud-Carrier, T. Martinez. Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous, ICMLA 2008.  D. A. Grier. When Computers Were Human. Princeton University Press, 2005. ISBN 0691091579  JS. Hacker and L. von Ahn. “Matchin: Eliciting User Preferences with an Online Game”, CHI 2009.  J. Heer, M. Bobstock. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design”, CHI 2010.  P. Heymann and H. Garcia-Molina. “Human Processing”, Technical Report, Stanford Info Lab, 2010.  J. Howe. “Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business”. Crown Business, New York, 2008.  P. Hsueh, P. Melville, V. Sindhwami. “Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria”. NAACL HLT Workshop on Active Learning and NLP, 2009.  B. Huberman, D. Romero, and F. Wu. “Crowdsourcing, attention and productivity”. Journal of Information Science, 2009.  P.G. Ipeirotis. The New Demographics of Mechanical Turk. March 9, 2010. PDF and Spreadsheet.  P.G. Ipeirotis, R. Chandrasekar and P. Bennett. Report on the human computation workshop. SIGKDD Explorations v11 no 2 pp. 80-83, 2010.  P.G. Ipeirotis. Analyzing the Amazon Mechanical Turk Marketplace. CeDER-10-04 (Sept. 11, 2010) November 1, 2011 Crowdsourcing for Research and Engineering 185
  • 186. Bibliography (2)  A. Kittur, E. Chi, and B. Suh. “Crowdsourcing user studies with Mechanical Turk”, SIGCHI 2008.  Aniket Kittur, Boris Smus, Robert E. Kraut. CrowdForge: Crowdsourcing Complex Work. CHI 2011  Adriana Kovashka and Matthew Lease. “Human and Machine Detection of … Similarity in Art”. CrowdConf 2010.  K. Krippendorff. "Content Analysis", Sage Publications, 2003  G. Little, L. Chilton, M. Goldman, and R. Miller. “TurKit: Tools for Iterative Tasks on Mechanical Turk”, HCOMP 2009.  T. Malone, R. Laubacher, and C. Dellarocas. Harnessing Crowds: Mapping the Genome of Collective Intelligence. 2009.  W. Mason and D. Watts. “Financial Incentives and the ’Performance of Crowds’”, HCOMP Workshop at KDD 2009.  J. Nielsen. “Usability Engineering”, Morgan-Kaufman, 1994.  A. Quinn and B. Bederson. “A Taxonomy of Distributed Human Computation”, Technical Report HCIL-2009-23, 2009  J. Ross, L. Irani, M. Six Silberman, A. Zaldivar, and B. Tomlinson. “Who are the Crowdworkers?: Shifting Demographics in Amazon Mechanical Turk”. CHI 2010.  F. Scheuren. “What is a Survey” (http://www.whatisasurvey.info) 2004.  R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng. “Cheap and Fast But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks”. EMNLP-2008.  V. Sheng, F. Provost, P. Ipeirotis. “Get Another Label? Improving Data Quality … Using Multiple, Noisy Labelers” KDD 2008.  S. Weber. “The Success of Open Source”, Harvard University Press, 2004.  L. von Ahn. Games with a purpose. Computer, 39 (6), 92–94, 2006.  L. von Ahn and L. Dabbish. “Designing Games with a purpose”. CACM, Vol. 51, No. 8, 2008. November 1, 2011 Crowdsourcing for Research and Engineering 186
  • 187. Bibliography (3)  Shuo Chen et al. What if the Irresponsible Teachers Are Dominating? A Method of Training on Samples and Clustering on Teachers. AAAI 2010.  Paul Heymann, Hector Garcia-Molina: Turkalytics: analytics for human computation. WWW 2011.  Florian Laws, Christian Scheible and Hinrich Schütze. Active Learning with Amazon Mechanical Turk. EMNLP 2011.  C.Y. Lin. Rouge: A package for automatic evaluation of summaries. Proceedings of the workshop on text summarization branches out (WAS), 2004.  C. Marshall and F. Shipman “The Ownership and Reuse of Visual Media”, JCDL, 2011.  Hohyon Ryu and Matthew Lease. Crowdworker Filtering with Support Vector Machine. ASIS&T 2011.  Wei Tang and Matthew Lease. Semi-Supervised Consensus Labeling for Crowdsourcing. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR), 2011.  S. Vijayanarasimhan and K. Grauman. Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds. CVPR 2011.  Stephen Wolfson and Matthew Lease. Look Before You Leap: Legal Pitfalls of Crowdsourcing. ASIS&T 2011. November 1, 2011 Crowdsourcing for Research and Engineering 187
  • 188. Crowdsourcing in IR: 2008-2010  2008  O. Alonso, D. Rose, and B. Stewart. “Crowdsourcing for relevance evaluation”, SIGIR Forum, Vol. 42, No. 2.  2009  O. Alonso and S. Mizzaro. “Can we get rid of TREC Assessors? Using Mechanical Turk for … Assessment”. SIGIR Workshop on the Future of IR Evaluation.  P.N. Bennett, D.M. Chickering, A. Mityagin. Learning Consensus Opinion: Mining Data from a Labeling Game. WWW.  G. Kazai, N. Milic-Frayling, and J. Costello. “Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments”, SIGIR.  G. Kazai and N. Milic-Frayling. “… Quality of Relevance Assessments Collected through Crowdsourcing”. SIGIR Workshop on the Future of IR Evaluation.  Law et al. “SearchWar”. HCOMP.  H. Ma, R. Chandrasekar, C. Quirk, and A. Gupta. “Improving Search Engines Using Human Computation Games”, CIKM 2009.  2010  SIGIR Workshop on Crowdsourcing for Search Evaluation.  O. Alonso, R. Schenkel, and M. Theobald. “Crowdsourcing Assessments for XML Ranked Retrieval”, ECIR.  K. Berberich, S. Bedathur, O. Alonso, G. Weikum “A Language Modeling Approach for Temporal Information Needs”, ECIR.  C. Grady and M. Lease. “Crowdsourcing Document Relevance Assessment with Mechanical Turk”. NAACL HLT Workshop on … Amazon's Mechanical Turk.  Grace Hui Yang, Anton Mityagin, Krysta M. Svore, and Sergey Markov . “Collecting High Quality Overlapping Labels at Low Cost”. SIGIR.  G. Kazai. “An Exploration of the Influence that Task Parameters Have on the Performance of Crowds”. CrowdConf.  G. Kazai. “… Crowdsourcing in Building an Evaluation Platform for Searching Collections of Digitized Books”., Workshop on Very Large Digital Libraries (VLDL)  Stephanie Nowak and Stefan Ruger. How Reliable are Annotations via Crowdsourcing? MIR.  Jean-François Paiement, Dr. James G. Shanahan, and Remi Zajac. “Crowdsourcing Local Search Relevance”. CrowdConf.  Maria Stone and Omar Alonso. “A Comparison of On-Demand Workforce with Trained Judges for Web Search Relevance Evaluation”. CrowdConf.  T. Yan, V. Kumar, and D. Ganesan. CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones. MobiSys pp. 77--90, 2010. November 1, 2011 Crowdsourcing for Research and Engineering 188
  • 189. Crowdsourcing in IR: 2011  WSDM Workshop on Crowdsourcing for Search and Data Mining.  SIGIR Workshop on Crowdsourcing for Information Retrieval.  O. Alonso and R. Baeza-Yates. “Design and Implementation of Relevance Assessments using Crowdsourcing, ECIR 2011.  Roi Blanco, Harry Halpin, Daniel Herzig, Peter Mika, Jeffrey Pound, Henry Thompson, Thanh D. Tran. “Repeatable and Reliable Search System Evaluation using Crowd-Sourcing”. SIGIR 2011.  Yen-Ta Huang, An-Jung Cheng, Liang-Chi Hsieh, Winston H. Hsu, Kuo-Wei Chang. “Region-Based Landmark Discovery by Crowdsourcing Geo-Referenced Photos.” SIGIR 2011.  Hyun Joon Jung, Matthew Lease . “Improving Consensus Accuracy via Z-score and Weighted Voting”. HCOMP 2011.  G. Kasneci, J. Van Gael, D. Stern, and T. Graepel, CoBayes: Bayesian Knowledge Corroboration with Assessors of Unknown Areas of Expertise, WSDM 2011.  Gabriella Kazai,. “In Search of Quality in Crowdsourcing for Search Engine Evaluation”, ECIR 2011.  Gabriella Kazai, Jaap Kamps, Marijn Koolen, Natasa Milic-Frayling. “Crowdsourcing for Book Search Evaluation: Impact of Quality on Comparative System Ranking.” SIGIR 2011.  Abhimanu Kumar, Matthew Lease . “Learning to Rank From a Noisy Crowd”. SIGIR 2011.  Edith Law, Paul N. Bennett, and Eric Horvitz. “The Effects of Choice in Routing Relevance Judgments”. SIGIR 2011. November 1, 2011 Crowdsourcing for Research and Engineering 189