SlideShare a Scribd company logo
1 of 235
Download to read offline
User:ClueBot NG
HotCat
Engineering at the
Intersection of Productive
Efficiency, Ideology, and
Ethical AI in Wikipedia
Wikimedia
Research
Aaron Halfaker
ahalfaker@wikimedia.org
Planet
Earth
(6.97 billion)
data from March 2012
Planet
Earth
(6.97 billion)
data from March 2012
1:10
Monthly
Wikipedia
Readers
(500 million)
Monthly Wikipedia Editors
(113,304)
Planet
Earth
(6.97 billion)
data from March 2012
1:10000
1:10
Monthly
Wikipedia
Readers
(500 million)
Monthly Wikipedia Editors
(113,304)
Planet
Earth
(6.97 billion)
data from March 2012
Monthly
Wikipedia
Readers
(500 million)
1:10
1:10000
Monthly Wikipedia Editors
(113,304)
Planet
Earth
(6.97 billion)
data from March 2012
Monthly
Wikipedia
Readers
(500 million)
1:10
1:10000
[edit]
Monthly Wikipedia Editors
(100,000)
1:10,000
Wikimedia Foundation Staff
(~225)
1:1000
Monthly Wikipedia
Readers (500M)
Wikimedia
Research
Story Time!
1. Wikipedia as a socio-technical system
○ Systems-thinking & Biological metaphors ~
1. Wikipedia as a socio-technical system
○ Systems-thinking & Biological metaphors
2. Critique of algorithmic quality control
○ Standpoint epistemology
○ Encoding of ideology in technology
~
1. Wikipedia as a socio-technical system
○ Systems-thinking & Biological metaphors
2. Critique of algorithmic quality control
○ Standpoint epistemology
○ Encoding of ideology in technology
3. Infrastructure for socio-technical change
○ “progress catalyst”
○ Hearing to speech vs. Speaking to be heard
○ The dangers of “subjective algorithms”
~
Part 1. The
socio-technical of
Wikipedia
What is Wikipedia?
The world’s largest encyclopedia
~5 million articles
in English
A wiki
● Anyone can edit
● Shared authorship
Flipped publication model
● publish first
● review later (maybe)
An online community
● ~100k active volunteer editors
WikiProject Video
Games
WikiProject Medicine
A system
Available
Human
Attention
A system
Available
Human
Attention
Input
Output
A system
Available
Human
Attention
Input
Output
Socio-technical
Social Technical
https://commons.wikimedia.org/wiki/File:Commodore_P
ET_4016,_Google_NY_office_computer_museum.jpg
https://commons.wikimedia.org/wiki/File:Kadokawa_office_(2).jpg
Social Technical
https://commons.wikimedia.org/wiki/File:Marcus_Bai
ns_Line.png https://commons.wikimedia.org/wiki/File:Balsa-gpg.png
Socio-technical
Technologist
Techno-biologist?
Bacterium Paramecium
Bacterium Paramecium
DNA
ribosomes
vacuole
Bacterium Paramecium
DNA
ribosomes
vacuole
vacuole
contractile vacuole
endoplasmic
reticulum
oral groove
macronucleus
micronucleus
Bacterium Paramecium
Bacterium Paramecium
Dunbar’s number: ~150
Via commons. CC-BY-SA 2.0
Bacterium ParameciumBare-amecium Organelles
Robots
& tools
Policies &
Guidelines
socio - technical
socio-technical
~
System with specialized
sub-systems
System with specialized
sub-systems
Work allocation
● Identify, prioritize and assign tasks
Work allocation
● Largely for free due to Linus’ Law
Linus’s Law?
Linus’s Law?
Eric
Raymond
Linus’s Law?
“given enough eyeballs, all bugs are shallow”
Linus’s Law?
“given enough eyeballs, all bugs are shallow”
yessss…
bugssss….
Linus’s Law?
“given enough eyeballs, all bugs are shallow”
Linus’s Law?
“given enough eyeballs, all bugs are shallow”
!
Linus’s Law?
visibility is critical to open collaboration
A corollary for Wikipedia
Given that enough people see an incomplete
article, all potential contributions to that article
will be easy for someone.
!
Works in theory -- Stvilia, B., et al. (2008). Information quality work organization in
Wikipedia. JASIST, 59(6), 983-1001.
Part of becoming a Wikipedian -- Bryant, S. L., Forte, A., & Bruckman, A. (2005,
November). Becoming Wikipedian. GROUP (pp. 1-10). ACM.
We can support visibility with technology -- Cosley, D., et al. (2007, January).
SuggestBot: using intelligent task routing to help people find work in wikipedia. IUI
(pp. 32-41). ACM.
Bad things happen when we take it away -- Schneider, J., et al. (2014, August).
Accept, decline, postpone. OpenSym(p. 26). ACM.
Work allocation
● Identify, prioritize and assign tasks
Regulation of behavior
● Norm formation, propagation and enforcement
Prescriptive
Idea for a Norm
Descriptive
Observation of
consistent behavior
Prescriptive
Idea for a Norm
Descriptive
Observation of
consistent behavior
ESSAY
Prescriptive
Idea for a Norm
Descriptive
Observation of
consistent behavior
Requests for
Comments
ESSAY
rabble
rabble
rabble
Prescriptive
Idea for a Norm
Descriptive
Observation of
consistent behavior
Requests for
Comments
GUIDELINE
POLICY
ESSAY
rabble
rabble
rabble
GUIDELINE
POLICY
ESSAY
Informal Formalized
Morgan, J. T., & Zachry, M. (2010,
November). Negotiating with angry
mastodons: the wikipedia policy
environment as genre ecology. In
Proceedings of the 16th ACM
international conference on Supporting
group work (pp. 165-168). ACM.
Halfaker, A., Geiger, R. S., Morgan, J.
T., & Riedl, J. (2012). The rise and
decline of an open collaboration system:
How Wikipedia’s reaction to popularity is
causing its decline. American Behavioral
Scientist, 0002764212469365.
Informal Formalized
Growth of regulations
Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2012). The rise and decline of an open collaboration system: How Wikipedia’s
reaction to popularity is causing its decline. American Behavioral Scientist, 0002764212469365.
Growth of regulations
Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2012). The rise and decline of an open collaboration system: How Wikipedia’s
reaction to popularity is causing its decline. American Behavioral Scientist, 0002764212469365.
Growth of regulations
Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2012). The rise and decline of an open collaboration system: How Wikipedia’s
reaction to popularity is causing its decline. American Behavioral Scientist, 0002764212469365.
Work allocation
● Identify, prioritize and assign tasks
Regulation of behavior
● Norm formation, propagation and enforcement
Quality control
● Identify and remove damage
Quality control
Quality control
!
Quality control
!
Fully automated
Machine learning
● Fast (~ 5 seconds)[1]
● No human effort
● Only obvious vandalism
Semi automated
Human computation
● Still pretty fast (~ 30
seconds)[1]
● Minimizes human effort
● Humans catch most
vandalism at a glance
1. R. Stuart Geiger & Aaron Halfaker. When the Levee Breaks (2013). WikiSym.
Quality control
!
Fully automated
Semi automated
Banning (Admins)
Quality control
!
Fully automated
Semi automated
Banning (Admins)
Innate
- Fast
- General
- Local
Adaptive
- Slow
- Specific
- Global
Halfaker, A. & Riedl, J. (2012) Bots and Cyborgs,
IEEE Computing 45(3) (p. 79-82)
Geiger, R. S., & Ribes, D. (2010, February). The
work of sustaining order in wikipedia: the banning
of a vandal. CSCW (pp. 117-126). ACM.
Work allocation
● Identify, prioritize and assign tasks
Regulation of behavior
● Norm formation, propagation and enforcement
Quality control
● Identify and remove damage
Community management
● Newcomer socialization, dispute mediation & training
https://commons.wikimedia.org/wiki/File:Flickr_-_Official_U.S._Navy_Imagery_-_Sailor%27s_daug
hter_operates_a_fire_hose_with_crew_member_assistance..jpg
6,000 newcomers per day
https://commons.wikimedia.org/wiki/File:Flickr_-_Official_U.S._Navy_Imagery_-_Sailor%27s_daug
hter_operates_a_fire_hose_with_crew_member_assistance..jpg
Host
Bot
vandals
good-faith
6,000 newcomers per day
https://commons.wikimedia.org/wiki/File:Flickr_-_Official_U.S._Navy_Imagery_-_Sailor%27s_daug
hter_operates_a_fire_hose_with_crew_member_assistance..jpg
Host
Bot
good-faith
vandals
6,000 newcomers per day
Work allocation
● Identify, prioritize and assign tasks
Regulation of behavior
● Form, propagate and enforce norms
Quality control
● Identify and remove damage
Community management
● Socialize & train newcomers; mediate disputes
Reflection (Adaptation)
● Where are we going? Where do we want to go? How do we want to get there?
Reflection (Adaptation)
Where are we going?
Where do we want to go?
How do we get there?
?
Reflection
Adaptation
https://commons.wikimedia.org/wiki/File:Morphine_to_Apomorphine.png CC-SY-SA 4.0
https://commons.wikimedia.org/wiki/File:Morphine_to_Apomorphine.png CC-SY-SA 4.0
https://commons.wikimedia.org/wiki/File:Gear_reducer.gif CC-BY-SA 3.0
Norms
User:ClueBot NG
Available
Human
Attention
Part 2. Critique of
algorithmic quality
control
● :m:Research:The_Rise_and_Decline
● Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2012). The rise and decline of an open collaboration
system: How Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist,
0002764212469365.
https://commons.wikimedia.org/wiki/File:Flickr_-_Official_U.S._Navy_Imagery_-_Sailor%27s_daug
hter_operates_a_fire_hose_with_crew_member_assistance..jpg
User:ClueBot NG
The machine classifier
https://commons.wikimedia.org/wiki/File:Artificial.intelligence.jpg
Public Domain
The machine classifier
is_anon
chrs_added
chrs_removed
cust_comment
repeated_chrs
longest_token
badwords_added
The machine classifier
is_anon
chrs_added
chrs_removed
cust_comment
repeated_chrs
longest_token
badwords_added
?
BAD!
The machine classifier
is_anon
chrs_added
chrs_removed
cust_comment
repeated_chrs
longest_token
badwords_added
?
Good.
160k
Edits
Per
Day
10%
90%
Need review
Probably OK
Counter vandalism
Without machine prediction: Reviewing 160k edits per day...
267 Hours
(33 people * 8 hours)
With machine prediction: Reviewing 16k edits per day…
27 Hours
(4 people * 8 hours)
User:ClueBot NG
What happened?
Standpoints & objectivity
Standpoints & objectivity
Donna Haraway
Via commons CC-By_SA 3.0
Standpoints & objectivity
Donna Haraway
Via commons CC-By_SA 3.0
Male scientists
Female* scientists
Standpoints & objectivity
Donna Haraway
Via commons CC-By_SA 3.0
Male scientists
Female* scientists
Standpoints & objectivity
Donna Haraway
Via commons CC-By_SA 3.0
Male scientists
Female* scientists
Standpoints & objectivity
Donna Haraway
Via commons CC-By_SA 3.0
Male scientists
Female* scientists
Standpoints & objectivity
Donna Haraway
Via commons CC-By_SA 3.0
Male scientists
Female* scientists
● reproductive
competition
● dominance
● communication
● social grooming
Standpoints & objectivity
Your standpoint gives you a view of what’s
important & valuable. (comm vs. dominance)
Standpoints & objectivity
Your standpoint gives you a view of what’s
important & valuable. (comm vs. dominance)
An objectivity is what you construct to reify
this value. (grooming patterns vs. sex partners)
Standpoints & objectivity
Your standpoint gives you a view of what’s
important & valuable. (comm vs. dominance)
An objectivity is what you construct to reify
this value. (grooming patterns vs. sex partners)
Standpoints and objectivities can be merged
and extended.
Standpoints & objectivity
Donna Haraway
Via commons CC-By_SA 3.0
Male scientists
Female* scientists
● reproductive
competition
● dominance
● communication
● social grooming
Expanded standpoints
Multiple objectivities
Expanded understanding
● Wikipedia is a firehose
● Bad edits must be reverted
● Minimize effort wasted on
quality control work
Standpoint
● Wikipedia is a firehose
● Bad edits must be reverted
● Minimize effort wasted on
quality control work ClueBot NG
Standpoint Objectivity
● Wikipedia is a firehose
● Bad edits must be reverted
● Minimize effort wasted on
quality control work
MASSIVE SUCCESS!
Standpoint Objectivity
ClueBot NG
User:ClueBot NG
THE
INTERNET
RecentChanges
~90% reduction
in workload
267
hours
Reviewing all of the edits
That’s 33 people working
8 hours each.
27
hours
Reviewing edits WITH
MACHINE LEARNING.
That’s 4 people working
7 hours each.
Woah woah woah! All the
newcomers are leaving!
Suh, B., et al. (2009). The singularity is not
near: slowing growth of Wikipedia. WikiSym.
ACM.
Woah woah woah! All the
newcomers are leaving!
Suh, B., et al. (2009). The singularity is not
near: slowing growth of Wikipedia. WikiSym.
ACM.
We forgot to design for
socializing newcomers!
Halfaker, A. et al. (2012) The Rise and
Decline of an Open Collaboration
System. American Behavioral
Scientist.
8%
160k
2% Vandals
Doing fine90%
Newcomers who need help
Work allocation
Regulation of behavior
Quality control
Community management
Reflection (Adaptation)
Work allocation
Regulation of behavior
Quality control
Community management
Reflection (Adaptation)
Hey! Maybe we should also value newcomers
having a good experience -- and design for that.
● Wikipedia is a firehose
● Bad edits must be reverted
● Minimize effort wasted on
quality control work
● Wikipedia is a firehose ✓
● Bad edits must be reverted
● Minimize effort wasted on
quality control work
● Wikipedia is a firehose ✓
● Bad edits must be reverted ✓
● Minimize effort wasted on
quality control work
● Wikipedia is a firehose ✓
● Bad edits must be reverted ✓
● Minimize effort wasted on
quality control work ✓
● Wikipedia is a firehose ✓
● Bad edits must be reverted ✓
● Minimize effort wasted on
quality control work ✓
● Socialize & train the newcomers!
● More newcomers -- major Wikimedia
Foundation goal
● New spaces developed for supporting
newcomers
2012 - 2015 : The conversation incorporates
new standpoint
The Coop The Teahouse
Win?
User:ClueBot NG
THE
INTERNET
RecentChanges
~90% reduction
in workload
First, the sugar
Huggle is an amazing piece of software. I have no doubt it
represents the state of the art in distributed quality control. This
software and its users are responsible for critical work. It’s
developers and users are wonderful people. We owe them a lot
for their thankless work, so let me take this opportunity to say:
Thank You
Now for the medicine
Huggle circa 2009
Good
Bad
Huggle circa 2009
Good
Bad
Probably
bad
Probably
less bad
Huggle circa 2009
Huggle in 2015
Huggle in 2015Good Bad
Good Bad
Probably
bad
Probably
less bad
Huggle in 2015
Good Bad
Probably
bad
Probably
less bad
Welcome!
● Quality control in Wikipedia still not designed
with newcomer socialization in mind
● Quality control in Wikipedia still not designed
with newcomer socialization in mind
● Newcomers (especially those who don’t
conform) remain marginalized
● Quality control in Wikipedia still not designed
with newcomer socialization in mind
● Newcomers (especially those who don’t
conform) remain marginalized
● Still not seeing gains in the retention of
good-faith newcomers
Why?
Why?
Why did we see changes to some of the sub-systems in
Wikipedia, but not others?
Part 3: Infrastructure for
socio-technical change
The machine classifier
https://commons.wikimedia.org/wiki/File:Artificial.intelligence.jpg
Public Domain
BAD!
The machine classifier
is_anon
chrs_added
chrs_removed
cust_comment
repeated_chrs
longest_token
badwords_added
?
Good.
User:ClueBot NG
?? ?
User:ClueBot NG
?? ?
Wikipedia is a firehose ✓
Bad edits must be reverted ✓
Minimize effort wasted on
quality control work ✓
Socialize & train the newcomers! ✓
User:ClueBot NG
?? ?
Wikipedia is a firehose ✓
Bad edits must be reverted ✓
Minimize effort wasted on
quality control work ✓
Socialize & train the newcomers! ✓
?
User:ClueBot NG
?? ?
Wikipedia is a firehose ✓
Bad edits must be reverted ✓
Minimize effort wasted on
quality control work ✓
Socialize & train the newcomers! ✓
?
● 20+ research papers on
Wikipedia damage detection
● Machine classification not
part of standard CS degree
● Many volunteer tool devs
don’t have a CS degree
anyway.
Labor intensive, performance
considerations, etc.
User:ClueBot NG
?? ?
Wikipedia is a firehose ✓
Bad edits must be reverted ✓
Minimize effort wasted on
quality control work ✓
Socialize & train the newcomers! ✓
?
● 20+ research papers on
Wikipedia damage detection
● Machine classification not
part of standard CS degree
● Many wiki-tool devs don’t
have a CS degree anyway.
Labor intensive, performance
considerations, etc.
User:ClueBot NG
?? ?
Authored by computer scientists with extensive skills in
machine learning and distributed systems.
149F
72F
Progress ----->
Energy
Better quality
control &
socialization
Building
a machine
learning
models
User:ClueBot NG
Progress ----->
Energy
Progress catalyst
User:ClueBot NG
Better quality
control &
socialization
Progress ----->
Energy
Progress catalyst
User:ClueBot NG
Better quality
control &
socialization
Scoring Platform
User:ClueBot NG
?? ?
Scoring Platform
?
User:ClueBot NG
?
User:ClueBot NG
Host
Bot
Scoring Platform
?
User:ClueBot NG
Host
Bot
Scoring Platform
http://ores.wmflabs.org/scores/enwiki/damaging/638307884
http://ores.wmflabs.org/scores/enwiki/damaging/638307884
“English Wikipedia”
http://ores.wmflabs.org/scores/enwiki/damaging/638307884
Is this edit damaging?
http://ores.wmflabs.org/scores/enwiki/damaging/638307884
http://ores.wmflabs.org/scores/enwiki/damaging/638307884
http://ores.wmflabs.org/scores/enwiki/damaging/638307884
"638307884": {
"prediction": false,
"probability": {
"false": 0.942,
"true": 0.058
}
}
http://ores.wmflabs.org/scores/enwiki/damaging/642215410
http://ores.wmflabs.org/scores/enwiki/damaging/638307884
"638307884": {
"prediction": false,
"probability": {
"false": 0.942,
"true": 0.058
}
}
"638307884": {
"prediction": false,
"probability": {
"false": 0.942,
"true": 0.058
}
}
http://ores.wmflabs.org/scores/enwiki/damaging/642215410
http://ores.wmflabs.org/scores/enwiki/damaging/638307884
"642215410": {
"prediction": true,
"probability": {
"false": 0.080,
"true": 0.920
}
}
● Fast (median delay 0.5 seconds)
● Scale-able & redundant
● Comparable to the state-of-the-art
Progress ----->
Energy
~20,000 lines of code +
1 advanced degree in
Computer Science
Connect to
ORES via
HTTP
Better quality
control
User:ClueBot NG
~20,000 lines of code +
1 advanced degree in
Computer Science
Connect to
ORES via
HTTP
User:ClueBot NG
Progress ----->
Energy
Progress ----->
Energy
~20,000 lines of code +
1 advanced degree in
Computer Science
Connect to
ORES via
HTTP
User:ClueBot NG
Progress ----->
Energy
~20,000 lines of code +
1 advanced degree in
Computer Science
Connect to
ORES via
HTTP
User:ClueBot NG
- 30 volunteer
developed tools
- 3 major Wikimedia
Product initiatives
- Some really cool
data science
projects!
Part 3.5: Feminist
inspiration
Critique → Design
“Subjective Algorithms”
“Subjective Algorithms”"algorithms, often aided
by big data, now make
decisions in subjective
realms where there is no
right decision, and no
anchor with which to
judge outcomes."
Tufekci, Z. (2015). Algorithms in our Midst: Information, Power
and Choice when Software is Everywhere. CSCW (pp.
1918-1918). ACM.
“Subjective algorithms”
https://commons.wikimedia.org/wiki/File:Zeynep_Tufekci_2_(5805394290).jpg
“Subjective Algorithms”
What is good? relevant?
important? desirable? valuable?
"algorithms, often aided
by big data, now make
decisions in subjective
realms where there is no
right decision, and no
anchor with which to
judge outcomes."
Tufekci, Z. (2015). Algorithms in our Midst: Information, Power
and Choice when Software is Everywhere. CSCW (pp.
1918-1918). ACM.
“Subjective algorithms”
https://commons.wikimedia.org/wiki/File:Zeynep_Tufekci_2_(5805394290).jpg
Who is allowed to participate? Who gets labeled “bad-faith”?
What types of contributions will be labeled “damaging”?
BAD!
is_anon
chrs_added
chrs_removed
cust_comment
repeated_chrs
longest_token
badwords_added
?
Good.
BAD!
and some important
good stuff
is_anon
chrs_added
chrs_removed
cust_comment
repeated_chrs
longest_token
badwords_added
?
Good.
and some important bad
stuff
Please exercise extreme caution to avoid
encoding racism or other biases into an AI
scheme. [...] Wnt (talk) 12:58, 20 February
2015 (UTC)
From Wikipedia:Wikipedia_Signpost/2015-02-18/Special_report
Please exercise extreme caution to avoid
encoding racism or other biases into an AI
scheme. [...] Wnt (talk) 12:58, 20 February
2015 (UTC)
From Wikipedia:Wikipedia_Signpost/2015-02-18/Special_report
?
Civil
Harassment
Please exercise extreme caution to avoid
encoding racism or other biases into an AI
scheme. [...] Wnt (talk) 12:58, 20 February
2015 (UTC)
From Wikipedia:Wikipedia_Signpost/2015-02-18/Special_report
?
and messages from South Africans
Two stories
Damaging edit
is_anon
chrs_added
chrs_removed
cust_comment
repeated_chrs
longest_token
badwords_added
?
Good edit
Two stories
Damaging edit
● The Italian word “ha”
● Anonymous editors
is_anon
chrs_added
chrs_removed
cust_comment
repeated_chrs
longest_token
badwords_added
?
Good edit
The Italian “ha”
Literally: Not a laughing matter
:it:Progetto:Patrolling/ORES
:m:Talk:ORES#Checklist for itwiki setup.
Damaging edit
...
informals_added
badwords_added
...
?
Good edit
Damaging edit
...
informals_added
badwords_added
...
?
Good edit
Badwords: Curse words, racial slurs and other offensive terminology
Informals: Casual speak that would be welcome on a talk page, but not within an
article.
Damaging edit
...
informals_added
badwords_added
...
?
Good edit
Badwords: Curse words, racial slurs and other offensive terminology
Informals: Casual speak that would be welcome on a talk page, but not within an
article.
For example "hello" or "hahaha".
Badwords: Curse words, racial slurs and other offensive terminology
Informals: Casual speak that would be welcome on a talk page, but not within an
article.
Damaging edit
...
informals_added
badwords_added
...
?
Good edit
For example "hello" or "hahaha".
revscoring/languages/english.py#L163 revscoring/languages/tests/test_english.py#L103
revscoring/languages/english.py#L163 revscoring/languages/tests/test_english.py#L103
“Ha” is laughing in English, but “Ha” is not laughing in Italian!
… <snip> ...
Anonymous editors
Anonymous editors
Otherwise, anons seemed to dominate false-positive reports from
every wiki
… maybe anons are really
bad.
… maybe anons are really
bad.
● Generally, anon edits are twice as likely to be
vandalism
… maybe anons are really
bad.
● Generally, anon edits are twice as likely to be
vandalism
● 90% of anonymous edits are good
https://ores.wmflabs.org/v2/scores/enwiki/damaging/642345235?feature.revision.user.is_anon=false
https://ores.wmflabs.org/v2/scores/enwiki/damaging/642345235?feature.revision.user.is_anon=false
{"prediction": false,
"probability": {"false": 0.656,
" true": 0.344}}
https://ores.wmflabs.org/v2/scores/enwiki/damaging/642345235?feature.revision.user.is_anon=false
{"prediction": false,
"probability": {"false": 0.656,
" true": 0.344}}
https://ores.wmflabs.org/v2/scores/enwiki/damaging/642345235?feature.revision.user.is_anon=true
{"prediction": false,
"probability": {"false": 0.541,
" true": 0.459}}
https://ores.wmflabs.org/v2/scores/enwiki/damaging/642345235?feature.revision.user.is_anon=false
{"prediction": false,
"probability": {"false": 0.656,
" true": 0.344}}
https://ores.wmflabs.org/v2/scores/enwiki/damaging/642345235?feature.revision.user.is_anon=true
{"prediction": false,
"probability": {"false": 0.541,
" true": 0.459}}
Just by being “anon”, we score this edit 11.5%
more likely to be damaging to the article.
Anon editor
{"feature.revision.user.is_anon": true,
"feature...seconds_since_registration": 0,
"feature.revision.user.has_advanced_rights": false,
"feature.revision.user.is_admin": false,
"feature.revision.user.is_bot": false,
"feature.revision.user.is_curator": false}
New editor (2h since registration)
{"feature.revision.user.is_anon": false,
"feature...seconds_since_registration": 18000,
"feature.revision.user.has_advanced_rights": false,
"feature.revision.user.is_admin": false,
"feature.revision.user.is_bot": false,
"feature.revision.user.is_curator": false}
User:EpochFail (8 years since registration)
{"feature.revision.user.is_anon": false,
"feature...seconds_since_registration": 257995021,
"feature.revision.user.has_advanced_rights": false,
"feature.revision.user.is_admin": false,
"feature.revision.user.is_bot": false,
"feature.revision.user.is_curator": false}
User classesModeling strategies
https://en.wikipedia.org/wiki/Gradient_boosting
https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM
Gradient Boosting Linear SVMNatural
user
class
Gradient Boosting Linear SVMNatural
user
class
Gradient Boosting Linear SVMNatural
user
class
???
Gradient Boosting Linear SVMNatural
user
class
OK. What if every edit was anon?
Gradient Boosting Linear SVMAnon
user
class
Gradient Boosting Linear SVMAnon
user
class
Gradient Boosting Linear SVMAnon
user
class
What if every edit were from a
newcomer?
Gradient Boosting Linear SVMNewbie
user
class
Gradient Boosting Linear SVMNewbie
user
class
Gradient Boosting Linear SVMNewbie
user
class
What if I (EpochFail) saved all the
edits?
Gradient Boosting Linear SVMEpochFail
user
class
Gradient Boosting Linear SVMEpochFail
user
class
Gradient Boosting Linear SVMEpochFail
user
class
Gradient Boosting Linear SVMEpochFail
user
class
Mwahahahaha!
Dec. 2015:
- Gradient boosting deployed
Dec. 2015:
- Gradient boosting deployed
Still needs work:
- Bias against anons/newcomers lessened,
but not gone
- New sources of signal
- HashingVectorization
- Probabilistic Context-free Grammars
1. Wikipedia as a socio-technical system
○ Systems-thinking & Biological metaphors ~
1. Wikipedia as a socio-technical system
○ Systems-thinking & Biological metaphors
2. Critique of algorithmic quality control
○ Standpoint epistemology
○ Encoding of ideology in technology
~
1. Wikipedia as a socio-technical system
○ Systems-thinking & Biological metaphors
2. Critique of algorithmic quality control
○ Standpoint epistemology
○ Encoding of ideology in technology
3. Infrastructure for socio-technical change
○ “progress catalyst”
○ Hearing to speech vs. Speaking to be heard
○ The dangers of “subjective algorithms”
~
Thanks!
SCIENCE
SCIENCE
SCIENCE
SCIENCE
SCIENCE
SCIENCE SCIENCE
Props to my collaborators
● User:Ladsgroup
● User:Staeiou
● User:He7d3r
● User:とある白い猫
● User:Awight
● User:YuviPanda
● User:Danilo.mac
● User:Jtmorgan
● User:Aetilley
● User:Riedl
Aaron Halfaker
ahalfaker@wikimedia.org
enwp.org/User:EpochFail
https://twitter.com/halfak
● Communities as living, adaptive systems
● Algorithms in social spaces
● Empowerment as technological intervention
● Machine learning support for collaborative
work
Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, Ideology, and Ethical AI in Wikipedia

More Related Content

Similar to Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, Ideology, and Ethical AI in Wikipedia

Wikimedia Foundation presentation to MS Technology Management students at Uni...
Wikimedia Foundation presentation to MS Technology Management students at Uni...Wikimedia Foundation presentation to MS Technology Management students at Uni...
Wikimedia Foundation presentation to MS Technology Management students at Uni...webbyj
 
Strategicplan2011
Strategicplan2011Strategicplan2011
Strategicplan2011Kim Ross
 
Intervention Strategies for Increasing Engagement in Crowdsourcing
Intervention Strategies for Increasing Engagement in CrowdsourcingIntervention Strategies for Increasing Engagement in Crowdsourcing
Intervention Strategies for Increasing Engagement in CrowdsourcingSmart-Society-Project
 
Open access for researchers and research managers
Open access  for researchers and research managersOpen access  for researchers and research managers
Open access for researchers and research managersIryna Kuchma
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Don Pellegrino
 
Repurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical deviceRepurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical deviceDigital Methods Initiative
 
Descending Mount Everest. Steps towards applied Wikipedia research
Descending Mount Everest. Steps towards applied Wikipedia researchDescending Mount Everest. Steps towards applied Wikipedia research
Descending Mount Everest. Steps towards applied Wikipedia researchDario Taraborelli
 
Wikis For Nonprofits
Wikis For NonprofitsWikis For Nonprofits
Wikis For NonprofitsJulie Spriggs
 
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Crawling and Scraping tutorial at the Digital Methods Summer School 2013Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Crawling and Scraping tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
Noshir Contractor - WebSci'09 - Society On-Line
Noshir Contractor - WebSci'09 - Society On-LineNoshir Contractor - WebSci'09 - Society On-Line
Noshir Contractor - WebSci'09 - Society On-Linebutest
 
Crowdsourcing and the NIST Digital Archives: Using the 'crowd' to describe NI...
Crowdsourcing and the NIST Digital Archives: Using the 'crowd' to describe NI...Crowdsourcing and the NIST Digital Archives: Using the 'crowd' to describe NI...
Crowdsourcing and the NIST Digital Archives: Using the 'crowd' to describe NI...Andrea Medina-Smith
 
The MGI and AI
The MGI and AIThe MGI and AI
The MGI and AIaimsnist
 
Calit2: An Experiment in Social Networks
Calit2: An Experiment in Social NetworksCalit2: An Experiment in Social Networks
Calit2: An Experiment in Social NetworksLarry Smarr
 
NASA CoECI Presentaion on Crowdsourcing and Challenges
NASA CoECI Presentaion on Crowdsourcing and ChallengesNASA CoECI Presentaion on Crowdsourcing and Challenges
NASA CoECI Presentaion on Crowdsourcing and ChallengesSteve Rader
 
Michael Edson, Relevance, Existence, and Smithsonian Strategy, for OCLC "Web ...
Michael Edson, Relevance, Existence, and Smithsonian Strategy, for OCLC "Web ...Michael Edson, Relevance, Existence, and Smithsonian Strategy, for OCLC "Web ...
Michael Edson, Relevance, Existence, and Smithsonian Strategy, for OCLC "Web ...Michael Edson
 
digital scholarship: how open publication and co-creation could transform sci...
digital scholarship: how open publication and co-creation could transform sci...digital scholarship: how open publication and co-creation could transform sci...
digital scholarship: how open publication and co-creation could transform sci...@cristobalcobo
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?petermurrayrust
 

Similar to Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, Ideology, and Ethical AI in Wikipedia (20)

Wikimedia Foundation presentation to MS Technology Management students at Uni...
Wikimedia Foundation presentation to MS Technology Management students at Uni...Wikimedia Foundation presentation to MS Technology Management students at Uni...
Wikimedia Foundation presentation to MS Technology Management students at Uni...
 
Wikipedia l
Wikipedia lWikipedia l
Wikipedia l
 
Strategicplan2011
Strategicplan2011Strategicplan2011
Strategicplan2011
 
Intervention Strategies for Increasing Engagement in Crowdsourcing
Intervention Strategies for Increasing Engagement in CrowdsourcingIntervention Strategies for Increasing Engagement in Crowdsourcing
Intervention Strategies for Increasing Engagement in Crowdsourcing
 
Open access for researchers and research managers
Open access  for researchers and research managersOpen access  for researchers and research managers
Open access for researchers and research managers
 
Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...Interactive Visualization Systems and Data Integration Methods for Supporting...
Interactive Visualization Systems and Data Integration Methods for Supporting...
 
Repurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical deviceRepurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical device
 
Descending Mount Everest. Steps towards applied Wikipedia research
Descending Mount Everest. Steps towards applied Wikipedia researchDescending Mount Everest. Steps towards applied Wikipedia research
Descending Mount Everest. Steps towards applied Wikipedia research
 
Wikis For Nonprofits
Wikis For NonprofitsWikis For Nonprofits
Wikis For Nonprofits
 
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Crawling and Scraping tutorial at the Digital Methods Summer School 2013Crawling and Scraping tutorial at the Digital Methods Summer School 2013
Crawling and Scraping tutorial at the Digital Methods Summer School 2013
 
Noshir Contractor - WebSci'09 - Society On-Line
Noshir Contractor - WebSci'09 - Society On-LineNoshir Contractor - WebSci'09 - Society On-Line
Noshir Contractor - WebSci'09 - Society On-Line
 
British Library Datasets Programme Feb 2011
British Library Datasets Programme Feb 2011British Library Datasets Programme Feb 2011
British Library Datasets Programme Feb 2011
 
Crowdsourcing and the NIST Digital Archives: Using the 'crowd' to describe NI...
Crowdsourcing and the NIST Digital Archives: Using the 'crowd' to describe NI...Crowdsourcing and the NIST Digital Archives: Using the 'crowd' to describe NI...
Crowdsourcing and the NIST Digital Archives: Using the 'crowd' to describe NI...
 
The MGI and AI
The MGI and AIThe MGI and AI
The MGI and AI
 
Calit2: An Experiment in Social Networks
Calit2: An Experiment in Social NetworksCalit2: An Experiment in Social Networks
Calit2: An Experiment in Social Networks
 
NASA CoECI Presentaion on Crowdsourcing and Challenges
NASA CoECI Presentaion on Crowdsourcing and ChallengesNASA CoECI Presentaion on Crowdsourcing and Challenges
NASA CoECI Presentaion on Crowdsourcing and Challenges
 
Michael Edson, Relevance, Existence, and Smithsonian Strategy, for OCLC "Web ...
Michael Edson, Relevance, Existence, and Smithsonian Strategy, for OCLC "Web ...Michael Edson, Relevance, Existence, and Smithsonian Strategy, for OCLC "Web ...
Michael Edson, Relevance, Existence, and Smithsonian Strategy, for OCLC "Web ...
 
digital scholarship: how open publication and co-creation could transform sci...
digital scholarship: how open publication and co-creation could transform sci...digital scholarship: how open publication and co-creation could transform sci...
digital scholarship: how open publication and co-creation could transform sci...
 
Chicago School of Data Book
Chicago School of Data BookChicago School of Data Book
Chicago School of Data Book
 
Can machines understand the scientific literature?
Can machines understand the scientific literature?Can machines understand the scientific literature?
Can machines understand the scientific literature?
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dashnarutouzumaki53779
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Visualising and forecasting stocks using Dash
Visualising and forecasting stocks using DashVisualising and forecasting stocks using Dash
Visualising and forecasting stocks using Dash
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Quora ML Workshop: Engineering at the Intersection of Productive Efficiency, Ideology, and Ethical AI in Wikipedia