SlideShare a Scribd company logo
1 of 30
Disinformation on the Web:
Impact, Characteristics and
Detection of Wikipedia Hoaxes
Srijan Kumar Univ. of Maryland
Robert West Stanford Univ.
Jure Leskovec Stanford Univ.
1
Originally presented at the 25th International World Wide Web Conference,
Montreal, Canada, April 2016
Web: Source of information
2
62% adults
in U.S.A.
rely on
social media
for news
28% of 18-
24 year olds
use social
media as
primary
news source
Web: Source of false information
3
Types of false information
4
Misinformation
honest mistake
Disinformation
deliberate lie to mislead
Hoax
“deliberately
fabricated falsehood
made to masquerade
as truth”
Wikipedia
Why Wikipedia?
The free encyclopedia that anyone can edit
5
Easy to add (false)
information
• Freely accessible
• Large reach
• Major source of
information for
many
Hoaxes on Wikipedia
6
Data: Wikipedia Hoaxes
Hoax article vs hoax facts
7
Data: Wikipedia Hoaxes
Hoax article vs hoax facts
21,218 hoax articles
8
Hoax lifecycle:
Wikipedia hoaxes
9
Impact
of hoaxes
Characteristics
of hoaxes
Detection
of hoaxes
Quantify their
impact?
What are the
hoaxes like?
Can we find
them?
Impact of hoaxes
“The worst hoaxes are those which
(a) last for a long time,
(b) receive significant traffic,
(c) are relied upon by credible news media.”
Jimmy Wales on Quora
10
Impact of hoaxes
“The worst hoaxes are those which
(a) last for a long time”
11
Time t between patrolling and flagging
0.990.90
Impact of hoaxes
“The worst hoaxes are those which
(b) receive significant traffic”
12
10 100 500
Number n of pageviews per day
Impact of hoaxes
“The worst hoaxes are those which
(c) are relied upon by credible news media”
13
1.08
active inlinks
per hoax article,
on average
7% of hoax
articles have at
least 5
active inlinks
Wikipedia hoaxes
14
Impact
of hoaxes
Characteristics
of hoaxes
Detection
of hoaxes
Most hoaxes are
caught soon, but
some hoaxes are
impactful
What are the
hoaxes like?
Can we find
them?
15
Successful hoax
pass patrol
survive for a month
viewed 100+/day
Failed hoax
flagged and
deleted during
patrol
Wrongly flagged
temporarily flagged
Legitimate
articles
never flagged
Hoax
Non-hoax
Characteristics of hoaxes
16
Appearance:
how the article
looks
Link-network:
how the article
connects
Support:
how other
articles refer to it
Editor:
how the article
creator looks
Characteristics of hoaxes
17
Surprisingly, hoax articles are
longer than non-hoax articles!
Features:
o Plain-text length
Appearance:
how the article
looks
Link-network:
how the article
connects
Support:
how other
articles refer to it
Editor:
how the article
creator looks
Characteristics of hoaxes
18
Surprisingly, hoax articles are
longer than non-hoax articles!
but
they mostly have plain text and
have fewer web and wiki links.
Appearance:
how the article
looks
Link-network:
how the article
connects
Support:
how other
articles refer to it
Editor:
how the article
creator looks
Features:
o Plain-text length
o Plain-text-to-markup ratio
o Wiki-link density
o Web-link density
Characteristics of hoaxes
19
Clustering coefficient = 0
incoherent article
Clustering coefficient > 0
coherent article
Legitimate articles are more
coherent than successful hoaxes
Appearance:
hoaxes mostly
have text and
few references.
Link-network:
how the article
connects
Support:
how other
articles refer to it
Editor:
how the article
creator looks
Characteristics of hoaxes
20
Hoax mentions are less in number.
Features:
o Number of prior mentions
Appearance:
hoaxes mostly
have text and
few references.
Link-network:
hoaxes have
incoherent
wikilinks.
Support:
how other articles
refer to it
Editor:
how the article
creator looks
Characteristics of hoaxes
21
Hoax mentions are less in number,
mostly created by article creator or
anonymously, and are more
recently created.
Features:
o Number of prior mentions
o Creator of first mention
o Time since first mention
Appearance:
hoaxes mostly
have text and
few references.
Link-network:
hoaxes have
incoherent
wikilinks.
Support:
how other articles
refer to it
Editor:
how the article
creator looks
Characteristics of hoaxes
22
Hoax creators are more recently
registered, and
have lesser editing experience.
Features:
o Creator’s time since registration
o Creator’s experience
Appearance:
hoaxes mostly
have text and
few references.
Link-network:
hoaxes have
incoherent
wikilinks.
Support:
hoaxes have few,
recent, suspicious
mentions.
Editor:
how the article
creator looks
Wikipedia Hoaxes
23
Impact
of hoaxes
Characteristics
of hoaxes
Detection
of hoaxes
Hoaxes are
different from non-
hoaxes in many
respects
Most hoaxes are
caught soon, but
some hoaxes are
impactful
Can we find
them?
Detection of hoaxes
24
Will a hoax get
past patrol?
Is an article
a hoax?
Is an article flagged
as hoax really one?
AUC = 71%
Appearance
features
AUC = 98%
Editor and
Network features
AUC = 86%
Editor and
support features
We discovered previously unknown hoaxes!
25
Flagged by us and deleted by Wikipedia administrators
Steve Moertel
American
popcorn
entrepreneur
Article survived over
6 years 11 months!
Can readers identify hoaxes?
26
Results
320 random hoax and non-hoax pairs
10 raters on Amazon Mechanical Turk rated each pair
Casual readers are gullible to hoaxes.
Accurate detection needs non-appearance features.
50%
Random
66%
Human
86%
Classifier
What fools humans?
27
Humans get fooled when article looks more “genuine”,
and it is assumed to be credible.
Comparing easy- vs hard-to-identify hoaxes
How to identify misinformation on the web?
28
● Appearance
○ How well referenced is the information source?
○ What is the content of the article?
● Editor
○ Who created the information?
● Network
○ How related is this information to other information it
references to?
● Support
○ Is there any evidence of the information, prior to its
creation?
Wikipedia Hoaxes
29
Impact
of hoaxes
Characteristics
of hoaxes
Detection
of hoaxes
Hoaxes are
different from non-
hoaxes in many
respects
Most hoaxes are
caught soon, but
some hoaxes are
impactful
Non-appearance
features are
important to
detect hoaxes
Thank you!
srijan@cs.umd.edu
http://cs.umd.edu/~srijan

More Related Content

Viewers also liked

Zoekupdate 2017
Zoekupdate 2017Zoekupdate 2017
Zoekupdate 2017voginip
 
Findability of organizational knowledge
Findability of organizational knowledgeFindability of organizational knowledge
Findability of organizational knowledgevoginip
 
Gaat Artificial Intelligence helpen het zoeken verder te automatiseren?
Gaat Artificial Intelligence helpen het zoeken verder te automatiseren?Gaat Artificial Intelligence helpen het zoeken verder te automatiseren?
Gaat Artificial Intelligence helpen het zoeken verder te automatiseren?voginip
 
Informatie en politiek: informatie, data en macht in de 21ste eeuw
Informatie en politiek: informatie, data en macht in de 21ste eeuwInformatie en politiek: informatie, data en macht in de 21ste eeuw
Informatie en politiek: informatie, data en macht in de 21ste eeuwvoginip
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisationvoginip
 
Impact - the game
Impact - the gameImpact - the game
Impact - the gamevoginip
 
Rara, waar ben ik? - een introductie tot geolocatie, Bellingcat's belangrijks...
Rara, waar ben ik? - een introductie tot geolocatie, Bellingcat's belangrijks...Rara, waar ben ik? - een introductie tot geolocatie, Bellingcat's belangrijks...
Rara, waar ben ik? - een introductie tot geolocatie, Bellingcat's belangrijks...voginip
 
Efficiënt en systematisch zoeken in bibliografische databanken
Efficiënt en systematisch zoeken in bibliografische databankenEfficiënt en systematisch zoeken in bibliografische databanken
Efficiënt en systematisch zoeken in bibliografische databankenvoginip
 
Searching for reliable business information: free versus fee
Searching for reliable business information: free versus feeSearching for reliable business information: free versus fee
Searching for reliable business information: free versus feevoginip
 
En toen was er niets meer ....
En toen was er niets meer ....En toen was er niets meer ....
En toen was er niets meer ....voginip
 
Video search by deep-learning
Video search by deep-learningVideo search by deep-learning
Video search by deep-learningvoginip
 
Informatie vindbaar met metadata en taxonomieën vogin ip workshop 2017 joyce...
Informatie vindbaar met metadata en taxonomieën vogin ip workshop 2017 joyce...Informatie vindbaar met metadata en taxonomieën vogin ip workshop 2017 joyce...
Informatie vindbaar met metadata en taxonomieën vogin ip workshop 2017 joyce...Joyce van Aalten
 
The changing landscape of search for business information
The changing landscape of search for business informationThe changing landscape of search for business information
The changing landscape of search for business informationvoginip
 

Viewers also liked (13)

Zoekupdate 2017
Zoekupdate 2017Zoekupdate 2017
Zoekupdate 2017
 
Findability of organizational knowledge
Findability of organizational knowledgeFindability of organizational knowledge
Findability of organizational knowledge
 
Gaat Artificial Intelligence helpen het zoeken verder te automatiseren?
Gaat Artificial Intelligence helpen het zoeken verder te automatiseren?Gaat Artificial Intelligence helpen het zoeken verder te automatiseren?
Gaat Artificial Intelligence helpen het zoeken verder te automatiseren?
 
Informatie en politiek: informatie, data en macht in de 21ste eeuw
Informatie en politiek: informatie, data en macht in de 21ste eeuwInformatie en politiek: informatie, data en macht in de 21ste eeuw
Informatie en politiek: informatie, data en macht in de 21ste eeuw
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisation
 
Impact - the game
Impact - the gameImpact - the game
Impact - the game
 
Rara, waar ben ik? - een introductie tot geolocatie, Bellingcat's belangrijks...
Rara, waar ben ik? - een introductie tot geolocatie, Bellingcat's belangrijks...Rara, waar ben ik? - een introductie tot geolocatie, Bellingcat's belangrijks...
Rara, waar ben ik? - een introductie tot geolocatie, Bellingcat's belangrijks...
 
Efficiënt en systematisch zoeken in bibliografische databanken
Efficiënt en systematisch zoeken in bibliografische databankenEfficiënt en systematisch zoeken in bibliografische databanken
Efficiënt en systematisch zoeken in bibliografische databanken
 
Searching for reliable business information: free versus fee
Searching for reliable business information: free versus feeSearching for reliable business information: free versus fee
Searching for reliable business information: free versus fee
 
En toen was er niets meer ....
En toen was er niets meer ....En toen was er niets meer ....
En toen was er niets meer ....
 
Video search by deep-learning
Video search by deep-learningVideo search by deep-learning
Video search by deep-learning
 
Informatie vindbaar met metadata en taxonomieën vogin ip workshop 2017 joyce...
Informatie vindbaar met metadata en taxonomieën vogin ip workshop 2017 joyce...Informatie vindbaar met metadata en taxonomieën vogin ip workshop 2017 joyce...
Informatie vindbaar met metadata en taxonomieën vogin ip workshop 2017 joyce...
 
The changing landscape of search for business information
The changing landscape of search for business informationThe changing landscape of search for business information
The changing landscape of search for business information
 

Similar to Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

Fake news: Identifying, debunking and discussing false narratives with learners
Fake news: Identifying, debunking and discussing false narratives with learnersFake news: Identifying, debunking and discussing false narratives with learners
Fake news: Identifying, debunking and discussing false narratives with learnersLearningandTeaching
 
7.1 Evaluating Information7.2 Neo-Luddite Views of Compute.docx
7.1 Evaluating Information7.2 Neo-Luddite Views of Compute.docx7.1 Evaluating Information7.2 Neo-Luddite Views of Compute.docx
7.1 Evaluating Information7.2 Neo-Luddite Views of Compute.docxsleeperharwell
 
NASW Workshop: The Secret Life of Social Media
NASW Workshop: The Secret Life of Social MediaNASW Workshop: The Secret Life of Social Media
NASW Workshop: The Secret Life of Social MediaDennis Meredith
 
13 2 t4e_chapterthirteen_powerpointw_teays
13 2 t4e_chapterthirteen_powerpointw_teays13 2 t4e_chapterthirteen_powerpointw_teays
13 2 t4e_chapterthirteen_powerpointw_teayssagebennet
 
13 2 t4e_chapterthirteen_powerpointw_teays
13 2 t4e_chapterthirteen_powerpointw_teays13 2 t4e_chapterthirteen_powerpointw_teays
13 2 t4e_chapterthirteen_powerpointw_teayssagebennet
 
Quora ML Workshop: Sock Puppets and Hoaxes on the Web
Quora ML Workshop: Sock Puppets and Hoaxes on the WebQuora ML Workshop: Sock Puppets and Hoaxes on the Web
Quora ML Workshop: Sock Puppets and Hoaxes on the WebQuora
 
Fake news learner packet
Fake news learner packetFake news learner packet
Fake news learner packetPhill Briscoe
 
Marketing Essay (MBA)
Marketing Essay (MBA)Marketing Essay (MBA)
Marketing Essay (MBA)Connie Ripp
 
Science and Social Media: The Importance of Being Online
Science and Social Media: The Importance of Being OnlineScience and Social Media: The Importance of Being Online
Science and Social Media: The Importance of Being OnlineChristie Wilcox
 
Informationliteracy
InformationliteracyInformationliteracy
InformationliteracyYvonne M
 
Good Thoughts In English Essay - Pile Quotes
Good Thoughts In English Essay - Pile QuotesGood Thoughts In English Essay - Pile Quotes
Good Thoughts In English Essay - Pile QuotesAllison Thompson
 
Web 3.0 and Dutch journalism by Raymond Franz
Web 3.0 and Dutch journalism by Raymond FranzWeb 3.0 and Dutch journalism by Raymond Franz
Web 3.0 and Dutch journalism by Raymond Franzrafranz
 
Comments Are Terrible (But They Don't Have To Be)
Comments Are Terrible (But They Don't Have To Be)Comments Are Terrible (But They Don't Have To Be)
Comments Are Terrible (But They Don't Have To Be)Katie Steiner
 
High School Versus College Free Ess
High School Versus College  Free EssHigh School Versus College  Free Ess
High School Versus College Free EssElizabeth Allen
 
Definition Essay Sample Essay About Myself Introdu
Definition Essay Sample Essay About Myself IntroduDefinition Essay Sample Essay About Myself Introdu
Definition Essay Sample Essay About Myself IntroduMichelle Shaw
 
Write My Essay Online For Cheap Thesis For
Write My Essay Online For Cheap Thesis ForWrite My Essay Online For Cheap Thesis For
Write My Essay Online For Cheap Thesis ForMyel Ramos
 
Speaking different languages
Speaking different languagesSpeaking different languages
Speaking different languagesJake Orlowitz
 

Similar to Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes (20)

Dec 7 lecture
Dec 7 lectureDec 7 lecture
Dec 7 lecture
 
Fake news: Identifying, debunking and discussing false narratives with learners
Fake news: Identifying, debunking and discussing false narratives with learnersFake news: Identifying, debunking and discussing false narratives with learners
Fake news: Identifying, debunking and discussing false narratives with learners
 
7.1 Evaluating Information7.2 Neo-Luddite Views of Compute.docx
7.1 Evaluating Information7.2 Neo-Luddite Views of Compute.docx7.1 Evaluating Information7.2 Neo-Luddite Views of Compute.docx
7.1 Evaluating Information7.2 Neo-Luddite Views of Compute.docx
 
NASW Workshop: The Secret Life of Social Media
NASW Workshop: The Secret Life of Social MediaNASW Workshop: The Secret Life of Social Media
NASW Workshop: The Secret Life of Social Media
 
13 2 t4e_chapterthirteen_powerpointw_teays
13 2 t4e_chapterthirteen_powerpointw_teays13 2 t4e_chapterthirteen_powerpointw_teays
13 2 t4e_chapterthirteen_powerpointw_teays
 
13 2 t4e_chapterthirteen_powerpointw_teays
13 2 t4e_chapterthirteen_powerpointw_teays13 2 t4e_chapterthirteen_powerpointw_teays
13 2 t4e_chapterthirteen_powerpointw_teays
 
Quora ML Workshop: Sock Puppets and Hoaxes on the Web
Quora ML Workshop: Sock Puppets and Hoaxes on the WebQuora ML Workshop: Sock Puppets and Hoaxes on the Web
Quora ML Workshop: Sock Puppets and Hoaxes on the Web
 
The real scoop on fake news
The real scoop on fake newsThe real scoop on fake news
The real scoop on fake news
 
Fake news learner packet
Fake news learner packetFake news learner packet
Fake news learner packet
 
Marketing Essay (MBA)
Marketing Essay (MBA)Marketing Essay (MBA)
Marketing Essay (MBA)
 
Science and Social Media: The Importance of Being Online
Science and Social Media: The Importance of Being OnlineScience and Social Media: The Importance of Being Online
Science and Social Media: The Importance of Being Online
 
Informationliteracy
InformationliteracyInformationliteracy
Informationliteracy
 
Good Thoughts In English Essay - Pile Quotes
Good Thoughts In English Essay - Pile QuotesGood Thoughts In English Essay - Pile Quotes
Good Thoughts In English Essay - Pile Quotes
 
Julie Clegg
Julie CleggJulie Clegg
Julie Clegg
 
Web 3.0 and Dutch journalism by Raymond Franz
Web 3.0 and Dutch journalism by Raymond FranzWeb 3.0 and Dutch journalism by Raymond Franz
Web 3.0 and Dutch journalism by Raymond Franz
 
Comments Are Terrible (But They Don't Have To Be)
Comments Are Terrible (But They Don't Have To Be)Comments Are Terrible (But They Don't Have To Be)
Comments Are Terrible (But They Don't Have To Be)
 
High School Versus College Free Ess
High School Versus College  Free EssHigh School Versus College  Free Ess
High School Versus College Free Ess
 
Definition Essay Sample Essay About Myself Introdu
Definition Essay Sample Essay About Myself IntroduDefinition Essay Sample Essay About Myself Introdu
Definition Essay Sample Essay About Myself Introdu
 
Write My Essay Online For Cheap Thesis For
Write My Essay Online For Cheap Thesis ForWrite My Essay Online For Cheap Thesis For
Write My Essay Online For Cheap Thesis For
 
Speaking different languages
Speaking different languagesSpeaking different languages
Speaking different languages
 

More from voginip

Zo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko BoonstraZo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko Boonstravoginip
 
Automatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingenAutomatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingenvoginip
 
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerkingHybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerkingvoginip
 
Solving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source TechniquesSolving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source Techniquesvoginip
 
PiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar makenPiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar makenvoginip
 
Red het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimteRed het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimtevoginip
 
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)voginip
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
The Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical ResearchThe Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical Researchvoginip
 
Oude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en WikipediaOude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en Wikipediavoginip
 
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...voginip
 
Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?voginip
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...voginip
 
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...voginip
 
Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!voginip
 
Systematisch zoeken op het web
Systematisch zoeken op het webSystematisch zoeken op het web
Systematisch zoeken op het webvoginip
 
Grote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als dataGrote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als datavoginip
 
Werken met Wikidata
Werken met WikidataWerken met Wikidata
Werken met Wikidatavoginip
 
Een gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardighedenEen gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardighedenvoginip
 
Een startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat nietEen startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat nietvoginip
 

More from voginip (20)

Zo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko BoonstraZo wordt je factchecker - Aafko Boonstra
Zo wordt je factchecker - Aafko Boonstra
 
Automatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingenAutomatisch metadateren - de kansen en de uitdagingen
Automatisch metadateren - de kansen en de uitdagingen
 
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerkingHybride Intelligentie: de rol van Large Language Models in informatieverwerking
Hybride Intelligentie: de rol van Large Language Models in informatieverwerking
 
Solving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source TechniquesSolving World War II Photo Mysteries with Open Source Techniques
Solving World War II Photo Mysteries with Open Source Techniques
 
PiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar makenPiCo: Historische personen beter vindbaar maken
PiCo: Historische personen beter vindbaar maken
 
Red het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimteRed het internet! Op weg naar de online publieke ruimte
Red het internet! Op weg naar de online publieke ruimte
 
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
AI en IP (Artificieele Intelligentie en Intellectueel Eigendom)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
The Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical ResearchThe Dark Side of Science: Misconduct in Biomedical Research
The Dark Side of Science: Misconduct in Biomedical Research
 
Oude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en WikipediaOude boeken, nieuwe vaardigheden en Wikipedia
Oude boeken, nieuwe vaardigheden en Wikipedia
 
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
De kracht van samenwerking: hoe de Universiteitsbibliotheek Gent open kennisc...
 
Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?Open yet everywhere in chains: Where next for open knowledge?
Open yet everywhere in chains: Where next for open knowledge?
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...
 
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
Vijf vindbaarheidsproblemen waar een taxonomie de schuld van krijgt (maar nik...
 
Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!Why one-size-fits all does not work in Explainable Artificial Intelligence!
Why one-size-fits all does not work in Explainable Artificial Intelligence!
 
Systematisch zoeken op het web
Systematisch zoeken op het webSystematisch zoeken op het web
Systematisch zoeken op het web
 
Grote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als dataGrote hoeveelheden tekst analyseren als data
Grote hoeveelheden tekst analyseren als data
 
Werken met Wikidata
Werken met WikidataWerken met Wikidata
Werken met Wikidata
 
Een gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardighedenEen gereedschapskist voor digitale vaardigheden
Een gereedschapskist voor digitale vaardigheden
 
Een startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat nietEen startende éénpitter in informatieland: wat goed ging en wat niet
Een startende éénpitter in informatieland: wat goed ging en wat niet
 

Recently uploaded

Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 

Recently uploaded (20)

Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 

Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes

  • 1. Disinformation on the Web: Impact, Characteristics and Detection of Wikipedia Hoaxes Srijan Kumar Univ. of Maryland Robert West Stanford Univ. Jure Leskovec Stanford Univ. 1 Originally presented at the 25th International World Wide Web Conference, Montreal, Canada, April 2016
  • 2. Web: Source of information 2 62% adults in U.S.A. rely on social media for news 28% of 18- 24 year olds use social media as primary news source
  • 3. Web: Source of false information 3
  • 4. Types of false information 4 Misinformation honest mistake Disinformation deliberate lie to mislead Hoax “deliberately fabricated falsehood made to masquerade as truth” Wikipedia
  • 5. Why Wikipedia? The free encyclopedia that anyone can edit 5 Easy to add (false) information • Freely accessible • Large reach • Major source of information for many
  • 7. Data: Wikipedia Hoaxes Hoax article vs hoax facts 7
  • 8. Data: Wikipedia Hoaxes Hoax article vs hoax facts 21,218 hoax articles 8 Hoax lifecycle:
  • 9. Wikipedia hoaxes 9 Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Quantify their impact? What are the hoaxes like? Can we find them?
  • 10. Impact of hoaxes “The worst hoaxes are those which (a) last for a long time, (b) receive significant traffic, (c) are relied upon by credible news media.” Jimmy Wales on Quora 10
  • 11. Impact of hoaxes “The worst hoaxes are those which (a) last for a long time” 11 Time t between patrolling and flagging 0.990.90
  • 12. Impact of hoaxes “The worst hoaxes are those which (b) receive significant traffic” 12 10 100 500 Number n of pageviews per day
  • 13. Impact of hoaxes “The worst hoaxes are those which (c) are relied upon by credible news media” 13 1.08 active inlinks per hoax article, on average 7% of hoax articles have at least 5 active inlinks
  • 14. Wikipedia hoaxes 14 Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Most hoaxes are caught soon, but some hoaxes are impactful What are the hoaxes like? Can we find them?
  • 15. 15 Successful hoax pass patrol survive for a month viewed 100+/day Failed hoax flagged and deleted during patrol Wrongly flagged temporarily flagged Legitimate articles never flagged Hoax Non-hoax
  • 16. Characteristics of hoaxes 16 Appearance: how the article looks Link-network: how the article connects Support: how other articles refer to it Editor: how the article creator looks
  • 17. Characteristics of hoaxes 17 Surprisingly, hoax articles are longer than non-hoax articles! Features: o Plain-text length Appearance: how the article looks Link-network: how the article connects Support: how other articles refer to it Editor: how the article creator looks
  • 18. Characteristics of hoaxes 18 Surprisingly, hoax articles are longer than non-hoax articles! but they mostly have plain text and have fewer web and wiki links. Appearance: how the article looks Link-network: how the article connects Support: how other articles refer to it Editor: how the article creator looks Features: o Plain-text length o Plain-text-to-markup ratio o Wiki-link density o Web-link density
  • 19. Characteristics of hoaxes 19 Clustering coefficient = 0 incoherent article Clustering coefficient > 0 coherent article Legitimate articles are more coherent than successful hoaxes Appearance: hoaxes mostly have text and few references. Link-network: how the article connects Support: how other articles refer to it Editor: how the article creator looks
  • 20. Characteristics of hoaxes 20 Hoax mentions are less in number. Features: o Number of prior mentions Appearance: hoaxes mostly have text and few references. Link-network: hoaxes have incoherent wikilinks. Support: how other articles refer to it Editor: how the article creator looks
  • 21. Characteristics of hoaxes 21 Hoax mentions are less in number, mostly created by article creator or anonymously, and are more recently created. Features: o Number of prior mentions o Creator of first mention o Time since first mention Appearance: hoaxes mostly have text and few references. Link-network: hoaxes have incoherent wikilinks. Support: how other articles refer to it Editor: how the article creator looks
  • 22. Characteristics of hoaxes 22 Hoax creators are more recently registered, and have lesser editing experience. Features: o Creator’s time since registration o Creator’s experience Appearance: hoaxes mostly have text and few references. Link-network: hoaxes have incoherent wikilinks. Support: hoaxes have few, recent, suspicious mentions. Editor: how the article creator looks
  • 23. Wikipedia Hoaxes 23 Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Hoaxes are different from non- hoaxes in many respects Most hoaxes are caught soon, but some hoaxes are impactful Can we find them?
  • 24. Detection of hoaxes 24 Will a hoax get past patrol? Is an article a hoax? Is an article flagged as hoax really one? AUC = 71% Appearance features AUC = 98% Editor and Network features AUC = 86% Editor and support features
  • 25. We discovered previously unknown hoaxes! 25 Flagged by us and deleted by Wikipedia administrators Steve Moertel American popcorn entrepreneur Article survived over 6 years 11 months!
  • 26. Can readers identify hoaxes? 26 Results 320 random hoax and non-hoax pairs 10 raters on Amazon Mechanical Turk rated each pair Casual readers are gullible to hoaxes. Accurate detection needs non-appearance features. 50% Random 66% Human 86% Classifier
  • 27. What fools humans? 27 Humans get fooled when article looks more “genuine”, and it is assumed to be credible. Comparing easy- vs hard-to-identify hoaxes
  • 28. How to identify misinformation on the web? 28 ● Appearance ○ How well referenced is the information source? ○ What is the content of the article? ● Editor ○ Who created the information? ● Network ○ How related is this information to other information it references to? ● Support ○ Is there any evidence of the information, prior to its creation?
  • 29. Wikipedia Hoaxes 29 Impact of hoaxes Characteristics of hoaxes Detection of hoaxes Hoaxes are different from non- hoaxes in many respects Most hoaxes are caught soon, but some hoaxes are impactful Non-appearance features are important to detect hoaxes

Editor's Notes

  1. Web is a space for all, where anyone can read, publish and share information. It is rapidly becoming one of the major sources of news and information for everyone. In fact, 62% of adults in USA rely on social media for news, and more than a quarter of youngsters, between the age of 18 and 24, rely primarily on social media for news, even more than they rely on television.
  2. And in the third dimension, we look at how much the hoax article has spread across the web. For that, we use Wikipedia server’s click logs, to look at which links were clicked from across the web, both within and outside Wikipedia, that lead to the hoax article. And we find that on an average, each hoax article has 1.08 inlinks that were actually clicked and the reader came to read the article. These links were from search engines, social networks, and from within Wikipedia too.