Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Disinformation on the Web: impact, characteristics and detection of Wikipedia hoaxes
1. Disinformation on the Web:
Impact, Characteristics and
Detection of Wikipedia Hoaxes
Srijan Kumar Univ. of Maryland
Robert West Stanford Univ.
Jure Leskovec Stanford Univ.
1
Originally presented at the 25th International World Wide Web Conference,
Montreal, Canada, April 2016
2. Web: Source of information
2
62% adults
in U.S.A.
rely on
social media
for news
28% of 18-
24 year olds
use social
media as
primary
news source
4. Types of false information
4
Misinformation
honest mistake
Disinformation
deliberate lie to mislead
Hoax
“deliberately
fabricated falsehood
made to masquerade
as truth”
Wikipedia
5. Why Wikipedia?
The free encyclopedia that anyone can edit
5
Easy to add (false)
information
• Freely accessible
• Large reach
• Major source of
information for
many
10. Impact of hoaxes
“The worst hoaxes are those which
(a) last for a long time,
(b) receive significant traffic,
(c) are relied upon by credible news media.”
Jimmy Wales on Quora
10
11. Impact of hoaxes
“The worst hoaxes are those which
(a) last for a long time”
11
Time t between patrolling and flagging
0.990.90
12. Impact of hoaxes
“The worst hoaxes are those which
(b) receive significant traffic”
12
10 100 500
Number n of pageviews per day
13. Impact of hoaxes
“The worst hoaxes are those which
(c) are relied upon by credible news media”
13
1.08
active inlinks
per hoax article,
on average
7% of hoax
articles have at
least 5
active inlinks
15. 15
Successful hoax
pass patrol
survive for a month
viewed 100+/day
Failed hoax
flagged and
deleted during
patrol
Wrongly flagged
temporarily flagged
Legitimate
articles
never flagged
Hoax
Non-hoax
16. Characteristics of hoaxes
16
Appearance:
how the article
looks
Link-network:
how the article
connects
Support:
how other
articles refer to it
Editor:
how the article
creator looks
17. Characteristics of hoaxes
17
Surprisingly, hoax articles are
longer than non-hoax articles!
Features:
o Plain-text length
Appearance:
how the article
looks
Link-network:
how the article
connects
Support:
how other
articles refer to it
Editor:
how the article
creator looks
18. Characteristics of hoaxes
18
Surprisingly, hoax articles are
longer than non-hoax articles!
but
they mostly have plain text and
have fewer web and wiki links.
Appearance:
how the article
looks
Link-network:
how the article
connects
Support:
how other
articles refer to it
Editor:
how the article
creator looks
Features:
o Plain-text length
o Plain-text-to-markup ratio
o Wiki-link density
o Web-link density
19. Characteristics of hoaxes
19
Clustering coefficient = 0
incoherent article
Clustering coefficient > 0
coherent article
Legitimate articles are more
coherent than successful hoaxes
Appearance:
hoaxes mostly
have text and
few references.
Link-network:
how the article
connects
Support:
how other
articles refer to it
Editor:
how the article
creator looks
20. Characteristics of hoaxes
20
Hoax mentions are less in number.
Features:
o Number of prior mentions
Appearance:
hoaxes mostly
have text and
few references.
Link-network:
hoaxes have
incoherent
wikilinks.
Support:
how other articles
refer to it
Editor:
how the article
creator looks
21. Characteristics of hoaxes
21
Hoax mentions are less in number,
mostly created by article creator or
anonymously, and are more
recently created.
Features:
o Number of prior mentions
o Creator of first mention
o Time since first mention
Appearance:
hoaxes mostly
have text and
few references.
Link-network:
hoaxes have
incoherent
wikilinks.
Support:
how other articles
refer to it
Editor:
how the article
creator looks
22. Characteristics of hoaxes
22
Hoax creators are more recently
registered, and
have lesser editing experience.
Features:
o Creator’s time since registration
o Creator’s experience
Appearance:
hoaxes mostly
have text and
few references.
Link-network:
hoaxes have
incoherent
wikilinks.
Support:
hoaxes have few,
recent, suspicious
mentions.
Editor:
how the article
creator looks
24. Detection of hoaxes
24
Will a hoax get
past patrol?
Is an article
a hoax?
Is an article flagged
as hoax really one?
AUC = 71%
Appearance
features
AUC = 98%
Editor and
Network features
AUC = 86%
Editor and
support features
25. We discovered previously unknown hoaxes!
25
Flagged by us and deleted by Wikipedia administrators
Steve Moertel
American
popcorn
entrepreneur
Article survived over
6 years 11 months!
26. Can readers identify hoaxes?
26
Results
320 random hoax and non-hoax pairs
10 raters on Amazon Mechanical Turk rated each pair
Casual readers are gullible to hoaxes.
Accurate detection needs non-appearance features.
50%
Random
66%
Human
86%
Classifier
27. What fools humans?
27
Humans get fooled when article looks more “genuine”,
and it is assumed to be credible.
Comparing easy- vs hard-to-identify hoaxes
28. How to identify misinformation on the web?
28
● Appearance
○ How well referenced is the information source?
○ What is the content of the article?
● Editor
○ Who created the information?
● Network
○ How related is this information to other information it
references to?
● Support
○ Is there any evidence of the information, prior to its
creation?
29. Wikipedia Hoaxes
29
Impact
of hoaxes
Characteristics
of hoaxes
Detection
of hoaxes
Hoaxes are
different from non-
hoaxes in many
respects
Most hoaxes are
caught soon, but
some hoaxes are
impactful
Non-appearance
features are
important to
detect hoaxes
Web is a space for all, where anyone can read, publish and share information.
It is rapidly becoming one of the major sources of news and information for everyone.
In fact, 62% of adults in USA rely on social media for news, and more than a quarter of youngsters, between the age of 18 and 24, rely primarily on social media for news, even more than they rely on television.
And in the third dimension, we look at how much the hoax article has spread across the web.
For that, we use Wikipedia server’s click logs, to look at which links were clicked from across the web, both within and outside Wikipedia, that lead to the hoax article.
And we find that on an average, each hoax article has 1.08 inlinks that were actually clicked and the reader came to read the article. These links were from search engines, social networks, and from within Wikipedia too.