Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeffer and Matt Stempeck: Characterizing the Life Cycle of Online News Stories Using Social Media Reactions. In CSCW. Baltimore, USA. February 2014.
DevEX - reference for building teams, processes, and platforms
Characterizing the Life Cycle of Online News Stories Using Social Media Reactions
1. Characterizing the Life Cycle
of Online News Stories
Using Social Media Reactions
Carlos Castillo, Mohammed El-Haddad, Matt Stempeck, Jürgen Pfeffer
Twitter: @ChaToX
2. 2
Carlos Castillo – @chatox
http://www.chato.cl/research/
Outline
• Determining classes of news articles
• Predicting traffic using social media
3. 3
Carlos Castillo – @chatox
http://www.chato.cl/research/
Usage analysis in online news
• Aikat (1998)
– Short dwell times, weekday+, weekend-,
bursty traffic.
• Crane and Sornette (2008), Yang and
Leskovec (2011), Lehmann et al. (2012)
– Behavioral classes of attention online
4. 4
Carlos Castillo – @chatox
http://www.chato.cl/research/
Analysis of social media responses
• SocialFlow whitepaper (Lotan, Gaffney,
and Meyer 2011)
– Al Jazeera, BBC News, CNN, The Economist,
Fox News and The New York Times
• Hu et al. (2011)
– Tweets during speech of US president
5. 5
Carlos Castillo – @chatox
http://www.chato.cl/research/
Predictive Web Analytics (references)
6. 6
Carlos Castillo – @chatox
http://www.chato.cl/research/
Data collection
• Three weeks in October 2012
• “Beacon” embedded in Al Jazeera pages
– Real-time data processing
– Apache S4 application for online processing
– Cassandra (NoSQL database) for storage
≈ 3M visits
≈ 200K social media reactions
8. 8
Carlos Castillo – @chatox
http://www.chato.cl/research/
News In-Depth
Examples:
• US state of Maryland
abolishes death penalty
(May 2nd, 2013)
• Hundreds arrested in
China over 'fake' meat
(May 3rd, 2013)
Examples:
• Spirits of Japan shrine
haunt Asian relations
(May 2nd, 2013)
• Interactive: Powering
the Gulf (May 2nd,
2013)
9. 9
Carlos Castillo – @chatox
http://www.chato.cl/research/
News (322) In-Depth (139)
Tag clouds extracted from titles of articles
16. Examples
Decreasing
(78%):
● Almost all
breaking news
● Sometimes
delayed due to
timezone
differences, e.g.
Hurricane Sandy
Steady or
Increasing (12%):
● Ongoing news:
Obama/Romney,
Worker strikes in
SA, Syrian unrest
● Articles updated
with supporting
content
Rebounding
(10%):
● Articles picked up
by external
sources or social
media (typically
single source of
traffic)
● Background
articles to new
developments
17. 17
Carlos Castillo – @chatox
http://www.chato.cl/research/
Prediction of visits
• Short-term traffic is to a large extent
correlated with long-term traffic
• Social media signals are correlated with
traffic and shelf-life
More reactions → more traffic
More discussion → longer shelf-life
• Can we predict 7 days after 30 minutes?
18. 18
Carlos Castillo – @chatox
http://www.chato.cl/research/
Predicting traffic and shelf-life online
has a long history
• Predicting long-term behavior and
half-life from short-term observations
– Observations = comments, visits, votes, …
– Behavior = total comments, total visits, …
– 10+ papers specifically on web traffic
• Bit.ly (2011, 2012)
– Studies half-life per topic and platform
24. 24
Carlos Castillo – @chatox
http://www.chato.cl/research/
http://fast.qcri.org/
25. 25
Carlos Castillo – @chatox
http://www.chato.cl/research/
What did we learn?
• Decrease, Stay or Increase. Rebound
– Roughly 80:10:10 ratio
• News vs In-Depth: different behavior
• Social media signals are useful to
understand and predict visits
26. 26
Carlos Castillo – @chatox
http://www.chato.cl/research/
Invitation:
ECML/PKDD Discovery Challenge 2014
• Open competition
on predictive Web
Analytics
• Data provided by
Chartbeat Inc.