Presented at IZEAfest in Orlando, FL
Social network and sharing analysis including:
+Document analysis at scale: Meme tracking combined with other variables like sentiment and bias
+Social network at scale: Information cascades and virality, inference of social networks given meme-like information as contagions
+The node level perspective and its effects on what an individual sees and shares: Illusions, effort and overload, topics, personality and demographics
+Personas and segmentation: Grouping based on demographics and interests
1. The Science of Sharing
Jason Baldridge
Co-founder, People Pattern
Associate Professor, The University of Texas at Austin
@jasonbaldridge
2. Preliminary notes
• This talk incorporates results and images from many different research papers by people working primarily in social
network analysis.
• As such, this talk is a synthesis of that work put together into a narrative to introduce key abilities and results. I felt this
high-level view was the best way to discuss “The Science of Sharing”, rather than relying primarily on my own work or work
done at People Pattern. Also, I was really impressed by the work researchers are doing in social network analysis and
wanted to share even a glimpse of the problems they are tackling and what they are finding.
• The high-level progression of this talk is:
• Document analysis at scale: meme tracking combined with other variables like sentiment and bias
• Social network at scale: information cascades and virality, inference of social networks given meme-like information as
contagions.
• The node level perspective and its effects on what an individual sees and shares: Illusions, effort and overload, topics,
personality and demographics.
• Personas and segmentation: grouping based on demographics and interests.
• The last item is work done at People Pattern. I stress that neither I nor People Pattern was involved with the research
papers cited in the other slides. My own academic research focuses on natural language processing, especially machine
learning for learning syntactic parsers and performing geolocation using text. For more on those topics, see: http://
www.jasonbaldridge.com/papers
• In the actual talk, I didn’t cover the slides on email overload (to keep things to 30 minutes), but for this deck, I’ve put them
back in their place.
• References and links to PDF’s of all cited work are at the end of this deck. They are also available on this post on my blog:
https://bcomposes.wordpress.com/2015/10/23/references-for-my-izeafest-talk/
Link to livestream of the talk:
https://youtu.be/8_aFymHQZbM?t=6h51m18s
3. Meme tracking
Leskovec et al. (2009). “Meme-tracking and the Dynamics of the News Cycle.”
Automatic detection and tracking of memes over time.
4. Meme tracking
Leskovec et al. (2009). “Meme-tracking and the Dynamics of the News Cycle.”
Meme oscillation heartbeat from blogs to mainstream media.
5. Quoting Patterns in
Political Coverage
Niculae et al. (2015). “QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Pattterns.”
Measuring bias is subjective and hard.
Personal estimates of bias are influenced by the availability heuristic.
57% of Americans perceive media as biased.
73% of conservatives think bias is liberal.
11% of liberals think bias is liberal.
Similarly: husbands and wives both estimate their
contributions to family activities differently.
[Lee & Waite (2005): http://www.jstor.org/stable/3600272]
Read this!
6. Quoting Patterns in
Political Coverage
Niculae et al. (2015). “QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Pattterns.”
Automated tracking of quotations from Obama’s speeches.
Red: quoted in
conservative media. Blue: quoted in
liberal media.
7. Niculae et al. (2015). “QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Pattterns.”
Dimensionality reduction reveals two main bias dimensions:
(one) independent-mainstream & (two) foreign-liberal-conservative.
Quoting Patterns in
Political Coverage
8. Niculae et al. (2015). “QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Pattterns.”
Sentiment across two bias dimensions:
more mainstream & conservative correlates with negative sentiment.
Quoting Patterns in
Political Coverage
9. Structural Virality
Goel et al. (2015). “The Structural Virality of Online Diffusion”
Information cascades can propagate via broadcast and viral diffusion.
Most cascades contain both broadcast and viral spreading.
Broadcast Viral
10. Structural Virality
Goel et al. (2015). “The Structural Virality of Online Diffusion”
Twitter cascades characterized by structural virality,
increasing down and to the right.
11. Structural Virality
Goel et al. (2015). “The Structural Virality of Online Diffusion”
Petition cascades are smallest, but have highest structural virality.
12. Structural Virality
Goel et al. (2015). “The Structural Virality of Online Diffusion”
99% of content
adoptions terminate in
a single generation
The largest image and
video cascades are low on
structural virality.
Broadcast is by far the dominant mode to reach large audiences.
This means pay-to-play when you need to go big reliably.
13. Predicting memes using
network structure
Weng et al. (2014). “Predicting Successful Memes using Network and Community Structure.”
Viral (a,b) and non-viral (c,d) memes diffuse differently at start.
14. Predicting memes using
network structure
Weng et al. (2014). “Predicting Successful Memes using Network and Community Structure.”
Network configurations of early adopters impact virality.
These relationships are better predictors than early popularity.
15. Rumor cascades
Friggeri et al. (2015). “Rumor Cascades”
Spread of false rumors is kept in check in social networks.
False Cabela’s Obamacare receipt cascade on Facebook. Snope links (red dots) typically end a branch of a rumor cascade.
16. Rumor cascades
Friggeri et al. (2015). “Rumor Cascades”
Being snoped increases the likelihood of deletion of the original post.
False rumors are more likely to be deleted.
18. Information propagation
Gomez Rodriguez et al (2014). “Uncovering the structure and temporal dynamics of information propagation.”
Contagion model: Information infects nodes, which become active.
Information spreads from active nodes along the network edges.
19. Information propagation
Gomez Rodriguez et al (2014). “Uncovering the structure and temporal dynamics of information propagation.”
Given information cascades, infer network using contagion model.
20. Information propagation
Gomez Rodriguez et al (2014). “Uncovering the structure and temporal dynamics of information propagation.”
Inferred structure shows emerging and vanishing clusters.
Red: mainstream media. Blue: blogs.
March 2011
June 2011
October 2011
21. Information propagation
Gomez Rodriguez et al (2014). “Uncovering the structure and temporal dynamics of information propagation.”
Evolution of network for Fukushima articles.
22. Information propagation
Gomez Rodriguez et al (2014). “Uncovering the structure and temporal dynamics of information propagation.”
Information generally flows from mainstream media to blogs.
Blogs play a crucial role in information dissemination in civil movements.
23. Information propagation
Gomez Rodriguez et al (2014). “Uncovering the structure and temporal dynamics of information propagation.”
Blogs and mainstream media swap influence during course of event.
Increased blog influence proportion correlates with social unrest.
24. Is virality/contagion
a bad metaphor?
Taylor Swift has 65 million Twitter followers who can
receive her messages. One individual cannot sneeze on
and infect that many people simultaneously.
The likelihood of disease infection increases
independently with exposure to different infected
individuals, but “infection” by an idea increases greatly
when exposed to it by multiple, independent parties.
25. Majority illusion
Lerman et al. (2015). “The Majority Illusion in Social Networks.”
Friendship paradox: on average most people have
fewer friends than their friends.
This generalizes to any node attribute, which may
explain why people overestimate their friends’
alcohol consumption.
26. Majority illusion
Lerman et al. (2015). “The Majority Illusion in Social Networks.”
The connectedness of “infected” people greatly impacts the perception of others.
A minority opinion can appear extremely popular for each individual (left side).
27. Majority illusion
Lerman et al. (2015). “The Majority Illusion in Social Networks.”
The size of majority illusion in Digg and political blogs, varying
the number and connectedness of infected nodes.
28. Strength of weak ties
Brokerage positions expose their network to diverse information.
Easy to establish weak links in social media, but increases cognitive load.
Embedded position Brokerage position
Kang and Lerman (2015). “User Effort and Network Structure Mediate Access to Information in Networks.”
29. User effort and
network structure
Kang and Lerman (2015). “User Effort and Network Structure Mediate Access to Information in Networks.”
Twitter users with more diverse networks see more diverse content.
More active users (red dots) see more diverse content regardless.
30. User effort and
network structure
Kang and Lerman (2015). “User Effort and Network Structure Mediate Access to Information in Networks.”
High network diversity users tend to see more general topics.
Low diversity users tend to focus on one or two niche topics.
31. Information overload
Gomez-Rodriguez et al. (2014). “Quantifying Information Overload in Social Media and its Impact on Social Contagions”
After incoming rate passes 30 tweets per hour, retweeting drops.
32. Information overload
Gomez-Rodriguez et al. (2014). “Quantifying Information Overload in Social Media and its Impact on Social Contagions”
Users are responsive until 50-100 tweets/hour,
then give up or resort to other techniques or tools.
33. Information overload
Gomez-Rodriguez et al. (2014). “Quantifying Information Overload in Social Media and its Impact on Social Contagions”
More background information leads to smaller cascades.
34. Email overload
Kooti et al. (2015). “Evolution of Conversations in the Age of Email Overload.”
Longer email threads have shorter, quicker responses.
Long last response signals end of thread.
35. Email overload
Kooti et al. (2015). “Evolution of Conversations in the Age of Email Overload.”
Emails received on weekends are replied to more slowly & tersely.
36. Email overload
Kooti et al. (2015). “Evolution of Conversations in the Age of Email Overload.”
People email more as they get more emails, but get buried.
37. Email overload
Kooti et al. (2015). “Evolution of Conversations in the Age of Email Overload.”
Younger people are less sensitive to overload than older ones.
38. Information diets
Most users consume one or two dominant topics.
Kulshrestha et al. (2015). “Characterizing Information Diets of Social Media Users.”
39. Information diets
Kulshrestha et al. (2015). “Characterizing Information Diets of Social Media Users.”
Social media concentrates more on real-time topics.
40. Personality classification
Yarkoni (2010). “Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers.”
Language production provides a window on personality at scale.
41. Personality classification
Iacobelli et al. (2015). “Large Scale Personality Classification of Bloggers.”
Bigrams as indicators of high/low scorers in personality classification.
High scorers Low scorers
Neuroticism
Extroversion
Openness
Agreeableness
Conscientiousness
42. Ad Targeting
and Personality
Chen et al. (2015). “Making Use of Derived Personality: The Case of Social Media Ad Targeting.”
Twitter users whose language indicates higher openness and lower
neuroticism are more likely to respond positively to an ad.
43. Antisocial Behavior Online
Cheng et al. (2015). “Antisocial Behavior in Online Discussion Communities.”
Comparing banned & normal users (in retrospect): banned users wrote
posts that are less relevant, harder to read, and less positive.
FBU: Future banned users
NBU: Never banned users
45. Tailored audiences
People Pattern and Smarty Pants Vitamins case study.
Human analysis and machine learning can be used to characterize
and identify personas using social media profiles.
+
46. Tailored audiences
People Pattern and Smarty Pants Vitamins case study.
Interest prediction and extraction of interest-specific keywords.
Promoted tweet copy informed by persona-based keywords.
+
47. Tailored audiences
People Pattern and Smarty Pants Vitamins case study.
Persona-based campaigns with audience-driven ad copy
produced higher engagement at lower cost per conversion.
+
Conversions
0
60
120
180
240
Control Overscheduled Parent Grab & Go
Cost per conversion
0
10
20
30
40
48. Sub-micro segmentation
We have limited attention and many options.
The best, most relevant content is often created by those
with very similar passions, interests, and demographics.
Doresa Jennings Cheryl Baldridge
• PhD, BGSU
• Lives in the southern USA
• Mother of profoundly gifted
children
• Homeschooler
• Commitment to STEM
• African-American
• JD, Yale
• Lives in the southern USA
• Mother of profoundly gifted
children
• Homeschooler
• Commitment to STEM
• African-American
Dr. J creates a lot of original text and video.
My busy wife makes time for it all.
Other content is less compelling for her.
http://kdacademy.blogspot.com/
https://www.youtube.com/user/DAJedu
49. Conclusion
Authentic, original
content is the most
compelling.
Audience understanding is essential:
demographics, personality and microsegment relevance.
Pay-to-play to reliably get
your word out.
Content consumers must
constantly manage
information overload.
Large scale analysis of
networks and documents
reveals hidden patterns.
50. References
• Chen et al. (2015). “Making Use of Derived Personality: The Case of Social Media Ad Targeting.”
- http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10508
• Cheng et al. (2015). “Antisocial Behavior in Online Discussion Communities.” - http://arxiv.org/
abs/1504.00680
• Friggeri et al. (2015). “Rumor Cascades.” - http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/
paper/view/8122
• Goel et al. (2015). “The Structural Virality of Online Diffusion.” - https://5harad.com/papers/
twiral.pdf
• Gomez-Rodriguez et al. (2014). “Quantifying Information Overload in Social Media and its
Impact on Social Contagions.” - http://arxiv.org/abs/1403.6838
• Gomez Rodriguez et al. (2014). "Uncovering the structure and temporal dynamics of information
propagation." - http://www.mpi-sws.org/~manuelgr/pubs/S2050124214000034a.pdf
• Iacobelli et al. (2015). “Large Scale Personality Classification of Bloggers.” - http://
www.research.ed.ac.uk/portal/files/12949424/
Iacobelli_Gill_et_al_2011_Large_scale_personality_classification_of_bloggers.pdf
51. References
• Kang and Lerman (2015). “User Effort and Network Structure Mediate Access to Information in
Networks.” - http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10483
• Kooti et al. (2015). “Evolution of Conversations in the Age of Email Overload.” - http://arxiv.org/
abs/1504.00704
• Kulshrestha et al (2015). “Characterizing Information Diets of Social Media Users.” - https://
www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewFile/10595/10505
• Lerman et al. (2015). “The Majority Illusion in Social Networks.” - http://arxiv.org/abs/1506.03022
• Leskovec et al. (2009). “Meme-tracking and the Dynamics of the News Cycle.” - http://
www.memetracker.org/quotes-kdd09.pdf
• Niculae et al. (2015). “QUOTUS: The Structure of Political Media Coverage as Revealed by
Quoting Patterns.” - http://snap.stanford.edu/quotus/
• Weng et al. (2014). “Predicting Successful Memes using Network and Community Structure.” -
http://arxiv.org/abs/1403.6199
• Yarkoni (2010). “Personality in 100,000 Words: A large-scale analysis of personality and word
use among bloggers.” - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2885844/