2. TREND
DETECTION, TRACKING & TRANSITION
in Social Networks
1. Definition & General Idea
2. Web Samples in Trend Hunting
3. Detection Approches
4. Architecture: TwitterMonitor
5. Detection: MemeTracker
6. Classification: ExoEndo
SemioNet: Semantic Social Network Analysis
3. REFERENCES
Mathioudakis, Michael, and Nick Koudas. "Twittermonitor: trend
detection over the twitter stream." Proceedings of the 2010 ACM
SIGMOD International Conference on Management of data. ACM,
2010.
Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. "Meme-tracking
and the dynamics of the news cycle." Proceedings of the 15th ACM
SIGKDD international conference on Knowledge discovery and
data mining. ACM, 2009.
Naaman, Mor, Hila Becker, and Luis Gravano. "Hip and trendy:
Characterizing emerging trends on Twitter." Journal of the American
Society for Information Science and Technology 62.5 (2011): 902-
918.
Becker, Hila, Mor Naaman, and Luis Gravano. "Beyond Trending
Topics: Real-World Event Identification on Twitter." ICWSM 11 (2011):
438-441.
4. Trend Analysis
The Science of Studying
Changes in Social Patterns,
Including Fashion, Technology
& Consumer Behavior
Horizontal Analysis
The General Movement
over TIME of a
Statistically Detectable
Change
Fundamentally, a Method
for Understanding HOW &
WHY Things have Changed
– or will Change – over TIME
6. APPROCH
Text Mining
Topic Ident. & Clust.
"Kilroy was here" was a
piece of graffiti that
became popular in the
1940s, and existed under
various names in
different countries,
illustrating how a meme
can be modified through
replication
Memes
(/ˈmiːm/) is "an idea, behavior, or
style that spreads from person to person
within a culture.“ … through writing,
speech, gestures, rituals, or other
imitable phenomena with a mimicked
theme. … cultural analogues to genes in that
they self-replicate, mutate, and
respond to selective pressures.
7. GroupBurst: Assesses Co-occurrences
One-pass
of Bursty
Real-time
Keyword in Recent Tweets
Adjustable against spam
Theoretically sound!
Adjustable against SPURIOUS Bursts. Coincidental Burst of Keyword over a short period of time
Context Extraction Algorithms (PCA,
SVD) & Grapevine’s Entity Extractor
to Add more
271 Million Monthly Active Users
500 Million Tweets (140 ch) Per Day
78% Active Users on Mobile
77% Accounts Outside U.S.
Supports 35+ languages
8. MemeTracking
News Cycle
Tracking News Evolution
Quotes & Memes
Integral Part of Journalistic Practice
Travel Relatively Intact with Mutational Variants
Clustering by Graph
9. Item: Each News Article/Blog Post
Phrase: A Quoted String Occurs in Items
MemeTracking …
10. Phrase Graph
DAG
|P| < |Q|
“senseless killing”
“enough of senseless
killing”
“Hear our voice. We have had enough of this
senseless killing”
Directed Edit Distance(P, Q) < δ
Word Consecutive Overlap(P, Q) > k
P Q
푊푃,푄 ∝
1
퐷푖푟푒푐푡푒푑 퐸푑푖푡 퐷푖푠푡푎푛푐푒(푃,푄)
∝ 푇표푡푎푙 푁푢푚푏푒푟 표푓 푄 푖푛 퐶표푟푝푢푠
MemeTracking …
11. Phrase Clusters
Directed Acyclic Graph (DAG) Partitioning
Given a Weighted DAG, Delete a Set of Edges of
Min Total Weight So That Each of the Resulting
Components is Single-Rooted.
NP-hard
Heuristic
1.Start from the Roots
2.Down the DAG & greedily Assigns each Node to the Cluster to
which it has the most Edges
MemeTracking …
13. Result
Volume Distribution
Dataset
3 Months Aug 1 to Oct 31 2008
~ 1M Docs per Day from 1.65 Million
Sites!
47M Phrases, 22M Distinct
9H Clustering Process Time
35, 800 Non-trivial Clusters (at least two phrases)
MemeTracking …
15. Other Findings
Time lag between the news media and blogs
푓 푛푗 훿 푡 − 푡푗
푛푗 = Number of Item Previously Written for Cluster j
푡 = 푡ℎ푒 푐푢푟푟푒푛푡 푡푖푚푒
푡푗 = 푡ℎ푒 푡푖푚푒 푤ℎ푒푛 푗 푤푎푠 푓푖푟푠푡 푝푟표푑푢푐푒푑
푅푒푐푒푛푐푦 → 훿 푖푠 푚표푛표푡표푛푖푐푎푙푙푦 푑푒푐푟푒푎푠푖푛푔 푖푛 푡 − 푡푗
퐼푚푖푡푎푡푖표푛 → 푓 푖푠 푚표푛표푡표푛푖푐푎푙푙푦 푖푛푐푟푒푎푠푖푛푔 푖푛 푛푗, 푓(0) > 0
푡 → 0−: 푎 = 0.076 푡 → 0+: 푎 = 0.092
푡 → 0−: 푏 = 1.77 푡 → 0+: 푏 = 2.15
Quotes migrating from blogs to news media: 3.5%
Each Cluster
Modeling the news trend
Imitation≠Recency
MemeTracking …
16.
17. Characterizing Trends
“trends in trend data.” Meta Trend
Taxonomy of the trends
Key Distinguishing Features of Trends
Not only the Textual Content
Social Network Structure
Ties
Geographic
Action Retweet, Reply, Mention, Hashtag
18. Trends
Exogenous
Broadcast-media
Broadcast of local media
“fight” (boxing event)
“Ravens” (football game)
Broadcast of global/national media
“Kanye”(KanyeWest acts up at the MTVVideo MusicAwards)
“Lost Finale” (series finale of Lost).
Global News
Breaking
“earthquake” (Chile earthquake)
“Tsunami” (HawaiiTsunamiwarning)
“Beyoncé”(Beyoncé cancels Malaysia concert).
Nonbreaking
“HCR” (health care reform)
“Tiger” (Tiger Woods apologizes)
“iPad” (toward thelaunch of Apple’s popular device).
National Holidays & Memorial Days
“Halloween,” “Valentine’s.”
Local Participatory & Physical
Planned
“marathon,”
“superbowl” (Super Bowl viewing parties)
“patrick’s” (St. Patrick’s Day Parade).
Unplanned
“rainy,” “snow.”
Endogenous
Memes
#in2010 (in December 2009, users imagine their near future)
“November” (users marking the beginning of the month on November 1)
Retweets
Fan Community Activities
“2pac” (the anniversary of the death of hip-hop artist Tupac Shakur).
Characterizing Trends …
19. Trends from twitter.com
Trends from Simple Trend Detector
Trends for Quality Analysis Supervised Categories
Trends for Computing Features
Tquantity
Ttwitter
Tterm freq.
Tquality
Characterizing Trends …
20. Content Features
•Average number of words/characters
•Proportion of messages with URLs, unique URLs, with hashtags ex/including trend terms
•Top unique hashtag?
•Similarity to centroid
Interaction Features
• Proportion of retweets, replies, mentions
Time-based Features
• Exponential fit head, tail
• Logarithmic fit head, tail
Participation Features
• Messages per author
• Proportion of messages from top author
• Proportion of messages from top 10% of authors
Social Network Features
•Level of reciprocity
•Maximal eigenvector centrality
•Maximal degree centrality
•Transitivity
•Density
•Average component size
Characterizing Trends …
21. Content features: Exo higher URLs, smaller hashtags
Exogenous
vs.
Endogenous
Trends
Interaction features: Exo fewer
retweets, similar number of replies
Time features: Exo different for the
head period before the trend peak
but will exhibit similar time features in
the tail period after the trend peak,
compared to endogenous trends.
Social network features: Exo fewer connections, less reciprocity
1.1
1.2
1.3
1.4
Characterizing Trends …
23. IDEA
Automatic Categorization of Trends
Photography Trend Selfie Image
Trust Trend Trustful Users, Trustful Twits
Untrendy People! Users Counteract the trends
Editor's Notes
Vertical Analysis: Financial Managers Set One Accounting Item as the Benchmark & Compare other Items with the Numerical Standard
In contrast with
Horizontal Analysis: Study of Performance Trends over Time
Short
Intermediate
Long
Past
Now
Future
Automatic trend detection over the twitter stream
distinctive phrases that travel relatively intact through on-line text; developing scalable algorithms for clustering textual variants of such phrases, we identify a broad class of memes that exhibit wide spread
mutation. As a result, a central computational challenge in this approach is to find robust ways of extracting and identifying all the mutational variants of each of these distinctive phrases, and to group them together.
Words as Tokens
This latter dependence is important, since we particularly wish to preserve edges (p, q) when the inclusion of p in q is supported by many occurrences of q.
Collections of Phrases Deemed to be Close Textual Variants of One Another
CCDF: Complementary Cumulative Distribution Function
If the quantity of interest is power-law distributed with exponent γ, p(x) ∝ x−γ, then when plotted on log-log axes the CCDF will be a straight line with slope −(γ + 1).
the tail is much heavier
This means that variants of popular phrases, like “lipstick on a pig,” are much more “stickier” than what would be expected from overall phrase volume distribution.
Popular phrases have many variants and each of them appears more frequently than an “average” phrase.
To put a “lipstick on a pig”(does not make it a lady) is a rhetorical expression used to convey the message that making superficial or cosmetic changes is a futile attempt to disguise the true nature of a product
اگر زري بپوشي، اگر اطلس بپوشي، همون کنگر فروشي
بزک
focus on the 1,000 threads with the largest total volumes (i.e. the largest number of mentions).
Thread volume in blogs reaches its peak typically 2.5 hours after the peak thread volume in the news sources. Thread volume in news sources increases slowly but decrease quickly, while in blogs the increase is rapid and decrease much slower.
reflect an ever-updating real-time live image of our society.
Exogenous Trends
• Broadcast-media events:
◦ Broadcast of local media events: “fight” (boxing event), “Ravens” (football game).
◦ Broadcast of global/national media events: “Kanye”(KanyeWest acts up at the MTVVideo MusicAwards),“Lost Finale” (series finale of Lost).
• Global news events:
◦ Breaking news events: “earthquake” (Chile earthquake),“Tsunami” (HawaiiTsunamiwarning), “Beyoncé”(Beyoncé cancels Malaysia concert).
◦ Nonbreaking news events: “HCR” (health care reform),“Tiger” (Tiger Woods apologizes), “iPad” (toward thelaunch of Apple’s popular device).
• National holidays and memorial days: “Halloween,” “Valentine’s.”
• Local participatory and physical events:
◦ Planned events: “marathon,” “superbowl” (Super Bowl viewing parties), “patrick’s” (St. Patrick’s Day Parade).
◦ Unplanned events: “rainy,” “snow.”
Endogenous Trends
• Memes: #in2010 (in December 2009, users imagine their near future), “November” (users marking the beginning of the month on November 1)
• Retweets (users “forwarding” en masse a single tweet from a popular user): “determination” (users retweeting LL Cool J’s post about said concept).
• Fan community activities: “2pac” (the anniversary of the death of hip-hop artist Tupac Shakur).
Breaking News vs. Other Exogenous Trends
H2.1: Interaction features of breaking events will be different than those of other exogenous trends, with more retweets (forwarding), but fewer replies (conversation).
H2.2: Time features of breaking events will be different for the head period, showing more rapid growth, and a better fit to the functions’ curve (i.e., less noise) compared to other exogenous trends.
H2.3: Social network features of breaking events will be different than those of other exogenous trends.
Local Events vs. Other Exogenous Trends
H3.1: Content features of local events will be different than those of other exogenous trends.
H3.2: Interaction features of local events will be different than those of other exogenous trends; in particular, local events will have more replies (conversation).
H3.3: Time features of local events will be different than those of other exogenous trends.
H3.4: Social network features of local events will be different than those of other exogenous trends; in particular, local events will have denser networks, more connectivity, and higher reciprocity.
Memes vs. Retweet Endogenous Trends
H4.1: Content features of memes will be different than those of retweet trends.
H4.2: Interaction features of memes will be different than those of retweet trends; in particular, retweet trends will have significantly more retweet (forwarding) messages (this hypothesis is included as a “sanity check” since the retweet trends are defined by having a large proportion of retweets).
H4.3: Time features of memes will be different than those of retweet trends.
H4.4: Participation features of memes will be different than those of retweet trends.
H4.5: Social network features of memes will be different than those of retweet trends; in particular, meme trends will have more connectivity and higher reciprocity than retweet trends.