Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-Information Networks
1. Who will follow whom?
Exploiting Semantics for Link Prediction in
Attention-Information Networks
International Semantic Web Conference 2012. Boston, US
Matthew Rowe1 Milan Stankovic2,3 Harith Alani4
1
School of Computing and Communications, Lancaster University, Lancaster, UK
2
Hypios Research, 187 rue du Temple, 75003 Paris, France
3
Universit Paris-Sorbonne, 28 rue Serpente, 75006 Paris
4
Knowledge Media Institute, The Open University, Milton Keynes, UK
@mrowebot | m.rowe@lancaster.ac.uk
http://www.matthew-rowe.com | http://www.lancs.ac.uk/staff/rowem/
2. Background Problem Formulation Approach Experiments Summary
Attention Information Networks
The intersection of information and social networks
[Yin et al., 2011]:
Users can follow other users: u subscribes to v
u = Follower, v = Followee
User u is paying attention to the content from user v
u v
Users become ’Information Hubs’ [Romero and Kleinberg, 2010]
Tune in to get real time event information
E.g. #Sandy, #Arabspring, #Londonriots
People become social sensors
u v
Attention is paid to the information that users publish
Who will follow whom? Exploiting Semantics for Link Prediction 2 / 22
3. Background Problem Formulation Approach Experiments Summary
Attention Economics
Large uptake/adoption of Attention-Information Networks:
31.9% increase in Twitter users in 2011
Attention becomes a limited commodity
“What counts now is what is most scarce now, namely attention.”
[Goldhaber, 1997]
Users must consider who they wish to subscribe to
Whose content do I wish to receive?
Who interests me?
If we can understand who will follow whom & follower decisions:
Predict social capital based on expected network growth;
Facilitate audience building
Of interest to Digital Marketing firms - i.e. boosting client’s presence
Who will follow whom? Exploiting Semantics for Link Prediction 3 / 22
4. Background Problem Formulation Approach Experiments Summary
Outline
Problem Formulation
Related Work
Follower-Decision Hypotheses
Formulating the Problem
Approach
Features
Concept Disambiguation with User Contexts
Experiments
Dataset
Experimental Setup
Results: Prediction Accuracy
Results: Follower-Decision Patterns
Who will follow whom? Exploiting Semantics for Link Prediction 4 / 22
5. Background Problem Formulation Approach Experiments Summary
Related Work
Network-topology approaches [Golder and Yardi, 2010,
Yin et al., 2011, Backstrom and Leskovec, 2011]:
Path structures, common followers and common friends
Local metadata approaches [Schifanella et al., 2010,
Leroy et al., 2010, Brzozowski and Romero, 2011]:
Common tags (Flickr, YouTube), group information (on Flickr)
Local metadata approaches use tags or group memberships, but no
concepts
No examination of the follower-decision behaviour patterns
And no exploration of divergent follower decision behaviour
Who will follow whom? Exploiting Semantics for Link Prediction 5 / 22
6. Background Problem Formulation Approach Experiments Summary
Follower-Decision Hypotheses
H1. Following a user is performed when there is a topical
affinity between the follower and the followee
[Schifanella et al., 2010] found social and topical homophily to be
correlated on Flickr
H2. Users who do not focus on specific topics do not base
their follower-decisions on topical information but on social
factors
Unfocussed users show divergent decision behaviour
H3. Users who are more socially connected are driven by social
rather than topical factors
High-degree users are driven by social network effects
Who will follow whom? Exploiting Semantics for Link Prediction 6 / 22
7. Background Problem Formulation Approach Experiments Summary
Formulating the Problem
A directed social network is a graph: G = V , E , where:
V denotes the set of users (nodes), and;
E is the set of edges ( u, v ∈ E ) between nodes.
meaning that u follows v
An egocentric social network (egonet) of u is denoted by Γ(u)
Γ− (u) denotes in the follower network (incoming edges)
Γ+ (u) denotes in the followee network (outgoing edges)
A given user u is provided with a set of recommended users R(u)
R(u) ∩ Γ+ (u) = ∅
Goal: induce a function between users and recommendations:
f : V × R → {0, 1}
Who will follow whom? Exploiting Semantics for Link Prediction 7 / 22
8. Background Problem Formulation Approach Experiments Summary
Predicting Links in Attention-Information Networks
Given our problem setting we want to:
1. Identify the best performing general model;
2. Explore follower-decision behaviour and how this differs
Problem is a binary classification task: pairwise features between u
and each of his recommendations (v ∈ R(u))
To enable accurate prediction and explore different factors behind
link creation we implement:
Social features: based on the network-structure
Topical features: based on content published by u and v
Visibility features: based on the user noticing a followed
We now explain the various features which are computed between u
and v ∈ R(u)...
Who will follow whom? Exploiting Semantics for Link Prediction 8 / 22
9. Background Problem Formulation Approach Experiments Summary
Social
Social features account for the topology of the network and the existence
of edges present within the network prior to recommendations
Mutual Followers Count: Measures the overlap of the follower sets
(i.e. the set of users connecting into a given user) between u and v .
Mutual Followees Count Measures the overlap of the followee sets
Mutual Friends Count Measures the overlap of the friends sets
i.e. Friendship is denoted by a bi-directional edge between nodes
Mutual Neighbours Measures the overlap of the ego-centric
networks of u and v whilst ignoring the directions of the links in the
network
[Zhou et al., 2009, Yin et al., 2011, Backstrom and Leskovec, 2011]
Who will follow whom? Exploiting Semantics for Link Prediction 9 / 22
10. Background Problem Formulation Approach Experiments Summary
Topical (I)
In attention-information networks users pay attention the content of
other users
Topical features use: a) tags, b) concept bags, c) concept graphs
Tag Vectors: Examining tag/keyword overlap between u and v
[Schifanella et al., 2010]
Cosine Similarity: between the tag vectors of u and v
Concept Bags: Examining overlap of concepts
Return concepts from user content, then derive the concept bag
vector
Cosine Similarity: similarity between the concept bag vectors of u
and v
Jensen-Shannon Divergence: probability distribution divergence
between concept bag vectors of u and v
Greater divergence means greater dissimilarity between topics
Who will follow whom? Exploiting Semantics for Link Prediction 10 / 22
11. Background Problem Formulation Approach Experiments Summary
Topical (II)
C1
C3 C2
u v
Concept Graphs: Semantic relatedness of users using graph-based
metrics
Measure distances between concepts from tags of u and v : d(ci , cj )
Distance measures have two varieties, based on input tags:
1. Tag Intersection: Intersection of the tag sets of u and v
2. All Tags: All tags from the tag sets of u and v
Measured three distances measures for d(ci , cj ) using the above sets:
Shortest Path: least number of steps from ci to cj (Bellman-Ford
algorithm)
Hitting Time: number of steps for a random walker to leave ci and
reach cj [Fouss et al., 2007]
Commute Time: number of steps for a random walker to leave ci
and reach cj , and then return to ci
Who will follow whom? Exploiting Semantics for Link Prediction 11 / 22
12. Background Problem Formulation Approach Experiments Summary
Visibility
The presence of information published by a prospective followee could
influence users in their follower-decisions
Retweet Count: total number of times a given user (v ) has been
retweeted by members of the followee network belonging to u
Mention Count: total number of times a given user (v ) has been
mentioned by members of the followee network belonging to u
Comment Count: total number of times a given user (v ) has had
his content commented on by members of the followee network
belonging to u
Weighted Counts: weight each count by reply-frequency with
ego-network member
Who will follow whom? Exploiting Semantics for Link Prediction 12 / 22
13. Background Problem Formulation Approach Experiments Summary
Features Summary
Type Feature Name Output Domain
Social Mutual Followers Count {0} ∪ + N
Mutual Followees Count {0} ∪ + N
Mutual Friends Count {0} ∪ + N
Mutual Neighbours Count {0} ∪ + N
Topical Tag Vectors - Cosine [0, 1]
Concept Bags - Cosine [0, 1]
Concept Bags - JS-Divergence R +
Concept Graphs - Int - Shortest Path N +
Concept Graphs - All - Shortest Path N +
Concept Graphs - Int - Hitting Time R +
Concept Graphs - All - Hitting Time R +
Concept Graphs - Int - Commute Time R +
Concept Graphs - All - Commute Time R +
Visibility Retweet Count {0} ∪ N +
Mention Count {0} ∪ N +
Comment Count {0} ∪ N +
Weighted Retweet Count {0} ∪ R +
Weighted Mention Count {0} ∪ R +
Weighted Comment Count {0} ∪ R +
Who will follow whom? Exploiting Semantics for Link Prediction 13 / 22
14. Background Problem Formulation Approach Experiments Summary
Concept Disambiguation with User Contexts
Distances across the concept graph capture semantic relatedness
Distance metrics require a mapping between a tag and a concept...
Polysemy Problem: one tag can be mapped to multiple concepts
[Cantador et al., 2011] propose ‘distributional aggregation’ to
choose the most representative tag for a web resource:
Voting mechanism: Tag usage frequency amongst a collection of
users
Our voting mechanism: concept frequency given the user
For a given tag: count candidate concept frequency in concept bag
CTu , choose the most frequent
Who will follow whom? Exploiting Semantics for Link Prediction 14 / 22
16. Background Problem Formulation Approach Experiments Summary
Experimental Setup
1. General Follower Prediction: seeking a follower model
Randomly selected 10% of users and built pairwise feature vectors
2. Binned Follower Prediction: seeking behaviour-specific models
Divided users into 10 bins based on: a) concept-bag entropy, b)
out-degree
Selected all the users from low and high bins, built feature vectors
Divided each dataset into an 80:20% split for training and testing
For each experiment:
1. Model Selection
2. Pattern Analysis
Evaluation Measures:
1. Area Under the receiver operator characteristic Curve (AUC )
2. Matthews correlation coefficient (MCC )
Who will follow whom? Exploiting Semantics for Link Prediction 16 / 22
17. Background Problem Formulation Approach Experiments Summary
Results: Prediction Accuracy
General Follower Prediction Model
Topical features significantly better
Models significantly outperform the random model
Binned Follower Prediction Models
Concept entropy: low - topical features; high - social features
Degree: low and high - topical features
Visibility features have little effect on predictions (majority are zero)
1.0
0.4
Social Social
Topical Topical
0.8
0.3
Visibility Visibility
All All
0.6
0.2
0.4
0.1
0.2
0.0
−0.1
0.0
Full Entropy − Low Entropy − High Degree − Low Degree − High Full Entropy − Low Entropy − High Degree − Low Degree − High
(c) AUC (d) MCC
Who will follow whom? Exploiting Semantics for Link Prediction 17 / 22
18. Background Problem Formulation Approach Experiments Summary
Results: Follower-Decision Patterns
Connections are formed...
In the General Follower Prediction Model when:
users share neighbours
users are closer in terms of the subjects they discuss
In the Binned Follower Prediction Model
for low entropy and low degree users when:
same feature pattern as the general model
for high entropy users when:
users have an overlap of subscribers
tags differ, but similar concepts!
for high degree users when:
users listen to the same people
users share a topical affinity with the same pattern as the general
model
Who will follow whom? Exploiting Semantics for Link Prediction 18 / 22
19. Background Problem Formulation Approach Experiments Summary
Findings
General behaviour pattern: topical homphily
[Schifanella et al., 2010] found socially close users to have high tag
cosine
Our approach detects latent patterns based on concept graphs
On common followers:
[Golder and Yardi, 2010, Brzozowski and Romero, 2011] found
mutual audience to correlate with link creation
We find that: mutual followers should be reduced in the general
model
On common neighbours:
[Leroy et al., 2010] found an increase in mutual neighbours to
correlate with link creation
Similar effect in our findings
Divergent behaviour for high entropy users: suggests a need for
bespoke models
Who will follow whom? Exploiting Semantics for Link Prediction 19 / 22
20. Background Problem Formulation Approach Experiments Summary
Conclusions
Our approach for link prediction outperforms: a) a random baseline,
b) existing network-structure approaches
General follower-decision model identified topical homophily effects
Accounting for behaviour uncovered different follower-decisions:
Unfocussed users follow users with whom they have conceptual
affinity
Concept-graphs allowed for latent effects to be identified
Applicable over the linked data graph
Can improve recommendations by accounting for behaviour and
building bespoke models:
Growing the platform’s network and increasing social capital
Understand who will follow whom, and audience growth
Who will follow whom? Exploiting Semantics for Link Prediction 20 / 22
21. Background Problem Formulation Approach Experiments Summary
Future Work
Apply our approach over Twitter and YouTube: are findings
consistent?
Extract concepts from content, measure distances across the Linked
Data graph
Inclusion of more nuanced user behaviour
Conjecture: performance is conditioned on time-sensitive user
behaviour
User Churn: detecting the complement of link creation
25 days of Twitter logs show this (red):
∆(u) = |Γ− (u)| − |Γ− (u)|
t t (1)
6000
5000
4000
c(∆)
3000
2000
1000
0
−40 −20 0 20 40
∆
Who will follow whom? Exploiting Semantics for Link Prediction 21 / 22
22. Background Problem Formulation Approach Experiments Summary
Questions
Twitter: @mrowebot
Email: m.rowe@lancaster.ac.uk
WWW: http://www.matthew-rowe.com
WWW: http://www.lancs.ac.uk/staff/rowem/
Who will follow whom? Exploiting Semantics for Link Prediction 22 / 22