Community analysis using graph representation learning on social networks

Community Analysis
Using Graph Representation Learning
On Social Networks
Marco Brambilla and Mattia Gasparini
Politecnico di Milano

Introduction
• Development of platforms such as Instagram and
Facebook increased levels of interaction among
people
• Variety of social networks data exploited to map
users behavior
• Graphs perfectly fit for modeling all the
interactions of these users
2

Problem Statement
• Analysis of communities on on-line social
networks, applying machine learning on graphs
• Representation learning is used to extract valuable
information about users inside the community
• Classification of consumer and business users
• Grouping of similar users
3

Representation Learning
• Define a continuos representation for each node of the
graph (embedding) to easily apply machine learning
techniques on graphs
• Embeddings are based on neighbourhood nodes:
4
u
u :

Node2vec
• Emeddings computations performed using
node2vec algorithm[1], included in the Stanford
Network Analysis Platform (SNAP) library
• The algorithm calculates the embeddings solving an
optimization problem:
max
𝑓
𝑢 ∈𝑉
log Pr(𝑁𝑠(𝑢)|𝑓 𝑢 )
5
[1] Grover and Leskovec. 2016. node2vec: Scalable Feature Learning for Networks.

Node2vec
6
OutputInput
Node2vec
algorithms
calculates
embeddings such
that similarities
between graph
nodes and vectors
are preserved.

Case Study
• Emerging Italian fashion brand: Emporio Le Sirenuse
• Products: luxury swimsuits and dresses
• Case study is focused on the brand, its competitors
and their communities, defined as the set of
followers users on social network
7
http://www.fashiondatasensing.polimi.it/

Related Work
• Users’ communities defined using graph’s structural
properties [himelboim2017, deeb2017, guerrero2017]
• Brand-related communities have a specific role,
with business strategies as final target [ramadan2018,
kim2014, campbell2014]
• Fashion brands gain major advantages from social
media [brambilla2017, schmidt2017]
8

Analysis Pipeline
9
The proposed solution defines a method to handle all the steps of the analysis.

1 – Data Collection
• Web scraping of 10 brands and their followers data
from Instagram
• Time window: from 1 𝑠𝑡
January 2017 to 1 𝑠𝑡
November 2017
• Final database : 400K users, 10M posts
10

2 – Graph Construction
• Graphs are built using several entities: users that we
want to analyze (𝑈𝑡), their posts (𝑃), hashtags
referenced in the posts (𝐻) and mentioned users (𝑈 𝑚)
• Symmetrically, three different types of edges are
defined:
o 𝐸 𝑜𝑤𝑛𝑒𝑟 = 𝑒1, 𝑒2 𝑒1 ∈ 𝑈𝑡, 𝑒2 ∈ 𝑃}
o 𝐸𝑡𝑎𝑔 = 𝑒1, 𝑒2 𝑒1 ∈ 𝑃, 𝑒2 ∈ 𝑇}
o 𝐸 𝑚𝑒𝑛𝑡𝑖𝑜𝑛 = 𝑒1, 𝑒2 𝑒1 ∈ 𝑃, 𝑒2 ∈ 𝑈 𝑚}
11

2 – Graph Construction
• Three graph models are used for the analysis:
1. Mixed network: 𝐺 𝑀 = 𝑈, 𝑃, 𝑇 , 𝐸 𝑜𝑤𝑛𝑒𝑟, 𝐸𝑡𝑎𝑔, 𝐸 𝑚𝑒𝑛𝑡𝑖𝑜𝑛
2. Hashtags network: 𝐺ℎ = 𝑈𝑡, 𝑃, 𝑇 , 𝐸 𝑜𝑤𝑛𝑒𝑟, 𝐸𝑡𝑎𝑔
3. Mentions network: 𝐺 𝑚 = 𝑈𝑡, 𝑈 𝑚, 𝑃 , 𝐸 𝑜𝑤𝑛𝑒𝑟, 𝐸 𝑚𝑒𝑛𝑡𝑖𝑜𝑛
• 𝐺ℎ and 𝐺 𝑚 are subgraphs of 𝐺 𝑀: they map the
influence of specific social media aspects
12

Example Hashtags
Network
13
The central part of the graph features
the most connected nodes, which
correspond to the users that
have many hashtags in common.

3 – Graph Reduction
• A reduction process is applied to 𝐺ℎ and 𝐺 𝑚 to obtain «classical» social
networks, where the nodes are the users and the edges are weighted
based on the number of shared entities:
𝑤𝑖𝑗 =
𝑡𝑖 ∩ 𝑡𝑗 , 𝑖𝑓 𝑖, 𝑗 ∈ 𝐺ℎ
𝑚𝑖 ∩ 𝑚𝑗 , 𝑖𝑓 𝑖, 𝑗 ∈ 𝐺 𝑚
where 𝑖, 𝑗 ⊂ 𝑈𝑡, 𝑡𝑖,𝑗 ⊆ 𝑇, 𝑚𝑖,𝑗 ⊆ 𝑈 𝑚
• 𝐺ℎ and 𝐺 𝑚, the reduced hashtags and reduced mentions networks, are
generated
14

Reduced Graph
Example
15
Reduced mentions
network 𝐺 𝑚: edges
are weighted based
on number of
common mentioned
users.

4 – Features Extraction
• Both heterogeneous networks 𝐺ℎ,𝑚 and reduced
networks 𝐺ℎ,𝑚 are used to extract the embeddings
• Feature vectors dimension is fixed for the two types
of networks: 𝑑 𝐺 = 8 and 𝑑 𝐺 = 4, respectively.
• Hyper-parameter tuning for 𝑝 and 𝑞 in supervised
setting
16

5 – Classification
• Domain specific task:
«Discriminate between consumer and non-consumer
users»
• Ground-truth of 351 labelled users defined with
domain experts
• Three features set are tested:
• Social media account data(#followers, #following,
#posts, bio)
• Complete network embeddings
• Reduced network embeddings
17

5 – Classification Experiment
18
Description of the user is valuable if a good fraction of the neighborhood
is exploited, which is not always feasible for complete networks.

5 – Classification Experiment on Reduced Networks
19
Performance and number of classified users increase with the number of user nodes
included in the model, even if they are not classified: they enrich the neighborhood and, by
consequence, the features vector.

6 – Clustering
• Hashtags reduced networks 𝐺ℎ used as proxy to
content-based similarity
• K-means is applied on extracted features vectors
• Focus on 𝐺ℎ of Emporio Le Sirenuse community
20

6 – Clustering
Network Input
21
Hashtags Reduced
Network 𝐺ℎ of
Emporio Le Sirenuse
community.

6 – Clustering Features
22
Embeddings extracted from the
network.
First two features components
are used for visualization.

6 – Clustering Output
23
K selection: plot of inertia
against number of clusters

6 – Output
Network
24
Application of
clustering output to
the reduced network

6 – Cluster Validation: Domain Experts
• Domain experts are provided with a subset of users for each
cluster
• Manual inspection of user profiles, providing feedback
about the patterns present in each cluster
25

6 – Cluster Validation: Experts Feedback
• Cluster 0, 1 and 2 very well defined: professionals
users, such as showrooms and other brands
• Cluster 3 contains regular users that share contents
about holidays in Italy
• Clusters 3, 4, 5 and 6 composed mostly by regular
users, too
26

6 – Cluster Labels
27
Cluster labels extracted using the set of hashtags shared at least by two users inside the
cluster.

28
FOOD
LUXURY
HIPSTER
INTERNATIONAL
INTERIOR
DESIGN
VINTAGE
ITALIAN HOLIDAYS
6 – Final Result

Conclusion
• Results:
• Definition of an effective method to analyze
communities inside social network domain
• Modeling of user similarities through network features
• Detection of content-driven sub-communities
• Future work:
• Inclusion of time variable
29

Questions?
Contacts:
Marco Brambilla: marco.brambilla@polimi.it
Mattia Gasparini: mattia.gasparini@polimi.it
@marcobrambi @datascience_mi
http://www.fashiondatasensing.polimi.it/
http://datascience.deib.polimi.it

Community analysis using graph representation learning on social networks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Community analysis using graph representation learning on social networks

Similar to Community analysis using graph representation learning on social networks (20)

More from Marco Brambilla

More from Marco Brambilla (20)

Recently uploaded

Recently uploaded (20)

Community analysis using graph representation learning on social networks

Editor's Notes