More Related Content Similar to Social network analysis intro part I (20) Social network analysis intro part I1. Social Network Analysis 2012
Introduction to Social Network Analysis
Part I
Katarina Stanoevska-Slabeva, Miriam Meckel,Thomas Plotkowiak
2. Agenda
1. Introduction to networks ~ 1h
– Types
– Research Areas
2. Introduction network measures ~ 1h
– For whole networks
– For actors
• Centrality measures
3. Workshop ~ 2h
– Import your Facebook Data
– Analyze your Data
– Export your Data
© Thomas Plotkowiak 2010
4. 1.1 Network Types
Domain Aspects: General Aspects:
• Non-Social Networks
– Computer Networks • Direct vs. Indirect Connection
− One Mode
– Power Grid Networks
− Two Mode
– Road Networks
• Temporal Aspects
– Neural Networks …. − Changing in Time
• Social Networks − Static
– Real Life • Topological Aspects
• Friendship − (Non)Directed
• Marriage − (Non)Valued
• Sexual Contact − Shapes (Ring, Star,…)
– Online
• Mobile Networks
• Friendship in OSN
© Thomas Plotkowiak 2010
6. Airline Networks
Source: Northwest Airlines WorldTraveler Magazine
© Thomas Plotkowiak 2010
7. Railway Networks
Source: TRTA, March 2003 - Tokyo rail map
© Thomas Plotkowiak 2010
9. Flavor Networks
A flavor network that captures the flavor compounds shared by culinary ingredients. Each node denotes an ingredient, the node color
indicates food category, and node size reflects the ingredient prevalence in recipes. Two ingredients are connected if they share a
significant number of flavor compounds, link thickness representing the number of shared compounds between the two ingredients.
(Barabasi et al 2012) © Thomas Plotkowiak 2010
11. Twitter News-Sharing Networks
News sharing network of NYT. Nodes are individuals who predominantly share news stories on topics given by the legend.
Links are “follow” relationships between individuals. Cosmopolitan, local scene, national liberal, national conservative, and
national diverse are tightly connected groups. (Herdagdelen 2012)
© Thomas Plotkowiak 2010
12. Political Blog Networks
Color corresponds to political orientation, size reflects the number of citations
received from the top 40 blogs, and line thickness reflects the number of citations
between two blogs. (Adamic 2004) © Thomas Plotkowiak 2010
18. Two Mode Networks
A fragment of the Scottish directorates (1904-5) network. Directors (grey) and Firms (black). Data taken from The anatomy of Scottish Capital (John
Scott and Michael Hughes). 64 nonfinancial firms, 8 banks, 14 insurance comp. and 22 investment companies.
© Thomas Plotkowiak 2010
19. Scientific Knowledge Networks
Circles represent individual journals. The lines that connect journals are clicks from users. Colors correspond to
the AAT classification of the journal. Labels have been assigned to local clusters of journals that correspond to
particular scientific disciplines. (Bollen et al 2012) © Thomas Plotkowiak 2010
22. Sociometry and Social Network Analysis
Sociometry studies interpersonal relations. Society is not an
aggregate of individuals and the characteristics (as statisticians
assume) but a structure of interpersonal ties. Therefore, the
individual is not the basic social unit. The social atom consists
of an individual and his or her social, economic, or cultural ties.
Social atoms are linked into groups, and , ultimately, society
consists of interrelated groups.
© Thomas Plotkowiak 2010
24. Practical applications
• Businesses use SNA to analyze and improve communication flow in
their organization, or with their networks of partners and
customers
• Law enforcement agencies (and the army) use SNA to identify
criminal and terrorist networks from traces of communication that
they collect; and then identify key players in these networksSocial
• Network Sites like Facebookuse basic elements of SNA to identify
and recommend potential friends based on friends-of-friends
• Civil society organizations use SNA to uncover conflicts of interest
in hidden connections between government bodies, lobbies and
businesses
• Network operators (telephony, cable, mobile) use SNA-like
methods to optimize the structure and capacity of their networks
© Thomas Plotkowiak 2010
25. Example of a Sociogram
Choices of twenty-six girls living in one dormitory at a New York state training school.
The girls were asked to choose the girls they liked best as their dining-table partners.
© Thomas Plotkowiak 2010
26. Different Levels of Analysis
Global-Network Primary
Group
Ego-Net
Best Friend
Dyad
2-step
Partial network
© Thomas Plotkowiak 2010
27. Why should we make a distinction?
1. Ego-network
– Have data on a respondent (ego) and the people they are connected
to (alters).
– May include estimates of connections among alters
2. Partial network
– Ego networks plus some amount of tracing to reach contacts of
contacts
– Something less than full account of connections among all pairs of
actors in the relevant population
3. Complete or “Global” data
– Data on all actors within a particular (relevant) boundary
– Never exactly complete (due to missing data), but boundaries are set
Different forms of analysis methods and perspectives have emerged
based on the scope of the analyzed network.
© Thomas Plotkowiak 2010
29. 1.3 Research Areas
• Research on networks
• What are their properties? What is their structure?
• Does structure matter? For ex. How stable are the networks?
• Are all networks similar to each other (no matter what domain)?
• Research on actors
• What positions exist? What position do certain actors have?
• Does position matter? Does a role matter?
• Research on dynamics
• How do actors act in networks? What typical behaviors can we find?
• How do networks form? How do they evolve?
• Research on diffusion
• What flows on the on the edges in the network?
• For ex. How fast does information flow? Where does it flow to?
• How can we influence it?
© Thomas Plotkowiak 2010
30. Research on Network Structure
• Example: How does the Internet look like? (Britt)
© Thomas Plotkowiak 2010
33. Research on Network Dynamics
• Example Friendship Network Formation (Snijders)
t=0
t=1
t=2 t=3
© Thomas Plotkowiak 2010
35. Research on Diffusion
Adopted 1Q Post Launch
Adopted 2Q Post Launch
Adopted 3Q Post Launch
Adopted 4Q Post Launch
Adopted 5Q Post Launch
Adopted 6Q Post Launch
Adopted 7Q Post Launch
Adopted 8Q Post Launch
37. Metrics for whole networks
• Density
• Average Degree
• Average Distance
• Diameter
• Number of Components
• … Next Session: More advanced metrics for whole networks (degree distributions, clustering, hierarchy etc..)
© Thomas Plotkowiak 2010
38. Density
• Density: Number of ties, expressed as percentage of the
number of ordered/unordered pairs
low density: 25% high density: 39%
© Thomas Plotkowiak 2010
39. Average Degree
• Average number of links per Person
Density: 0,47 Density: 0,14
Average Degree: 4 Average Degree: 4
© Thomas Plotkowiak 2010
40. Average Distance
• Average geodesic distance between all pairs of nodes
avg. distance 1.9 avg. distance 2.4
© Thomas Plotkowiak 2010
41. Diameter
• Maximum Distance (= The length of the longest shortest
path.)
diameter 3 diameter 3
© Thomas Plotkowiak 2010
42. Number of Components
• Component Ratio: Number of Components minus 1 divided
by number of nodes minus 1
CR is 1 when all nodes are isolates.
CR is 0 when all nodes are in one component. CR: (3-1)/(14-1) = 0.154
© Thomas Plotkowiak 2010
44. Centrality Measures
• Distance
• Degree Centrality
• Degree Prestige
• Closeness Centrality
• Betweenness Centrality
• Eigenvector Centrality & Pagerank
© Thomas Plotkowiak 2010
45. Example – Communication ties within a sawmill
H – Hispanic
E – English
M- Mill
P – Planer section
Y - Yard
Vertex labels indicate the ethnicity and the type of work of each employee, for example
HP-10 is an Hispanic (H) working in the planer section (P)
© Thomas Plotkowiak 2010
46. Distance
• The larger the number of sources accessible to a person, the
easier it is to obtain information. Social ties constitute a social
capital that may be used to mobilize social resources.
A geodesic is the shortest path between two vertices.
The distance from vertex u to vertex v is the length of the
geodesic from u to v.
© Thomas Plotkowiak 2010
47. Degree Centrality
• The simplest indicator of centrality is the number of its
neighbors (degree in a simple undirected network)
The degree centrality of a node is its degree.
4 3
© Thomas Plotkowiak 2010
48. Degree Centrality for whole networks
Degree centralization of a network is the variation in the degrees
of vertices divided by the maximum degree variation which is
possible in a networks of the same size.
Degree Centralization = 1 Degree Centralization Thomas Plotkowiak 2010
© = 0.17
49. Prestige Centrality = Indegree
•Prestige can be expressed as the relative indegree of an
actor (degree prestige)
1 4 6 10
3 8 9
2 5 7 11
Prestige of node 3: Pd = 2+3 / (11= 0, 2
(n3 ) − 1)
= x+ j / ( g − 1)
Pd (n j )
Notice: Prestige does not depend on the size of the group and ist value lies between
0 and 1 (Star).
© Thomas Plotkowiak 2010
50. Closeness Centrality
• Closeness centrality : A person is always then central, if that
person regarding to the network relation is very close to all
other persons. Such a central position allows to improve the
efficiency of the communication of an actor. Such an actor is
able to desseminate and receive information fast.
g −1
Cc ( ni ) = g
∑ d (n , n
j =1
i j )
© Thomas Plotkowiak 2010
51. Closeness Centrality
1 4 6 10
3 8 9
2 5 7 11
ni nj d
n Cc
3 1 1
1 0,27
3 2 1
11 − 1 2 0,29
Cc = = 0, 43
3 4 1
3 5 1
( n3 ) 3 0,43
3 6 2
23 4 0,45
3 7 2
5 0,45
3 8 3 6 0,45
3 9 4 Notice: We are only analyzing 7 0,45
3 10 5 symetrical relations and fully connected 8 0,45
3 11 5 networks. 9 0,37
23 10 0,27
© Thomas Plotkowiak 2010
52. Closeness Centrality for whole networks
• Centralisation is a structural property of a group and not a
relational attribute of individual actors.
• Index for Centralisation is computed by summing the
differences of the the centrality of the most central actor and
the centrality of all other actors and dividing by the Maximum
possible value for such a network.
∑ [C (n ) − CD (i)]
g *
D
CD = i=1
[(N −1)(N − 2)]
© Thomas Plotkowiak 2010
53. Centralisation II
• Centralisation is always high when only one node has a high
centrality degree and the remaining nodes are not central.
• Notice: Only the difference of data of a fixed group at
different timeslots allows for interpretable results (analogue to
network density)
Closenes Centralization = 1 Closeness Centralization = Thomas Plotkowiak 2010
© 0.43
54. Betweenness Centrality
• Betweenness Centrality: Persons (Cutpoints), that connect
two in other respects unconnected subpopulations, are actors
with a high betweenness centrality score.
• Notice : We are assuming that information always travels on
the shortest paths!
g (n i )
∑
j ≠k
jk
g jk
i ≠ j ,k
Cb ( ni ) =
( g − 1)( g − 2)
* (g-1)(g-2)/2 for undirected graphs © Thomas Plotkowiak 2010
55. Betweenness centrality
• Notice: In directed networks it is possible that some actors
are not reachable by others, but are themselves able to reach
other nodes by themselves.
1 4 6 10
3 8 9
2 5 7 11
1 2 3 4 5 6 7 8 9 10 11
0 0 0,37 0,22 0,22 0,22 0,22 0,48 0,37 0 0
© Thomas Plotkowiak 2010
56. Hue (from red=0 to blue=max) shows the node betweenness.
© Thomas Plotkowiak 2010
57. Eigenvector Centrality
Don Corleone did not have many strong
ties. He was a man of few words, yet he
could make an offer you can’t refuse.
Don Corleone surrounded himself with
his sons and his trusted capos, who in
turn, handled the day to day management
issues of the family.
© Thomas Plotkowiak 2010
58. Eigenvector Centrality
Make xi proportional to the average of the centralities of its i’s
network neighbors
n
1
xi =
λ
∑A x
j =1
ij j
where λ is a constant. In matrix-vector notation we can write
1
x= Ax
λ
The value λ is an eigenvalue of matrix A if there exists a non-zero
vector x, such that Ax=λx. Vector x is an eigenvector of matrix A
The largest eigenvalue is called the principal eigenvalue
The corresponding eigenvector is the principal eigenvector
© Thomas Plotkowiak 2010
59. Centralities in comparison
• Degree: How many people can
this person reach directly?
•
Betweenness: How likely is this
person to be the most direct
route between two people in
the network?
•
Closeness: How fast can this
person reach everyone in the
network?
• Eigenvector: How well is this
person connected to other
well-connected people?
© Thomas Plotkowiak 2010
61. Process
1. Import Data with Netviz
2. Process with Gephi
1. Open 7. Labels
2. Layout 8. Community detection
3. Ranking (Degree) 9. Filter
4. Statistics 10. Label Adjust
5. Ranking (Betweenness) 11. Preview
6. Layout (Size Adjust)
3. Export
© Thomas Plotkowiak 2010
62. Netvizz
1. Sign in to your Facebook account
2. Search for netvizz application
3. Choose parameters you would like to include in the data (e.g.
gender, wall posts count, interface language)
4. Analyze either
– Your personal friend network today
– [OR] one of your groups listed at the bottom
5. Wait for the application to create the .gdf file and download it
(right click, save as)
© Thomas Plotkowiak 2010
63. Gephi
• Gephi is an open-source network analysis and visualization
software package.
• Envisioned as providing "easy and broad access to network
data", it's advertised as being "Like Photoshop for graphs."
• Gephi has been used in a number of research projects in the
university, journalism and elsewhere.
• The Gephi Team: Mathieu Bastian, Sebastien Heymann, Julian
Bilcke, Mathieu Jacomy, Franck Ghitalla
© Thomas Plotkowiak 2010
64. Gephi: 1. Open
• From File menu select
Open and then select
the .gdf file you saved
from Netvizz
• At first it looks like a big
hairball, so we'll change
the layout to make some
sense of the connections
© Thomas Plotkowiak 2010
65. Gephi: 2. Layout
• From the Layout module on
the left side chose Force
Atlas* from the Dropdown
Menu, then click run
– Force atlas makes connected
nodes attract each other, while
unconnected nodes are
pushed towards the periphery
• Click stop when it seems
that the layout has
converged towards a stable
state
*For graphs with a large number of nodes or edges rather chose Yifan Hu Layout
© Thomas Plotkowiak 2010
66. Gephi: 3. Ranking (Degree)
1. Chose the Ranking-Nodes
Tab in the top left module
and chose Degree from the
dropdown menu
– Degree = number of
connections
2. Hover your mouse over the
gradient bar, then double click
on each triangle to choose a
color for each side of the
range
– Try to use bright colors for the
highest degree and dark for
lowest
3. Click apply © Thomas Plotkowiak 2010
67. Gephi: 4. Statistics
• Click the Statistics tab in the
top right module
• Click Run next to Average
path length
– Chose directed from Popup
Menu
• Click close when the graph
reports shows up
© Thomas Plotkowiak 2010
68. Gephi: 5. Rank (Betweeness)
• Return to Ranking in the
top left module and click
Chose a rank parameter
from the dropdown
– Chose Betweeness Centrality
from the dropdown menu
• Click on the icon for size,
instead of color
– Set min size to 10 and max
size to 50 (experiment a little)
• Click Apply
© Thomas Plotkowiak 2010
69. Gephi: 6. Layout
• To keep the larger nodes
from overlapping smaller
ones, go to the Layout
tab and check the Adjust
by sizes box
• Click Run and then Stop
© Thomas Plotkowiak 2010
70. Gephi: 7. Labels
• Click the bold black T in
the toolabar at the bottom
of the window to turn
labels on
• Click the black letter A in
the same toolbar to select
the Size Mode for the
labels, and choose the node
size option
• Use the slider on the right
to adjust the size
• You can also change the
font style by clicking next
to the slider
© Thomas Plotkowiak 2010
71. Gephi: 8. Community Detection
• Go back to the statistics tab
on the right and click Run
next to Modularity
– Check randomize and click OK
• Go to the partition tab in the
top left module and click the
refresh arrow
• Choose modularity class
from the dropdown menu
– Right click to randomize colors
• Click Apply
© Thomas Plotkowiak 2010
72. Gephi: 9. Filter
• Go to Filters in the top right
module and open the
Topology Folder
– Drag the degree range to the
box below ("Drag filter here")
• Click on Degree Range to
open the Parameters
– Click on the "0" and change it to
a slightly higher value
– This removes the nodes that are
not connected to many other
nodes
• Click Filter
© Thomas Plotkowiak 2010
73. Gephi: 10. Label Adjust
1. Go to the Layout module
on the left
2. Chose label Adjust layout
to make the labels not
overlapping
3. Click Run and then Stop
© Thomas Plotkowiak 2010
74. Gephi: 11. Preview
1. At the very top click on the
Preview tab
2. Under Node, check the box
"Show Labels"
3. Click Refresh at the bottom,
and choose your label font
4. Play around with the
options until you like your
graph (Don't forget to click
refresh every time)
© Thomas Plotkowiak 2010
75. Gephi: 12. Export
• To Export your graph for publication in SVG or PDF
click the Export button
• Save
© Thomas Plotkowiak 2010
76. Gephi: 13. Make sense out of it
Friends from swimming club
Roommate & swimming club
Friends from
staying in Japan
Friends from studies at the
University of Mannheim
Friends from studies at the
University of Waterloo
Joined me on
Friends from school the exchange to
Canada
© Thomas Plotkowiak 2010
77. Hungry? Need More Data?
• Use NodeXL
• Write own crawlers
(ask me)
• Use existing archives
– http://snap.stanford.edu/
– http://vlado.fmf.uni-
lj.si/pub/networks/pajek
– http://vlado.fmf.uni-
lj.si/pub/networks/data/
ucinet/ucidata.htm
• Collect by Surveys
© Thomas Plotkowiak 2010
78. Time to read a book on SNA. But which?
© Thomas Plotkowiak 2010
79. Interactive Summary
The biggest advantage I can gain by using SNA is…
The most important fact about SNA for me is…
The concept that made the most sense for me today was…
The biggest danger in using SNA is …
If I will use SNA in the future, I will try to make sure that…
If I use SNA in my next project I will use it for …
I should change my perspective on networks in considering …
I have changed my opinion about SNA , finding out that…
I missed today that …
Before attending that seminar I didn't know that …
I wish we could have covered…
If I forget mostly everything that learned today, I will still remember …
The most important thing today for me was …
© Thomas Plotkowiak 2010