ICPSR - Complex Systems Models in the Social Sciences - Lecture 3 - Professor...
Presentation
1. Dynamics in large scale networks
John Clements
Supervised by: Dr. Babak Farzad, Dr. Henryk Fuk±
Brock University
jc09xs@brocku.ca
February 01 2016
John Clements (Brock University) Dynamics in large scale networks February 01 2016 1 / 65
2. Table of Contents
1 Introduction
Denitions
2 A Brief History of Large network dynamics
Patterns in the removal of nodes from large networks.
Network properties
3 High Clustering
4 Node expiration
Connectivity and node expiration.
Degree
Clustering Coecient
Conclusions
5 The server merger
Graphical evolution
The servers before the merger
The merger.
Degree dierences.
6 Graph motifs
Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 2 / 65
3. Table of Contents
1 Introduction
Denitions
2 A Brief History of Large network dynamics
Patterns in the removal of nodes from large networks.
Network properties
3 High Clustering
4 Node expiration
Connectivity and node expiration.
Degree
Clustering Coecient
Conclusions
5 The server merger
Graphical evolution
The servers before the merger
The merger.
Degree dierences.
6 Graph motifs
Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 3 / 65
4. Graph Theory Denition
Graph
A Graph G is an ordered pair (V (G),E(G)) consisting of a set V (G) of
vertices and a set E(G) of edges, that form connections between them.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 4 / 65
5. Network analysis Denitions
Degree
The degree of a vertex v in a graph G, denoted kG (v) is the number of
edges of G incident with v.[?]
Clustering coecient:
The clustering coecient of a node v is:
cv =
2T(v)
kv (kv − 1)
Where T(v) is the number of triangles (i.e. connected neighbors) v is
involved in. The clustering coecient of a degree 0 or 1 node is set as 0.[?]
John Clements (Brock University) Dynamics in large scale networks February 01 2016 5 / 65
6. Network analysis Denitions
Degree
The degree of a vertex v in a graph G, denoted kG (v) is the number of
edges of G incident with v.[?]
Clustering coecient:
The clustering coecient of a node v is:
cv =
2T(v)
kv (kv − 1)
Where T(v) is the number of triangles (i.e. connected neighbors) v is
involved in. The clustering coecient of a degree 0 or 1 node is set as 0.[?]
John Clements (Brock University) Dynamics in large scale networks February 01 2016 5 / 65
7. A Brief overview of Large network dynamics
There are a truly enormous number of paper of studies and analysis of real
world large networks including nearly any type of online network.
Alongside these studies are network models
But the removal process of nodes from large networks has rarely been
studied empirically and incorporated in very few dynamic into network
models. Most dynamic models
Many dynamic models have been proposed often these include a edge
removal process models that use a node removal process are much rarer.
Examples: 6 degrees of separation, the actor network, durr
These studies range from single snapshots and painstakingly gathered
survey data to the event based dynamic studies.
Most of these studies do not account for the removal of nodes.
Another area important to us is network modeling, many of the studies of
real world networks propose a model or provide best ts of one.
Models
• Nodal attribute models
• Exponental random graphsJohn Clements (Brock University) Dynamics in large scale networks February 01 2016 6 / 65
8. The two datasets.
The businesses competing for add space on Google and Bing.
Why did we choose these networks in particular?
We looked at the removal or lapse process for businesses in a network of
businesses competing for AD space on Google and Bing.
The network of friendships among Avatars in the MMOFPS
planetside 2.
We looked for patterns in the removal or expiration of Avatars in the
massively multiplayer online game (MMOG) planetside 2.
Look at the merger of two servers.
Examine the removal or lapse process for avatars, looking for a simple rule.
Why did we collect these?
• We thought we could nd patterns in the removal or lapse of these
nodes.
• Competition example
John Clements (Brock University) Dynamics in large scale networks February 01 2016 7 / 65
9. Crawler overview.
1 Gather a list of active avatar Ids from the server we want to crawl.
Add them to the queue of Id's to check.
2 Get the friendlists of all avatar Ids in the queue from the API. If
successful remove them from the queue and add it to the list of visited
Ids.
3 Then go through the friend list identify which Ids are valid. Save each
of these valid friend relationships to the edge set.
4 While there are Ids in the queue go to step two.
5 Record the edge list in a sql table.
6 Gather the avatar attributes for each of the Ids found in the crawl and
record them to a sql table.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 8 / 65
10. Planetside 2 avatar attributes
The planetside 2 servers
Datasets
Server Location 7 days 44 days
US East EW Emerald
US West CW Connery
EU MW Miller
Our data is drawn from three
planetside 2 servers:
• Connery the east coast server.
• Emerald the server created from the merger of Waterson and Mattherson.
• Miller a EU server
The available avatar attributes depends on when it was gathered.
Common to both datasets.
• Id
• Name
John Clements (Brock University) Dynamics in large scale networks February 01 2016 9 / 65
11. Exclusive
Avatars online in the last 44 days,
stored in Connery, Emerald and Miller.
• Includes the server merger
• Starts on the 23rd of May.
Avatars online in the last 7 days,
referred to by CW,EW and MW.
• Outt Id
• Outt size
• Creation date
• Login count
• Last login date
• Total time played and time played
by month
• Number of kills and deaths by
month.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 10 / 65
12. Correlation between attributes
Average attribute correlation matrix for CW.
Degree CC Br Kills Deaths K/D Time Outt Size
Degree 1.000
CC -0.005 1.000
Br 0.305 0.059 1.000
Kills 0.210 0.006 0.499 1.000
Deaths 0.206 0.001 0.445 0.794 1.000
K/D 0.103 0.004 0.403 0.324 0.194 1.000
Time 0.246 0.004 0.510 0.792 0.892 0.280 1.000
Outt Size 0.024 -0.033 0.088 0.003 0.065 0.008 0.056 1.000
John Clements (Brock University) Dynamics in large scale networks February 01 2016 11 / 65
13. Google Ad network visualization
John Clements (Brock University) Dynamics in large scale networks February 01 2016 12 / 65
14. High Clustering
Clustering coecient in the planetside 2 snapshots.
Advertisement networks.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 13 / 65
15. High Clustering
Clustering coecient in the planetside 2 snapshots.
Advertisement networks.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 13 / 65
16. Avatar states
Active vs Inactive
• An avatar with active after the previous snapshot is active.
• An inactive avatar is any avatar who is not active but is seen active
again in the future.
Avatar states
• A new avatar is any avatar created after the previous snapshot.
• A dead or abandoned avatar is any avatar who never returns from
inactivity.
• And the third group of Immediately abandoned (IA) new avatars.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 14 / 65
17. small world for long diameters
Advertisment network Random graphs.
Bing Google
Diameter 7 8 3 4
APL 2.528 2.752 2.945 (0.00180) 5.108 (0.00696)
Table: The diameter and average path length of the competition network
John Clements (Brock University) Dynamics in large scale networks February 01 2016 15 / 65
18. The clustering coecient distribution of Google.
Full Without the spike
John Clements (Brock University) Dynamics in large scale networks February 01 2016 16 / 65
19. The clustering coecient distribution of Bing.
Full Without the spike
John Clements (Brock University) Dynamics in large scale networks February 01 2016 17 / 65
20. Emerald August 18th a typical distribution of avatar
clustering coecient
Full Without the spike
John Clements (Brock University) Dynamics in large scale networks February 01 2016 18 / 65
21. Table of Contents
1 Introduction
Denitions
2 A Brief History of Large network dynamics
Patterns in the removal of nodes from large networks.
Network properties
3 High Clustering
4 Node expiration
Connectivity and node expiration.
Degree
Clustering Coecient
Conclusions
5 The server merger
Graphical evolution
The servers before the merger
The merger.
Degree dierences.
6 Graph motifs
Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 19 / 65
22. Edges connecting failed companies.
Compared with edges in random subgraphs.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 20 / 65
23. Edge dynamics
CW EW MW
Edge formation
existing ↔ existing 29.97% 29.23% 25.15%
existing ↔ new 4.74% 5.27% 4.85%
existing ↔ IA 3.31% 3.71% 3.32%
new ↔ new 0.45% 0.53% 0.56%
new ↔ IA 0.30% 0.41% 0.39%
IA ↔ IA 0.23% 0.29% 0.30%
Edge deletion
One removed 53.15% 54.70% 52.01%
Both removed 4.30% 5.02% 4.54%
Broken 4.06% 4.23% 4.69%
Unstable 0.29% 0.30% 0.26%
John Clements (Brock University) Dynamics in large scale networks February 01 2016 21 / 65
24. Power law Degree Distribution
The impact of the degree
• Power law x−α
• Exponential e−λx
• Power law with exponential cuto x−α
e−λx
John Clements (Brock University) Dynamics in large scale networks February 01 2016 22 / 65
25. Degree of failed companies
Bing Google
John Clements (Brock University) Dynamics in large scale networks February 01 2016 23 / 65
26. Degree of Dead Avatars
John Clements (Brock University) Dynamics in large scale networks February 01 2016 24 / 65
27. Clustering coecient distribution of failed companies
Bing Google
John Clements (Brock University) Dynamics in large scale networks February 01 2016 25 / 65
28. The normalized battle rank distribution.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 26 / 65
29. Avatar state by size of outt.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 27 / 65
31. Conclusions
• Generally the nodes that were removed from both network were
peripheral in unimportant positions.
• But none of the patterens we did nd were strong indicators in the end.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 29 / 65
32. Table of Contents
1 Introduction
Denitions
2 A Brief History of Large network dynamics
Patterns in the removal of nodes from large networks.
Network properties
3 High Clustering
4 Node expiration
Connectivity and node expiration.
Degree
Clustering Coecient
Conclusions
5 The server merger
Graphical evolution
The servers before the merger
The merger.
Degree dierences.
6 Graph motifs
Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 30 / 65
33. W
e initially started collecting the planetside 2 dataset to capture the merger
of two servers.
• This is the rst time that a server merger has been captured and
studied.
• Provides an easily studied analog to real world merger of populations.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 31 / 65
34. Reading the graphs
These images were created using Gephi [?] using the force atlas 2 layout.
Node size scales linearly with degree, and colour is assigned by the
following table.
Colour Key
Origin Faction
NC TR VS
Waterson
Mattherson
Neither
John Clements (Brock University) Dynamics in large scale networks February 01 2016 32 / 65
35. Merger: Waterson June 23
John Clements (Brock University) Dynamics in large scale networks February 01 2016 33 / 65
36. Merger: Mattherson June 23
John Clements (Brock University) Dynamics in large scale networks February 01 2016 34 / 65
37. June 30th
John Clements (Brock University) Dynamics in large scale networks February 01 2016 35 / 65
38. July 14th
John Clements (Brock University) Dynamics in large scale networks February 01 2016 36 / 65
39. August 4th
John Clements (Brock University) Dynamics in large scale networks February 01 2016 37 / 65
40. August 18th
John Clements (Brock University) Dynamics in large scale networks February 01 2016 38 / 65
45. Assortivity:
Assortativity measures the tendency of nodes to be connected to nodes
similar to themselves in some way.
The assortativity coecient is dened as follows:
r =
i ei,i − i a2
i
1 − i a2
i
Where ei,j is the fraction of edges that connecting vertexes of type i to
vertexes of type j. Let ai to be the fraction of edges connecting to a vertex
of type i.
The minimum is:
rmin =
− i a2
i
1 − i a2
i
which occurs when ei,j = 0∀i,j
John Clements (Brock University) Dynamics in large scale networks February 01 2016 43 / 65
46. The assortivity by origin.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 44 / 65
47. Degree dierence of cross origin edges.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 45 / 65
48. Degree dierence of mattherson to waterson edges.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 46 / 65
49. Table of Contents
1 Introduction
Denitions
2 A Brief History of Large network dynamics
Patterns in the removal of nodes from large networks.
Network properties
3 High Clustering
4 Node expiration
Connectivity and node expiration.
Degree
Clustering Coecient
Conclusions
5 The server merger
Graphical evolution
The servers before the merger
The merger.
Degree dierences.
6 Graph motifs
Finding bounds on the 3 node subgraphs.John Clements (Brock University) Dynamics in large scale networks February 01 2016 47 / 65
50. Graph motif
Finding the potential motifs in a Barabási-Albert graph.
Denition
The graph motifs of a network are patterns that occur signicantly more
often in it then expected in an ensemble of networks[?].
The signicance of a motif is measured with the simple Z score.
Signicance
Z =
M − ¯Mr
σr
Where M be the number of subgraphs in the network and ¯Mr and σr be
the mean and standard deviation for the number found in the ensemble
John Clements (Brock University) Dynamics in large scale networks February 01 2016 48 / 65
51. History of network motifs
Introduced in by Shen-Orr et. al.
Most research has focused eciently nding motifs such as:
• FANMOD 2006
• KAVOSH 2009
In 2013 Johan Ugander found the extremal bounds on the potential
subgraphs found in any network by its density.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 49 / 65
52. History of network motifs
Introduced in by Shen-Orr et. al.
Most research has focused eciently nding motifs such as:
• FANMOD 2006
• KAVOSH 2009
In 2013 Johan Ugander found the extremal bounds on the potential
subgraphs found in any network by its density.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 49 / 65
53. History of network motifs
Introduced in by Shen-Orr et. al.
Most research has focused eciently nding motifs such as:
• FANMOD 2006
• KAVOSH 2009
In 2013 Johan Ugander found the extremal bounds on the potential
subgraphs found in any network by its density.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 49 / 65
55. The Barabási Albert Algorithm
The algorithm takes two parameters N the number of nodes in the nal
graph and m the number of edges each node forms to existing nodes.
• Create graph with m unconnected nodes.
• While there are less then N nodes in the network, add a node with m
edges to existing nodes.
• The probability of choosing a existing node is proportional to its
degree.
[?]
P(v) =
kx
i∈V (G)
ki
(1)
John Clements (Brock University) Dynamics in large scale networks February 01 2016 51 / 65
56. Random graph ensemble
• Traditionally the ensemble consists of random graphs with the same
degree distribution as the original network.
• However this method results in some correlations that arrise from the
degree distrobution itself
• So we used a ensemble of Gn,p random graphs with the same density
as the original.
Gn,p random graph
There are two parameters n and p, generate a graph with n nodes for every
pair of nodes add an edge with probability p independently. The expected
density of such a graph is equal to the p.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 52 / 65
57. Random graph ensemble
• Traditionally the ensemble consists of random graphs with the same
degree distribution as the original network.
• However this method results in some correlations that arrise from the
degree distrobution itself
• So we used a ensemble of Gn,p random graphs with the same density
as the original.
Gn,p random graph
There are two parameters n and p, generate a graph with n nodes for every
pair of nodes add an edge with probability p independently. The expected
density of such a graph is equal to the p.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 52 / 65
58. The 4 undirected triads
Possible triads
Empty
(1 − p)3
One edge
3p(1 − p)2
Open Triad
3p2(1 − p)
Triangle
p3
The density of a completed BA graph is:
p =
2m(N − m)
N(N − 1)
(2)
So we can easily compute the expected number of each triad in a Gn,p
random graph.
N
(1 − p)3 N
3p(1 − p)2 N
3p2
(1 − p)
N
p3
John Clements (Brock University) Dynamics in large scale networks February 01 2016 53 / 65
59. Empirical tests
The triangles created by low values of m.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 54 / 65
60. Probabilistic bounds.
A the time t depends entirely
on the parameters N and m.
Since we start from a empty
graph and add m edges with
every node the number of
edges at any given time must
be:
E(Gt) = m(t − m) (3)
As a result many other graph
parameters can be calculated
at any given step such as
density:
D =
2m(t − m)
t(t − 1)
(4)
Examples of edge probabilities and
bounds.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 55 / 65
61. Additive bounds.
• So when N is greater then 8m2−1+ 16m3−16m2+1
8m−2 the probability of a
edge is greater in the BA graph then in a Gn,p graph.
• If we were to simply count how many of each subgraph are added and
nd bounds for small motifs at least.
• The expected number of any n node subgraphs in our ensemble which
is simply (N
3)P where P is the probability of such a subgraph in a Gn,p
random graph.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 56 / 65
62. Bounds on triads in the BA graph vs the expected number in the
ensemble.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 57 / 65
63. Triangles
As we know at each timestep we add m at most (m
2) triangles. And at least
(m
2) open triads are created at each step after the second.
BarabásiAlbert
The upper bound on the number of Triangle subgraphs is:
N
t=m+1
m
2
=
1
2
m(m − 1)(N − m) (5)
Random Graph Ensemble
The expected number of triangle subgraphs in a Gn,p random graph is:
N
3
p3
=
4
3
(N − 2)(N − m)3m3
(N − 1)2N2
(6)
John Clements (Brock University) Dynamics in large scale networks February 01 2016 58 / 65
64. Triangles
As we know at each timestep we add m at most (m
2) triangles. And at least
(m
2) open triads are created at each step after the second. Trivially:
1
2
m(m − 1)(N − m)
4
3
(N − 2)(N − m)3m3
(N − 1)2N2
For all 0 m N.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 58 / 65
65. Open Triad.
Random Graph Ensemble
The expected number of open triads in a Gn,p random graph is.
N
3
3p2
(p − 1) =
2(N − 2)(N − m)2m2(N2 − 2Nm + 2m2 − N)
(N − 1)2N2
(5)
BarabásiAlbert
The lower bound on the number of open triads in a BA graph.
1
2
m(m + 1)(N − m) ≥
2(N − 2)(N − m)2m2(N2 − 2Nm + 2m2 − N)
(N − 1)2N2
(6)
John Clements (Brock University) Dynamics in large scale networks February 01 2016 59 / 65
66. Open Triad.
solution
Therefore when m and N are used such that m ≥ N
2 − 1 the open triad will
be a motif of the resulting graph.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 59 / 65
67. One Edge.
BarabásiAlbert
The minimum number of subgraphs containing a single edge.
N
t=m+1
t − 1
2
−
t − 1 − m
2
=
1
2
m(N − 2)(N − m) (5)
Random Graph Ensemble
While the expected number of subgraphs containing exactly one edge in a
Gn,p random graph is:
N
3
3p(p − 1)2
=
(N − 2)(N − m)m(N2 − 2Nm + 2m2 − N)2
(N − 1)2N2
(6)
John Clements (Brock University) Dynamics in large scale networks February 01 2016 60 / 65
68. One Edge.
So the maximum number of one edge subgraphs in the BA graph is greater
then the expected number in the Gn,p when:
m
1
2
N +
1
2
( 2 − 1)N2 + (2 − 2)N (5)
m
1
2
N −
1
2
( 2 − 1)N2 + (2 − 2)N (6)
John Clements (Brock University) Dynamics in large scale networks February 01 2016 60 / 65
69. Empty.
BarabásiAlbert
The upper bound on the number of empty nodes in a BA graph the bound
is:
m
3
+
N
t=m+1
t − 1 − m
2
=
1
6
(N − 2)(N2
− 3Nm + 3m2
− N) (7)
Random Graph Ensemble
The expected number of empty graphs in a Gn,p graph is:
N
3
(1 − p)3
=
1
6
(N − 2)(N2 − 2Nm + 2m2 − N)3
(N − 1)2N2
(8)
John Clements (Brock University) Dynamics in large scale networks February 01 2016 61 / 65
70. Empty.
For all N 5 the upper bound is less then the expected value of empty
subgraphs in the ensemble.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 61 / 65
71. Final bounds
Probabilistic bounds
When the dierence between m and N is such that
N ≥
8m2 − 16m3 − 16m2 + 1 − 1
8m − 2
holds, then the triangle and empty triads will never be a motif.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 62 / 65
72. Final bounds
Additive bounds
The open triad will be a motif a BA graph whenever:
m
N
2
− 1
for any N 2.
The single edge triad can only be a motif when:
m
1
2
N +
1
2
( 2 − 1)N2 + (2 − 2)N
or
m
1
2
N −
1
2
( 2 − 1)N2 + (2 − 2)N
for any valid N.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 62 / 65
73. future work
The formation of a servers social network.
Continuing from the server merger, we have records of the formation of
several new servers that could be investigated. Letting us learn how the
original servers structure came to be.
Identifying the social structure of players
Continuing from the work on the removal of players from the planetside 2
network and from the ad network, it would be very helpful to have better
way of identifying the parent players or companies. Potentially changing
the networks structure greatly
John Clements (Brock University) Dynamics in large scale networks February 01 2016 63 / 65
74. future work
The formation of a servers social network.
Continuing from the server merger, we have records of the formation of
several new servers that could be investigated. Letting us learn how the
original servers structure came to be.
Identifying the social structure of players
Continuing from the work on the removal of players from the planetside 2
network and from the ad network, it would be very helpful to have better
way of identifying the parent players or companies. Potentially changing
the networks structure greatly
John Clements (Brock University) Dynamics in large scale networks February 01 2016 63 / 65
75. Future Work: Additional database analysis.
By its very nature large datasets will always have more unanswered
questions. There are a huge number of potential relationships between the
networks, the actors and their removal that we did not have time to test,
for example how many of the removed avatars had a typo in their name. In
this suggests some future work that seems interesting but is either outside
the scope of large network analysis or simply something we did not have
time to do.
John Clements (Brock University) Dynamics in large scale networks February 01 2016 64 / 65
76. John Clements (Brock University) Dynamics in large scale networks February 01 2016 65 / 65