1. Prediction in Dynamic Graph Sequences
Prediction in Dynamic Graph Sequences
Emile Richard
CMLA-ENS Cachan & 1000mercis
Supervisors :
Th. Evgeniou (INSEAD) and N. Vayatis (CMLA-ENS Cachan)
January 20, 2012
2. Prediction in Dynamic Graph Sequences
Table of contents
Context
Motivation
Data Description
Problem Formulation
Random Graph Models
Link Prediction Heuristics
Framework
Algorithms
Two-stage optimization
Joint Optimization in W and S
Variants
Discussion
References
4. Prediction in Dynamic Graph Sequences
Context
Motivation
From Big Data to Business Decisions
1000mercis: interactive marketing and advertisement
(emailing, mobile, viral games)
1. Send less ads: email is free → overwhelm consumers
2. Make consumers happy: serendipity
3. Act sustainably: avoid long-term fatigue
4. Earn more: up to 5 times!
5. Prediction in Dynamic Graph Sequences
Context
Motivation
Prediction in Relational Databases?
Recommender systems
Links: to select recommendations, offline fine-tuning
Sales volumes: prepare or push trends
Resource allocation Consumers and contributors in UGC[Zhang11], Stock
management
Understanding of data through relevant features extraction
Returning
12
Sellers
11.5 Products
Buyers
11 Commission
Log
10.5
10
9.5
9
0 50 100 150 200 250 300
Time (week)
Sellers
Products
New
12 Buyers
Commission
10
8
Log
6
4
2
0 50 100 150 200 250 300
Time (week)
6. Prediction in Dynamic Graph Sequences
Context
Motivation
Similar Problems
The Netflix prize: 1M$ for a 10% improvement in accuracy
Amazon: 35% sales generated by recommendation[Linden03]
CRM optimization: acquisition, cross-selling, churn
management, prediction of top-selling items etc.
8. Prediction in Dynamic Graph Sequences
Context
Motivation
Similar Problems in Computational Biology1
Understanding the underlying mechanisms of biological
systems
Inference procedures for analysis of effects of biological
pathways in cancer progression
Study the effect of potential drugs/treatments on gene
regulatory networks in cancer cells
1
After a discussion with Ali Shohaie
9. Prediction in Dynamic Graph Sequences
Context
Data Description
Case Study
Data: C-to-C website
Recommendation newsletters and banners
Management of promotional assets and pressure on users
Domain users products daily sales
Music 0.4M 60K 2K
Books 1.2M 1.7M 18K
Electronic 0.5M 60K 2K
Video Games 0.9M 0.2M 9K
10. Prediction in Dynamic Graph Sequences
Context
Data Description
Heterogeneous Domains
Users side
1
0.8 Video Games
Density
Music
0.6
Electronic Devices
0.4 Books
0.2
0
−8 −7 −6 −5 −4 −3 −2 −1 0
log(Clustering Coefficient)
Products side
1
0.8 Video Games
Density 0.6 Music
Electronic Devices
0.4 Books
0.2
0
−8 −7 −6 −5 −4 −3 −2 −1 0
log(Clustering Coefficient)
user side product side user side product side
0.9 1 0.5 0.45
Video Games Video Games Video Games
0.8 Music Music 0.4 Music
Video Games
Electronic 0.8 Electronic 0.4 Electronic
0.7 Music 0.35
Books Books Books
Electronic
0.6 0.3
Books
Density
0.3
Density
0.6
Density
Density
0.5 0.25
0.4 0.2
0.4 0.2
0.3 0.15
0.2 0.1 0.1
0.2
0.1 0.05
0 0 0 0
8 9 10 11 12 13 7 8 9 10 11 12 13 7 8 9 10 11 12 13 7 8 9 10 11 12 13
(2) (2)
log(degree) log(degree) log(d /degree) log(d /degree)
user side product side Books joint User x Product distribution Music joint User x Product distribution
0.5 0.45
Video Games Video Games
1.0
1.0
Music 0.4 Music
0.4 Electronic Electronic
0.35
0.8
0.8
Books Books
Products(Decreasing degree)
0.3
Products(decreasing degree)
0.3
Density
Density
0.6
0.6
0.25
0.2
0.4
0.4
0.2
0.15
0.2
0.2
0.1 0.1
0.05
0.0
0.0
0 0
7 8 9 10 11 12 13 7 8 9 10 11 12 13 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
(3) (2) (3) (2)
log(d /d ) log(d /d ) Users (decreasing degree) Users (decreasing degree)
12. Prediction in Dynamic Graph Sequences
Problem Formulation
Dynamic Graphs
Nodes linked by Edges that appear over time
Web applications, Economics, Biology, Drug discovery
(Social networks users, Friendship)
(Users and products, Purchases or clicks)
(Websites, Hyperlinks)
(Proteins, Interaction)
13. Prediction in Dynamic Graph Sequences
Problem Formulation
Prediction at Descriptor (macro) and Edge (micro) Levels
Network Effect: cause and symptom of the evolution of node
features e.g. popularity, homophily, centrality, diffusion level
Simultaneousely predict node features and future links
14. Prediction in Dynamic Graph Sequences
Problem Formulation
Complex Networks?
Degrees of freedom ∼ n2 , n: # nodes
Latent factors r n , r : # latent factors
Intrinsic dimensionality reduced to ∼ rn n2
Kepler’s Laws of networks
15. Prediction in Dynamic Graph Sequences
Problem Formulation
Random Graph Models
Random Graph Models
Erdos-Renyi[Bollobas01]: nodes connected with uniform
probability. No prediction chance
Preferential Attachment[Albert02]: reproduces power-law
degree distributions. Rich-get-Richer
Block-Models[Nowicki01]: k blocks or clusters form the
structure of the graph. Community Structure
Latent Factor Model[Hoff02, Krivitsky10] node latent factors
zi , zj , pair-wise covariate descriptors xi,j
P(Y |X , Z , θ) = P(Yi,j |Xi,j , Zi , Zj , θ)
i=j
log odd(yi,j = 1|xi,j , zi , zj , α, β) ∝ α − βxi,j + zi − zj 2
Parameter Estimation
16. Prediction in Dynamic Graph Sequences
Problem Formulation
Random Graph Models
Exponential Random Graph Families[Wasserman96]
Graph z: realization of a random variable Z
Pθ (Z = z) = e θ ω(z)−Ψ(θ)
θ ∈ RQ vector of parameters,
ω sufficient statistics on the graph z : ω(z) ∈ RQ
Ψ a normalization factor
Parameter Estimation by Maximizing Log-likelihood
17. Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Nearest Neighbors and Walks
Hypothesis: a graph G is partially observed, we aim to find the
hidden edges[Kleinberg07]
Friends of my friends are likely to be my friends.
A ∈ {0, 1}n×n the social adjacency matrix
n
(A2 )i,j = k=1 Ai,k Ak,j = #paths of length 2 from i to j
= #common friends of i and j
Random Walks
Take W = D −1 A where D is the diagonal matrix of degrees
∞
Katz = k=1 β k W k = (In − βW )−1 − In
18. Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Bipartite Graphs of Marketplaces
p1
u1
p2
u2
p3
u3
p4
u4
p5
Who bought this also bought that.
M ∈ {0, 1}#users×#products : transactions
(MM M)i,j : number of times product j was purchased by
users having purchased the same products as a given user i
0 M
Random Walks Apply the unipartite formula to
M 0
19. Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Low-Rank
A = Udiag(σi )V SVD
Define X ∗ = i σi (X )
and Dτ (A) = Udiag max(σi − τ, 0)V : the Shrinkage operator
Rank r matrix closest to A is Udiag(σ1 , · · · , σr , 0, · · · 0)V
1
Fact : argminX 2 X − A 2 + τ X ∗ = Dτ (A)
F
block−wise adjacency
0
10
20
30
40
50
60
0 10 20 30 40 50 60
nz = 1400
Matrix Completion[Srebro05, Candes08, Koltchinskii11]
estimates A by minimizing
1
ω(A) − ω(X ) 2 + τ X
2 ∗
2
for a linear mapping ω : R n×n → RQ
20. Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Link Prediction: Statistical and Spectral Properties
Statistics on number of triangles and length of paths in the
graph are stable
Spectral functions[Kunegis09] of the adjacency and stochastic
matrices killing low eigenvalues
If A = Udiag(σi )V is the SVD, Udiag(f (σi )i )V is called
spectral function.
Spectral Functions
1
0.9
2
0.8 σ
∝ (1−β σ)−1−1
0.7 max(σ − τ, 0)
0.6
f(σ)
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
σ
21. Prediction in Dynamic Graph Sequences
Problem Formulation
Link Prediction Heuristics
Leading Insight
Link Prediction heuristics implicitly suggest
1. Graph sequence fits to some slowly varying feature map
2. Spectrum of graphs is regular
Define a regularization formulation of the problem in order to
leverage the trade-offs and select the best features.
Obstacle to matrix completion: ω(A) is to be predicted.
22. Prediction in Dynamic Graph Sequences
Problem Formulation
Framework
Notations
Time steps t ∈ {1, 2, ..., T }
Adjacency matrices At ∈ {0, 1}n×n graph sequence
Feature map ω : Rn×n → RQ linear
ω linear (degree, clusters)
Q n2
Prediction of AT +1 : score matrix S ∈ Rn×n
23. Prediction in Dynamic Graph Sequences
Problem Formulation
Framework
Assumptions
1. Stationarity of successive feature vectors
∃f : RQ → RQ , ∀t, ω(At+1 ) = f (ω(At )) + t
2. Simplicity of S
S low rank[Srebro05],
Penalize the trace norm S ∗
24. Prediction in Dynamic Graph Sequences
Problem Formulation
Framework
Quantities to control
1. Features predictor
T −1
J1 (f ) = (ω(At+1 ), f (ω(At )) + κ f H
t=1
2. Predicted features matching the predicted graph features
(coupling term)
J2 (f , S) = (ω(S), f (ω(AT ))
3. Penalty on S
J3 (S) = τ S ∗
25. Prediction in Dynamic Graph Sequences
Problem Formulation
Framework
Convex Optimization Problem
Let
ω(A1 ) ω(A2 )
.
. .
. (T −1)×Q
X = ,Y = ∈R
. .
ω(AT −1 ) ω(AT )
We take linear predictors, f (ω) = ω W and define the convex
objective
.
L = J1 + J2 + J3
1 2 κ 2 1 2
= XW − Y F + W F + ω(AT ) W − ω(S) 2 +τ S ∗
2 2 2
29. Prediction in Dynamic Graph Sequences
Algorithms
Two-stage optimization
Split and Alternately Minimize
.
Splitting: Lη (S, S) = τ S ∗ + h(S, ν), subject to S = S
Alternately minimize in S and S :
1
mG (S) = argminS τ S ∗ + h(S), S − S + 2µ S −S 2
F
1
mH (S) = argminS h(S, ν) + τ S ∗, S −S + 2µ S −S 2
F
Algorithm 1 Link Discovery Algorithm
Parameters: τ, ν, η
Initialization: W0 = Z1 = AT , α1 = 0
for k = 1, 2, . . . do
Sk ← mG (Zk ) and Sk ← mH (Sk )
1
Wk ← (Sk + Sk )
2
1 2
αk+1 ← (1 + 1 + 4αk )
2
1
Zk+1 ← Wk + αk (Sk − Wk−1 ) − (Wk − Wk−1 )
αk+1
end for
30. Prediction in Dynamic Graph Sequences
Algorithms
Joint Optimization in W and S
Minimization of L by proximal gradient descent
L(S, W ) = g (S, W ) + Γ(S, W )
.
g (S, W ) = 1 XW − Y 2 + 1 ω(AT ) W − ω(S)
2 F 2
2
2 :
smoothly differentiable fit-term
.
Γ(S, W ) = κ W 2 + τ S ∗ : convex penalty
2 F
Explicit proximal
. 1 2 1 2
proxθΓ (S, W ) = argmin(Z ,V ) θΓ(Z , V )+ S−Z F+ W −V F
2 2
= (Dθτ (S), W /(1 + θκ))
(Sk+1 , Wk+1 ) = proxθk Γ (Sk , Wk ) − θk gradg (Sk , Wk )
FISTA[Beck09] for optimal convergence rate
31. Prediction in Dynamic Graph Sequences
Algorithms
Variants
Variant 1: Graph Regularization Constraint
Want i ∼S j ⇒ f (i) ∼H f (j)
Control the laplacian-like[Chen10] inner product
J4 (f , S) = i,j Si,j f (i) − f (j) 2
H = S, f (i) − f (j) 2
H
i,j
i∼j f (i) ∼f (j)
Other possibility: J4 (f , S) = S, Gram(f )
Lgraph regularization = L + λJ4
Issue: non-convex regularizers
Algorithms:
1. Gradient descent with hyper-parameters that keep the
objective inside the convexity domain
2. Projected gradient descent inside the convexity domain
32. Prediction in Dynamic Graph Sequences
Algorithms
Variants
Gradient Descent Convergence Area
33. Prediction in Dynamic Graph Sequences
Algorithms
Variants
Empirical Results
Data Marketing Synthetic
Method Error ∆Sales ∆Graph ∆Sales ∆Graph
Our solution 0.62 0.28 0.13 ± .002 0.21± .003
Rank-free prediction 0.64 0.31 0.19 ± .008 0.24 ± .01
AR 0.80 - 0.66 ± .007 -
ARIMA 0.78 - 0.17 ± .02 -
VAR 1.02 - 0.42 ± .09 -
MC with shrinkage - 0.38 - 0.22 ± .003
ω(AT +1 )−f (ω(AT )) 2
Sales Prediction metric: ∆Sales = ω(AT +1 ) 2
to be minimized
AT +1 −S F
Graph Completion metric: ∆Graph = AT +1 F
to be minimized
34. Prediction in Dynamic Graph Sequences
Algorithms
Variants
Convexity Domain
2 2
J4 κ |f| + ν|S−AT| λ J4 + κ |f|2 + ν|S−AT|2
30
sw2 + s2 + w2
16
14
14 25
12
s2 + w2
12
10 10 20
2
sw
8 8
15
6 6
4 10
4
2
2 2
2 1.5
5
0 1.5
1 2
4 1 0
3.5 0.5
0.5 4 0 1
3 3.5 0 4
2.5 0 3 3.5
2.5 −0.5 3 0
2 −0.5
2 2.5
1.5 −1 −1 2
1 1.5 1.5 −1
−1.5 1 −1.5 1
0.5
+ =
0.5 0.5
−2 −2 −2
s
0
w s
0
w s
0
w
J4 not jointly-convex in (S, f )
λJ4 + κ W 2 + ν S − AT 2 convex inside
F F
√
n×n νκ
E= S ∈ R+ , W ∈ Rn×d W 2
F ≤
2λ
36. Prediction in Dynamic Graph Sequences
Algorithms
Variants
Variant 2: Sparsity Constraint
.
Lsparse (S, W ) = L(S, W ) + γ S 1,1 (lasso)
Split S onto S and S and add an equality constraint
Synthetic data n = 100, Q = 15, T = 200
10 runs for cross validation 10 runs for test
AUC on S reported
Nearest Neighbors Static Low Rank Lsparse L
0.9767 ± 0.0076 0.9751 ± 0.0362 0.9812 ± 0.0008 0.9778 ± 0.0071
38. Prediction in Dynamic Graph Sequences
Discussion
Synthetic Data Generation
Let ∀k ∈ {1, · · · , r }
−(t−µi,k )2
(i,k) 1 2σ 2
Ut =√ e i,k + i,k
2πσi,k
quantify the taste of user i for feature k at t, and
(i,k)
Vt the weight of feature k for item i and take
(i,j)
At = 1{U (i) (t) > θ}1{V (j) (t) > θ}
At is
1. Sparse
2. Rank at most r
3. Its latent factors evolve slowly provided σ’s are not too small.
39. Prediction in Dynamic Graph Sequences
Discussion
Scalability
Dτ (A) is dense, even for sparse A
1 2 2
Fact[Srebro05] : S ∗ = 2 minUV =S U F + V F
Instead of fixing τ , fix r and take U, V ∈ Rn×r
Define
.
J (U, V , W ) =
2 2 κ 2 λ 2 2
XW −Y F+ ω(AT ) W −ω(UV ) 2+ W F+ ( U F+ V F)
2 2
Parallel Stochastic Gradient Algorithms [Recht11]
40. Prediction in Dynamic Graph Sequences
Discussion
Store Recommendation Lists
Each feature leads to a specific list of recommendation
Store top-k lists
Learn optimal combinations / aggregations
... work in progress
41. Prediction in Dynamic Graph Sequences
Discussion
Conclusion
Introduction of a regularization approach formulation for link
prediction in graph sequences
Several variants detailed and empirically tested
Perspective for scalable algorithms
Perspective for theoretical analysis and understanding of the
problem
43. Prediction in Dynamic Graph Sequences
References
Reka Albert and Albert-Laszlo Barab`si.
a
Statistical mechanics of complex networks.
Reviews of Modern Physics, 74:4797, 2002.
A. Beck and M. Teboulle.
A fast iterative shrinkage-thresholding algorithm for linear
inverse problems.
SIAM Journal of Imaging Sciences, 2(1):183–202, 2009.
B. Bollobas.
Random graphs, vol. 73 of Cambridge Studies in Advanced
Mathematics. 2nd ed.
Cambridge University Press, Cambridge, 2001.
Emmanuel J. Cand`s and Terence Tao.
e
A singular value thresholding algorithm for matrix completion.
SIAM Journal on Optimization, 20(4):1956–1982, 2008.
44. Prediction in Dynamic Graph Sequences
References
Xi Chen, Seyoung Kim, Qihang Lin, Jaime G. Carbonell, and
Eric P. Xing.
Graph-structured multi-task regression and an efficient
optimization method for general fused lasso.
arXiv, 2010.
Donald Goldfarb and Shiqlan Ma.
Fast alternating linearization methods for minimizing the sum
of two convex functions.
Technical Report, Department of IEOR, Columbia University,
2009.
P. D. Hoff, A. E. Raftery, and M. S. Handcock.
Latent space approaches to social network analysis.
Journal of the Royal Statistical Society, 97, 2002.
David Liben-Nowell and Jon Kleinberg.
The link-prediction problem for social networks.
45. Prediction in Dynamic Graph Sequences
References
Journal of the American Society for Information Science and
Technology, 58(7):1019–1031, 2007.
Vladimir Koltchinskii, Karim Lounici, and Alexandre Tsybakov.
Nuclear norm penalization and optimal rates for noisy matrix
completion.
Annals of Statistics, 2011.
P. N. Krivitsky and M. S. Handcock.
A Separable Model for Dynamic Networks.
ArXiv e-prints, November 2010.
J´rˆme Kunegis and Andreas Lommatzsch.
eo
Learning spectral graph transformations for link prediction.
In Proceedings of the 26th Annual International Conference on
Machine Learning, ICML ’09, pages 561–568, New York, NY,
USA, 2009. ACM.
46. Prediction in Dynamic Graph Sequences
References
G. Linden, B Smith, and J. York.
Amazon.com recommendations : Item-to-item collaborative
filtering.
IEEE Internet Computing, 2003.
K. Nowicki and T. Snijders.
Estimation and prediction for stochastic blockstructures.
Journal of the American Statistical Association, 96:1077–
1087, 2001.
Benjamin Recht and Christopher Re.
Parallel stochastic gradient algorithms for large-scale matrix
completion.
Submitted for publication, 2011.
Emile Richard, Nicolas Baskiotis, Theodoros Evgeniou, and
Nicolas Vayatis.
Link discovery using graph feature tracking.
47. Prediction in Dynamic Graph Sequences
References
Proceedings of Neural Information Processing Systems (NIPS),
2010.
Nathan Srebro, Jason D. M. Rennie, and Tommi S. Jaakkola.
Maximum-margin matrix factorization.
In Lawrence K. Saul, Yair Weiss, and L´on Bottou, editors, in
e
Proceedings of Neural Information Processing Systems 17,
pages 1329–1336. MIT Press, Cambridge, MA, 2005.
Stanley Wasserman and Philippa Pattison.
Logit models and logistic regressions for social networks: I. an
introduction to markov graphs and p ∗ .
Psychometrika, 61(3):401–425, September 1996.
K. Zhang, Th. Evgeniou, V. Padmanabhan, and E. Richard.
Content contributor management and network effects in a ugc
environment.
Marketing Science, 2011.