Prediction in dynamic Graphs

Prediction in Dynamic Graph Sequences


Emile Richard

CMLA-ENS Cachan & 1000mercis
Supervisors :
Th. Evgeniou (INSEAD) and N. Vayatis (CMLA-ENS Cachan)

January 20, 2012


Table of contents
Context
Motivation
Data Description
Problem Formulation
Random Graph Models
Link Prediction Heuristics
Framework
Algorithms
Two-stage optimization
Joint Optimization in W and S
Variants
Discussion
References


Context

Context
Motivation

From Big Data to Business Decisions

1000mercis: interactive marketing and advertisement
(emailing, mobile, viral games)
1. Send less ads: email is free → overwhelm consumers
2. Make consumers happy: serendipity
3. Act sustainably: avoid long-term fatigue
4. Earn more: up to 5 times!

Context
Motivation

Prediction in Relational Databases?
Recommender systems
Links: to select recommendations, oﬄine ﬁne-tuning
Sales volumes: prepare or push trends
Resource allocation Consumers and contributors in UGC[Zhang11], Stock
management
Understanding of data through relevant features extraction
Returning
12
Sellers
11.5 Products
Buyers
11 Commission
Log

10.5

10

9.5

9
0 50 100 150 200 250 300
Time (week)
Sellers
Products
New
12 Buyers
Commission

10

8
Log

6

4

2
0 50 100 150 200 250 300
Time (week)

Context
Motivation

Similar Problems

The Netﬂix prize: 1M$ for a 10% improvement in accuracy
Amazon: 35% sales generated by recommendation[Linden03]
CRM optimization: acquisition, cross-selling, churn
management, prediction of top-selling items etc.

Context
Motivation

Other Web Applications

Context
Motivation

Similar Problems in Computational Biology1

Understanding the underlying mechanisms of biological
systems
Inference procedures for analysis of eﬀects of biological
pathways in cancer progression
Study the eﬀect of potential drugs/treatments on gene
regulatory networks in cancer cells

1
After a discussion with Ali Shohaie

Context
Data Description

Case Study

Data: C-to-C website
Recommendation newsletters and banners
Management of promotional assets and pressure on users

Domain users products daily sales
Music 0.4M 60K 2K
Books 1.2M 1.7M 18K
Electronic 0.5M 60K 2K
Video Games 0.9M 0.2M 9K

Context
Data Description

Heterogeneous Domains
Users side
1

0.8 Video Games

Density
Music
0.6
Electronic Devices
0.4 Books

0.2

0
−8 −7 −6 −5 −4 −3 −2 −1 0
log(Clustering Coefficient)
Products side
1

0.8 Video Games
Density 0.6 Music
Electronic Devices
0.4 Books

0.2

0
−8 −7 −6 −5 −4 −3 −2 −1 0
log(Clustering Coefficient)
user side product side user side product side
0.9 1 0.5 0.45
Video Games Video Games Video Games
0.8 Music Music 0.4 Music
Video Games
Electronic 0.8 Electronic 0.4 Electronic
0.7 Music 0.35
Books Books Books
Electronic
0.6 0.3
Books

Density
0.3

Density
0.6
Density

Density

0.5 0.25

0.4 0.2
0.4 0.2
0.3 0.15

0.2 0.1 0.1
0.2
0.1 0.05

0 0 0 0
8 9 10 11 12 13 7 8 9 10 11 12 13 7 8 9 10 11 12 13 7 8 9 10 11 12 13
(2) (2)
log(degree) log(degree) log(d /degree) log(d /degree)
user side product side Books joint User x Product distribution Music joint User x Product distribution
0.5 0.45
Video Games Video Games
1.0

1.0
Music 0.4 Music
0.4 Electronic Electronic
0.35
0.8

0.8
Books Books

Products(Decreasing degree)
0.3
Products(decreasing degree)

0.3
Density

Density

0.6

0.6
0.25

0.2
0.4

0.4
0.2
0.15
0.2

0.2
0.1 0.1

0.05
0.0

0.0
0 0
7 8 9 10 11 12 13 7 8 9 10 11 12 13 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

(3) (2) (3) (2)
log(d /d ) log(d /d ) Users (decreasing degree) Users (decreasing degree)

Problem Formulation

Problem Formulation

Problem Formulation

Dynamic Graphs

Nodes linked by Edges that appear over time
Web applications, Economics, Biology, Drug discovery
(Social networks users, Friendship)
(Users and products, Purchases or clicks)
(Websites, Hyperlinks)
(Proteins, Interaction)

Problem Formulation

Prediction at Descriptor (macro) and Edge (micro) Levels

Network Eﬀect: cause and symptom of the evolution of node
features e.g. popularity, homophily, centrality, diﬀusion level
Simultaneousely predict node features and future links

Problem Formulation

Complex Networks?

Degrees of freedom ∼ n2 , n: # nodes
Latent factors r n , r : # latent factors
Intrinsic dimensionality reduced to ∼ rn n2
Kepler’s Laws of networks

Problem Formulation
Random Graph Models

Random Graph Models
Erdos-Renyi[Bollobas01]: nodes connected with uniform
probability. No prediction chance
Preferential Attachment[Albert02]: reproduces power-law
degree distributions. Rich-get-Richer
Block-Models[Nowicki01]: k blocks or clusters form the
structure of the graph. Community Structure
Latent Factor Model[Hoﬀ02, Krivitsky10] node latent factors
zi , zj , pair-wise covariate descriptors xi,j

P(Y |X , Z , θ) = P(Yi,j |Xi,j , Zi , Zj , θ)
i=j

log odd(yi,j = 1|xi,j , zi , zj , α, β) ∝ α − βxi,j + zi − zj 2

Parameter Estimation

Problem Formulation
Random Graph Models

Exponential Random Graph Families[Wasserman96]

Graph z: realization of a random variable Z
Pθ (Z = z) = e θ ω(z)−Ψ(θ)

θ ∈ RQ vector of parameters,
ω suﬃcient statistics on the graph z : ω(z) ∈ RQ
Ψ a normalization factor
Parameter Estimation by Maximizing Log-likelihood

Problem Formulation

Nearest Neighbors and Walks

Hypothesis: a graph G is partially observed, we aim to ﬁnd the
hidden edges[Kleinberg07]
Friends of my friends are likely to be my friends.
A ∈ {0, 1}n×n the social adjacency matrix
n
(A2 )i,j = k=1 Ai,k Ak,j = #paths of length 2 from i to j

= #common friends of i and j

Random Walks
Take W = D −1 A where D is the diagonal matrix of degrees
∞
Katz = k=1 β k W k = (In − βW )−1 − In

Problem Formulation

Bipartite Graphs of Marketplaces
p1
u1
p2
u2
p3
u3
p4
u4
p5

Who bought this also bought that.
M ∈ {0, 1}#users×#products : transactions
(MM M)i,j : number of times product j was purchased by
users having purchased the same products as a given user i
0 M
Random Walks Apply the unipartite formula to
M 0

Problem Formulation

Low-Rank
A = Udiag(σi )V SVD
Deﬁne X ∗ = i σi (X )
and Dτ (A) = Udiag max(σi − τ, 0)V : the Shrinkage operator
Rank r matrix closest to A is Udiag(σ1 , · · · , σr , 0, · · · 0)V
1
Fact : argminX 2 X − A 2 + τ X ∗ = Dτ (A)
F
block−wise adjacency
0

10

20

30

40

50

60
0 10 20 30 40 50 60
nz = 1400

Matrix Completion[Srebro05, Candes08, Koltchinskii11]
estimates A by minimizing
1
ω(A) − ω(X ) 2 + τ X
2 ∗
2
for a linear mapping ω : R n×n → RQ

Problem Formulation

Link Prediction: Statistical and Spectral Properties
Statistics on number of triangles and length of paths in the
graph are stable
Spectral functions[Kunegis09] of the adjacency and stochastic
matrices killing low eigenvalues
If A = Udiag(σi )V is the SVD, Udiag(f (σi )i )V is called
spectral function.
Spectral Functions
1

0.9

2
0.8 σ
∝ (1−β σ)−1−1
0.7 max(σ − τ, 0)

0.6
f(σ)

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
σ

Problem Formulation

Leading Insight

Link Prediction heuristics implicitly suggest
1. Graph sequence fits to some slowly varying feature map
2. Spectrum of graphs is regular

Define a regularization formulation of the problem in order to
leverage the trade-offs and select the best features.
Obstacle to matrix completion: ω(A) is to be predicted.

Problem Formulation
Framework

Notations

Time steps t ∈ {1, 2, ..., T }
Adjacency matrices At ∈ {0, 1}n×n graph sequence
Feature map ω : Rn×n → RQ linear
ω linear (degree, clusters)
Q n2
Prediction of AT +1 : score matrix S ∈ Rn×n

Problem Formulation
Framework

Assumptions

1. Stationarity of successive feature vectors

∃f : RQ → RQ , ∀t, ω(At+1 ) = f (ω(At )) + t

2. Simplicity of S
S low rank[Srebro05],
Penalize the trace norm S ∗

Problem Formulation
Framework

Quantities to control

1. Features predictor
T −1
J1 (f ) = (ω(At+1 ), f (ω(At )) + κ f H
t=1

2. Predicted features matching the predicted graph features
(coupling term)

J2 (f , S) = (ω(S), f (ω(AT ))
3. Penalty on S

J3 (S) = τ S ∗

Problem Formulation
Framework

Convex Optimization Problem

Let
   
ω(A1 ) ω(A2 )
.
. .
. (T −1)×Q
X = ,Y =  ∈R
   
. .
ω(AT −1 ) ω(AT )

We take linear predictors, f (ω) = ω W and deﬁne the convex
objective
.
L = J1 + J2 + J3

1 2 κ 2 1 2
= XW − Y F + W F + ω(AT ) W − ω(S) 2 +τ S ∗
2 2 2

Algorithms

Algorithms

Algorithms

Optimization Strategies

Goal : minimize L(S, W )
1. Two-stage optimization
2. Joint optimization in W and S
3. Variant 1: graph regularization
4. Variant 2: sparsity constraint

Algorithms

Two-stage Optimization [Richard10]
.
Solve W = argminW ∈RQ×Q J1 (W ) (regression)
Minimize J2 (W , S) + J3 (S)
Optimal algorithms due to Nesterov
√
-optimal solution after O(1/ ) iterations instead of
O(1/ 2 ) [Goldfarb09]

(r ,noise)alg. Proposed Static P. A. Katz
(5,0.000) 0.671±0.008 0.648 ± 0.008 0.627 ± 0.015 0.616 ± 0.015
(5,0.250) 0.675 ± 0.009 0.642 ± 0.007 0.602 ± 0.016 0.592 ± 0.016
(5,0.750) 0.519 ± 0.007 0.525 ± 0.005 0.497 ± 0.007 0.491 ± 0.007
(500,0.000) 0.592 ± 0.008 0.587 ± 0.007 0.671 ± 0.010 0.667 ± 0.009
(500,0.250) 0.607 ± 0.011 0.588 ± 0.009 0.649 ± 0.009 0.643 ± 0.009
(500,0.750) 0.601 ± 0.010 0.583 ± 0.007 0.645 ± 0.017 0.641 ± 0.017

Algorithms

Split and Alternately Minimize
.
Splitting: Lη (S, S) = τ S ∗ + h(S, ν), subject to S = S
Alternately minimize in S and S :
1
mG (S) = argminS τ S ∗ + h(S), S − S + 2µ S −S 2
F
1
mH (S) = argminS h(S, ν) + τ S ∗, S −S + 2µ S −S 2
F

Algorithm 1 Link Discovery Algorithm
Parameters: τ, ν, η
Initialization: W0 = Z1 = AT , α1 = 0
for k = 1, 2, . . . do
Sk ← mG (Zk ) and Sk ← mH (Sk )
1
Wk ← (Sk + Sk )
2
1 2
αk+1 ← (1 + 1 + 4αk )
2
1
Zk+1 ← Wk + αk (Sk − Wk−1 ) − (Wk − Wk−1 )
αk+1
end for

Algorithms
Joint Optimization in W and S

Minimization of L by proximal gradient descent
L(S, W ) = g (S, W ) + Γ(S, W )
.
g (S, W ) = 1 XW − Y 2 + 1 ω(AT ) W − ω(S)
2 F 2
2
2 :
smoothly diﬀerentiable ﬁt-term
.
Γ(S, W ) = κ W 2 + τ S ∗ : convex penalty
2 F
Explicit proximal

. 1 2 1 2
proxθΓ (S, W ) = argmin(Z ,V ) θΓ(Z , V )+ S−Z F+ W −V F
2 2
= (Dθτ (S), W /(1 + θκ))

(Sk+1 , Wk+1 ) = proxθk Γ (Sk , Wk ) − θk gradg (Sk , Wk )

FISTA[Beck09] for optimal convergence rate

Algorithms
Variants

Variant 1: Graph Regularization Constraint

Want i ∼S j ⇒ f (i) ∼H f (j)
Control the laplacian-like[Chen10] inner product
J4 (f , S) = i,j Si,j f (i) − f (j) 2
H = S, f (i) − f (j) 2
H
i,j
i∼j f (i) ∼f (j)
Other possibility: J4 (f , S) = S, Gram(f )
Lgraph regularization = L + λJ4
Issue: non-convex regularizers
Algorithms:
1. Gradient descent with hyper-parameters that keep the
objective inside the convexity domain
2. Projected gradient descent inside the convexity domain

Algorithms
Variants

Gradient Descent Convergence Area

Algorithms
Variants

Empirical Results

Data Marketing Synthetic

Method Error ∆Sales ∆Graph ∆Sales ∆Graph

Our solution 0.62 0.28 0.13 ± .002 0.21± .003
Rank-free prediction 0.64 0.31 0.19 ± .008 0.24 ± .01
AR 0.80 - 0.66 ± .007 -
ARIMA 0.78 - 0.17 ± .02 -
VAR 1.02 - 0.42 ± .09 -
MC with shrinkage - 0.38 - 0.22 ± .003

ω(AT +1 )−f (ω(AT )) 2
Sales Prediction metric: ∆Sales = ω(AT +1 ) 2
to be minimized
AT +1 −S F
Graph Completion metric: ∆Graph = AT +1 F
to be minimized

Algorithms
Variants

Convexity Domain

2 2
J4 κ |f| + ν|S−AT| λ J4 + κ |f|2 + ν|S−AT|2

30

sw2 + s2 + w2
16
14
14 25
12

s2 + w2
12

10 10 20
2
sw

8 8
15
6 6
4 10
4
2
2 2
2 1.5
5
0 1.5
1 2
4 1 0
3.5 0.5
0.5 4 0 1
3 3.5 0 4
2.5 0 3 3.5
2.5 −0.5 3 0
2 −0.5
2 2.5
1.5 −1 −1 2
1 1.5 1.5 −1
−1.5 1 −1.5 1
0.5

+ =
0.5 0.5
−2 −2 −2

s
0
w s
0
w s
0

w

J4 not jointly-convex in (S, f )
λJ4 + κ W 2 + ν S − AT 2 convex inside
F F
√
n×n νκ
E= S ∈ R+ , W ∈ Rn×d W 2
F ≤
2λ

Algorithms
Variants

Empirical Results

Performance (ν)
1.4
HYBRID (Regression)
1.2 HYBRID (Graph Completion)
Rank Free Regression
1 Rank Free Graph Completion
Regression Only
relative errors

Graph Only
0.8

0.6

0.4

0.2

0
−8 −6 −4 −2 0 2 4 6 8 10
log(ν)

Algorithms
Variants

Variant 2: Sparsity Constraint

.
Lsparse (S, W ) = L(S, W ) + γ S 1,1 (lasso)
Split S onto S and S and add an equality constraint
Synthetic data n = 100, Q = 15, T = 200
10 runs for cross validation 10 runs for test
AUC on S reported

Nearest Neighbors Static Low Rank Lsparse L
0.9767 ± 0.0076 0.9751 ± 0.0362 0.9812 ± 0.0008 0.9778 ± 0.0071

Discussion

Discussion

Discussion

Synthetic Data Generation
Let ∀k ∈ {1, · · · , r }
−(t−µi,k )2
(i,k) 1 2σ 2
Ut =√ e i,k + i,k
2πσi,k

quantify the taste of user i for feature k at t, and
(i,k)
Vt the weight of feature k for item i and take
(i,j)
At = 1{U (i) (t) > θ}1{V (j) (t) > θ}

At is
1. Sparse
2. Rank at most r
3. Its latent factors evolve slowly provided σ’s are not too small.

Discussion

Scalability

Dτ (A) is dense, even for sparse A
1 2 2
Fact[Srebro05] : S ∗ = 2 minUV =S U F + V F
Instead of fixing τ , fix r and take U, V ∈ Rn×r
Define
.
J (U, V , W ) =
2 2 κ 2 λ 2 2
XW −Y F+ ω(AT ) W −ω(UV ) 2+ W F+ ( U F+ V F)
2 2
Parallel Stochastic Gradient Algorithms [Recht11]

Discussion

Store Recommendation Lists

Each feature leads to a speciﬁc list of recommendation
Store top-k lists
Learn optimal combinations / aggregations
... work in progress

Discussion

Conclusion

Introduction of a regularization approach formulation for link
prediction in graph sequences
Several variants detailed and empirically tested
Perspective for scalable algorithms
Perspective for theoretical analysis and understanding of the
problem

Discussion

Thanks

Mercis !

References

Reka Albert and Albert-Laszlo Barab`si.
a
Statistical mechanics of complex networks.
Reviews of Modern Physics, 74:4797, 2002.
A. Beck and M. Teboulle.
A fast iterative shrinkage-thresholding algorithm for linear
inverse problems.
SIAM Journal of Imaging Sciences, 2(1):183–202, 2009.
B. Bollobas.
Random graphs, vol. 73 of Cambridge Studies in Advanced
Mathematics. 2nd ed.
Cambridge University Press, Cambridge, 2001.
Emmanuel J. Cand`s and Terence Tao.
e
A singular value thresholding algorithm for matrix completion.
SIAM Journal on Optimization, 20(4):1956–1982, 2008.

References

Xi Chen, Seyoung Kim, Qihang Lin, Jaime G. Carbonell, and
Eric P. Xing.
Graph-structured multi-task regression and an eﬃcient
optimization method for general fused lasso.
arXiv, 2010.
Donald Goldfarb and Shiqlan Ma.
Fast alternating linearization methods for minimizing the sum
of two convex functions.
Technical Report, Department of IEOR, Columbia University,
2009.
P. D. Hoﬀ, A. E. Raftery, and M. S. Handcock.
Latent space approaches to social network analysis.
Journal of the Royal Statistical Society, 97, 2002.
David Liben-Nowell and Jon Kleinberg.
The link-prediction problem for social networks.

References

Journal of the American Society for Information Science and
Technology, 58(7):1019–1031, 2007.
Vladimir Koltchinskii, Karim Lounici, and Alexandre Tsybakov.

Nuclear norm penalization and optimal rates for noisy matrix
completion.
Annals of Statistics, 2011.
P. N. Krivitsky and M. S. Handcock.
A Separable Model for Dynamic Networks.
ArXiv e-prints, November 2010.
J´rˆme Kunegis and Andreas Lommatzsch.
eo
Learning spectral graph transformations for link prediction.
In Proceedings of the 26th Annual International Conference on
Machine Learning, ICML ’09, pages 561–568, New York, NY,
USA, 2009. ACM.

References

G. Linden, B Smith, and J. York.
Amazon.com recommendations : Item-to-item collaborative
ﬁltering.
IEEE Internet Computing, 2003.
K. Nowicki and T. Snijders.
Estimation and prediction for stochastic blockstructures.
Journal of the American Statistical Association, 96:1077–
1087, 2001.
Benjamin Recht and Christopher Re.
Parallel stochastic gradient algorithms for large-scale matrix
completion.
Submitted for publication, 2011.
Emile Richard, Nicolas Baskiotis, Theodoros Evgeniou, and
Nicolas Vayatis.
Link discovery using graph feature tracking.

References

Proceedings of Neural Information Processing Systems (NIPS),
2010.
Nathan Srebro, Jason D. M. Rennie, and Tommi S. Jaakkola.
Maximum-margin matrix factorization.
In Lawrence K. Saul, Yair Weiss, and L´on Bottou, editors, in
e
Proceedings of Neural Information Processing Systems 17,
pages 1329–1336. MIT Press, Cambridge, MA, 2005.
Stanley Wasserman and Philippa Pattison.
Logit models and logistic regressions for social networks: I. an
introduction to markov graphs and p ∗ .
Psychometrika, 61(3):401–425, September 1996.
K. Zhang, Th. Evgeniou, V. Padmanabhan, and E. Richard.
Content contributor management and network eﬀects in a ugc
environment.
Marketing Science, 2011.

Prediction in dynamic Graphs

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Prediction in dynamic Graphs

Similar to Prediction in dynamic Graphs (20)

More from Cdiscount

More from Cdiscount (17)

Recently uploaded

Recently uploaded (20)

Prediction in dynamic Graphs