2014 9-22

.
.
The Chow-Liu algorithm based on the MDL with discreete
and continuous variables
Joe Suzuki
Osaka University
AIGM 2014, Paris
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable1s / 26

The Chow-Liu Algorithm
Chow-Liu
P1; ;N: Probability of X(1); ; X(N) N ( 1)
G = (V; E): Undirected Graph
E := fg, V := f1; ;Ng (N 1), E := ffi ; jgji̸= j ; i ; j 2 Vg
do E̸= fg
1. choose fi ; jg 2 E that maximizes I (i ; j)
2. remove fi ; jg from E
3. if no loop is generated, add fi ; jg to E
Mutual Information of X(i); X(j):
I (i ; j) :=
Σ
x(i)
Σ
x(j)
Pi ;j (x(i); x(j)) log
Pi ;j (x(i); x(j))
Pi (x(j))Pi (x(i))
.
Tree E s.t.
Σ
fi ;jg2E I (i ; j) ! max
.
.D(P1; ;NjjQ) ! min

Example
Q(x(1); x(2); x(3); x(4))
=
P1;2(x(1); x(2))P1;3(x(1); x(3))P1;4(x(1); x(4))
P1(x(1))P2(x(1)) P1(x(1))P3(x(1)) P1(x(1))P4(x(4))
P1(x(1))P2(x(2))P3(x(3))P4(x(4))
= P(x(1))P(x(2)jx(1))P(x(3)jx(1))P(x(4)jx(1))
i 1 1 2 1 2 3
j 2 3 3 4 4 4
I (i ; j) 12 10 8 6 4 2
j j
1 3
j j
2 4
j j
1 3
j j
2 4
j j
1 3
j j
2 4
j j
1 3
@@
j j
2 4

Dendroid Distribution
X(1); ; X(N): Discrete Random Variables
V := f1; ;Ng
E ffi ; jgji̸= j ; i ; j 2 Vg
Q(x(1); ; x(N)jE) =
Π
fi ;jg2E
Pi ;j (x(i); x(j))
Pi (x(i))Pj (x(j))
Π
i2V
Pi (x(i)) ;
fPi (x(i))gi2V , fPi ;j (x(i); x(j))gi̸=j : from P1; ;N(x(1); ; x(N))

Contribution
.
Starting from Data
.
.Learning rather than Approximation
distribution P1; ;N
data xn = f(x(1)
i ; ; x(N)
i )gni
=1
.
In any database,
..
.some

elds are discrete and others continuous
Joe Suzuki: A Construction of Bayesian Networks from Databases
Based on an MDL Principle, UAI 1993
David Edwords, et. al: Selecting high-dimensional mixed graphical
models using minimal AIC or BIC forests, BMC Informatics 2010
Joe Suzuki: Learning Bayesian network structures when discrete and
continous variables are present, PGM 2014

Maximum Likelihood (ML)
f^P
i (x(i))gi2V , f^P
i ;j (x(i); x(j))gi̸=j are obtained from xn
　
ML Estimation of MI:
^I
(i ; j) :=
Σ
x(i)
Σ
x(j)
^P
i ;j (x(i); x(j)) log
^P
i ;j (x(i); x(j))
^P
i (x(j))^P
i (x(i))
Empirical Entropy given E (minus Likelihood given E):
^H
n(xnjE) := n
Σ
i2V
^H
(i ) n
Σ
fi ;jg2E
^I
(i ; j)
.
ML seeks a tree even if X(1); X(N) are independent
.
.The true graph is not obtained even if n ! 1

Prior Distribution over Forest (V; E)
pij : the prior probability of X(i) ?? X(j)
(E) :=
1
K
Π
fi ;jg2E
1 pij
pij
K :=
Σ Π
fi ;jg2E
1 pij
pij

Minimum Description Length (Suzuki, UAI-1993)
R(i ) =
∫
P(fx(i)
k
gnk
=1
j)w()d
R(i ; j) =
∫
P(fx(i)
k ; x(j)
k
gnk
=1
j)w()d
Rn(xnjE) :=
Π
fi ;jg2E
R(i ; j)
R(i )R(j)
Π
i2V
R(i )
L(xnjE) := log R(xnjE)
Description Length:
l(xn) = log (E) + L(xnjE) ! min
Bayesian Estimation of MI:
J(i ; j) :=
1
n
log
R(i ; j)
R(i )R(j)

If we expand using approximaion, we

nd
k(E): # of Parameters in E
(i): # of values X(i) takes
L(xnjE) ^H
n(xnjE) +
1
2
k(E) log n
l(xn) ^H
n(xnjE) +
1
2
k(E) log n log (E)
J(i ; j) ^I
(i ; j) 1
2n
((i) 1)((j) 1) log n 1
n
log
1 pij
pij
　
the orders of choosing edges are different
J(i ; j) could be negative and makes a forest while ^I
(i ; j) makes a tree

Univesality
.
Universal Measure w.r.t.

nte set A
.
There exists Rn s.t.
.
1
n
log
Pn(xn)
Rn(xn)
! 0
(xn 2 An) with Pn-Probability one as n ! 1 for any Pn.
P(i) =
Πn
k=1 P(x(i)
k ) , P(i ; j) =
Πn
k=1 P(x(i)
k ; x(j)
k )
1
n
log
P(i )
R(i )
! 0 ;
1
n
log
P(i ; j)
R(i ; j)
! 0
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e0s / 26

Consistency
Qn(xnjE) :=
Π
fi ;jg2E
P(i ; j)
P(i )P(j)
Π
i2V
P(i )
with Prob. 1 as n ! 1 for any Qn(jE)
1
n
log
Qn(xnjE)
Rn(xnjE)
! 0
For large n,
(E1)Q(xnjE1) (E2)Q(xnjE2) () (E1)R(xnjE1) (E2)R(xnjE2)
A maximum posterior probability forest is obtained for large n.

ML vs MDL
ML MDL
Choices Minimize Minimize
of E ^H
n(xnjE) ^H
n(xnjE)
2k(E) log n log (E)
+1
Choices of fi ; jg Maximize ^I
(i ; j) Maximize J(i ; j)
Criteria Fitness of xn to E Fitness of xn to E
and Simplicity of E
Consistency No Yes

When Density Exists
When density f exists for X (Ryabko, 2009)
A0 := fAg
Aj+1 is a re

nement of Aj
for each j , xn = (x1; ; xn) 2 Rn7! (a(j)
1 ; ; a(j)
n ) 2 Anj
...
...
...
...
-
-
-
A1
A2
Aj
gn
1 (xn) =
Rn
1 (a(1)
1 ; ; a(1)
n )
(a(1)
1 ) (a(1)
n )
gn
2 (xn) =
Rn
2 (a(2)
1 ; ; a(2)
n )
(a(2)
1 ) (a(2)
n )
gn
j (xn) =
Rn
j (a(j)
1 ; ; a(j)
n )
(a(j)
1 ) (a(j)
n )
: Lebesgue measure (width of interval), Rn
j : Universal Measure w.r.t. Aj

When Density Exists
Σ
j wj = 1, wj 0
gn(xn) :=
1Σ
j=1
wjgn
j (xn)
f : density function
fj (density function of level j)
f n(xn) := f (x1) f (xn)
.
Ryabko 2009
.
for any f s.t. D(f jjfj ) ! 0 (j ! 1)
.
1
n
log
f n(xn)
gn(xn)
! 0
as n ! 1

When Density does not exists
Extensions from Ryabko 2009
Remove the assumption that a density exists.
Remove the restricion of density class
for any f s.t. D(f jjfj ) ! 0 (j ! 1) ! for any f

When density does not exist for X (Suzuki 2011)
B1 := ff1g; f2; 3; gg
B2 := ff1g; f2g; f3; 4; gg
: : :
Bk := ff1g; f2g; ; fkg; fk + 1; k + 2; gg
: : :
for each level k, xn = (x1; ; xn) 2 Nn7! (b(k)
1 ; ; b(k)
n ) 2 Bn
k
(fkg) =
1
k
1
k + 1
gn
k (yn) :=
Rn
k (b(k)
1 ; ; b(k)
n )
(b(k)
1 ) (b(k)
n )
Σ
!k = 1, !k 0, gn(xn) :=
1Σ
k=1
!kgn
k (xn)

D(f jjfj )̸! 0 as j ! 1 (1)
∫ 1
1
2
f (x)dx 0
-
0 1 x
C0
C1
C2
C3
...
...
...

D(f jjfj )̸! 0 as j ! 1 (2)
∫ 1
1
f (x)dx 0
-
0 1 x
C0
C1
C2
C3
...
...
...

D(f jjfj ) ! 0 as j ! 1
Universal Histogram Sequence fCkg1
k=0
... ...
-
x
C0
C1
C2
C3
...
.
Suzuki 2013
.
For any (generalized) density f as n ! 1 with Prob. 1
.
1
n
log
f n(xn)
gn(xn)
! 0

Computing gn(xn)
Input xn 2 An, output gn(xn)
1. For each k = 1; ;K, gn
k (xn) := 0
2. For each k = 1; ;K and each a 2 Ak , ck (a) := 0
3. For each i = 1; ; n, for each k = 1; ;K
1. Find ai 2 Ak from xi 2 A
2. gn
k (xn) := gn
k (xn) log
ck (ai ) + 1=2
i 1 + jAk j=2
+ log(X (ai ))
3. ck (ai ) := ck (ai ) + 1
. 4 gn(xn) := 1K
ΣK
k=1 gn
k (xn)
Universal Measure w.r.t. Ak
Rn
k (xn) =
Πn
i=1
c(a(k)
i ) + 1=2
i 1 + jAk j=2

Computation: O(nN2K)
.
Computing gn(xn) and gn(xn; yn)
.
O(nN2K)
(O(nN2) for discrete case)
.
Proportional to n and N + N(N 1)=2
a(1)
7! a(2)
7! 7! a(K)
i
i
i : Binary Search
Proprtional to K
gn(xn; yn) can be obtained by
ΣK
k=1
!kgn
k;k (xn; yn) rather than
ΣJ
j=1
ΣK
k=1
!jkgn
jk (xn; yn).
.
Computng MI and

nding the forest
.
.N(N 1)=2

2014 9-22

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to 2014 9-22

Similar to 2014 9-22 (20)

More from Joe Suzuki

More from Joe Suzuki (19)

Recently uploaded

Recently uploaded (20)

2014 9-22