THE COUNTRY WHO SOLVED THE WORLD_HOW CHINA LAUNCHED THE CIVILIZATION REVOLUTI...
2014 9-22
1. .
.
The Chow-Liu algorithm based on the MDL with discreete
and continuous variables
Joe Suzuki
Osaka University
AIGM 2014, Paris
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable1s / 26
2. The Chow-Liu Algorithm
Chow-Liu
P1; ;N: Probability of X(1); ; X(N) N ( 1)
G = (V; E): Undirected Graph
E := fg, V := f1; ;Ng (N 1), E := ffi ; jgji̸= j ; i ; j 2 Vg
do E̸= fg
1. choose fi ; jg 2 E that maximizes I (i ; j)
2. remove fi ; jg from E
3. if no loop is generated, add fi ; jg to E
Mutual Information of X(i); X(j):
I (i ; j) :=
Σ
x(i)
Σ
x(j)
Pi ;j (x(i); x(j)) log
Pi ;j (x(i); x(j))
Pi (x(j))Pi (x(i))
.
Tree E s.t.
Σ
fi ;jg2E I (i ; j) ! max
.
.D(P1; ;NjjQ) ! min
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable2s / 26
3. The Chow-Liu Algorithm
Example
Q(x(1); x(2); x(3); x(4))
=
P1;2(x(1); x(2))P1;3(x(1); x(3))P1;4(x(1); x(4))
P1(x(1))P2(x(1)) P1(x(1))P3(x(1)) P1(x(1))P4(x(4))
P1(x(1))P2(x(2))P3(x(3))P4(x(4))
= P(x(1))P(x(2)jx(1))P(x(3)jx(1))P(x(4)jx(1))
i 1 1 2 1 2 3
j 2 3 3 4 4 4
I (i ; j) 12 10 8 6 4 2
j j
1 3
j j
2 4
j j
1 3
j j
2 4
j j
1 3
j j
2 4
j j
1 3
@@
j j
2 4
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable3s / 26
4. The Chow-Liu Algorithm
Dendroid Distribution
X(1); ; X(N): Discrete Random Variables
V := f1; ;Ng
E ffi ; jgji̸= j ; i ; j 2 Vg
Q(x(1); ; x(N)jE) =
Π
fi ;jg2E
Pi ;j (x(i); x(j))
Pi (x(i))Pj (x(j))
Π
i2V
Pi (x(i)) ;
fPi (x(i))gi2V , fPi ;j (x(i); x(j))gi̸=j : from P1; ;N(x(1); ; x(N))
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable4s / 26
5. The Chow-Liu Algorithm
Contribution
.
Starting from Data
.
.Learning rather than Approximation
distribution P1; ;N
data xn = f(x(1)
i ; ; x(N)
i )gni
=1
.
In any database,
..
.some
6. elds are discrete and others continuous
Joe Suzuki: A Construction of Bayesian Networks from Databases
Based on an MDL Principle, UAI 1993
David Edwords, et. al: Selecting high-dimensional mixed graphical
models using minimal AIC or BIC forests, BMC Informatics 2010
Joe Suzuki: Learning Bayesian network structures when discrete and
continous variables are present, PGM 2014
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable5s / 26
7. The Chow-Liu Algorithm
Maximum Likelihood (ML)
f^P
i (x(i))gi2V , f^P
i ;j (x(i); x(j))gi̸=j are obtained from xn
ML Estimation of MI:
^I
(i ; j) :=
Σ
x(i)
Σ
x(j)
^P
i ;j (x(i); x(j)) log
^P
i ;j (x(i); x(j))
^P
i (x(j))^P
i (x(i))
Empirical Entropy given E (minus Likelihood given E):
^H
n(xnjE) := n
Σ
i2V
^H
(i ) n
Σ
fi ;jg2E
^I
(i ; j)
.
ML seeks a tree even if X(1); X(N) are independent
.
.The true graph is not obtained even if n ! 1
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable6s / 26
8. The Chow-Liu Algorithm
Prior Distribution over Forest (V; E)
pij : the prior probability of X(i) ?? X(j)
(E) :=
1
K
Π
fi ;jg2E
1 pij
pij
K :=
Σ Π
fi ;jg2E
1 pij
pij
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable7s / 26
9. The Chow-Liu Algorithm
Minimum Description Length (Suzuki, UAI-1993)
R(i ) =
∫
P(fx(i)
k
gnk
=1
j)w()d
R(i ; j) =
∫
P(fx(i)
k ; x(j)
k
gnk
=1
j)w()d
Rn(xnjE) :=
Π
fi ;jg2E
R(i ; j)
R(i )R(j)
Π
i2V
R(i )
L(xnjE) := log R(xnjE)
Description Length:
l(xn) = log (E) + L(xnjE) ! min
Bayesian Estimation of MI:
J(i ; j) :=
1
n
log
R(i ; j)
R(i )R(j)
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable8s / 26
11. nd
k(E): # of Parameters in E
(i): # of values X(i) takes
L(xnjE) ^H
n(xnjE) +
1
2
k(E) log n
l(xn) ^H
n(xnjE) +
1
2
k(E) log n log (E)
J(i ; j) ^I
(i ; j) 1
2n
((i) 1)((j) 1) log n 1
n
log
1 pij
pij
the orders of choosing edges are different
J(i ; j) could be negative and makes a forest while ^I
(i ; j) makes a tree
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreete aAnIGdMcon2t0i1n4u,ouPsarvisariable9s / 26
13. nte set A
.
There exists Rn s.t.
.
1
n
log
Pn(xn)
Rn(xn)
! 0
(xn 2 An) with Pn-Probability one as n ! 1 for any Pn.
P(i) =
Πn
k=1 P(x(i)
k ) , P(i ; j) =
Πn
k=1 P(x(i)
k ; x(j)
k )
1
n
log
P(i )
R(i )
! 0 ;
1
n
log
P(i ; j)
R(i ; j)
! 0
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e0s / 26
14. The Chow-Liu Algorithm
Consistency
Qn(xnjE) :=
Π
fi ;jg2E
P(i ; j)
P(i )P(j)
Π
i2V
P(i )
with Prob. 1 as n ! 1 for any Qn(jE)
1
n
log
Qn(xnjE)
Rn(xnjE)
! 0
For large n,
(E1)Q(xnjE1) (E2)Q(xnjE2) () (E1)R(xnjE1) (E2)R(xnjE2)
A maximum posterior probability forest is obtained for large n.
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e1s / 26
15. The Chow-Liu Algorithm
ML vs MDL
ML MDL
Choices Minimize Minimize
of E ^H
n(xnjE) ^H
n(xnjE)
2k(E) log n log (E)
+1
Choices of fi ; jg Maximize ^I
(i ; j) Maximize J(i ; j)
Criteria Fitness of xn to E Fitness of xn to E
and Simplicity of E
Consistency No Yes
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e2s / 26
16. When Density Exists
When density f exists for X (Ryabko, 2009)
A0 := fAg
Aj+1 is a re
17. nement of Aj
for each j , xn = (x1; ; xn) 2 Rn7! (a(j)
1 ; ; a(j)
n ) 2 Anj
...
...
...
...
-
-
-
A1
A2
Aj
gn
1 (xn) =
Rn
1 (a(1)
1 ; ; a(1)
n )
(a(1)
1 ) (a(1)
n )
gn
2 (xn) =
Rn
2 (a(2)
1 ; ; a(2)
n )
(a(2)
1 ) (a(2)
n )
gn
j (xn) =
Rn
j (a(j)
1 ; ; a(j)
n )
(a(j)
1 ) (a(j)
n )
: Lebesgue measure (width of interval), Rn
j : Universal Measure w.r.t. Aj
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e3s / 26
18. When Density Exists
Σ
j wj = 1, wj 0
gn(xn) :=
1Σ
j=1
wjgn
j (xn)
f : density function
fj (density function of level j)
f n(xn) := f (x1) f (xn)
.
Ryabko 2009
.
for any f s.t. D(f jjfj ) ! 0 (j ! 1)
.
1
n
log
f n(xn)
gn(xn)
! 0
as n ! 1
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e4s / 26
19. When Density does not exists
Extensions from Ryabko 2009
Remove the assumption that a density exists.
Remove the restricion of density class
for any f s.t. D(f jjfj ) ! 0 (j ! 1) ! for any f
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e5s / 26
20. When Density does not exists
When density does not exist for X (Suzuki 2011)
B1 := ff1g; f2; 3; gg
B2 := ff1g; f2g; f3; 4; gg
: : :
Bk := ff1g; f2g; ; fkg; fk + 1; k + 2; gg
: : :
for each level k, xn = (x1; ; xn) 2 Nn7! (b(k)
1 ; ; b(k)
n ) 2 Bn
k
(fkg) =
1
k
1
k + 1
gn
k (yn) :=
Rn
k (b(k)
1 ; ; b(k)
n )
(b(k)
1 ) (b(k)
n )
Σ
!k = 1, !k 0, gn(xn) :=
1Σ
k=1
!kgn
k (xn)
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e6s / 26
21. When Density does not exists
D(f jjfj )̸! 0 as j ! 1 (1)
∫ 1
1
2
f (x)dx 0
-
0 1 x
C0
C1
C2
C3
...
...
...
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e7s / 26
22. When Density does not exists
D(f jjfj )̸! 0 as j ! 1 (2)
∫ 1
1
f (x)dx 0
-
0 1 x
C0
C1
C2
C3
...
...
...
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e8s / 26
23. When Density does not exists
D(f jjfj ) ! 0 as j ! 1
Universal Histogram Sequence fCkg1
k=0
... ...
-
x
C0
C1
C2
C3
...
.
Suzuki 2013
.
For any (generalized) density f as n ! 1 with Prob. 1
.
1
n
log
f n(xn)
gn(xn)
! 0
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl1e9s / 26
24. When Density does not exists
Computing gn(xn)
Input xn 2 An, output gn(xn)
1. For each k = 1; ;K, gn
k (xn) := 0
2. For each k = 1; ;K and each a 2 Ak , ck (a) := 0
3. For each i = 1; ; n, for each k = 1; ;K
1. Find ai 2 Ak from xi 2 A
2. gn
k (xn) := gn
k (xn) log
ck (ai ) + 1=2
i 1 + jAk j=2
+ log(X (ai ))
3. ck (ai ) := ck (ai ) + 1
. 4 gn(xn) := 1K
ΣK
k=1 gn
k (xn)
Universal Measure w.r.t. Ak
Rn
k (xn) =
Πn
i=1
c(a(k)
i ) + 1=2
i 1 + jAk j=2
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl2e0s / 26
25. When Density does not exists
Computation: O(nN2K)
.
Computing gn(xn) and gn(xn; yn)
.
O(nN2K)
(O(nN2) for discrete case)
.
Proportional to n and N + N(N 1)=2
a(1)
7! a(2)
7! 7! a(K)
i
i
i : Binary Search
Proprtional to K
gn(xn; yn) can be obtained by
ΣK
k=1
!kgn
k;k (xn; yn) rather than
ΣJ
j=1
ΣK
k=1
!jkgn
jk (xn; yn).
.
Computng MI and
26. nding the forest
.
.N(N 1)=2
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl2e1s / 26
27. When Density does not exists
Bayesian Estimator of Mutual Information
J(i ; j) =
1
n
log
gn(i ; j)
gn(i )gn(j)
1
n
log
1 pi ;j
pij
age height menarche sex igf1 tanner testvol weight
age NA 0.7627465 0.8521553 0.01010264 0.5138440 0.52534862 0.1997714 0.6091554
height NA NA 0.6706380 0.26225428 0.4132932 0.68547041 0.3105466 0.9269808
menarche NA NA NA 0.68786102 0.4919746 0.84283639 0.0000000 0.6456718
sex NA NA NA NA 0.2778511 0.08923994 0.1083901 0.1925525
igf1 NA NA NA NA NA 0.47529101 0.2272998 0.3722551
tanner NA NA NA NA NA NA 0.3796768 0.6420483
testvol NA NA NA NA NA NA NA 0.2409487
weight NA NA NA NA NA NA NA NA
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl2e2s / 26
28. When Density does not exists
R ISwR package juul2
The juul data frame has 1339 rows and 6 columns. It contains a reference
sample of the distribution of insulin-like growth factor (IGF-I), one
observation per subject in various ages, with the bulk of the data collected
in connection with school physical examinations.
age
menar
-che
weight height
sex
tanner
igf1
testvol
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl2e3s / 26
29. When Density does not exists
Experiments
n 100 500 1000 2000
Jn(i ; j) 0.90 0.99 1.86 3.15
HSIC 0.50 9.51 40.28 185.53
(a) N = 4
n 100 500 1000 2000
perfectly matching rate 0.52 0.60 0.72 0.79
K-L divergence loss 0.0169 0.00303 0.00152 0.000405
execution time (sec) 1.64 12.71 22.45 51.24
(b) N = 4
n 100 500 1000 2000
perfectly matching rate 0.18 0.31 0.38 0.59
K-L divergence loss 0.0652 0.00800 0.00575 0.00298
execution time (sec) 4.27 24.44 52.5 116.1
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl2e4s / 26
31. Conclusion
Conclusion
.
Establish Chow-Liu Learning based on MDL without assuming either
Discrete or Continuous
.
.
Theoretical Analysis w.r.t. n;N;K (K: quantization depth)
Realistic Computation using R
Insight:
The implimation is not hard
The computation is proportional to K
Future Works:
Optimal K w.r.t. n;N
Exponential Memory w.r.t. K
R Package Publication
Joe Suzuki (Osaka University) The Chow-Liu algorithm based on the MDL with discreeteAaIGndMco2n0t1i4n,uoPuasrisvariabl2e6s / 26