3. M&S
1. Scalar Function
θV cos0
θ
gtθV sin0
jgtθViθVV )sin(cos 00
2
2
mv
mghE
*Total Energy = Potential + Kinetic Energy
4. M&S
2. Principle of Minimum Energy
E
Principle of Maximum Entropy
Principle of Minimum Energy
Equilibrium at fixed internal energy
Equilibrium at fixed entropy
EquilibriumUnstable
S
5. M&S
In Neural Network
Supervised Model
),,( jiij yxWE
ix
jy
Input variables
Output variables Energy = - Correlation
ijW
Unsupervised Model
ix Input variables
),( iij xWE
Energy with input variables =
- Correlation
ijW
6. M&S
In Neural Network
Unsupervised Model with Hidden units
),,( jiij hvWE
iv
jh
Visible variables
Hidden variables Energy = - Correlation
ijW
Energy Correlation
22. M&S
Macrostate Vs. Microstate
TH H
10 100 500
HH T
HT H
TH T
TT H
HT T
HH H
TT T
Total number of
microstate: 8
Microstate 1
Microstate 2
Microstate 3
…
Position, Velocity…
27. M&S
Boltzman Distribution
j
TkE
TkE
i Bj
Bi
e
e
EP /
/
)(
!...!!
!
210 NNN
N
W
SW ln
≈
Maximum Entropy!
Number of cases Number of particles of total system
ith microstate’s number
of particle
...),0,0,0,(N
...),0,0,2,2( N
...),0,1,2,3( N
...
Number of cases
36. M&S
Restriction – NO connections between H and V respectvely
Boltzman Machine
Restricted Boltzman Machine
1v 2v 3v 4v
1h 2h 3h
1v 2v 3v 4v
1h 2h 3h
37. M&S
Restriction – NO connections between H and V respectvely
Restricted Boltzman Machine
1v 2v 3v 4v
1h 2h 3h
Conditional Independent!
)|()|()|,( CBPCAPCBAP
)|()|()|,( 1111 vhPvhPvhhP
j
j vhPvhP )|()|(
i
i hvPhvP )|()|(
General Form
38. M&S
Energy from Hopfield Network
i j j
jj
i
iijiij hcvbhvwhvE ),(
1v 2v 3v 4v
1h 2h 3h
hv
hvE
hvE
j
vE
vE
i
e
e
hvP
e
e
vvP j
i
,
),(
),(
)(
)(
),()(
v‘ bias
h‘ bias
39. M&S
Two Important Conditional Probabilites! – First
i
jiijj cvwσvhP )()|1(
1v 2v 3v 4v
1h 2h 3h
x
x
e
e
xσ
1
)(
41. M&S
Generative Vs. Discriminative Model
),(),|( yxPyxP
Y
X
Y
X
)|( xyP
EX) Gaussians, Sigmoid Belief Networks,
Bayesian Networks
EX) Neural Network, Logistic Regression,
Support Vector Machine
<Generative Model> <Discriminative Model>
RBM
43. M&S
Maximum Likelihood Estimator
EX) What is the probability of coin in the case of head?
H H T
322
)1()|()( ppppθxPθL
032
)( 2
pp
p
pL
3
2
p
44. M&S
Learning in RBM
Cost = Negative Log-Likelihood (NLL)
)|(ln)|( θvPvθNLL
hv
hvE
h
hvE
ee
,
),(),(
lnln
model),(),(
)|(
hvE
θ
hvE
θθ
vθNLL
data
Gradient Discent for NLL
<...> Expectation
Free Energy 1 Free Energy ∞
Positive Phase Negative Phase
47. M&S
Markov Chain Monte Carlo (MCMC)
1. Markov Chain
𝒑 𝒛 𝒕 𝒙 𝟏:𝒕, 𝒛 𝟏:𝒕, 𝒖 𝟏:𝒕) = 𝒑 𝒛 𝒕 𝒙 𝒕)
𝒑 𝒙 𝒕 𝒙 𝟏:𝒕−𝟏, 𝒛 𝟏:𝒕, 𝒖 𝟏:𝒕) = 𝒑 𝒙 𝒕 𝒙 𝒕−𝟏, 𝒖 𝒕)
First-order Markov chain is that next state depends only on immediately
preceding one, Second or higher order’s next state depends on two or more
preceding ones.
48. M&S
Markov Chain Monte Carlo (MCMC)
2. Monte Carlo – Compute the value statistically by using
random numbers
samplesofnumberTotal
circletheinsampleofNumber
4
π
22
yx 1
<Evalutation>
VS.
Sampling
EX) Compute circular constant
49. M&S
Gibbs Sampling
1. Set up initial values randomly
Multi-Dimensional Variants
...,, 321 xxx ...),,,( 321 xxxp
Joint Probability or Conditional Probability
or
2. Sampling with conditional distribution
3. Perform this to reach stationary value
- Algorithm -
...),0,1,1,0,0,0,1( ...),0,1,1,1,0,1,0( ...),1,1,1,0,1,0,1(
0r 1r 2r
)|( 01 rrp )|( 12 rrp
...
)|( ii xxp
50. M&S
k-th Contrastive Divergence
1. Usage of real data as initial values
2. kth sample is equal to expectation of desirable distribution
- Characteristics -
...),0,1,1,0,0,0,1( ...),0,1,1,1,0,1,0( ...),1,1,1,0,1,0,1(
data
1r 2r
)|( 01 rrp )|( 12 rrp
2k
51. M&S
k-th Contrastive Divergence
1. Usage of real data as initial values
2. kth sample is equal to expectation of desirable distribution
3. That k is 1 is enough to be converged since the real data is used
as initial valuesData
- Characteristics -
...),0,1,1,0,0,0,1( ...),0,1,1,1,0,1,0(
data
1r
)|( 01 rrp
53. M&S
Approximation in RBM
model),(
hvE
θ
m
m
xf
m
xf )(
1
)( model
)()()(
1 1
kk
m
m
xfxfxf
m
MCMC_Gibbs sampling
CD_k=1
54. M&S
Sampling Algorithm in RBM
1st Step
1 0 1 1
1h 2h 3h
- Usage of real data as an initial value
dataInput
}1,0{jh
}1,0{iv
55. M&S
Sampling Algorithm in RBM
2nd Step
1 0 1 1
1 2h 3h
- Sampling each hidden unit with conditional probability
starting from initial values
dataInput
}1,0{jh
}1,0{iv
56. M&S
Two Important Conditional Probabilites! – First
i
jiijj cvwσvhP )()|1(
1v 2v 3v 4v
1h 2h 3h
x
x
e
e
xσ
1
)(
57. M&S
Sampling Algorithm in RBM
2nd Step
1 0 1 1
1 2h 3h
- Sampling each hidden unit with conditional probability
starting from initial values
dataInput
}1,0{jh
}1,0{iv
i
jiijj cvwσvhP )()|1(
58. M&S
Sampling Algorithm in RBM
2nd Step
1 0 1 1
1 0 3h
- Sampling each hidden unit with conditional probability
starting from initial values
dataInput
}1,0{jh
}1,0{iv
i
jiijj cvwσvhP )()|1(
59. M&S
Sampling Algorithm in RBM
2nd Step
1 0 1 1
1 0 1
- Sampling each hidden unit with conditional probability
starting from initial values
dataInput
}1,0{jh
}1,0{iv
i
jiijj cvwσvhP )()|1(
60. M&S
Sampling Algorithm in RBM
3rd Step
0 0 1 1
1 0 1
- Sampling each input unit with conditional probability
starting from sampled hidden units
}1,0{jh
}1,0{iv
Reconstruction! Generative Model!
62. M&S
Sampling Algorithm in RBM
3rd Step
0 0 1 1
1 0 1
- Sampling each input unit with conditional probability
starting from sampled hidden units
}1,0{jh
}1,0{iv
Reconstruction! Generative Model!
i
ijiji bhwσhvP )()|1(
63. M&S
Sampling Algorithm in RBM
3rd Step
0 0 1 1
1 0 1
- Sampling each input unit with conditional probability
starting from sampled hidden units
}1,0{jh
}1,0{iv
Reconstruction! Generative Model!
i
ijiji bhwσhvP )()|1(
64. M&S
Sampling Algorithm in RBM
3rd Step
0 0 1 1
1 0 1
- Sampling each input unit with conditional probability
starting from sampled hidden units
}1,0{jh
}1,0{iv
Reconstruction! Generative Model!
i
ijiji bhwσhvP )()|1(
65. M&S
Sampling Algorithm in RBM
3rd Step
0 0 1 0
1 0 1
- Sampling each input unit with conditional probability
starting from sampled hidden units
}1,0{jh
}1,0{iv
Reconstruction! Generative Model!
i
ijiji bhwσhvP )()|1(CD_k=1
66. M&S
Sampling Algorithm in RBM
4th Step - k times performing CD_k
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
…
1h 2h ...h
1v 2v ...v
t = 0 t = 1 t = ∞ ≈ k
67. M&S
Sampling Algorithm in RBM
4th Step - k times performing CD_k=1
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
…
1h 2h ...h
1v 2v ...v
t = 0 t = 1 t = ∞ ≈ k
Data Model
68. M&S
Learning in RBM
model),(),(
)|(
hvE
θ
hvE
θθ
vθNLL
data
Gradient Discent for NLL
<...> Expectation
cbwθ ,,
ji
ij
hv
w
hvE
),(
i
i
v
b
hvE
),(
j
j
h
c
hvE
),(
i j j
jj
i
iijiij hcvbhvwhvE ),(
69. M&S
Learning in RBM
model),(),(
)|(
hvE
θ
hvE
θθ
vθNLL
data
Gradient Discent for NLL
<...> Expectation
cbwθ ,,
ji
ij
hv
w
hvE
),(
i
i
v
b
hvE
),(
j
j
h
c
hvE
),(
)( model jidatajiwij hvhvηwΔ
)( model idataibi vvηbΔ
)( model jdatajcj hhηcΔ
70. M&S
Learning in RBM
model),(),(
)|(
hvE
θ
hvE
θθ
vθNLL
data
Gradient Discent for NLL
<...> Expectation
cbwθ ,,
)( model jidatajiwij hvhvηwΔ
)( model idataibi vvηbΔ
)( model jdatajcj hhηcΔ
i
ijiijji vcvwσhv )(
i
jiijj cvwσh )(
ii vv
72. M&S
Learning in RBM
model),(),(
)|(
hvE
θ
hvE
θθ
vθNLL
data
Gradient Discent for NLL
<...> Expectation
cbwθ ,,
)(
)(1 k
iib
t
i
t
i vvηbb
))()((
)()(1
i
k
ij
k
iij
i
ijiijw
t
ij
t
ij vcvwσvcvwσηww
))()((
)(1
i
j
k
iij
i
jiijc
t
j
t
j cvwσcvwσηcc
ModelData
73. M&S
Intuition for RBM
Cost = Negative Log-Likelihood (NLL)
)|( vθNLL
hv
hvE
h
hvE
ee
,
),(),(
lnln
Model
Data ),( hvE
),( hvE
Energy Surface in global configurations
Datapoint + Hidden(datapoint)
Reconstruction + Hidden(reconstructio
Sampling
74. M&S
Intuition for RBM
Cost = Negative Log-Likelihood (NLL)
)|( vθNLL
hv
hvE
h
hvE
ee
,
),(),(
lnln
Model
Data ),( hvE
),( hvE
Energy Surface in global configurations
75. M&S
Intuition for RBM
Cost = Negative Log-Likelihood (NLL)
)|( vθNLL
hv
hvE
h
hvE
ee
,
),(),(
lnln
Sampling Direction
Energy Surface in global configurations
Sampling
Global Minimum
Global MinimumDatapoint
76. M&S
Intuition for RBM
Cost = Negative Log-Likelihood (NLL)
)|( vθNLL
hv
hvE
h
hvE
ee
,
),(),(
lnln
Sampling Direction
Datapoint
j
vE
vE
i j
i
e
e
vvP )(
)(
)(
ith configuration
Overall configuration
77. M&S
Intuition for RBM
Cost = Negative Log-Likelihood (NLL)
)|( vθNLL
hv
hvE
h
hvE
ee
,
),(),(
lnln
Sampling Direction
Datapoint
j
vE
vE
i j
i
e
e
vvP )(
)(
)(
ith configuration
Overall configuration
78. M&S
Intuition for RBM
Sampling Direction
Global Minimum
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
t = 0 t = 1
i
jiijj cvwσvhP )()|1(
i
ijiji bhwσhvP )()|1(
Boltzman Distribution
lk
hvE
hvE
ii lk
ji
e
e
hvP
.
),(
),(
),(
)( iEP
iE
Energy
79. M&S
Intuition for RBM
Sampling Direction
Global Minimum
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
t = 0 t = 1
i
jiijj cvwσvhP )()|1(
i
ijiji bhwσhvP )()|1(
Boltzman Distribution
lk
hvE
hvE
ii lk
ji
e
e
hvP
.
),(
),(
),(
)( iEP
iE
Energy
80. M&S
Intuition for RBM
Sampling Direction
Global Minimum
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
t = 0 t = 1
i
jiijj cvwσvhP )()|1(
i
ijiji bhwσhvP )()|1(
Boltzman Distribution
lk
hvE
hvE
ii lk
ji
e
e
hvP
.
),(
),(
),(
)( iEP
iE
Energy
81. M&S
Intuition for RBM
Sampling Direction
Global Minimum
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
t = 0 t = 1
i
jiijj cvwσvhP )()|1(
i
ijiji bhwσhvP )()|1(
Boltzman Distribution
lk
hvE
hvE
ii lk
ji
e
e
hvP
.
),(
),(
),(
)( iEP
iE
Energy
82. M&S
Intuition for RBM
Sampling Direction
Global Minimum
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
t = 0 t = 1
…
1v 2v ...v
1h 2h ...h
t = ∞
Energy
Energy Surface in global configurations
Sampling
Global Minimum