Restricted Boltzman Machine (RBM) presentation of fundamental theory

M&S
Restricted Boltzman
Machine
- Theory -
Seongwon Hwang

M&S
1. Scalar Function
θV cos0
θ
gtθV sin0

 jgtθViθVV )sin(cos 00
2
2
mv
mghE 
*Total Energy = Potential + Kinetic Energy

M&S
2. Principle of Minimum Energy
E
Principle of Maximum Entropy
Principle of Minimum Energy
Equilibrium at fixed internal energy
Equilibrium at fixed entropy
EquilibriumUnstable
S

M&S
In Neural Network
Supervised Model
),,( jiij yxWE
ix
jy
Input variables
Output variables Energy = - Correlation
ijW
Unsupervised Model
ix Input variables
),( iij xWE
Energy with input variables =
- Correlation
ijW

M&S
In Neural Network
Unsupervised Model with Hidden units
),,( jiij hvWE
iv
jh
Visible variables
Hidden variables Energy = - Correlation
ijW
Energy Correlation

M&S
In Neural Network
Learning in unsupervised model
x
),( xWE
data
xmin
x x
),( xWE
data
x
'WW 

M&S
How we make energy in neural
network?
Hopfield Neural Network

M&S
Two constraints
1. Symmetric weight between neurons
2. Asynchronously learning required for stable state
jiij WW 
3x
1x 2x
3x
1x 2x

M&S
Two constraints
1. Symmetric weight between neurons
2. Asynchronously learning required for stable state
jiij WW 
3x
1x 2x
3x
1x 2x
3x
1x 2x
Randomly activate node

M&S
Define energy by Hopfield
1x
2x
3x
5x
4x


ji
ijji wxxE
2
3
1
1
2
4 3
}1,0{ix

M&S
Example for intuition
1x
2x
3x
5x
4x
2
3
1
1
2
4 3
11 x
}1,0{ix
12 x 13 x 04 x 05 x
01 x 12 x 13 x 04 x 15 x
7E
6E
... ...

M&S
Application - Data store
1
2x
3x
5x
4x
2
3
1
1
2
4 3
}1,0{ix

M&S
Application - Data store
1
1
1
0
0
2
3
1
1
2
4 3
}1,0{ix

M&S
Learning in Hopfield Network
1x
2x
3x
5x
4x
12w
15w
13w
45w
35w
23w
34w
}1,0{ix


ji
ijji wxxE
Several dataset

 ijij ww
Weight uptdate

M&S
Overview
Energy Correlation
Probability Correlation
Hopfield Neural Network
Boltzman Machine

M&S
Overview
Probablity Correlation
Boltzman Machine




j
vE
vE
i j
i
e
e
vvP )(
)(
)(
Energy
Boltzman Distribution

M&S
Thermalphysics for boltzman
distribution

M&S
Macrostate Vs. Microstate
TH H
10 100 500
HH T
HT H
TH T
TT H
HT T
HH H
TT T
Total number of
microstate: 8
Microstate 1
Microstate 2
Microstate 3
…
Position, Velocity…

M&S
TH H
10 100 500
HH T
HT H
TH T
TT H
HT T
HH H
TT T
Total number of
macrostate : 4
1
T
2
0
3

M&S
TH H
10 100 500
HH T
HT H
TH T
TT H
HT T
HH H
TT T
Total number of
macrostate : 4
2
H
1
3
0
Temperature,
Pressure…

M&S
Canonical Ensemble (NVT Ensemble)
N, V, T, Fixed ensemble of microstates
0,,, ETVN

M&S
Canonical Ensemble (NVT Ensemble)
N, V, T, Fixed ensemble of microstates
0,,, ETVN
1,,, ETVN

M&S




j
TkE
TkE
i Bj
Bi
e
e
EP /
/
)(
!...!!
!
210 NNN
N
W 
SW ln
≈
Maximum Entropy!
Number of cases Number of particles of total system
ith microstate’s number
of particle
...),0,0,0,(N
...),0,0,2,2( N
...),0,1,2,3( N
...
Number of cases

M&S




j
TkE
TkE
i Bj
Bi
e
e
EP /
/
)(
)( iEP
iE

M&S
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
0 1 0 1
2 0 0 0 1 0 0 0 0 3
0 0 1 0 0 0 0 1 0 0
0 0 0 0 4 0 0 6 0 0
0 0 2 0 0 0 3 2 0 1
0 3 0 0 7 1 0 0 0 0
0 0 1 0 4 5 3 0 1 0
0 0 0 0 2 0 0 0 0 0
0 0 0 0 0 1 0 0 0 1
0 1 2 3 4 5 6 7
2




j
TkE
TkE
i Bj
Bi
e
e
EP /
/
)(
1 1 1 1 1 1 1 1 1 1
1 1 2 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 0 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1

M&S
H
2 0 0 0 1 0 0 0 0 3
0 0 1 0 0 0 0 1 0 0
0 0 0 0 4 0 0 6 0 0
0 0 2 0 0 0 3 2 0 1
0 3 0 0 7 1 0 0 0 0
0 0 1 0 4 5 3 0 1 0
0 0 0 0 2 0 0 0 0 0
0 0 0 0 0 1 0 0 0 1
0 1 2 3 4 5 6 7
Intuition for connection between Physics and Network

M&S
H
2 0 0 0 1 0 0 0 0 3
0 0 1 0 0 0 0 1 0 0
0 0 0 0 4 0 0 6 0 0
0 0 2 0 0 0 3 2 0 1
0 3 0 0 7 1 0 0 0 0
0 0 1 0 4 5 3 0 1 0
0 0 0 0 2 0 0 0 0 0
0 0 0 0 0 1 0 0 0 1
0 1 2 3 4 5 6 7
Intuition for connection between Physics and Network
As energy changes
Changes of molecular struture
Changes of configuration of network
Physics
Network

M&S
Helmholtz free energy



j
Eβ
bb
j
eTkZTkF )ln(ln
= Free energy associated with Canonical Ensemble

M&S
Overview
Probability




j
vE
vE
i j
i
e
e
vP )(
)(
)(
Energy
Configurations
...),1,0,1,0(1 v ...),1,0,1,1(2 v
N
2
N-dimensional binary data
...
1v 2v 3v 4v
5v 6v 7v

M&S
Overview
Probability
 


lk
hvE
hvE
ii lk
ji
e
e
hvP
.
),(
),(
),(
Energy
1v 2v 3v 4v
1h 2h 3h

M&S
Restricted Boltzman Machine

M&S
Restriction – NO connections between H and V respectvely
Boltzman Machine
1v 2v 3v 4v
1h 2h 3h
1v 2v 3v 4v
1h 2h 3h

M&S
Restriction – NO connections between H and V respectvely
1v 2v 3v 4v
1h 2h 3h
Conditional Independent!
)|()|()|,( CBPCAPCBAP 
)|()|()|,( 1111 vhPvhPvhhP  

j
j vhPvhP )|()|(

i
i hvPhvP )|()|(
General Form

M&S
Energy from Hopfield Network
  
i j j
jj
i
iijiij hcvbhvwhvE ),(
1v 2v 3v 4v
1h 2h 3h






hv
hvE
hvE
j
vE
vE
i
e
e
hvP
e
e
vvP j
i
,
),(
),(
)(
)(
),()(
v‘ bias
h‘ bias

M&S
Two Important Conditional Probabilites! – First
 
i
jiijj cvwσvhP )()|1(
1v 2v 3v 4v
1h 2h 3h
x
x
e
e
xσ


1
)(

M&S
Two Important Conditional Probabilites! – Second
1v 2v 3v 4v
1h 2h 3h
 
i
ijiji bhwσhvP )()|1(

M&S
Generative Vs. Discriminative Model
),(),|( yxPyxP
Y
X
Y
X
)|( xyP
EX) Gaussians, Sigmoid Belief Networks,
Bayesian Networks
EX) Neural Network, Logistic Regression,
Support Vector Machine
<Generative Model> <Discriminative Model>
RBM

M&S
Maximum Likelihood Estimator
Population
Sample
Maximizing the possibility based on observed samples
to estimate unobserved parameters of population

M&S
Maximum Likelihood Estimator
EX) What is the probability of coin in the case of head?
H H T
322
)1()|()( ppppθxPθL 
032
)( 2



pp
p
pL
3
2
p

M&S
Learning in RBM
Cost = Negative Log-Likelihood (NLL)
)|(ln)|( θvPvθNLL   

hv
hvE
h
hvE
ee
,
),(),(
lnln
model),(),(
)|(









 hvE
θ
hvE
θθ
vθNLL
data
Gradient Discent for NLL
<...> Expectation
Free Energy 1 Free Energy ∞
Positive Phase Negative Phase

M&S
Learning in RBM
model),(),(
)|(









 hvE
θ
hvE
θθ
vθNLL
data
1 0 1 1
1h 2h 3h}1,0{jh
Easy to compute!

M&S
Learning in RBM
model),(),(
)|(









 hvE
θ
hvE
θθ
vθNLL
data
1v 2v 3v 4v
1h 2h 3h}1,0{jh
Hard to compute!
}1,0{iv
mn
2
Total number of possible
configurations:

M&S
Markov Chain Monte Carlo (MCMC)
1. Markov Chain
𝒑 𝒛 𝒕 𝒙 𝟏:𝒕, 𝒛 𝟏:𝒕, 𝒖 𝟏:𝒕) = 𝒑 𝒛 𝒕 𝒙 𝒕)
𝒑 𝒙 𝒕 𝒙 𝟏:𝒕−𝟏, 𝒛 𝟏:𝒕, 𝒖 𝟏:𝒕) = 𝒑 𝒙 𝒕 𝒙 𝒕−𝟏, 𝒖 𝒕)
First-order Markov chain is that next state depends only on immediately
preceding one, Second or higher order’s next state depends on two or more
preceding ones.

M&S
Markov Chain Monte Carlo (MCMC)
2. Monte Carlo – Compute the value statistically by using
random numbers
samplesofnumberTotal
circletheinsampleofNumber

4
π
22
yx  1
<Evalutation>
VS.
Sampling
EX) Compute circular constant

M&S
Gibbs Sampling
1. Set up initial values randomly
Multi-Dimensional Variants
...,, 321 xxx ...),,,( 321 xxxp
Joint Probability or Conditional Probability
or
2. Sampling with conditional distribution
3. Perform this to reach stationary value
- Algorithm -
...),0,1,1,0,0,0,1( ...),0,1,1,1,0,1,0( ...),1,1,1,0,1,0,1(
0r 1r 2r
)|( 01 rrp )|( 12 rrp
...
)|( ii xxp 

M&S
k-th Contrastive Divergence
1. Usage of real data as initial values
2. kth sample is equal to expectation of desirable distribution
- Characteristics -
...),0,1,1,0,0,0,1( ...),0,1,1,1,0,1,0( ...),1,1,1,0,1,0,1(
data
1r 2r
)|( 01 rrp )|( 12 rrp
2k

M&S
k-th Contrastive Divergence
1. Usage of real data as initial values
2. kth sample is equal to expectation of desirable distribution
3. That k is 1 is enough to be converged since the real data is used
as initial valuesData
- Characteristics -
...),0,1,1,0,0,0,1( ...),0,1,1,1,0,1,0(
data
1r
)|( 01 rrp

M&S
Approximation in RBM
model),( 


 hvE
θ

m
m
xf
m
xf )(
1
)( model
)()()(
1 1
 kk
m
m
xfxfxf
m
MCMC_Gibbs sampling
CD_k=1

M&S
Sampling Algorithm in RBM
1st Step
1 0 1 1
1h 2h 3h
- Usage of real data as an initial value
dataInput
}1,0{jh
}1,0{iv

M&S
2nd Step
1 0 1 1
1 2h 3h
- Sampling each hidden unit with conditional probability
starting from initial values
dataInput
}1,0{jh
}1,0{iv

M&S
2nd Step
1 0 1 1
1 2h 3h
dataInput
}1,0{jh
}1,0{iv
 
i

M&S
2nd Step
1 0 1 1
1 0 3h
dataInput
}1,0{jh
}1,0{iv
 
i

M&S
2nd Step
1 0 1 1
1 0 1
dataInput
}1,0{jh
}1,0{iv
 
i

M&S
3rd Step
0 0 1 1
1 0 1
- Sampling each input unit with conditional probability
starting from sampled hidden units
}1,0{jh
}1,0{iv
Reconstruction! Generative Model!

M&S
3rd Step
0 0 1 1
1 0 1
}1,0{jh
}1,0{iv
 
i

M&S
3rd Step
0 0 1 0
1 0 1
}1,0{jh
}1,0{iv
 
i
ijiji bhwσhvP )()|1(CD_k=1

M&S
4th Step - k times performing CD_k
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
…
1h 2h ...h
1v 2v ...v
t = 0 t = 1 t = ∞ ≈ k

M&S
4th Step - k times performing CD_k=1
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
…
1h 2h ...h
1v 2v ...v
t = 0 t = 1 t = ∞ ≈ k
Data Model

M&S
Learning in RBM
model),(),(
)|(









 hvE
θ
hvE
θθ
vθNLL
data
<...> Expectation
cbwθ ,,
ji
ij
hv
w
hvE


 ),(
i
i
v
b
hvE


 ),(
j
j
h
c
hvE


 ),(
  
i j j
jj
i
iijiij hcvbhvwhvE ),(

M&S
Learning in RBM
model),(),(
)|(









 hvE
θ
hvE
θθ
vθNLL
data
<...> Expectation
cbwθ ,,
ji
ij
hv
w
hvE


 ),(
i
i
v
b
hvE


 ),(
j
j
h
c
hvE


 ),(
)( model jidatajiwij hvhvηwΔ
)( model idataibi vvηbΔ
)( model jdatajcj hhηcΔ

M&S
Learning in RBM
model),(),(
)|(









 hvE
θ
hvE
θθ
vθNLL
data
<...> Expectation
cbwθ ,,
)( model jidatajiwij hvhvηwΔ
)( model idataibi vvηbΔ
)( model jdatajcj hhηcΔ
 
i
ijiijji vcvwσhv )(
 
i
jiijj cvwσh )(
ii vv 

M&S
Learning in RBM
model),(),(
)|(









 hvE
θ
hvE
θθ
vθNLL
data
<...> Expectation
cbwθ ,,
ij
t
ij
t
ij wΔww 1
i
t
i
t
i bΔbb 1
j
t
j
t
j cΔcc 1

M&S
Learning in RBM
model),(),(
)|(









 hvE
θ
hvE
θθ
vθNLL
data
<...> Expectation
cbwθ ,,
)(
)(1 k
iib
t
i
t
i vvηbb 
))()((
)()(1
 
i
k
ij
k
iij
i
ijiijw
t
ij
t
ij vcvwσvcvwσηww
))()((
)(1
 
i
j
k
iij
i
jiijc
t
j
t
j cvwσcvwσηcc
ModelData

M&S
Intuition for RBM
)|( vθNLL  

hv
hvE
h
hvE
ee
,
),(),(
lnln
Model
Data ),( hvE
),( hvE
Energy Surface in global configurations
Datapoint + Hidden(datapoint)
Reconstruction + Hidden(reconstructio
Sampling

M&S
Intuition for RBM
)|( vθNLL  

hv
hvE
h
hvE
ee
,
),(),(
lnln
Model
Data ),( hvE
),( hvE

M&S
Intuition for RBM
)|( vθNLL  

hv
hvE
h
hvE
ee
,
),(),(
lnln
Sampling Direction
Sampling
Global Minimum
Global MinimumDatapoint

M&S
Intuition for RBM
)|( vθNLL  

hv
hvE
h
hvE
ee
,
),(),(
lnln
Sampling Direction
Datapoint




j
vE
vE
i j
i
e
e
vvP )(
)(
)(
ith configuration
Overall configuration

M&S
Intuition for RBM
Sampling Direction
Global Minimum
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
t = 0 t = 1
 
i
 
i
 


lk
hvE
hvE
ii lk
ji
e
e
hvP
.
),(
),(
),(
)( iEP
iE
Energy

M&S
Intuition for RBM
Sampling Direction
Global Minimum
1v 2v ...v
1h 2h ...h
1v 2v ...v
1h 2h ...h
t = 0 t = 1
…
1v 2v ...v
1h 2h ...h
t = ∞
Energy
Sampling
Global Minimum

M&S
Intuition for RBM
Contrastive Divergence (CD)
PCD Vs. CD
Global Minimum

M&S
Intuition for RBM
Persistent Contrastive Divergence (PCD)
PCD Vs. CD
Global Minimum

M&S
Intuition for RBM
PCD Vs. CD
Global Minimum
Previous sample point

M&S
Intuition for RBM
PCD Vs. CD
Global Minimum
Previous sample point
Winner is PCD!

M&S
Practice
Input Data 1th epoch Reconstruction

M&S
Practice
11th epoch Reconstruction 61th epoch Reconstruction

M&S
In Reality – Unsupervised Pretraining
1v 2v 3v ...v
1h 2h ...h
1h 2h 3h ...h
1y 2y ...y
Pretraining!

Restricted Boltzman Machine (RBM) presentation of fundamental theory

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Restricted Boltzman Machine (RBM) presentation of fundamental theory

Similar to Restricted Boltzman Machine (RBM) presentation of fundamental theory (20)

Recently uploaded

Recently uploaded (20)

Restricted Boltzman Machine (RBM) presentation of fundamental theory