2. OUTLINE
• Unsupervised Feature Learning
• Deep vs. Shallow Architectures
• Restricted Boltzman Machines
• Deep Belief Networks
• Greedy Layer-wise Deep Training Algorithm
• Conclusion
3. Unsupervised Feature Learning
• Transformation of "raw"
inputs to a
representation
• We have almost
unlabeled data so we
need an unsupervised
way of learning
• DBNs are graphical
models which learn to
extract a deep
hierarchical
representation of the
training data.
4. Deep vs. Shallow Architecture
• Perceptron, Multilayer NNs (lacks to train unlabeled data), SVMs,…
• Shallow architectures contain a fixed feature layer (or base function)
and a weight-combination layer
• Deep architectures are compositions of many layers of adaptive non-
linear components(DBNs, CNNs, …)
5. Restricted Boltzman Machines
• The main building block of a DBN is a bipartite undirected graphical model called
the Restricted Boltzmann Machine (RBM).
• More technically, a Restricted Boltzmann Machine is a stochastic neural network
(neural network meaning we have neuron-like units whose binary activations
depend on the neighbors they’re connected to; stochastic meaning these
activations have a probabilistic element) consisting of:
Restriction? To make learning easier, we restrict the network so that no visible
unit is connected to any other visible unit and no hidden unit is connected to
any other hidden unit.
6. Deep Belief Networks
• DBNs can be viewed as a composition of simple, unsupervised
networks i.e. RBMs + Sigmoid Belief Networks
• The greatest advantage of DBNs is its
capability of “learning features”, which is
achieved by a ‘layer-by-layer’ learning
strategies where the higher level features
are learned from the previous layers
7. Greedy Layer-wise Deep Training
• Idea: DBNs can be formed by “stacking” RBMs
• Each layer is trained as a Restricted Boltzman Machine.
• Train layers sequentially starting from bottom (observed data) layer. (Greedy
layer-wise)
• Each layer learns a higher-level representation of the layer below. The
training criterion does not depend on the labels. (Unsupervised)
8. Greedy Layer-wise Deep Training
• The principle of greedy layer-wise unsupervised training can be
applied to DBNs with RBMs as the building blocks for each layer
[Hinton06], [Bengio07]
• 1. Train the first layer as an RBM that models the raw input x =
• h0 as its visible layer.
• 2. Use that first layer to obtain a representation of the input that will be used as data for the
second layer. Two common solutions exist. This representation can be chosen as being the
mean activations p(h1 = 1| h0}) or samples of p(h1 | h0}).
• 3. Train the second layer as an RBM, taking the transformed data (samples or mean
activations) as training examples (for the visible layer of that RBM).
• 4. Iterate (2 and 3) for the desired number of layers, each time propagating upward either
samples or mean values.
• 5. Fine-tune all the parameters of this deep architecture with respect to a proxy for the DBN
log- likelihood, or with respect to a supervised training criterion (after adding extra learning
machinery to convert the learned representation into supervised predictions, e.g. a linear
classifier).
12. DBNs Training
After Layer-wise unsupervised pre-
training good initializations are
obtained
Fine tune the whole network (i.e. by
backpropagation/wake-sleep) w.r.t. a
supervised criterion
13. Conclusion
• Deep learning represents a more intellectual behavior
(learning features) compared with the other traditional
machine learning.
• A central idea, referred to as greedy layerwise
unsupervised pre-training, was to learn a hierarchy of
features one level at a time, using unsupervised feature
learning to learn a new transformation at each level to be
composed with the previously learned transformations;
essentially, each iteration of unsupervised feature learning
adds one layer of weights to a deep neural network.
Finally, the set of layers could be combined to initialize a
deep supervised predictor, such as a neural network
classifier, or a deep generative model
15. References
• Dandan Mo. A survey on deep learning: one small step
toward AI . 2012
• Geoffrey E Hinton. A Fast Learning Algorithm for Deep
Belief Nets. 1554:1527–1554, 2006.
• Yoshua Bengio. Learning Deep Architectures for AI,
volume 2. 2009.