SlideShare a Scribd company logo
1 of 62
Download to read offline
Fundamental of Deep
Learning
STANLEY WANG
SOLUTION ARCHITECT, TECH LEAD
@SWANG68
http://www.linkedin.com/in/stanley-wang-a2b143b
What is Deep Learning?
Deep learning is a set of algorithms in machine learning that attempt to
model high-level abstractions in data by using architectures composed of
multiple non-linear transformations.
• Multiple Layer Deep Neural Networks
• Work for Media and Unstructured Data
• Automatic Feature Engineering
• Complex Architectures and Computationally Intensive
From Deep Learning to Artificial Intelligence
Evolution of Deep Learning
Neuron Perceptron Computing Model
-
d
Update
D0
D1
D2
Input
Layer
Output
Layer
Destinations
Perceptron:
Activation
functions:
Learning:
Artificial Neural Networks
Historical Background: First Generation ANN
• Perceptron (~1960) used a
layer of hand-coded
features and tried to
recognize objects by
learning how to weight
these features.
– There was a neat learning
algorithm for adjusting the
weights.
– But perceptron nodes are
fundamentally limited in
what they can learn to do.
Non-Adaptive
Hand Coded
Features
Output Class Labels
Input Feature
Sketch of a typical perceptron
from the 1960’s
Bomb Toy
Multiple Layer Perceptron ANN (1960~1985)
input vector
hidden
layers
outputs
Back-propagate
error signal to get
derivatives for
learning
Compare outputs with
correct answer to get
error signal
BP Algorithm
Activations
The error:
Update
Weights:
0
1
0
.5
-5 5
errorsUpdate
• It requires labeled training data.
 Almost all data is unlabeled
• The learning time does not scale well
 It is very slow in networks with
multiple hidden layers.
 It can get stuck in poor local
optima.
Disadvantages
Back Propagation Algorithm
• Multi layer Perceptron network can be
trained by the back propagation
algorithm to perform any mapping
between the input and the output.
Advantages
Support Vector Machines
• Vapnik and his co-workers developed a very clever type of
perceptron called a Support Vector Machine.
o Instead of hand-coding the layer of non-adaptive features, each
training example is used to create a new feature using a fixed
recipe.
• The feature computes how similar a test example is to that
training example.
o Then a clever optimization technique is used to select the best
subset of the features and to decide how to weight each feature
when classifying a test case.
• But its just a perceptron and has all the same limitations.
• In the 1990’s, many researchers abandoned neural networks with
multiple adaptive hidden layers because Support Vector Machines
worked better.
Deep Learning Neural Network Strike Back
Ideas of Deep Learning
Deep Learning - Architectures
Deep Learning – Pre Training
Deep Learning Architecture Types
• Feed Forward
 MLPs
 Auto Encoders
 RBMs
• Recurrent
 Multi Modal
 LSTMs
 Stateful
Deep Architecture – Stack of Auto Encoder
Deep Architecture - Stacked RBMs
Deep Architecture - Recursive Neural Network
Deep Architecture – Recurrent Neural Network
Deep Architecture - Convolutional Neural Network
Why Deep Learning so Successful?
Different Levels of Knowledge Abstraction
Composing Features on Features.
Types of Deep Learning Training Protocol
Greedy Layer-Wise Training
• Train first layer using your data without the labels (unsupervised)
 Since there are no targets at this level, labels don't help.
Could also use the more abundant unlabeled data which is
not part of the training set (i.e. self-taught learning).
• Then freeze the first layer parameters and start training the
second layer using the output of the first layer as the
unsupervised input to the second layer
• Repeat this for as many layers as desired
 This builds our set of robust features
• Use the outputs of the final layer as inputs to a supervised
layer/model and train the last supervised layers(leave early
weights frozen)
• Unfreeze all weights and fine tune the full network by training
with a supervised approach, given the pre-processed weight
settings
Unsupervised Greedy Layer-Wise Training
Procedure.
Benefit of Greedy Layer-Wise Training
• Greedy layer-wise training avoids many of the problems of
trying to train a deep net in a supervised fashion
o Each layer gets full learning focus in its turn since it is the
only current "top" layer
o Can take advantage of unlabeled data
o When you finally tune the entire network with supervised
training the network weights have already been adjusted so
that you are in a good error basin and just need fine tuning.
This helps with problems of
• Ineffective early layer learning
• Deep network local minima
• Two most common approaches
o Stacked Auto-Encoders
o Deep Belief Networks
28
What is Auto Encoding?
What Auto-Encoder Can Do?
• A type of unsupervised learning which tries to discover generic features of
the data
o Learn identity function by learning important sub-features not by just
passing through data
o Can use just new features in the new training set or concatenate both
Deep Learning Auto Encoding
Deep Learning Auto Encoding : How To?
Deep Stacked Auto Encoder Architecture
Stacked Auto-Encoders Approach
• Stack many sparse auto-encoders in succession and train them using greedy
layer-wise training
• Drop the decode output layer each time
• Do supervised training on the last layer using final features
• Finally do supervised training on the entire network to fine- tune all weights
What is Sparse Encoders?
• Auto encoders will often do a dimensionality reduction
o PCA-like or non-linear dimensionality reduction
• This leads to a "dense" representation which is nice in terms of
parsimony
o All features typically have non-zero values for any input and the
combination of values contains the compressed information
• However, this distributed and entangled representation can often
make it more difficult for successive layers to pick out the salient
features
• A sparse representation uses more features where at any given time
a significant number of the features will have a 0 value
o This leads to more localist variable length encodings where a
particular node (or small group of nodes) with value 1 signifies the
presence of a feature (small set of bases)
o A type of simplicity bottleneck (regularizer)
o This is easier for subsequent layers to use for learning
Implementation of Sparse Auto-Encoder
• Use more hidden nodes in the encoder
• Use regularization techniques which encourage
sparseness e.g. a significant portion of nodes have 0
output for any given input
o Penalty in the learning function for non-zero nodes
with weight decay
• De-noising Auto-Encoder
o Stochastically corrupt training instance each time, but
still train auto-encoder to decode the uncorrupted
instance, forcing it to learn conditional dependencies
within the instance
o Better empirical results, handles missing values well
General Belief Nets
• A belief net is a directed
acyclic graph composed of
stochastic variables.
• Solve two problems:
 The inference problem:
 Infer the states of the
unobserved variables.
 The learning problem:
 Adjust the interactions
between variables to make
the network more likely to
generate the observed data.
stochastic
hidden
cause
visible
effect
Use nets composed of layers of
stochastic binary variables with
weighted connections. Other types of
variable can be generalized as well.
Stochastic Binary Units
(Bernoulli Variables)
• Variables with state of 1
or 0;
• The probability of turning
on is determined by the
weighted input from
other units (plus a bias)
0
0
1


j
jiji
i
wsb
sp
)exp(1
)(
1
1

j
jiji wsb
)( 1isp
Learning Rule for Sigmoid Belief Nets
• Learning is easy if we can get
an unbiased sample from the
posterior distribution over
hidden states given the
observed data.
• For each unit, maximize the
log probability that its binary
state in the sample from the
posterior would be
generated by the sampled
binary states of its parents.


j
jij
ii
ws
spp
)exp(1
)(
1
1
j
i
jiw
)( iijji pssw  
is
js
learning
rate
Problems with Deep Belief Nets
Since DBNs are directed graph model, given input data, the posterior of
hidden units is intractable due to the “explaining away” effect. Even two
hidden causes are independent, they can become dependent when we
observe an effect that they can both influence.
 Solution: Complementary Priors to ensure the posterior of hidden units
are under the independent constraints.
truck hits house earthquake
house
jumps
20 20
-20
-10 -10
General Deep Belief Nets
Explaining Away Effect
p(1,1)=.0001
p(1,0)=.4999
p(0,1)=.4999
p(0,0)=.0001
posterior
Complementary Priors
 Definition of Complementary Priors:
 Consider observations x and hidden variables y, for a given likelihood function P(x|y), the
priors of y, P(y) is called the complementary priors of P(x|y), provided that P(x,y)=P(x|y)
P(y) leads to the posteriors P(y|x) .
 Infinite directed model with tied weights and Complementary Priors and
Gibbs sampling:
 Recall that the RBMs have the property
 The definition of energy function of RBM makes it proper model that has
two sets of conditional independencies(complementary priors for both v
and h)
 Since we need to estimate the distribution of data, P(v), we can perform
Gibbs sampling alternatively from P(v,h) for infinite times. This procedure
is analogous to unroll the single RBM into infinite directed stacks of
RBMs with tied weights(due to “complementary priors”) where each
RBM takes input from the hidden layer of the lower level RBM.






n
j
j
m
i
i
vhPP
hvPP
1
1
)|()v|h(
)|(h)|(v
Restricted Boltzmann Machines
• Restrict the connectivity to make
learning easier.
 Only one layer of hidden units
 No connections between hidden units.
• The hidden units are conditionally
independent given the visible states.
 Quickly get an unbiased sample from
the posterior distribution when given a
data-vector, which is a big advantage
over directed belief nets
hidden
i
j
visible
Energy of A Joint Configuration

ji
ijji whvv,hE
,
)(
weight
between units i
and j
Energy with configuration v
on the visible units and h
on the hidden units
binary state of
visible unit i
binary state of
hidden unit j
ji
ij
hv
w
hvE




),(
Weights, Energies and Probabilities
• Each possible joint configuration of the visible and hidden
units has an energy
 The energy is determined by the weights and biases as in
a Hopfield net.
• The energy of a joint configuration of the visible and hidden
units determines its probability:
• The probability of a configuration over the visible units is
found by summing the probabilities of all the joint
configurations that contain it.
),(
),(
hvE
hvp e

Using Energies to Define Probabilities
• The probability of a joint
configuration over both visible
and hidden units depends on
the energy of that joint
configuration compared with
the energy of all other joint
configurations.
• The probability of a
configuration of the visible
units is the sum of the
probabilities of all the joint
configurations that contain it.
 


gu
guE
hvE
e
e
hvp
,
),(
),(
),(





gu
guE
h
hvE
e
e
vp
,
),(
),(
)(
partition
function
Maximum Likelihood RBM Learning Algorithm
0
 jihv 
 jihv
i
j
i
j
i
j
i
j
t = 0 t = 1 t = 2 t = infinity




jiji
ij
hvhv
w
vp 0)(log
Start with a training vector on the visible units.
Then alternate between updating all the hidden units in
parallel and updating all the visible units in parallel.
a fantasy
A Quick Way to Learn an RBM
0
 jihv 1
 jihv
i
j
i
j
t = 0 t = 1
)( 10
 jijiij hvhvw 
• Start with a training vector on
the visible units.
• Update all the hidden units in
parallel
• Update the all the visible units
in parallel to get a
“reconstruction”.
• Update the hidden units again.
Contrastive divergence: This is not following the gradient of the
log likelihood. But it works well. It is approximately following the
gradient of another objective function.
reconstructiondata
Restricted Boltzmann Machines
Restricted Boltzmann Machines
Restricted Boltzmann Machines
RBM Model Learning
RBM Model Learning
RBM Model Learning
Deep Belief Network
Deep Belief Network
Deep Belief Network
Deep Belief Network
Why Pre-‐training Works?
Deep Learning Use Cases : IR
DL Use Cases : Fraud Detection
DL NLP: Unified Architecture
DL Use Cases : NLP

More Related Content

What's hot

Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Akash Goel
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature LearningAmgad Muhammad
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksstellajoseph
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detailsonykhan3
 
Fundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural NetworksFundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural NetworksNelson Piedra
 
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Ohsawa Goodfellow
 
Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...Muhammad Ishaq
 
Neural networks
Neural networksNeural networks
Neural networksBasil John
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classificationCenk Bircanoğlu
 
Character Recognition using Artificial Neural Networks
Character Recognition using Artificial Neural NetworksCharacter Recognition using Artificial Neural Networks
Character Recognition using Artificial Neural NetworksJaison Sabu
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep LearningOswald Campesato
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its applicationHưng Đặng
 
Artificial neural networks (2)
Artificial neural networks (2)Artificial neural networks (2)
Artificial neural networks (2)sai anjaneya
 

What's hot (20)

Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature Learning
 
From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detail
 
Deep learning
Deep learning Deep learning
Deep learning
 
Fundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural NetworksFundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural Networks
 
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyay
 
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
Deep Learning via Semi-Supervised Embedding (第 7 回 Deep Learning 勉強会資料; 大澤)
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...
 
Neural networks
Neural networksNeural networks
Neural networks
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classification
 
Character Recognition using Artificial Neural Networks
Character Recognition using Artificial Neural NetworksCharacter Recognition using Artificial Neural Networks
Character Recognition using Artificial Neural Networks
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its application
 
Artificial neural networks (2)
Artificial neural networks (2)Artificial neural networks (2)
Artificial neural networks (2)
 
test
testtest
test
 

Viewers also liked

[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnnNAVER D2
 
Sparql a simple knowledge query
Sparql  a simple knowledge querySparql  a simple knowledge query
Sparql a simple knowledge queryStanley Wang
 
Practical deepllearningv1
Practical deepllearningv1Practical deepllearningv1
Practical deepllearningv1arthi v
 
Introduction to Deep Learning (Dmytro Fishman Technology Stream)
Introduction to Deep Learning (Dmytro Fishman Technology Stream) Introduction to Deep Learning (Dmytro Fishman Technology Stream)
Introduction to Deep Learning (Dmytro Fishman Technology Stream) IT Arena
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntel Nervana
 
A timeline view of Evolution of Analytics
A timeline view of Evolution of AnalyticsA timeline view of Evolution of Analytics
A timeline view of Evolution of AnalyticsSaurabh Banerjee
 
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntel Nervana
 
Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...Artifacia
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief netsbutest
 
[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기NAVER D2
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Ha Phuong
 
Neural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionNeural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionJohn Liu
 
Deep learning intro
Deep learning introDeep learning intro
Deep learning introbeamandrew
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief NetworksHasan H Topcu
 
連淡水阿嬤都聽得懂的 機器學習入門 scikit-learn
連淡水阿嬤都聽得懂的機器學習入門 scikit-learn 連淡水阿嬤都聽得懂的機器學習入門 scikit-learn
連淡水阿嬤都聽得懂的 機器學習入門 scikit-learn Cicilia Lee
 
機器學習簡報 / 机器学习简报 Machine Learning
機器學習簡報 / 机器学习简报 Machine Learning 機器學習簡報 / 机器学习简报 Machine Learning
機器學習簡報 / 机器学习简报 Machine Learning Will Kuan 官大鈞
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结君 廖
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習台灣資料科學年會
 

Viewers also liked (19)

[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn[251] implementing deep learning using cu dnn
[251] implementing deep learning using cu dnn
 
Sparql a simple knowledge query
Sparql  a simple knowledge querySparql  a simple knowledge query
Sparql a simple knowledge query
 
Practical deepllearningv1
Practical deepllearningv1Practical deepllearningv1
Practical deepllearningv1
 
Introduction to Deep Learning (Dmytro Fishman Technology Stream)
Introduction to Deep Learning (Dmytro Fishman Technology Stream) Introduction to Deep Learning (Dmytro Fishman Technology Stream)
Introduction to Deep Learning (Dmytro Fishman Technology Stream)
 
Introduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will ConstableIntroduction to Deep Learning with Will Constable
Introduction to Deep Learning with Will Constable
 
A timeline view of Evolution of Analytics
A timeline view of Evolution of AnalyticsA timeline view of Evolution of Analytics
A timeline view of Evolution of Analytics
 
Introduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres RodriguezIntroduction to deep learning @ Startup.ML by Andres Rodriguez
Introduction to deep learning @ Startup.ML by Andres Rodriguez
 
Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief nets
 
[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Neural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionNeural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting Recognition
 
Deep learning intro
Deep learning introDeep learning intro
Deep learning intro
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
連淡水阿嬤都聽得懂的 機器學習入門 scikit-learn
連淡水阿嬤都聽得懂的機器學習入門 scikit-learn 連淡水阿嬤都聽得懂的機器學習入門 scikit-learn
連淡水阿嬤都聽得懂的 機器學習入門 scikit-learn
 
機器學習簡報 / 机器学习简报 Machine Learning
機器學習簡報 / 机器学习简报 Machine Learning 機器學習簡報 / 机器学习简报 Machine Learning
機器學習簡報 / 机器学习简报 Machine Learning
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 

Similar to Fundamental of deep learning

Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsJon Lederman
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 
2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learning2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learningVan Thanh
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Jon Lederman
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxPoonam60376
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in PythonKv Sagar
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearningEyad Alshami
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101Felipe Prado
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
introduction to DL network deep learning.ppt
introduction to DL network deep learning.pptintroduction to DL network deep learning.ppt
introduction to DL network deep learning.pptQuangMinhHuynh
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxssuser3aa461
 
Introduction to un supervised learning
Introduction to un supervised learningIntroduction to un supervised learning
Introduction to un supervised learningRishikesh .
 

Similar to Fundamental of deep learning (20)

Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learning2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learning
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Convolutional neural networks
Convolutional neural  networksConvolutional neural  networks
Convolutional neural networks
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
Development of Deep Learning Architecture
Development of Deep Learning ArchitectureDevelopment of Deep Learning Architecture
Development of Deep Learning Architecture
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101DEF CON 24 - Clarence Chio - machine duping 101
DEF CON 24 - Clarence Chio - machine duping 101
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
introduction to DL network deep learning.ppt
introduction to DL network deep learning.pptintroduction to DL network deep learning.ppt
introduction to DL network deep learning.ppt
 
intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Introduction to un supervised learning
Introduction to un supervised learningIntroduction to un supervised learning
Introduction to un supervised learning
 

More from Stanley Wang

Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic webStanley Wang
 
Ontology model and owl
Ontology model and owlOntology model and owl
Ontology model and owlStanley Wang
 
Resource description framework
Resource description frameworkResource description framework
Resource description frameworkStanley Wang
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technologyStanley Wang
 
Next generation big data bi
Next generation big data biNext generation big data bi
Next generation big data biStanley Wang
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Data analytics as a service
Data analytics as a serviceData analytics as a service
Data analytics as a serviceStanley Wang
 
Distributed machine learning examples
Distributed machine learning examplesDistributed machine learning examples
Distributed machine learning examplesStanley Wang
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learningStanley Wang
 
Graph analytic and machine learning
Graph analytic and machine learningGraph analytic and machine learning
Graph analytic and machine learningStanley Wang
 
Big data analytic market opportunity
Big data analytic market opportunityBig data analytic market opportunity
Big data analytic market opportunityStanley Wang
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioningStanley Wang
 

More from Stanley Wang (14)

Ontologies and semantic web
Ontologies and semantic webOntologies and semantic web
Ontologies and semantic web
 
Ontology model and owl
Ontology model and owlOntology model and owl
Ontology model and owl
 
Resource description framework
Resource description frameworkResource description framework
Resource description framework
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technology
 
Next generation big data bi
Next generation big data biNext generation big data bi
Next generation big data bi
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Data analytics as a service
Data analytics as a serviceData analytics as a service
Data analytics as a service
 
Distributed machine learning examples
Distributed machine learning examplesDistributed machine learning examples
Distributed machine learning examples
 
Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
Graph analytic and machine learning
Graph analytic and machine learningGraph analytic and machine learning
Graph analytic and machine learning
 
Big data analytic market opportunity
Big data analytic market opportunityBig data analytic market opportunity
Big data analytic market opportunity
 
A sdn based application aware and network provisioning
A sdn based application aware and network provisioningA sdn based application aware and network provisioning
A sdn based application aware and network provisioning
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Fundamental of deep learning

  • 1. Fundamental of Deep Learning STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
  • 2. What is Deep Learning? Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations. • Multiple Layer Deep Neural Networks • Work for Media and Unstructured Data • Automatic Feature Engineering • Complex Architectures and Computationally Intensive
  • 3. From Deep Learning to Artificial Intelligence
  • 4. Evolution of Deep Learning
  • 7. Historical Background: First Generation ANN • Perceptron (~1960) used a layer of hand-coded features and tried to recognize objects by learning how to weight these features. – There was a neat learning algorithm for adjusting the weights. – But perceptron nodes are fundamentally limited in what they can learn to do. Non-Adaptive Hand Coded Features Output Class Labels Input Feature Sketch of a typical perceptron from the 1960’s Bomb Toy
  • 8. Multiple Layer Perceptron ANN (1960~1985) input vector hidden layers outputs Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal
  • 10. • It requires labeled training data.  Almost all data is unlabeled • The learning time does not scale well  It is very slow in networks with multiple hidden layers.  It can get stuck in poor local optima. Disadvantages Back Propagation Algorithm • Multi layer Perceptron network can be trained by the back propagation algorithm to perform any mapping between the input and the output. Advantages
  • 11. Support Vector Machines • Vapnik and his co-workers developed a very clever type of perceptron called a Support Vector Machine. o Instead of hand-coding the layer of non-adaptive features, each training example is used to create a new feature using a fixed recipe. • The feature computes how similar a test example is to that training example. o Then a clever optimization technique is used to select the best subset of the features and to decide how to weight each feature when classifying a test case. • But its just a perceptron and has all the same limitations. • In the 1990’s, many researchers abandoned neural networks with multiple adaptive hidden layers because Support Vector Machines worked better.
  • 12. Deep Learning Neural Network Strike Back
  • 13.
  • 14. Ideas of Deep Learning
  • 15. Deep Learning - Architectures
  • 16. Deep Learning – Pre Training
  • 17. Deep Learning Architecture Types • Feed Forward  MLPs  Auto Encoders  RBMs • Recurrent  Multi Modal  LSTMs  Stateful
  • 18. Deep Architecture – Stack of Auto Encoder
  • 19. Deep Architecture - Stacked RBMs
  • 20. Deep Architecture - Recursive Neural Network
  • 21. Deep Architecture – Recurrent Neural Network
  • 22. Deep Architecture - Convolutional Neural Network
  • 23. Why Deep Learning so Successful?
  • 24. Different Levels of Knowledge Abstraction Composing Features on Features.
  • 25. Types of Deep Learning Training Protocol
  • 26. Greedy Layer-Wise Training • Train first layer using your data without the labels (unsupervised)  Since there are no targets at this level, labels don't help. Could also use the more abundant unlabeled data which is not part of the training set (i.e. self-taught learning). • Then freeze the first layer parameters and start training the second layer using the output of the first layer as the unsupervised input to the second layer • Repeat this for as many layers as desired  This builds our set of robust features • Use the outputs of the final layer as inputs to a supervised layer/model and train the last supervised layers(leave early weights frozen) • Unfreeze all weights and fine tune the full network by training with a supervised approach, given the pre-processed weight settings
  • 27. Unsupervised Greedy Layer-Wise Training Procedure.
  • 28. Benefit of Greedy Layer-Wise Training • Greedy layer-wise training avoids many of the problems of trying to train a deep net in a supervised fashion o Each layer gets full learning focus in its turn since it is the only current "top" layer o Can take advantage of unlabeled data o When you finally tune the entire network with supervised training the network weights have already been adjusted so that you are in a good error basin and just need fine tuning. This helps with problems of • Ineffective early layer learning • Deep network local minima • Two most common approaches o Stacked Auto-Encoders o Deep Belief Networks 28
  • 29. What is Auto Encoding?
  • 30. What Auto-Encoder Can Do? • A type of unsupervised learning which tries to discover generic features of the data o Learn identity function by learning important sub-features not by just passing through data o Can use just new features in the new training set or concatenate both
  • 31. Deep Learning Auto Encoding
  • 32. Deep Learning Auto Encoding : How To?
  • 33. Deep Stacked Auto Encoder Architecture
  • 34. Stacked Auto-Encoders Approach • Stack many sparse auto-encoders in succession and train them using greedy layer-wise training • Drop the decode output layer each time • Do supervised training on the last layer using final features • Finally do supervised training on the entire network to fine- tune all weights
  • 35. What is Sparse Encoders? • Auto encoders will often do a dimensionality reduction o PCA-like or non-linear dimensionality reduction • This leads to a "dense" representation which is nice in terms of parsimony o All features typically have non-zero values for any input and the combination of values contains the compressed information • However, this distributed and entangled representation can often make it more difficult for successive layers to pick out the salient features • A sparse representation uses more features where at any given time a significant number of the features will have a 0 value o This leads to more localist variable length encodings where a particular node (or small group of nodes) with value 1 signifies the presence of a feature (small set of bases) o A type of simplicity bottleneck (regularizer) o This is easier for subsequent layers to use for learning
  • 36. Implementation of Sparse Auto-Encoder • Use more hidden nodes in the encoder • Use regularization techniques which encourage sparseness e.g. a significant portion of nodes have 0 output for any given input o Penalty in the learning function for non-zero nodes with weight decay • De-noising Auto-Encoder o Stochastically corrupt training instance each time, but still train auto-encoder to decode the uncorrupted instance, forcing it to learn conditional dependencies within the instance o Better empirical results, handles missing values well
  • 37. General Belief Nets • A belief net is a directed acyclic graph composed of stochastic variables. • Solve two problems:  The inference problem:  Infer the states of the unobserved variables.  The learning problem:  Adjust the interactions between variables to make the network more likely to generate the observed data. stochastic hidden cause visible effect Use nets composed of layers of stochastic binary variables with weighted connections. Other types of variable can be generalized as well.
  • 38. Stochastic Binary Units (Bernoulli Variables) • Variables with state of 1 or 0; • The probability of turning on is determined by the weighted input from other units (plus a bias) 0 0 1   j jiji i wsb sp )exp(1 )( 1 1  j jiji wsb )( 1isp
  • 39. Learning Rule for Sigmoid Belief Nets • Learning is easy if we can get an unbiased sample from the posterior distribution over hidden states given the observed data. • For each unit, maximize the log probability that its binary state in the sample from the posterior would be generated by the sampled binary states of its parents.   j jij ii ws spp )exp(1 )( 1 1 j i jiw )( iijji pssw   is js learning rate
  • 40. Problems with Deep Belief Nets Since DBNs are directed graph model, given input data, the posterior of hidden units is intractable due to the “explaining away” effect. Even two hidden causes are independent, they can become dependent when we observe an effect that they can both influence.  Solution: Complementary Priors to ensure the posterior of hidden units are under the independent constraints. truck hits house earthquake house jumps 20 20 -20 -10 -10 General Deep Belief Nets Explaining Away Effect p(1,1)=.0001 p(1,0)=.4999 p(0,1)=.4999 p(0,0)=.0001 posterior
  • 41. Complementary Priors  Definition of Complementary Priors:  Consider observations x and hidden variables y, for a given likelihood function P(x|y), the priors of y, P(y) is called the complementary priors of P(x|y), provided that P(x,y)=P(x|y) P(y) leads to the posteriors P(y|x) .  Infinite directed model with tied weights and Complementary Priors and Gibbs sampling:  Recall that the RBMs have the property  The definition of energy function of RBM makes it proper model that has two sets of conditional independencies(complementary priors for both v and h)  Since we need to estimate the distribution of data, P(v), we can perform Gibbs sampling alternatively from P(v,h) for infinite times. This procedure is analogous to unroll the single RBM into infinite directed stacks of RBMs with tied weights(due to “complementary priors”) where each RBM takes input from the hidden layer of the lower level RBM.       n j j m i i vhPP hvPP 1 1 )|()v|h( )|(h)|(v
  • 42. Restricted Boltzmann Machines • Restrict the connectivity to make learning easier.  Only one layer of hidden units  No connections between hidden units. • The hidden units are conditionally independent given the visible states.  Quickly get an unbiased sample from the posterior distribution when given a data-vector, which is a big advantage over directed belief nets hidden i j visible
  • 43. Energy of A Joint Configuration  ji ijji whvv,hE , )( weight between units i and j Energy with configuration v on the visible units and h on the hidden units binary state of visible unit i binary state of hidden unit j ji ij hv w hvE     ),(
  • 44. Weights, Energies and Probabilities • Each possible joint configuration of the visible and hidden units has an energy  The energy is determined by the weights and biases as in a Hopfield net. • The energy of a joint configuration of the visible and hidden units determines its probability: • The probability of a configuration over the visible units is found by summing the probabilities of all the joint configurations that contain it. ),( ),( hvE hvp e 
  • 45. Using Energies to Define Probabilities • The probability of a joint configuration over both visible and hidden units depends on the energy of that joint configuration compared with the energy of all other joint configurations. • The probability of a configuration of the visible units is the sum of the probabilities of all the joint configurations that contain it.     gu guE hvE e e hvp , ),( ),( ),(      gu guE h hvE e e vp , ),( ),( )( partition function
  • 46. Maximum Likelihood RBM Learning Algorithm 0  jihv   jihv i j i j i j i j t = 0 t = 1 t = 2 t = infinity     jiji ij hvhv w vp 0)(log Start with a training vector on the visible units. Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel. a fantasy
  • 47. A Quick Way to Learn an RBM 0  jihv 1  jihv i j i j t = 0 t = 1 )( 10  jijiij hvhvw  • Start with a training vector on the visible units. • Update all the hidden units in parallel • Update the all the visible units in parallel to get a “reconstruction”. • Update the hidden units again. Contrastive divergence: This is not following the gradient of the log likelihood. But it works well. It is approximately following the gradient of another objective function. reconstructiondata
  • 59. Deep Learning Use Cases : IR
  • 60. DL Use Cases : Fraud Detection
  • 61. DL NLP: Unified Architecture
  • 62. DL Use Cases : NLP