SlideShare a Scribd company logo
1 of 13
Download to read offline
The Back Propagation Learning Algorithm




  BP is extensively used and studied.
  Local minima.
  Learning can be slow.
  Practical examples.
  Handling time.




                          1
Local Minima



Algorithms based on gradient descent can become stuck
in local minima.

        E
            E
                E



                                   wi
                                        wi
                                             wi




However, generally local minima do not tend to be a
problem.
Speed of convergence is main problem.




                          2
Learning can be Slow



The more layers the slower learning becomes:

                                  ¡
           ¡Û           Ý   Ø ßÞ ´½   Ý µ Ú
                               Ý
                                      Æ
           ¡Ù                 Æ Û Ú ´½   Ú µ Ü
                                      ßÞ
                                      Æ
                  .
                  .
                  .

Each error term Æ modifies the previous by a Ý ´½   Ý µ like
term.
Since Ý is a sigmoidal function (¼ Ý ½), then
                      ¼ Ý´½   ݵ ¼ ¾
The more layers, the smaller the effective errors get, the
slower the network learns.




                              3
Speeding up Learning



A simple method to speeding up the learning is to add a
momentum term.

        ¡Û ´Ø · ½µ   Û · « ¡Û ´Øµ
where ¼ « ½.


Each weight is given some “inertia” or “momentum” so
it tends to change in the direction of its average.
When weight change is same every iteration (e.g. when
travelling over plateau):


                  ¡Û ´Ø · ½µ ¡Û ´Øµ

              ´½   «µ¡Û ´Ø · ½µ   Û

              ¡Û ´Ø · ½µ   ½   « Û
So, if « ¼ , effective learning rate is ½¼   .

Higher-order techniques (e.g. conjugate gradient) faster.

                            4
Encoder networks
    Momentum = 0.9 Learning Rate = 0.25



                                                       Error
                                            10.0


                                             0.0
                                                   0           402
                                          Input Set[3]         Output Set[0]
                                          Pat 1                Pat 1
                                          Pat 2                Pat 2
                                          Pat 3                Pat 3
                                          Pat 4                Pat 4
                                          Pat 5                Pat 5
                                          Pat 6                Pat 6
                                          Pat 7                Pat 7
                                          Pat 8                Pat 8



  8 inputs: local encoding, 1 of 8 active.
  Task: reproduce input at output layer (“bottleneck”)
  After 400 epochs, activation of hidden units:
     Pattern       Hidden units Pattern Hidden units
        1          1 1 1           5    1 0 0
        2          0 0 0           6    0 0 1
        3          1 1 0           7    0 1 0
        4          1 0 1           8    0 1 1
  Also called “self-supervised” networks.
  Related to PCA (a statistical method).
  Application: compression.
  Local vs distributed representations.

                                      5
Example: NetTalk




Sejnowski, T. & Rosenberg, C. (1986). Parallel networks that learn
    to pronounce English text. Complex Systems 1, 145–168.

task: to convert continuous text into speech.
input: a window of letters from English text drawn from
   a 1000 word dictionary.
7-letter context to disambiguate “brave”, “gave” vs “have”
output: phonetic representation of speech (which can be
   fed into a synthesiser).



                            s




                       Hidden Units




        T h i s          i s          t h e     i n p u t




                                  6
Example: NetTalk

                   s             26 output units




                                   80 hidden units
               Hidden Units        in a single layer




                                   7   29 input units




 ¯ Input: letter encoded using 1 of 29 units (26 + 3 for
   punctuation)
 ¯ Output: distributed representation across 21 features
   including vowel height, position in mouth; 5 fea-
   tures for stress.

Performance:

   90% correct on training set.
   80–87% correct on test set.
   Two small hidden layers better than one big layer.

Babbling during learning?
Hidden representations: vowel v consonants?

                              7
Example: Hand Written Zip Code Recognition




LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hub-
   bard, L. & Jackel, L. (1989). Backpropagation applied to hand-
   written zip code recognition. Neural Computation 1, 541–551.

task: Network is to learn to recognise handwritten digits
    taken from U.S. Mail.
input: Digitised hand written numbers.
output: One of 10 units is to be most active – the unit
   that represents the correctly recognised numeral.




                               8
Example: Hand Written Zip Code Recognition



Real input (normalised digits from the testing set)




   Knowledge of task constrains architecture.
   “Feature detectors” useful.
   Implemented by weight-sharing.
   Reduces free parameters, speeds up learning.




                            9
Example: Hand Written Zip Code Recognition




             0   1    2    ...   9        10 output units
                                          fully connected (310 weights)


    H3               ...                  30 hidden units
                                          fully connected (5790 weights)
                                          12   16 hidden units
    H2.1             ...             H2.12

 8 5 5
   kernels                                (38592 links)
 from 12
   H1 sets                                (2592 weights)


                                          12   64 hidden units
    H1.1             ...             H1.12


  12 5 5                                  (19968 links)
    kernels                               (1068 weights)


                                          16 16 digitised
                                          grayscale images




Before weight sharing 64660 links
After weight sharing 9760 weights


                                     10
Example: Hand Written Zip Code Recognition



Performance:
     error rate (%)




                                        test set


                                        training set

                      training passes

   Hidden units developed spatial filters (centre-surround).
   Better than earlier study which used specialised hand-
   crafted features (Denker et al, 1989).




                                 11
Handling temporal sequences




  “Spatialise” time (e.g. NetTalk)
  Add context units with fixed connections; some trace
  over time.
  Standard b.p. can be used in these cases.
  (fig 7.5 of HKP)




  For fully recurrent networks, b.p. extended to Real-
  Time Recurrent Learning (Williams & Zipser, 1989).



                          12
Summary




  Back propagation is popular training method.
  Hidden units find useful internal representations.
  Extendable to temporal sequences.
  Problems: can be slow, no convergence theorem. Need
  to try different architectures (#layers) , learning rates.
  Biological plausibility?
  1. Who provides the targets?
  2. Can signals (errors) backpropagate from one cell
     to another?




                             13

More Related Content

What's hot

Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Simplilearn
 

What's hot (20)

Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Perceptron
PerceptronPerceptron
Perceptron
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Unit 1
Unit 1Unit 1
Unit 1
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Mc culloch pitts neuron
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)Neural Networks: Radial Bases Functions (RBF)
Neural Networks: Radial Bases Functions (RBF)
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 

Viewers also liked

Backpropagation
BackpropagationBackpropagation
Backpropagation
ariffast
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
DEEPASHRI HK
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
stellajoseph
 
Learning Vector Quantization LVQ
Learning Vector Quantization LVQLearning Vector Quantization LVQ
Learning Vector Quantization LVQ
ESCOM
 
Learning Vector Quantization LVQ
Learning Vector Quantization LVQLearning Vector Quantization LVQ
Learning Vector Quantization LVQ
ESCOM
 

Viewers also liked (20)

Back propagation
Back propagationBack propagation
Back propagation
 
Back propagation network
Back propagation networkBack propagation network
Back propagation network
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Backpropagation
BackpropagationBackpropagation
Backpropagation
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Hopfield Networks
Hopfield NetworksHopfield Networks
Hopfield Networks
 
HOPFIELD NETWORK
HOPFIELD NETWORKHOPFIELD NETWORK
HOPFIELD NETWORK
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
neural network
neural networkneural network
neural network
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Improving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning AlgorithmImproving Performance of Back propagation Learning Algorithm
Improving Performance of Back propagation Learning Algorithm
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
 
Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9
 
Counter propagation Network
Counter propagation NetworkCounter propagation Network
Counter propagation Network
 
Learning Vector Quantization LVQ
Learning Vector Quantization LVQLearning Vector Quantization LVQ
Learning Vector Quantization LVQ
 
Learning Vector Quantization LVQ
Learning Vector Quantization LVQLearning Vector Quantization LVQ
Learning Vector Quantization LVQ
 
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IINeural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)
 

Similar to The Back Propagation Learning Algorithm

San Francisco Hadoop User Group Meetup Deep Learning
San Francisco Hadoop User Group Meetup Deep LearningSan Francisco Hadoop User Group Meetup Deep Learning
San Francisco Hadoop User Group Meetup Deep Learning
Sri Ambati
 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.ML
Sri Ambati
 
Dna computing
Dna computingDna computing
Dna computing
sathish3
 

Similar to The Back Propagation Learning Algorithm (20)

San Francisco Hadoop User Group Meetup Deep Learning
San Francisco Hadoop User Group Meetup Deep LearningSan Francisco Hadoop User Group Meetup Deep Learning
San Francisco Hadoop User Group Meetup Deep Learning
 
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14
 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.ML
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
H2ODeepLearningThroughExamples021215
H2ODeepLearningThroughExamples021215H2ODeepLearningThroughExamples021215
H2ODeepLearningThroughExamples021215
 
Borderline Smote
Borderline SmoteBorderline Smote
Borderline Smote
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Parallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical SectionParallel Programming: Beyond the Critical Section
Parallel Programming: Beyond the Critical Section
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Ndp Slides
Ndp SlidesNdp Slides
Ndp Slides
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text AnalysisRecurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Simultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,TasksSimultaneous,Deep,Transfer,Across, Domains,and,Tasks
Simultaneous,Deep,Transfer,Across, Domains,and,Tasks
 
Dna computing
Dna computingDna computing
Dna computing
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 

More from ESCOM

redes neuronales tipo Som
redes neuronales tipo Somredes neuronales tipo Som
redes neuronales tipo Som
ESCOM
 
redes neuronales Som
redes neuronales Somredes neuronales Som
redes neuronales Som
ESCOM
 
redes neuronales Som Slides
redes neuronales Som Slidesredes neuronales Som Slides
redes neuronales Som Slides
ESCOM
 
red neuronal Som Net
red neuronal Som Netred neuronal Som Net
red neuronal Som Net
ESCOM
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networks
ESCOM
 
redes neuronales Kohonen
redes neuronales Kohonenredes neuronales Kohonen
redes neuronales Kohonen
ESCOM
 
Teoria Resonancia Adaptativa
Teoria Resonancia AdaptativaTeoria Resonancia Adaptativa
Teoria Resonancia Adaptativa
ESCOM
 
ejemplo red neuronal Art1
ejemplo red neuronal Art1ejemplo red neuronal Art1
ejemplo red neuronal Art1
ESCOM
 
redes neuronales tipo Art3
redes neuronales tipo Art3redes neuronales tipo Art3
redes neuronales tipo Art3
ESCOM
 
Art2
Art2Art2
Art2
ESCOM
 
Redes neuronales tipo Art
Redes neuronales tipo ArtRedes neuronales tipo Art
Redes neuronales tipo Art
ESCOM
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
ESCOM
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
ESCOM
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
ESCOM
 
Fukushima Cognitron
Fukushima CognitronFukushima Cognitron
Fukushima Cognitron
ESCOM
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
ESCOM
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
ESCOM
 
Counterpropagation
CounterpropagationCounterpropagation
Counterpropagation
ESCOM
 
Teoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPTeoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAP
ESCOM
 
Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1
ESCOM
 

More from ESCOM (20)

redes neuronales tipo Som
redes neuronales tipo Somredes neuronales tipo Som
redes neuronales tipo Som
 
redes neuronales Som
redes neuronales Somredes neuronales Som
redes neuronales Som
 
redes neuronales Som Slides
redes neuronales Som Slidesredes neuronales Som Slides
redes neuronales Som Slides
 
red neuronal Som Net
red neuronal Som Netred neuronal Som Net
red neuronal Som Net
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networks
 
redes neuronales Kohonen
redes neuronales Kohonenredes neuronales Kohonen
redes neuronales Kohonen
 
Teoria Resonancia Adaptativa
Teoria Resonancia AdaptativaTeoria Resonancia Adaptativa
Teoria Resonancia Adaptativa
 
ejemplo red neuronal Art1
ejemplo red neuronal Art1ejemplo red neuronal Art1
ejemplo red neuronal Art1
 
redes neuronales tipo Art3
redes neuronales tipo Art3redes neuronales tipo Art3
redes neuronales tipo Art3
 
Art2
Art2Art2
Art2
 
Redes neuronales tipo Art
Redes neuronales tipo ArtRedes neuronales tipo Art
Redes neuronales tipo Art
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Fukushima Cognitron
Fukushima CognitronFukushima Cognitron
Fukushima Cognitron
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
 
Counterpropagation
CounterpropagationCounterpropagation
Counterpropagation
 
Teoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPTeoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAP
 
Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

The Back Propagation Learning Algorithm

  • 1. The Back Propagation Learning Algorithm BP is extensively used and studied. Local minima. Learning can be slow. Practical examples. Handling time. 1
  • 2. Local Minima Algorithms based on gradient descent can become stuck in local minima. E E E wi wi wi However, generally local minima do not tend to be a problem. Speed of convergence is main problem. 2
  • 3. Learning can be Slow The more layers the slower learning becomes:   ¡ ¡Û   Ý   Ø ßÞ ´½   Ý µ Ú Ý Æ ¡Ù   Æ Û Ú ´½   Ú µ Ü ßÞ Æ . . . Each error term Æ modifies the previous by a Ý ´½   Ý µ like term. Since Ý is a sigmoidal function (¼ Ý ½), then ¼ Ý´½   ݵ ¼ ¾ The more layers, the smaller the effective errors get, the slower the network learns. 3
  • 4. Speeding up Learning A simple method to speeding up the learning is to add a momentum term. ¡Û ´Ø · ½µ   Û · « ¡Û ´Øµ where ¼ « ½. Each weight is given some “inertia” or “momentum” so it tends to change in the direction of its average. When weight change is same every iteration (e.g. when travelling over plateau): ¡Û ´Ø · ½µ ¡Û ´Øµ ´½   «µ¡Û ´Ø · ½µ   Û ¡Û ´Ø · ½µ   ½   « Û So, if « ¼ , effective learning rate is ½¼ . Higher-order techniques (e.g. conjugate gradient) faster. 4
  • 5. Encoder networks Momentum = 0.9 Learning Rate = 0.25 Error 10.0 0.0 0 402 Input Set[3] Output Set[0] Pat 1 Pat 1 Pat 2 Pat 2 Pat 3 Pat 3 Pat 4 Pat 4 Pat 5 Pat 5 Pat 6 Pat 6 Pat 7 Pat 7 Pat 8 Pat 8 8 inputs: local encoding, 1 of 8 active. Task: reproduce input at output layer (“bottleneck”) After 400 epochs, activation of hidden units: Pattern Hidden units Pattern Hidden units 1 1 1 1 5 1 0 0 2 0 0 0 6 0 0 1 3 1 1 0 7 0 1 0 4 1 0 1 8 0 1 1 Also called “self-supervised” networks. Related to PCA (a statistical method). Application: compression. Local vs distributed representations. 5
  • 6. Example: NetTalk Sejnowski, T. & Rosenberg, C. (1986). Parallel networks that learn to pronounce English text. Complex Systems 1, 145–168. task: to convert continuous text into speech. input: a window of letters from English text drawn from a 1000 word dictionary. 7-letter context to disambiguate “brave”, “gave” vs “have” output: phonetic representation of speech (which can be fed into a synthesiser). s Hidden Units T h i s i s t h e i n p u t 6
  • 7. Example: NetTalk s 26 output units 80 hidden units Hidden Units in a single layer 7 29 input units ¯ Input: letter encoded using 1 of 29 units (26 + 3 for punctuation) ¯ Output: distributed representation across 21 features including vowel height, position in mouth; 5 fea- tures for stress. Performance: 90% correct on training set. 80–87% correct on test set. Two small hidden layers better than one big layer. Babbling during learning? Hidden representations: vowel v consonants? 7
  • 8. Example: Hand Written Zip Code Recognition LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hub- bard, L. & Jackel, L. (1989). Backpropagation applied to hand- written zip code recognition. Neural Computation 1, 541–551. task: Network is to learn to recognise handwritten digits taken from U.S. Mail. input: Digitised hand written numbers. output: One of 10 units is to be most active – the unit that represents the correctly recognised numeral. 8
  • 9. Example: Hand Written Zip Code Recognition Real input (normalised digits from the testing set) Knowledge of task constrains architecture. “Feature detectors” useful. Implemented by weight-sharing. Reduces free parameters, speeds up learning. 9
  • 10. Example: Hand Written Zip Code Recognition 0 1 2 ... 9 10 output units fully connected (310 weights) H3 ... 30 hidden units fully connected (5790 weights) 12 16 hidden units H2.1 ... H2.12 8 5 5 kernels (38592 links) from 12 H1 sets (2592 weights) 12 64 hidden units H1.1 ... H1.12 12 5 5 (19968 links) kernels (1068 weights) 16 16 digitised grayscale images Before weight sharing 64660 links After weight sharing 9760 weights 10
  • 11. Example: Hand Written Zip Code Recognition Performance: error rate (%) test set training set training passes Hidden units developed spatial filters (centre-surround). Better than earlier study which used specialised hand- crafted features (Denker et al, 1989). 11
  • 12. Handling temporal sequences “Spatialise” time (e.g. NetTalk) Add context units with fixed connections; some trace over time. Standard b.p. can be used in these cases. (fig 7.5 of HKP) For fully recurrent networks, b.p. extended to Real- Time Recurrent Learning (Williams & Zipser, 1989). 12
  • 13. Summary Back propagation is popular training method. Hidden units find useful internal representations. Extendable to temporal sequences. Problems: can be slow, no convergence theorem. Need to try different architectures (#layers) , learning rates. Biological plausibility? 1. Who provides the targets? 2. Can signals (errors) backpropagate from one cell to another? 13