Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

5.MLP(Multi-Layer Perceptron)

479 views

Published on

MLP(Multi-Layer Perceptron)

Published in: Engineering
  • Be the first to comment

5.MLP(Multi-Layer Perceptron)

  1. 1. 神經網路的基礎-- MLP(Multi-Layer Perceptron) 17
  2. 2. Artificial neural network Optimizer Mini-batch Activation functions Loss functions Batch Normalization Avoid Overfitting: Weight Decay, Dropout MLP(Multi-Layer Perceptron) 18
  3. 3. A single neuron (perceptron) 19
  4. 4. AND, OR gate use one Perceptron x1 x2 F 0.5 0.5 - 0.7AND -0.5 x1 x2 1 1 OR F X1 X2 Y -------------------- 0 0 0 0 1 0 1 0 0 1 1 1 X1 X2 Y -------------------- 0 0 0 0 1 1 1 0 1 1 1 1
  5. 5. Quiz: XOR gate x1 x2 F ? ? ?XOR OR X1 X2 Y -------------------- 0 0 0 0 1 1 1 0 1 1 1 0 Can XOR gate use only one Perceptron ?!
  6. 6. Single Perceptron == 線性 OR (0,0) (1,0) (0,1) (1,1) (0,0) (1,0) (0,1) (1,1) AND OR 0.5X1+0.5X2-0.7=0 X1 X2 X1 X2 (1.4,0) -0.5X1-0.5X2+0.7=0 X1+X2-0.5=0 (0.5,0)
  7. 7. (0,0) (1,0) (0,1) (1,1) XOR X1 X2 Lab: keras/concept/NN_concept.ipynb
  8. 8. Feature transformation
  9. 9. Feature transformation
  10. 10. 增加隱藏層的效果可做更難的分類 增加隱藏層 https://www.intechopen.com/books/artificial-neural-networks-architectures- and-applications/applications-of-artificial-neural-networks-in-chemical- problems
  11. 11. 串聯更多的perceptron Neural Network 27 x1 x2 a1 a2 w11 w21 w12 w22 1b1 a3 w13 w23 2b2 b3b3 = + + ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) = ( ) ( ) ( ) A=XW+B A simple Function
  12. 12. Multi-layer neural network MLP uses multiple hidden layers between the input and output layers to extract meaningful features A Neural Network = A Function MLP(Multi-Layer Perceptron) 28
  13. 13. 2 layers Neural Network 29 x1 x2 𝑎 ( ) 𝑎 ( ) 𝑎 ( ) 𝑎 ( ) 𝑏 ( ) 𝑏 ( ) ( )( ) 𝑏 ( ) 𝑏 ( ) 𝑏 ( ) 𝑏 ( ) y1 y2 𝑎 ( ) 𝑎 ( ) 𝑎 ( ) 𝑎 ( ) 𝑏 ( ) 𝑏 ( ) ( )( ) 𝑏 ( ) 𝑏 ( ) 𝑏 ( ) 𝑏 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) = 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) = 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) = 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( ) 𝑤 ( )2x3 3x3 3x2 1x2 1x2 ( ) = ( ) + ( ) + ( ) 𝐵 ( ) = 𝑏 ( ) 𝑏 ( ) 𝑏 ( )
  14. 14. Training neural networks 30
  15. 15. Find network weights to minimize the training error between true and estimated labels of training examples, e.g.: Training of multi-layer networks 31
  16. 16. Back-propagation: gradients are computed in the direction from output to input layers and combined using chain rule SGD(Stochastic gradient descent): compute the weight update w.r.t. one training example at a time, cycle through training examples in random order in multiple epochs  Slow Convergence 每次隨機選一個樣本,一筆一筆去更新很慢 • mini-batch SGD (a batch of samples computed simultaneously)  faster to complete one epoch Optimizer 32
  17. 17. 33
  18. 18. Mini-batch is expected to be called several times consecutively on different chunks of a dataset so as to implement out-of-core or online learning. This is especially useful when the whole dataset is too big to fit in memory at once. Mini-batch vs. Epoch *一個epoch = 看完所有training data 一次 *依照mini-batch 把所有training data 拆成多份 假設全部有1000 筆資料 batch size = 100 可拆成10 份 一個epoch 內會更新10 次 batch size = 10 可拆成100 份 一個epoch 內會更新100 次 *如何設定batch size? 不要設太大,常用28, 32, 128, 256, … mini-batch: partial fit method 34
  19. 19. Overview of Neural Network 35
  20. 20. To avoid falling into the local minimum and further increase the training speed Adaptive Learning Rate/Gradient algorithms 1. Adagrad 2. Momentum 3. RMSProp 4. Adam 5. … Adaptive Learning Rate/Gradient algorithms 36

×