This presentation explains how the backpropagation algorithm is useful in updating the artificial neural networks (ANNs) weights using two examples step by step. Readers should have a basic understanding of how ANNs work, partial derivatives, and multivariate chain rule.
This presentation won`t dive directly into the details of the algorithm but will start by training a very simple network. This is because the backpropagation algorithm is meant to be applied over a network after training. So, we should train the network before applying it to catch the benefits of backpropagation algorithm and how to use it.
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Backpropagation Explained: How ANNs Update Weights Step-by-Step
1. Backpropagation: Understanding How to
Update ANNs Weights Step-by-Step
Ahmed Fawzy Gad
ahmed.fawzy@ci.menofia.edu.eg
MENOUFIA UNIVERSITY
FACULTY OF COMPUTERS AND INFORMATION
INFORMATION TECHNOLOGY
المنوفية جامعة
والمعلومات الحاسبات كلية
المعلومات تكنولوجيا
المنوفية جامعة
2. Train then Update
• The backpropagation algorithm is used to update the NN weights
when they are not able to make the correct predictions. Hence, we
should train the NN before applying backpropagation.
Initial
Weights PredictionTraining
3. Train then Update
• The backpropagation algorithm is used to update the NN weights
when they are not able to make the correct predictions. Hence, we
should train the NN before applying backpropagation.
Initial
Weights PredictionTraining
BackpropagationUpdate
4. Neural Network Training Example
𝐗 𝟏 𝐗 𝟐 𝐎𝐮𝐭𝐩𝐮𝐭
𝟎. 𝟏 𝟎. 𝟑 𝟎. 𝟎𝟑
𝐖𝟏 𝐖𝟐 𝐛
𝟎. 𝟓 𝟎. 𝟓 1. 𝟖𝟑
Training Data Initial Weights
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
𝑿 𝟏
In Out
𝑾 𝟏
𝑾 𝟐
+𝟏
𝒃
𝑿 𝟐
5. Network Training
• Steps to train our network:
1. Prepare activation function input
(sum of products between inputs
and weights).
2. Activation function output.
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
6. Network Training: Sum of Products
• After calculating the sop between inputs
and weights, next is to use this sop as the
input to the activation function.
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃
𝒔 = 𝟎. 𝟏 ∗ 𝟎. 𝟓 + 𝟎. 𝟑 ∗ 𝟎. 𝟐 + 𝟏. 𝟖𝟑
𝒔 = 𝟏. 𝟗𝟒
7. Network Training: Activation Function
• In this example, the sigmoid activation
function is used.
• Based on the sop calculated previously,
the output is as follows:
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
𝒇 𝒔 =
𝟏
𝟏 + 𝒆−𝒔
𝒇 𝒔 =
𝟏
𝟏 + 𝒆−𝟏.𝟗𝟒
=
𝟏
𝟏 + 𝟎. 𝟏𝟒𝟒
=
𝟏
𝟏. 𝟏𝟒𝟒
𝒇 𝒔 = 𝟎. 𝟖𝟕𝟒
8. Network Training: Prediction Error
• After getting the predicted outputs,
next is to measure the prediction error
of the network.
• We can use the squared error function
defined as follows:
• Based on the predicted output, the
prediction error is:
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
𝑬 =
𝟏
𝟐
𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟕𝟒 𝟐
=
𝟏
𝟐
−𝟎. 𝟖𝟒𝟒 𝟐
=
𝟏
𝟐
𝟎. 𝟕𝟏𝟑 = 𝟎. 𝟑𝟓𝟕
9. How to Minimize Prediction Error?
• There is a prediction error and it should be minimized until reaching
an acceptable error.
What should we do in order to minimize the error?
• There must be something to change in order to minimize the error. In
our example, the only parameter to change is the weight.
How to update the weights?
• We can use the weights update equation:
𝑾 𝒏𝒆𝒘 = 𝑾 𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿
10. Weights Update Equation
• We can use the weights update equation:
𝑾 𝒏𝒆𝒘: new updated weights.
𝑾 𝒐𝒍𝒅: current weights. [1.83, 0.5, 0.2]
η: network learning rate. 0.01
𝒅: desired output. 0.03
𝒀: predicted output. 0.874
𝑿: current input at which the network made false prediction. [+1, 0.1, 0.3]
𝑾 𝒏𝒆𝒘 = 𝑾 𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿
12. Weights Update Equation
• The new weights are:
• Based on the new weights, the network will be re-trained.
𝑾 𝟏𝒏𝒆𝒘 𝑾 𝟐𝒏𝒆𝒘 𝒃 𝒏𝒆𝒘
𝟎. 𝟏𝟗𝟖 𝟎. 𝟒𝟗𝟗 𝟏. 𝟖𝟐𝟐
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
13. Weights Update Equation
• The new weights are:
• Based on the new weights, the network will be re-trained.
• Continue these operations until prediction error reaches an
acceptable value.
1. Updating weights.
2. Retraining network.
3. Calculating prediction error.
𝑾 𝟏𝒏𝒆𝒘 𝑾 𝟐𝒏𝒆𝒘 𝒃 𝒏𝒆𝒘
𝟎. 𝟏𝟗𝟖 𝟎. 𝟒𝟗𝟗 𝟏. 𝟖𝟐𝟐
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟒𝟗𝟗
𝑾 𝟐 = 𝟎. 𝟏𝟗𝟖
+𝟏
𝒃 = 𝟏. 𝟖22
𝟎. 𝟑
14. Why Backpropagation Algorithm is Important?
• The backpropagation algorithm is used to answer these questions
and understand effect of each weight over the prediction error.
New Weights
!Old Weights
15. Forward Vs. Backward Passes
• When training a neural network, there are two
passes: forward and backward.
• The goal of the backward pass is to know how each
weight affects the total error. In other words, how
changing the weights changes the prediction error?
Forward
Backward
16. Backward Pass
• Let us work with a simpler example:
• How to answer this question: What is the effect on the output Y
given a change in variable X?
• This question is answered using derivatives. Derivative of Y wrt X (
𝝏𝒀
𝝏𝑿
)
will tell us the effect of changing the variable X over the output Y.
𝒀 = 𝑿 𝟐
𝒁 + 𝑯
17. Calculating Derivatives
• The derivative
𝝏𝒀
𝝏𝑿
can be calculated as follows:
• Based on these two derivative rules:
• The result will be:
𝝏𝒀
𝛛𝑿
=
𝛛
𝛛𝑿
(𝑿 𝟐
𝒁 + 𝑯)
𝒀 = 𝑿 𝟐
𝒁 + 𝑯
𝛛
𝛛𝑿
𝑿 𝟐
= 𝟐𝑿Square
𝛛
𝛛𝑿
𝑪 = 𝟎Constant
𝝏𝒀
𝛛𝑿
= 𝟐𝑿𝒁 + 𝟎 = 𝟐𝑿𝒁
18. Prediction Error – Weight Derivative
E W?
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
Change in Y wrt X
𝝏𝒀
𝛛𝑿
Change in E wrt W
𝝏𝑬
𝛛𝑾
35. Interpreting Derivatives
• There are two useful pieces of information from the derivatives
calculated previously.
Increasing/decreasing weight
increases/decreases error.
Derivative MagnitudeDerivative Sign
Positive
Increasing/decreasing weight
decreases/increases error.
Negative
Increasing/decreasing weight by P
increases/decreases error by MAG*P.
Increasing/decreasing weight by P
decreases/increases error by MAG*P.
Positive
Sign
Negative
Sign
In our example, because both
𝛛𝑬
𝛛𝑾 𝟏
and
𝛛𝑬
𝛛𝑾 𝟐
are positive, then we would
like to decrease the weights in order to decrease the prediction error.
𝛛𝑬
𝛛𝑾 𝟐
= 𝟎. 𝟎𝟑
𝝏𝑬
𝛛𝑾 𝟏
= 𝟎. 𝟎𝟏
36. Updating Weights
• Each weight will be updated based on its derivative according to this
equation:
𝑾𝒊𝒏𝒆𝒘 = 𝑾𝒊𝒐𝒍𝒅 − η ∗
𝛛𝑬
𝛛𝑾𝒊
𝑾 𝟏𝒏𝒆𝒘 = 𝑾 𝟏 − η ∗
𝛛𝑬
𝛛𝑾 𝟏
= 𝟎. 𝟓 − 0.01 ∗ 𝟎. 𝟎𝟏
𝑾 𝟏𝒏𝒆𝒘 = 𝟎. 𝟒𝟗𝟗𝟗𝟏
𝑾 𝟐𝒏𝒆𝒘 = 𝑾 𝟐 − η ∗
𝛛𝑬
𝛛𝑾 𝟐
= 𝟎. 𝟐 − 0.01 ∗ 𝟎. 𝟎𝟐𝟖
𝑾 𝟐𝒏𝒆𝒘 = 𝟎. 𝟏𝟗𝟗𝟕
Updating 𝑾 𝟏 Updating 𝑾 𝟐
Continue updating weights according to derivatives and re-train the
network until reaching an acceptable error.