SlideShare a Scribd company logo
1 of 76
Download to read offline
Backpropagation: Understanding How to
Update ANNs Weights Step-by-Step
Ahmed Fawzy Gad
ahmed.fawzy@ci.menofia.edu.eg
MENOUFIA UNIVERSITY
FACULTY OF COMPUTERS AND INFORMATION
INFORMATION TECHNOLOGY
‫المنوفية‬ ‫جامعة‬
‫والمعلومات‬ ‫الحاسبات‬ ‫كلية‬
‫المعلومات‬ ‫تكنولوجيا‬
‫المنوفية‬ ‫جامعة‬
Train then Update
• The backpropagation algorithm is used to update the NN weights
when they are not able to make the correct predictions. Hence, we
should train the NN before applying backpropagation.
Initial
Weights PredictionTraining
Train then Update
• The backpropagation algorithm is used to update the NN weights
when they are not able to make the correct predictions. Hence, we
should train the NN before applying backpropagation.
Initial
Weights PredictionTraining
BackpropagationUpdate
Neural Network Training Example
𝐗 𝟏 𝐗 𝟐 𝐎𝐮𝐭𝐩𝐮𝐭
𝟎. 𝟏 𝟎. 𝟑 𝟎. 𝟎𝟑
𝐖𝟏 𝐖𝟐 𝐛
𝟎. 𝟓 𝟎. 𝟓 1. 𝟖𝟑
Training Data Initial Weights
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
𝑿 𝟏
In Out
𝑾 𝟏
𝑾 𝟐
+𝟏
𝒃
𝑿 𝟐
Network Training
• Steps to train our network:
1. Prepare activation function input
(sum of products between inputs
and weights).
2. Activation function output.
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
Network Training: Sum of Products
• After calculating the sop between inputs
and weights, next is to use this sop as the
input to the activation function.
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃
𝒔 = 𝟎. 𝟏 ∗ 𝟎. 𝟓 + 𝟎. 𝟑 ∗ 𝟎. 𝟐 + 𝟏. 𝟖𝟑
𝒔 = 𝟏. 𝟗𝟒
Network Training: Activation Function
• In this example, the sigmoid activation
function is used.
• Based on the sop calculated previously,
the output is as follows:
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
𝒇 𝒔 =
𝟏
𝟏 + 𝒆−𝒔
𝒇 𝒔 =
𝟏
𝟏 + 𝒆−𝟏.𝟗𝟒
=
𝟏
𝟏 + 𝟎. 𝟏𝟒𝟒
=
𝟏
𝟏. 𝟏𝟒𝟒
𝒇 𝒔 = 𝟎. 𝟖𝟕𝟒
Network Training: Prediction Error
• After getting the predicted outputs,
next is to measure the prediction error
of the network.
• We can use the squared error function
defined as follows:
• Based on the predicted output, the
prediction error is:
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
𝑬 =
𝟏
𝟐
𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟕𝟒 𝟐
=
𝟏
𝟐
−𝟎. 𝟖𝟒𝟒 𝟐
=
𝟏
𝟐
𝟎. 𝟕𝟏𝟑 = 𝟎. 𝟑𝟓𝟕
How to Minimize Prediction Error?
• There is a prediction error and it should be minimized until reaching
an acceptable error.
What should we do in order to minimize the error?
• There must be something to change in order to minimize the error. In
our example, the only parameter to change is the weight.
How to update the weights?
• We can use the weights update equation:
𝑾 𝒏𝒆𝒘 = 𝑾 𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿
Weights Update Equation
• We can use the weights update equation:
 𝑾 𝒏𝒆𝒘: new updated weights.
 𝑾 𝒐𝒍𝒅: current weights. [1.83, 0.5, 0.2]
 η: network learning rate. 0.01
 𝒅: desired output. 0.03
 𝒀: predicted output. 0.874
 𝑿: current input at which the network made false prediction. [+1, 0.1, 0.3]
𝑾 𝒏𝒆𝒘 = 𝑾 𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿
Weights Update Equation
𝑾 𝒏𝒆𝒘 = 𝑾 𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿
= [𝟏. 𝟖𝟑, 𝟎. 𝟓, 𝟎. 𝟐 + 𝟎. 𝟎𝟏 𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟕𝟒 [+𝟏, 𝟎. 𝟏, 𝟎. 𝟑
= [𝟏. 𝟖𝟑, 𝟎. 𝟓, 𝟎. 𝟐 + −𝟎. 𝟎𝟎𝟖𝟒[+𝟏, 𝟎. 𝟏, 𝟎. 𝟑
= [𝟏. 𝟖𝟑, 𝟎. 𝟓, 𝟎. 𝟐 + [−𝟎. 𝟎𝟎𝟖𝟒, −𝟎. 𝟎𝟎𝟎𝟖𝟒, −𝟎. 𝟎𝟎𝟐𝟓
= [𝟏. 𝟖𝟐𝟐, 𝟎. 𝟒𝟗𝟗, 𝟎. 𝟏𝟗𝟖
Weights Update Equation
• The new weights are:
• Based on the new weights, the network will be re-trained.
𝑾 𝟏𝒏𝒆𝒘 𝑾 𝟐𝒏𝒆𝒘 𝒃 𝒏𝒆𝒘
𝟎. 𝟏𝟗𝟖 𝟎. 𝟒𝟗𝟗 𝟏. 𝟖𝟐𝟐
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟓
𝑾 𝟐 = 𝟎. 𝟐
+𝟏
𝒃 = 𝟏. 𝟖𝟑
𝟎. 𝟑
Weights Update Equation
• The new weights are:
• Based on the new weights, the network will be re-trained.
• Continue these operations until prediction error reaches an
acceptable value.
1. Updating weights.
2. Retraining network.
3. Calculating prediction error.
𝑾 𝟏𝒏𝒆𝒘 𝑾 𝟐𝒏𝒆𝒘 𝒃 𝒏𝒆𝒘
𝟎. 𝟏𝟗𝟖 𝟎. 𝟒𝟗𝟗 𝟏. 𝟖𝟐𝟐
𝟎. 𝟏
In Out
𝑾 𝟏 = 𝟎. 𝟒𝟗𝟗
𝑾 𝟐 = 𝟎. 𝟏𝟗𝟖
+𝟏
𝒃 = 𝟏. 𝟖22
𝟎. 𝟑
Why Backpropagation Algorithm is Important?
• The backpropagation algorithm is used to answer these questions
and understand effect of each weight over the prediction error.
New Weights
!Old Weights
Forward Vs. Backward Passes
• When training a neural network, there are two
passes: forward and backward.
• The goal of the backward pass is to know how each
weight affects the total error. In other words, how
changing the weights changes the prediction error?
Forward
Backward
Backward Pass
• Let us work with a simpler example:
• How to answer this question: What is the effect on the output Y
given a change in variable X?
• This question is answered using derivatives. Derivative of Y wrt X (
𝝏𝒀
𝝏𝑿
)
will tell us the effect of changing the variable X over the output Y.
𝒀 = 𝑿 𝟐
𝒁 + 𝑯
Calculating Derivatives
• The derivative
𝝏𝒀
𝝏𝑿
can be calculated as follows:
• Based on these two derivative rules:
• The result will be:
𝝏𝒀
𝛛𝑿
=
𝛛
𝛛𝑿
(𝑿 𝟐
𝒁 + 𝑯)
𝒀 = 𝑿 𝟐
𝒁 + 𝑯
𝛛
𝛛𝑿
𝑿 𝟐
= 𝟐𝑿Square
𝛛
𝛛𝑿
𝑪 = 𝟎Constant
𝝏𝒀
𝛛𝑿
= 𝟐𝑿𝒁 + 𝟎 = 𝟐𝑿𝒁
Prediction Error – Weight Derivative
E W?
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
Change in Y wrt X
𝝏𝒀
𝛛𝑿
Change in E wrt W
𝝏𝑬
𝛛𝑾
Prediction Error – Weight Derivative
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
Prediction Error – Weight Derivative
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
Prediction Error – Weight Derivative
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕)
Prediction Error – Weight Derivative
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =
𝟏
𝟏 + 𝒆−𝒔
Prediction Error – Weight Derivative
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −
𝟏
𝟏 + 𝒆−𝒔
𝟐
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =
𝟏
𝟏 + 𝒆−𝒔
Prediction Error – Weight Derivative
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −
𝟏
𝟏 + 𝒆−𝒔
𝟐
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =
𝟏
𝟏 + 𝒆−𝒔
Prediction Error – Weight Derivative
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −
𝟏
𝟏 + 𝒆−𝒔
𝟐
𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =
𝟏
𝟏 + 𝒆−𝒔
Prediction Error – Weight Derivative
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −
𝟏
𝟏 + 𝒆−𝒔
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =
𝟏
𝟏 + 𝒆−𝒔
𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −
𝟏
𝟏 + 𝒆−(𝑿1∗ 𝑾1+ 𝑿2∗𝑾2+𝒃)
𝟐
Multivariate Chain Rule
Predicted
Output
Prediction
Error
sop Weights
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒇 𝒙 =
𝟏
𝟏 + 𝒆−𝒔
𝒔 = 𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝑾 𝟏, 𝑾 𝟐
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −
𝟏
𝟏 + 𝒆−(𝑿1∗ 𝑾1+ 𝑿2∗𝑾2+𝒃)
𝟐
𝝏𝑬
𝝏𝑾
=
𝝏
𝝏𝑾
(
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −
𝟏
𝟏 + 𝒆−(𝑿 𝟏∗ 𝑾 𝟏+ 𝑿 𝟐∗𝑾 𝟐+𝒃)
𝟐
)
Chain Rule
Multivariate Chain Rule
Predicted
Output
Prediction
Error
sop Weights
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒇 𝒙 =
𝟏
𝟏 + 𝒆−𝒔
𝒔 = 𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝑾 𝟏, 𝑾 𝟐
𝝏𝑬
𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝛛𝒔
𝝏𝒔
𝛛𝑾 𝟏
𝝏𝒔
𝛛𝑾 𝟐
𝝏𝑬
𝛛𝑾 𝟏
𝝏𝑬
𝛛𝑾 𝟐
Let’s calculate these individual partial derivatives.
𝝏𝑬
𝝏𝑾 𝟏
=
𝝏𝑬
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
∗
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝒔
∗
𝝏𝒔
𝝏𝑾 𝟏
𝝏𝑬
𝝏𝑾 𝟐
=
𝝏𝑬
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
∗
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝒔
∗
𝝏𝒔
𝝏𝑾 𝟐
𝝏𝑬
𝝏𝑾 𝟐
=
𝝏𝑬
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
∗
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝒔
∗
𝝏𝒔
𝝏𝑾 𝟐
Error-Predicted (
𝝏𝑬
𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
) Partial Derivative
Substitution
𝝏𝑬
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
=
𝝏
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
(
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐)
= 𝟐 ∗
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐−𝟏
∗ (𝟎 − 𝟏)
)= (𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅) ∗ (−𝟏
= 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅
𝝏𝑬
𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
= 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟖𝟕𝟒 − 𝟎. 𝟎𝟑
𝝏𝑬
𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
= 𝟎. 𝟖𝟒𝟒
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
Predicted-sop (
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝒔
) Partial Derivative
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝒔
=
𝝏
𝝏𝒔
(
𝟏
𝟏 + 𝒆−𝒔
)
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝒔
=
𝟏
𝟏 + 𝒆−𝒔
(𝟏 −
𝟏
𝟏 + 𝒆−𝒔
)
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝒔
=
𝟏
𝟏 + 𝒆−𝒔
(𝟏 −
𝟏
𝟏 + 𝒆−𝒔
) =
𝟏
𝟏 + 𝒆−𝟏.𝟗𝟒
(𝟏 −
𝟏
𝟏 + 𝒆−𝟏.𝟗𝟒
)
=
𝟏
𝟏 + 𝟎. 𝟏𝟒𝟒
(𝟏 −
𝟏
𝟏 + 𝟎. 𝟏𝟒𝟒
)
=
𝟏
𝟏. 𝟏𝟒𝟒
(𝟏 −
𝟏
𝟏. 𝟏𝟒𝟒
)
= 𝟎. 𝟖𝟕𝟒(𝟏 − 𝟎. 𝟖𝟕𝟒)
= 𝟎. 𝟖𝟕𝟒(𝟎. 𝟏𝟐𝟔)
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝛛𝒔
= 𝟎. 𝟏𝟏
Substitution
𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐞𝐝 =
𝟏
𝟏 + 𝒆−𝒔
Sop-𝑊1 (
𝝏𝒔
𝛛𝑾 𝟏
) Partial Derivative
𝝏𝒔
𝛛𝑾 𝟏
=
𝛛
𝛛𝑾 𝟏
(𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃)
= 𝟏 ∗ 𝑿 𝟏 ∗ 𝑾 𝟏
𝟏−𝟏 + 𝟎 + 𝟎
= 𝑿 𝟏 ∗ 𝑾 𝟏
𝟎
)= 𝑿 𝟏(𝟏
𝝏𝒔
𝛛𝑾 𝟏
= 𝑿 𝟏
𝝏𝒔
𝛛𝑾 𝟏
= 𝑿 𝟏
Substitution
𝝏𝒔
𝛛𝑾 𝟏
= 𝟎. 𝟏
𝐬 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃
𝝏𝒔
𝛛𝑾 𝟐
=
𝛛
𝛛𝑾 𝟐
(𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃)
= 𝟎 + 𝟏 ∗ 𝑿 𝟐 ∗ 𝑾 𝟐
𝟏−𝟏 + 𝟎
= 𝑿 𝟐 ∗ 𝑾 𝟐
𝟎
)= 𝑿 𝟐(𝟏
𝝏𝒔
𝛛𝑾 𝟐
= 𝑿 𝟐
𝝏𝒔
𝛛𝑾 𝟐
= 𝑿 𝟐 = 𝟎. 𝟑
Substitution
𝝏𝒔
𝛛𝑾 𝟐
= 𝟎. 𝟑
𝐬 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃
Sop-𝑊1 (
𝝏𝒔
𝛛𝑾 𝟐
) Partial Derivative
Error-𝑊1 (
𝛛𝑬
𝛛𝑾 𝟏
) Partial Derivative
• After calculating each individual derivative, we can multiply all of
them to get the desired relationship between the prediction error
and each weight.
𝝏𝑬
𝝏𝑾 𝟏
=
𝝏𝑬
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
∗
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝒔
∗
𝝏𝒔
𝝏𝑾 𝟏
𝝏𝑬
𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
= 𝟎. 𝟖𝟒𝟒
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝛛𝒔
= 𝟎. 𝟏𝟏
𝝏𝒔
𝛛𝑾 𝟏
= 𝟎. 𝟏
𝝏𝑬
𝛛𝑾 𝟏
= 𝟎. 𝟖𝟒𝟒 ∗ 𝟎. 𝟏𝟏 ∗ 𝟎. 𝟏
𝝏𝑬
𝛛𝑾 𝟏
= 𝟎. 𝟎𝟏
Calculated Derivatives
Error-𝑊2 (
𝛛𝑬
𝛛𝑾 𝟐
) Partial Derivative
𝝏𝑬
𝝏𝑾 𝟐
=
𝝏𝑬
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
∗
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝝏𝒔
∗
𝝏𝒔
𝝏𝑾 𝟐
𝝏𝑬
𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
= 𝟎. 𝟖𝟒𝟒
𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅
𝛛𝒔
= 𝟎. 𝟏𝟏
𝝏𝒔
𝛛𝑾 𝟐
= 𝟎. 𝟑
𝛛𝑬
𝛛𝑾 𝟐
= 𝟎. 𝟎𝟑
𝝏𝑬
𝛛𝑾 𝟐
= 𝟎. 𝟖𝟒𝟒 ∗ 𝟎. 𝟏𝟏 ∗ 𝟎. 𝟑
Calculated Derivatives
Interpreting Derivatives
• There are two useful pieces of information from the derivatives
calculated previously.
Increasing/decreasing weight
increases/decreases error.
Derivative MagnitudeDerivative Sign
Positive
Increasing/decreasing weight
decreases/increases error.
Negative
Increasing/decreasing weight by P
increases/decreases error by MAG*P.
Increasing/decreasing weight by P
decreases/increases error by MAG*P.
Positive
Sign
Negative
Sign
In our example, because both
𝛛𝑬
𝛛𝑾 𝟏
and
𝛛𝑬
𝛛𝑾 𝟐
are positive, then we would
like to decrease the weights in order to decrease the prediction error.
𝛛𝑬
𝛛𝑾 𝟐
= 𝟎. 𝟎𝟑
𝝏𝑬
𝛛𝑾 𝟏
= 𝟎. 𝟎𝟏
Updating Weights
• Each weight will be updated based on its derivative according to this
equation:
𝑾𝒊𝒏𝒆𝒘 = 𝑾𝒊𝒐𝒍𝒅 − η ∗
𝛛𝑬
𝛛𝑾𝒊
𝑾 𝟏𝒏𝒆𝒘 = 𝑾 𝟏 − η ∗
𝛛𝑬
𝛛𝑾 𝟏
= 𝟎. 𝟓 − 0.01 ∗ 𝟎. 𝟎𝟏
𝑾 𝟏𝒏𝒆𝒘 = 𝟎. 𝟒𝟗𝟗𝟗𝟏
𝑾 𝟐𝒏𝒆𝒘 = 𝑾 𝟐 − η ∗
𝛛𝑬
𝛛𝑾 𝟐
= 𝟎. 𝟐 − 0.01 ∗ 𝟎. 𝟎𝟐𝟖
𝑾 𝟐𝒏𝒆𝒘 = 𝟎. 𝟏𝟗𝟗𝟕
Updating 𝑾 𝟏 Updating 𝑾 𝟐
Continue updating weights according to derivatives and re-train the
network until reaching an acceptable error.
Second Example
Backpropagation for NN with Hidden Layer
ANN with Hidden Layer
𝑾 𝟏 𝑾 𝟐 𝑾 𝟑 𝑾 𝟒 𝑾 𝟓 𝑾 𝟔 𝒃 𝟏 𝒃 𝟐 𝒃 𝟑
𝟎. 𝟓 𝟎. 𝟏 𝟎. 𝟔𝟐 𝟎. 𝟐 −𝟎. 𝟐 𝟎. 𝟑 𝟎. 𝟒 −𝟎. 𝟏 𝟏. 𝟖𝟑
𝐗 𝟏 𝐗 𝟐 𝐎𝐮𝐭𝐩𝐮𝐭
𝟎. 𝟏 𝟎. 𝟑 𝟎. 𝟎𝟑
Training Data
Initial Weights
ANN with Hidden Layer
Initial
Weights PredictionTraining
ANN with Hidden Layer
Initial
Weights PredictionTraining
BackpropagationUpdate
Forward Pass – Hidden Layer Neurons
𝒉 𝟏𝒊𝒏 = 𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝟏
= 𝟎. 𝟏 ∗ 𝟎. 𝟓 + 𝟎. 𝟑 ∗ 𝟎. 𝟏 + 𝟎. 𝟒
𝒉 𝟏𝒊𝒏 = 𝟎. 𝟒𝟖
𝒉 𝟏𝒐𝒖𝒕 =
𝟏
𝟏 + 𝒆−𝒉 𝟏𝒊𝒏
=
𝟏
𝟏 + 𝒆−𝟎.𝟒𝟖
𝒉 𝟏𝒐𝒖𝒕 = 𝟎. 𝟔𝟏𝟖
𝒉 𝟏
In
Out
Forward Pass – Hidden Layer Neurons
𝒉 𝟐𝒊𝒏 = 𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐
= 𝟎. 𝟏 ∗ 𝟎. 𝟔𝟐 + 𝟎. 𝟑 ∗ 𝟎. 𝟐 − 𝟎. 𝟏
𝒉 𝟐𝒊𝒏 = 𝟎. 𝟎𝟐𝟐
𝒉 𝟐𝒐𝒖𝒕 =
𝟏
𝟏 + 𝒆−𝒉 𝟐𝒊𝒏
=
𝟏
𝟏 + 𝒆−𝟎.𝟎𝟐𝟐
𝒉 𝟐𝒐𝒖𝒕 = 𝟎. 𝟓𝟎𝟔
𝒉 𝟐
In
Out
Forward Pass – Output Layer Neuron
𝒐𝒖𝒕𝒊𝒏 = 𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑
= 𝟎. 𝟔𝟏𝟖 ∗ −𝟎. 𝟐 + 𝟎. 𝟓𝟎𝟔 ∗ 𝟎. 𝟑 + 𝟏. 𝟖𝟑
𝒐𝒖𝒕𝒊𝒏 = 𝟏. 𝟖𝟓𝟖
𝒐𝒖𝒕 𝒐𝒖𝒕 =
𝟏
𝟏 + 𝒆−𝒐𝒖𝒕 𝒊𝒏
=
𝟏
𝟏 + 𝒆−𝟏.𝟖𝟓𝟖
𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟔𝟓
𝒐𝒖𝒕
In
Out
Forward Pass – Prediction Error
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑
𝑬 =
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕 𝒐𝒖𝒕
𝟐
=
𝟏
𝟐
𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟔𝟓 𝟐
𝑬 = 𝟎. 𝟑𝟒𝟗
𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟔𝟓
𝝏𝑬
𝝏𝑾 𝟏
,
𝝏𝑬
𝝏𝑾 𝟐
,
𝝏𝑬
𝝏𝑾 𝟑
,
𝝏𝑬
𝝏𝑾 𝟒
,
𝝏𝑬
𝝏𝑾 𝟓
,
𝝏𝑬
𝝏𝑾 𝟔
Partial Derivatives Calculation
E−𝑊5 (
𝝏𝑬
𝝏𝑾 𝟓
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟓
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝑾 𝟓
E−𝑊5 (
𝝏𝑬
𝝏𝑾 𝟓
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟓
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝑾 𝟓
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
=
𝝏
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
(
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕 𝒐𝒖𝒕
𝟐
)
= 𝟐 ∗
𝟏
𝟐
𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕 𝒐𝒖𝒕
𝟐−𝟏 ∗ (𝟎 − 𝟏)
= 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ (−𝟏)
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝒐𝒖𝒕 𝒐𝒖𝒕 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝒐𝒖𝒕 𝒐𝒖𝒕 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟖𝟔𝟓 − 𝟎. 𝟎𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓
Partial Derivative
Substitution
E−𝑊5 (
𝝏𝑬
𝝏𝑾 𝟓
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟓
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝑾 𝟓
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
=
𝝏
𝝏𝒐𝒖𝒕𝒊𝒏
(
𝟏
𝟏 + 𝒆−𝒐𝒖𝒕 𝒊𝒏
)
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= (
𝟏
𝟏 + 𝒆−𝒐𝒖𝒕 𝒊𝒏
)(𝟏 −
𝟏
𝟏 + 𝒆−𝒐𝒖𝒕 𝒊𝒏
)
𝜕𝒐𝒖𝒕 𝒐𝒖𝒕
𝜕𝒐𝒖𝒕𝒊𝒏
= (
𝟏
𝟏 + 𝒆−𝟏.𝟖𝟓𝟖
)(𝟏 −
𝟏
𝟏 + 𝒆−𝟏.𝟖𝟓𝟖
)
= (
𝟏
𝟏. 𝟓𝟔
)(𝟏 −
𝟏
𝟏. 𝟓𝟔
)
= 𝟎. 𝟔𝟒𝟏 𝟏 − 𝟎. 𝟔𝟒𝟏 = 𝟎. 𝟔𝟒𝟏 𝟎. 𝟑𝟓𝟗
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
Partial Derivative
Substitution
E−𝑊5 (
𝝏𝑬
𝝏𝑾 𝟓
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟓
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝑾 𝟓
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟓
=
𝝏
𝝏𝑾 𝟓
(𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑)
= 𝟏 ∗ 𝒉 𝟏𝒐𝒖𝒕 ∗ (𝑾 𝟓) 𝟏−𝟏
+ 𝟎 + 𝟎
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟓
= 𝒉 𝟏𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟓
= 𝒉 𝟏𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟓
= 𝟎. 𝟔𝟏𝟖
Partial Derivative
Substitution
E−𝑊5 (
𝝏𝑬
𝝏𝑾 𝟓
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟓
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝑾 𝟓
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟓
= 𝟎. 𝟔𝟏𝟖
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓
𝝏𝑬
𝝏𝑾 𝟓
= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟔𝟏𝟖
𝝏𝑬
𝝏𝑾 𝟓
= 𝟎. 𝟏𝟏𝟗
E−𝑊6 (
𝝏𝑬
𝝏𝑾 𝟔
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟔
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝑾 𝟔
E−𝑊6 (
𝝏𝑬
𝝏𝑾 𝟔
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟔
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝑾 𝟔
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓
E−𝑊6 (
𝝏𝑬
𝝏𝑾 𝟔
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟓
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝑾 𝟔
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟔
=
𝝏
𝝏𝑾 𝟔
(𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑)
= 𝟎 + 𝟏 ∗ 𝒉 𝟐𝒐𝒖𝒕 ∗ (𝑾 𝟔) 𝟏−𝟏
+𝟎
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟔
= 𝒉 𝟐𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟔
= 𝒉 𝟐𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟔
= 𝟎. 𝟓𝟎𝟔
Partial Derivative
Substitution
E−𝑊6 (
𝝏𝑬
𝝏𝑾 𝟔
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟔
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝑾 𝟔
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝑾 𝟔
= 𝟎. 𝟓𝟎𝟔
𝝏𝑬
𝛛𝑾 𝟔
= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟓𝟎𝟔
𝛛𝑬
𝛛𝑾 𝟔
= 𝟎. 𝟎𝟗𝟕
E−𝑊1 (
𝝏𝑬
𝝏𝑾 𝟏
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟏
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟏
E−𝑊1 (
𝝏𝑬
𝝏𝑾 𝟏
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟏
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟏
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓
E−𝑊1 (
𝝏𝑬
𝝏𝑾 𝟏
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟏
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟏
Partial Derivative
Substitution
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟏 𝒐𝒖𝒕
=
𝝏
𝝏𝒉𝟏 𝒐𝒖𝒕
(𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑)
= (𝒉 𝟏𝒐𝒖𝒕) 𝟏−𝟏
∗ 𝑾 𝟓 + 𝟎 + 𝟎
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟏 𝒐𝒖𝒕
= 𝑾 𝟓
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟏 𝒐𝒖𝒕
= 𝑾 𝟓
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
= −𝟎. 𝟐
E−𝑊1 (
𝝏𝑬
𝝏𝑾 𝟏
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟏
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟏
Partial Derivative
Substitution
𝝏𝒉𝟏 𝒐𝒖𝒕
𝝏𝒉𝟏𝒊𝒏
=
𝝏
𝝏𝒉 𝟏𝒊𝒏
(
𝟏
𝟏 + 𝒆−𝒉 𝟏𝒊𝒏
)
𝝏𝒉𝟏 𝒐𝒖𝒕
𝝏𝒉𝟏𝒊𝒏
= (
𝟏
𝟏 + 𝒆−𝒉 𝟏𝒊𝒏
)(𝟏 −
𝟏
𝟏 + 𝒆−𝒉 𝟏𝒊𝒏
)
𝝏𝒉𝟏 𝒐𝒖𝒕
𝝏𝒉𝟏𝒊𝒏
= (
𝟏
𝟏 + 𝒆−𝒉 𝟏𝒊𝒏
)(𝟏 −
𝟏
𝟏 + 𝒆−𝒉 𝟏𝒊𝒏
)
= (
𝟏
𝟏 + 𝒆−𝟎.𝟒𝟖
)(𝟏 −
𝟏
𝟏 + 𝒆−𝟎.𝟒𝟖
)
𝝏𝒉 𝟐𝒐𝒖𝒕
𝝏𝒉 𝟐𝒊𝒏
= 𝟎. 𝟐𝟑𝟔
E−𝑊1 (
𝝏𝑬
𝝏𝑾 𝟏
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟏
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟏
Partial Derivative
Substitution
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟏
=
𝝏
𝝏𝑾 𝟏
(𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝟏)
= 𝑿 𝟏 ∗ (𝑾 𝟏) 𝟏−𝟏+ 𝟎 + 𝟎
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟏
= 𝑿 𝟏
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟏
= 𝑿 𝟏
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟏
= 𝟎. 𝟏
E−𝑊1 (
𝝏𝑬
𝝏𝑾 𝟏
) Parial Derivative
𝝏𝑬
𝛛𝑾 𝟏
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟏
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟏
= 𝟎. 𝟏
𝝏𝒉 𝟐𝒐𝒖𝒕
𝝏𝒉 𝟐𝒊𝒏
= 𝟎. 𝟐𝟑𝟔
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
= −𝟎. 𝟐
𝝏𝑬
𝝏𝑾 𝟏
= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ −𝟎. 𝟐 ∗ 𝟎. 𝟐𝟑𝟔 ∗ 𝟎. 𝟏
𝝏𝑬
𝝏𝑾 𝟏
= −𝟎. 𝟎𝟎𝟏
E−𝑊2 (
𝝏𝑬
𝝏𝑾 𝟐
) Parial Derivative:
𝝏𝑬
𝛛𝑾 𝟐
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟐
E−𝑊2 (
𝝏𝑬
𝝏𝑾 𝟐
) Parial Derivative:
𝝏𝑬
𝛛𝑾 𝟐
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟐
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓
𝝏𝒉 𝟐𝒐𝒖𝒕
𝝏𝒉 𝟐𝒊𝒏
= 𝟎. 𝟐𝟑𝟔
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
= −𝟎. 𝟐
E−𝑊2 (
𝝏𝑬
𝝏𝑾 𝟐
) Parial Derivative:
Partial Derivative
Substitution
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟐
=
𝝏
𝝏𝑾 𝟐
(𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝟏)
= 𝟎 + 𝑿 𝟐 ∗ (𝑾 𝟐) 𝟏−𝟏+𝟎
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟐
= 𝑿 𝟐
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟐
= 𝑿 𝟐
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟐
= 𝟎. 𝟑
𝝏𝑬
𝛛𝑾 𝟐
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟐
E−𝑊2 (
𝝏𝑬
𝝏𝑾 𝟐
) Parial Derivative:
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓
𝝏𝒉 𝟐𝒐𝒖𝒕
𝝏𝒉 𝟐𝒊𝒏
= 𝟎. 𝟐𝟑𝟔
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
= −𝟎. 𝟐
𝝏𝒉𝟏𝒊𝒏
𝝏𝑾 𝟐
= 𝟎. 𝟑
𝝏𝑬
𝝏𝑾 𝟐
= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ −𝟎. 𝟐 ∗ 𝟎. 𝟐𝟑𝟔 ∗ 𝟎. 𝟑
𝝏𝑬
𝝏𝑾 𝟐
= −. 𝟎𝟎𝟑
𝝏𝑬
𝛛𝑾 𝟐
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟏 𝒐𝒖𝒕
∗
𝛛𝒉𝟏 𝒐𝒖𝒕
𝛛𝒉𝟏𝒊𝒏
∗
𝛛𝒉𝟏𝒊𝒏
𝛛𝑾 𝟐
E−𝑊3 (
𝝏𝑬
𝝏𝑾 𝟑
) Parial Derivative:
𝝏𝑬
𝛛𝑾 𝟑
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟑
E−𝑊3 (
𝝏𝑬
𝝏𝑾 𝟑
) Parial Derivative:
𝝏𝑬
𝛛𝑾 𝟑
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟑
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓
E−𝑊3 (
𝝏𝑬
𝝏𝑾 𝟑
) Parial Derivative:
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟐 𝒐𝒖𝒕
=
𝝏
𝝏𝒉𝟐 𝒐𝒖𝒕
(𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑)
= 𝟎 + (𝒉 𝟐𝒐𝒖𝒕) 𝟏−𝟏∗ 𝑾 𝟔 + 𝟎
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟐 𝒐𝒖𝒕
= 𝑾 𝟔
Partial Derivative
Substitution 𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟐 𝒐𝒖𝒕
= 𝑾 𝟔
𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟐 𝒐𝒖𝒕
= 𝟎. 𝟑
𝝏𝑬
𝛛𝑾 𝟑
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟑
E−𝑊3 (
𝝏𝑬
𝝏𝑾 𝟑
) Parial Derivative:
𝝏𝒉𝟐 𝒐𝒖𝒕
𝝏𝒉𝟐𝒊𝒏
=
𝝏
𝝏𝒉 𝟐𝒊𝒏
(
𝟏
𝟏 + 𝒆−𝒉 𝟐𝒊𝒏
)
𝝏𝒉𝟐 𝒐𝒖𝒕
𝝏𝒉𝟐𝒊𝒏
= (
𝟏
𝟏 + 𝒆−𝒉 𝟐𝒊𝒏
)(𝟏 −
𝟏
𝟏 + 𝒆−𝒉 𝟐𝒊𝒏
)
Partial Derivative
Substitution
𝝏𝒉𝟐 𝒐𝒖𝒕
𝝏𝒉𝟐𝒊𝒏
= (
𝟏
𝟏 + 𝒆−𝒉 𝟐𝒊𝒏
)(𝟏 −
𝟏
𝟏 + 𝒆−𝒉 𝟐𝒊𝒏
)
= (
𝟏
𝟏 + 𝒆−𝟎.𝟎𝟐𝟐
)(𝟏 −
𝟏
𝟏 + 𝒆−𝟎.𝟎𝟐𝟐
)
𝝏𝒉 𝟐𝒐𝒖𝒕
𝝏𝒉 𝟐𝒊𝒏
= 𝟎. 𝟐𝟓
𝝏𝑬
𝛛𝑾 𝟑
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟑
E−𝑊3 (
𝝏𝑬
𝝏𝑾 𝟑
) Parial Derivative:
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟑
=
𝝏
𝝏𝑾 𝟑
(𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐)
= 𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐
= (𝑿 𝟏) 𝟏−𝟏∗ 𝑾 𝟑 + 𝟎 + 𝟎
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟑
= 𝑾 𝟑
Partial Derivative
Substitution
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟑
= 𝑾 𝟑
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟑
= 𝟎. 𝟔𝟐
𝝏𝑬
𝛛𝑾 𝟑
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟑
E−𝑊3 (
𝝏𝑬
𝝏𝑾 𝟑
) Parial Derivative:
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟐 𝒐𝒖𝒕
= 𝟎. 𝟑
𝝏𝒉 𝟐𝒐𝒖𝒕
𝝏𝒉 𝟐𝒊𝒏
= 𝟎. 𝟐𝟓
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟑
= 𝟎. 𝟔𝟐
𝝏𝑬
𝝏𝑾 𝟑
= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟑 ∗ 𝟎. 𝟐𝟓 ∗ 𝟎. 𝟔𝟐
𝝏𝑬
𝝏𝑾 𝟑
= 𝟎. 𝟎𝟎𝟗
𝝏𝑬
𝛛𝑾 𝟑
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟑
E−𝑊4 (
𝝏𝑬
𝝏𝑾 𝟒
) Parial Derivative:
𝝏𝑬
𝛛𝑾 𝟒
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟒
E−𝑊4 (
𝝏𝑬
𝝏𝑾 𝟒
) Parial Derivative:
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟐 𝒐𝒖𝒕
= 𝟎. 𝟑
𝝏𝒉 𝟐𝒐𝒖𝒕
𝝏𝒉 𝟐𝒊𝒏
= 𝟎. 𝟐𝟓
𝝏𝑬
𝛛𝑾 𝟒
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟒
E−𝑊4 (
𝝏𝑬
𝝏𝑾 𝟒
) Parial Derivative:
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟒
=
𝝏
𝝏𝑾 𝟒
(𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐)
= 𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐
= 𝟎 + (𝑿 𝟐) 𝟏−𝟏∗ 𝑾 𝟒 + 𝟎
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟒
= 𝑾 𝟒
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟒
= 𝑾 𝟒
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟒
= 𝟎. 𝟐
Partial Derivative
Substitution
𝝏𝑬
𝛛𝑾 𝟒
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟒
E−𝑊4 (
𝝏𝑬
𝝏𝑾 𝟒
) Parial Derivative:
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
𝝏𝒐𝒖𝒕𝒊𝒏
= 𝟎. 𝟐𝟑
𝝏𝑬
𝝏𝒐𝒖𝒕 𝒐𝒖𝒕
= 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏
𝝏𝒉𝟐 𝒐𝒖𝒕
= 𝟎. 𝟑
𝝏𝒉 𝟐𝒐𝒖𝒕
𝝏𝒉 𝟐𝒊𝒏
= 𝟎. 𝟐𝟓
𝝏𝒉𝟐𝒊𝒏
𝝏𝑾 𝟒
= 𝟎. 𝟐
𝝏𝑬
𝝏𝑾 𝟒
= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟑 ∗ 𝟎. 𝟐𝟓 ∗ 𝟎. 𝟐
𝝏𝑬
𝝏𝑾 𝟒
= 𝟎. 𝟎𝟎𝟑
𝝏𝑬
𝛛𝑾 𝟒
=
𝛛𝑬
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
∗
𝛛𝒐𝒖𝒕 𝒐𝒖𝒕
𝛛𝒐𝒖𝒕𝒊𝒏
∗
𝛛𝒐𝒖𝒕𝒊𝒏
𝛛𝒉𝟐 𝒐𝒖𝒕
∗
𝛛𝒉𝟐 𝒐𝒖𝒕
𝛛𝒉𝟐𝒊𝒏
∗
𝛛𝒉𝟐𝒊𝒏
𝛛𝑾 𝟒
All Error-Weights Partial Derivatives
𝝏𝑬
𝝏𝑾 𝟒
= 𝟎. 𝟎𝟎𝟑
𝝏𝑬
𝝏𝑾 𝟑
= 𝟎. 𝟎𝟎𝟗
𝝏𝑬
𝝏𝑾 𝟐
= −. 𝟎𝟎𝟑
𝝏𝑬
𝝏𝑾 𝟏
= −𝟎. 𝟎𝟎𝟏
𝛛𝑬
𝛛𝑾 𝟔
= 𝟎. 𝟎𝟗𝟕
𝝏𝑬
𝝏𝑾 𝟓
= 𝟎. 𝟏𝟏𝟗
Updated Weights
𝑾 𝟏𝒏𝒆𝒘 = 𝑾 𝟏 − η ∗
𝝏𝑬
𝝏𝑾 𝟏
= 𝟎. 𝟓 − 𝟎. 𝟎𝟏 ∗ −𝟎. 𝟎𝟎𝟏 = 𝟎. 𝟓𝟎𝟎𝟎𝟏
𝑾 𝟐𝒏𝒆𝒘 = 𝑾 𝟐 − η ∗
𝝏𝑬
𝝏𝑾 𝟐
= 𝟎. 𝟏 − 𝟎. 𝟎𝟏 ∗ −𝟎. 𝟎𝟎𝟑 = 𝟎. 𝟏𝟎𝟎𝟎𝟑
𝑾 𝟑𝒏𝒆𝒘 = 𝑾 𝟑 − η ∗
𝝏𝑬
𝝏𝑾 𝟑
= 𝟎. 𝟔𝟐 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟎𝟎𝟗 = 𝟎. 𝟔𝟏𝟗𝟗𝟏
𝑾 𝟒𝒏𝒆𝒘 = 𝑾 𝟒 − η ∗
𝝏𝑬
𝝏𝑾 𝟒
= 𝟎. 𝟐 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟎𝟎𝟑 = 𝟎. 𝟏𝟗𝟗𝟕
𝑾 𝟓𝒏𝒆𝒘 = 𝑾 𝟓 − η ∗
𝝏𝑬
𝝏𝑾 𝟓
= −𝟎. 𝟐 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟔𝟏𝟖 = −𝟎. 𝟐𝟎𝟔𝟏𝟖
𝑾 𝟔𝒏𝒆𝒘 = 𝑾 𝟔 − η ∗
𝝏𝑬
𝝏𝑾 𝟔
= 𝟎. 𝟑 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟎𝟗𝟕 = 𝟎. 𝟐𝟗𝟗𝟎𝟑
Continue updating weights according to derivatives and re-train the
network until reaching an acceptable error.

More Related Content

What's hot

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2Srinivasan R
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Simplilearn
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...Simplilearn
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Simplilearn
 
Randomized Algorithm- Advanced Algorithm
Randomized Algorithm- Advanced AlgorithmRandomized Algorithm- Advanced Algorithm
Randomized Algorithm- Advanced AlgorithmMahbubur Rahman
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its ApplicationsDr Ganesh Iyer
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term MemoryYan Xu
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 

What's hot (20)

Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Back propagation
Back propagationBack propagation
Back propagation
 
Machine learning
Machine learningMachine learning
Machine learning
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
 
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
Artificial Neural Network | Deep Neural Network Explained | Artificial Neural...
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
 
Randomized Algorithm- Advanced Algorithm
Randomized Algorithm- Advanced AlgorithmRandomized Algorithm- Advanced Algorithm
Randomized Algorithm- Advanced Algorithm
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Machine Learning and its Applications
Machine Learning and its ApplicationsMachine Learning and its Applications
Machine Learning and its Applications
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 

Similar to Backpropagation Explained: How ANNs Update Weights Step-by-Step

Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Stochastic optimal control & rl
Stochastic optimal control & rlStochastic optimal control & rl
Stochastic optimal control & rlChoiJinwon3
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Hojin Yang
 
Kalman filter for Beginners
Kalman filter for BeginnersKalman filter for Beginners
Kalman filter for Beginnerswinfred lu
 
Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017Masa Kato
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copyShuai Zhang
 
Orbital_Simulation (2).pptx
Orbital_Simulation (2).pptxOrbital_Simulation (2).pptx
Orbital_Simulation (2).pptxMSPrasad7
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementationJongsu "Liam" Kim
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfsagayalavanya2
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Passivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulatorPassivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulatorHancheol Choi
 
Intro to Quant Trading Strategies (Lecture 6 of 10)
Intro to Quant Trading Strategies (Lecture 6 of 10)Intro to Quant Trading Strategies (Lecture 6 of 10)
Intro to Quant Trading Strategies (Lecture 6 of 10)Adrian Aley
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffTaeoh Kim
 
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...JeeyeonHan
 
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)tungwc
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMCSEMINARGROOT
 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1Euijin Jeong
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesJinho Lee
 

Similar to Backpropagation Explained: How ANNs Update Weights Step-by-Step (20)

Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Stochastic optimal control & rl
Stochastic optimal control & rlStochastic optimal control & rl
Stochastic optimal control & rl
 
Variational Autoencoder Tutorial
Variational Autoencoder Tutorial Variational Autoencoder Tutorial
Variational Autoencoder Tutorial
 
Kalman filter for Beginners
Kalman filter for BeginnersKalman filter for Beginners
Kalman filter for Beginners
 
Dl meetup 07-04-16
Dl meetup 07-04-16Dl meetup 07-04-16
Dl meetup 07-04-16
 
Koh_Liang_ICML2017
Koh_Liang_ICML2017Koh_Liang_ICML2017
Koh_Liang_ICML2017
 
Learning group em - 20171025 - copy
Learning group   em - 20171025 - copyLearning group   em - 20171025 - copy
Learning group em - 20171025 - copy
 
Orbital_Simulation (2).pptx
Orbital_Simulation (2).pptxOrbital_Simulation (2).pptx
Orbital_Simulation (2).pptx
 
Solving Poisson Equation using Conjugate Gradient Method and its implementation
Solving Poisson Equation using Conjugate Gradient Methodand its implementationSolving Poisson Equation using Conjugate Gradient Methodand its implementation
Solving Poisson Equation using Conjugate Gradient Method and its implementation
 
Instrumental Variables
Instrumental VariablesInstrumental Variables
Instrumental Variables
 
DL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdfDL_lecture3_regularization_I.pdf
DL_lecture3_regularization_I.pdf
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Passivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulatorPassivity-based control of rigid-body manipulator
Passivity-based control of rigid-body manipulator
 
Intro to Quant Trading Strategies (Lecture 6 of 10)
Intro to Quant Trading Strategies (Lecture 6 of 10)Intro to Quant Trading Strategies (Lecture 6 of 10)
Intro to Quant Trading Strategies (Lecture 6 of 10)
 
PR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion TradeoffPR 113: The Perception Distortion Tradeoff
PR 113: The Perception Distortion Tradeoff
 
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
Confirmatory Bayesian Online Change Point Detection in the Covariance Structu...
 
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
SUEC 高中 Adv Maths (Quadratic Equation in One Variable)
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMC
 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 

More from Ahmed Gad

ICEIT'20 Cython for Speeding-up Genetic Algorithm
ICEIT'20 Cython for Speeding-up Genetic AlgorithmICEIT'20 Cython for Speeding-up Genetic Algorithm
ICEIT'20 Cython for Speeding-up Genetic AlgorithmAhmed Gad
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...Ahmed Gad
 
Python for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionPython for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionAhmed Gad
 
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...Ahmed Gad
 
M.Sc. Thesis - Automatic People Counting in Crowded Scenes
M.Sc. Thesis - Automatic People Counting in Crowded ScenesM.Sc. Thesis - Automatic People Counting in Crowded Scenes
M.Sc. Thesis - Automatic People Counting in Crowded ScenesAhmed Gad
 
Derivation of Convolutional Neural Network from Fully Connected Network Step-...
Derivation of Convolutional Neural Network from Fully Connected Network Step-...Derivation of Convolutional Neural Network from Fully Connected Network Step-...
Derivation of Convolutional Neural Network from Fully Connected Network Step-...Ahmed Gad
 
Introduction to Optimization with Genetic Algorithm (GA)
Introduction to Optimization with Genetic Algorithm (GA)Introduction to Optimization with Genetic Algorithm (GA)
Introduction to Optimization with Genetic Algorithm (GA)Ahmed Gad
 
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...Ahmed Gad
 
Avoid Overfitting with Regularization
Avoid Overfitting with RegularizationAvoid Overfitting with Regularization
Avoid Overfitting with RegularizationAhmed Gad
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleGenetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleAhmed Gad
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisAhmed Gad
 
Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientAhmed Gad
 
Python for Computer Vision - Revision
Python for Computer Vision - RevisionPython for Computer Vision - Revision
Python for Computer Vision - RevisionAhmed Gad
 
Anime Studio Pro 10 Tutorial as Part of Multimedia Course
Anime Studio Pro 10 Tutorial as Part of Multimedia CourseAnime Studio Pro 10 Tutorial as Part of Multimedia Course
Anime Studio Pro 10 Tutorial as Part of Multimedia CourseAhmed Gad
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsAhmed Gad
 
Operations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by ExampleOperations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by ExampleAhmed Gad
 
MATLAB Code + Description : Real-Time Object Motion Detection and Tracking
MATLAB Code + Description : Real-Time Object Motion Detection and TrackingMATLAB Code + Description : Real-Time Object Motion Detection and Tracking
MATLAB Code + Description : Real-Time Object Motion Detection and TrackingAhmed Gad
 
MATLAB Code + Description : Very Simple Automatic English Optical Character R...
MATLAB Code + Description : Very Simple Automatic English Optical Character R...MATLAB Code + Description : Very Simple Automatic English Optical Character R...
MATLAB Code + Description : Very Simple Automatic English Optical Character R...Ahmed Gad
 
Graduation Project - Face Login : A Robust Face Identification System for Sec...
Graduation Project - Face Login : A Robust Face Identification System for Sec...Graduation Project - Face Login : A Robust Face Identification System for Sec...
Graduation Project - Face Login : A Robust Face Identification System for Sec...Ahmed Gad
 
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...Ahmed Gad
 

More from Ahmed Gad (20)

ICEIT'20 Cython for Speeding-up Genetic Algorithm
ICEIT'20 Cython for Speeding-up Genetic AlgorithmICEIT'20 Cython for Speeding-up Genetic Algorithm
ICEIT'20 Cython for Speeding-up Genetic Algorithm
 
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
NumPyCNNAndroid: A Library for Straightforward Implementation of Convolutiona...
 
Python for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd EditionPython for Computer Vision - Revision 2nd Edition
Python for Computer Vision - Revision 2nd Edition
 
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
Multi-Objective Optimization using Non-Dominated Sorting Genetic Algorithm wi...
 
M.Sc. Thesis - Automatic People Counting in Crowded Scenes
M.Sc. Thesis - Automatic People Counting in Crowded ScenesM.Sc. Thesis - Automatic People Counting in Crowded Scenes
M.Sc. Thesis - Automatic People Counting in Crowded Scenes
 
Derivation of Convolutional Neural Network from Fully Connected Network Step-...
Derivation of Convolutional Neural Network from Fully Connected Network Step-...Derivation of Convolutional Neural Network from Fully Connected Network Step-...
Derivation of Convolutional Neural Network from Fully Connected Network Step-...
 
Introduction to Optimization with Genetic Algorithm (GA)
Introduction to Optimization with Genetic Algorithm (GA)Introduction to Optimization with Genetic Algorithm (GA)
Introduction to Optimization with Genetic Algorithm (GA)
 
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
Derivation of Convolutional Neural Network (ConvNet) from Fully Connected Net...
 
Avoid Overfitting with Regularization
Avoid Overfitting with RegularizationAvoid Overfitting with Regularization
Avoid Overfitting with Regularization
 
Genetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step ExampleGenetic Algorithm (GA) Optimization - Step-by-Step Example
Genetic Algorithm (GA) Optimization - Step-by-Step Example
 
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression AnalysisICCES 2017 - Crowd Density Estimation Method using Regression Analysis
ICCES 2017 - Crowd Density Estimation Method using Regression Analysis
 
Computer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and GradientComputer Vision: Correlation, Convolution, and Gradient
Computer Vision: Correlation, Convolution, and Gradient
 
Python for Computer Vision - Revision
Python for Computer Vision - RevisionPython for Computer Vision - Revision
Python for Computer Vision - Revision
 
Anime Studio Pro 10 Tutorial as Part of Multimedia Course
Anime Studio Pro 10 Tutorial as Part of Multimedia CourseAnime Studio Pro 10 Tutorial as Part of Multimedia Course
Anime Studio Pro 10 Tutorial as Part of Multimedia Course
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNs
 
Operations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by ExampleOperations in Digital Image Processing + Convolution by Example
Operations in Digital Image Processing + Convolution by Example
 
MATLAB Code + Description : Real-Time Object Motion Detection and Tracking
MATLAB Code + Description : Real-Time Object Motion Detection and TrackingMATLAB Code + Description : Real-Time Object Motion Detection and Tracking
MATLAB Code + Description : Real-Time Object Motion Detection and Tracking
 
MATLAB Code + Description : Very Simple Automatic English Optical Character R...
MATLAB Code + Description : Very Simple Automatic English Optical Character R...MATLAB Code + Description : Very Simple Automatic English Optical Character R...
MATLAB Code + Description : Very Simple Automatic English Optical Character R...
 
Graduation Project - Face Login : A Robust Face Identification System for Sec...
Graduation Project - Face Login : A Robust Face Identification System for Sec...Graduation Project - Face Login : A Robust Face Identification System for Sec...
Graduation Project - Face Login : A Robust Face Identification System for Sec...
 
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
Introduction to MATrices LABoratory (MATLAB) as Part of Digital Signal Proces...
 

Recently uploaded

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 

Recently uploaded (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 

Backpropagation Explained: How ANNs Update Weights Step-by-Step

  • 1. Backpropagation: Understanding How to Update ANNs Weights Step-by-Step Ahmed Fawzy Gad ahmed.fawzy@ci.menofia.edu.eg MENOUFIA UNIVERSITY FACULTY OF COMPUTERS AND INFORMATION INFORMATION TECHNOLOGY ‫المنوفية‬ ‫جامعة‬ ‫والمعلومات‬ ‫الحاسبات‬ ‫كلية‬ ‫المعلومات‬ ‫تكنولوجيا‬ ‫المنوفية‬ ‫جامعة‬
  • 2. Train then Update • The backpropagation algorithm is used to update the NN weights when they are not able to make the correct predictions. Hence, we should train the NN before applying backpropagation. Initial Weights PredictionTraining
  • 3. Train then Update • The backpropagation algorithm is used to update the NN weights when they are not able to make the correct predictions. Hence, we should train the NN before applying backpropagation. Initial Weights PredictionTraining BackpropagationUpdate
  • 4. Neural Network Training Example 𝐗 𝟏 𝐗 𝟐 𝐎𝐮𝐭𝐩𝐮𝐭 𝟎. 𝟏 𝟎. 𝟑 𝟎. 𝟎𝟑 𝐖𝟏 𝐖𝟐 𝐛 𝟎. 𝟓 𝟎. 𝟓 1. 𝟖𝟑 Training Data Initial Weights 𝟎. 𝟏 In Out 𝑾 𝟏 = 𝟎. 𝟓 𝑾 𝟐 = 𝟎. 𝟐 +𝟏 𝒃 = 𝟏. 𝟖𝟑 𝟎. 𝟑 𝑿 𝟏 In Out 𝑾 𝟏 𝑾 𝟐 +𝟏 𝒃 𝑿 𝟐
  • 5. Network Training • Steps to train our network: 1. Prepare activation function input (sum of products between inputs and weights). 2. Activation function output. 𝟎. 𝟏 In Out 𝑾 𝟏 = 𝟎. 𝟓 𝑾 𝟐 = 𝟎. 𝟐 +𝟏 𝒃 = 𝟏. 𝟖𝟑 𝟎. 𝟑
  • 6. Network Training: Sum of Products • After calculating the sop between inputs and weights, next is to use this sop as the input to the activation function. 𝟎. 𝟏 In Out 𝑾 𝟏 = 𝟎. 𝟓 𝑾 𝟐 = 𝟎. 𝟐 +𝟏 𝒃 = 𝟏. 𝟖𝟑 𝟎. 𝟑 𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃 𝒔 = 𝟎. 𝟏 ∗ 𝟎. 𝟓 + 𝟎. 𝟑 ∗ 𝟎. 𝟐 + 𝟏. 𝟖𝟑 𝒔 = 𝟏. 𝟗𝟒
  • 7. Network Training: Activation Function • In this example, the sigmoid activation function is used. • Based on the sop calculated previously, the output is as follows: 𝟎. 𝟏 In Out 𝑾 𝟏 = 𝟎. 𝟓 𝑾 𝟐 = 𝟎. 𝟐 +𝟏 𝒃 = 𝟏. 𝟖𝟑 𝟎. 𝟑 𝒇 𝒔 = 𝟏 𝟏 + 𝒆−𝒔 𝒇 𝒔 = 𝟏 𝟏 + 𝒆−𝟏.𝟗𝟒 = 𝟏 𝟏 + 𝟎. 𝟏𝟒𝟒 = 𝟏 𝟏. 𝟏𝟒𝟒 𝒇 𝒔 = 𝟎. 𝟖𝟕𝟒
  • 8. Network Training: Prediction Error • After getting the predicted outputs, next is to measure the prediction error of the network. • We can use the squared error function defined as follows: • Based on the predicted output, the prediction error is: 𝟎. 𝟏 In Out 𝑾 𝟏 = 𝟎. 𝟓 𝑾 𝟐 = 𝟎. 𝟐 +𝟏 𝒃 = 𝟏. 𝟖𝟑 𝟎. 𝟑 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝑬 = 𝟏 𝟐 𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟕𝟒 𝟐 = 𝟏 𝟐 −𝟎. 𝟖𝟒𝟒 𝟐 = 𝟏 𝟐 𝟎. 𝟕𝟏𝟑 = 𝟎. 𝟑𝟓𝟕
  • 9. How to Minimize Prediction Error? • There is a prediction error and it should be minimized until reaching an acceptable error. What should we do in order to minimize the error? • There must be something to change in order to minimize the error. In our example, the only parameter to change is the weight. How to update the weights? • We can use the weights update equation: 𝑾 𝒏𝒆𝒘 = 𝑾 𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿
  • 10. Weights Update Equation • We can use the weights update equation:  𝑾 𝒏𝒆𝒘: new updated weights.  𝑾 𝒐𝒍𝒅: current weights. [1.83, 0.5, 0.2]  η: network learning rate. 0.01  𝒅: desired output. 0.03  𝒀: predicted output. 0.874  𝑿: current input at which the network made false prediction. [+1, 0.1, 0.3] 𝑾 𝒏𝒆𝒘 = 𝑾 𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿
  • 11. Weights Update Equation 𝑾 𝒏𝒆𝒘 = 𝑾 𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿 = [𝟏. 𝟖𝟑, 𝟎. 𝟓, 𝟎. 𝟐 + 𝟎. 𝟎𝟏 𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟕𝟒 [+𝟏, 𝟎. 𝟏, 𝟎. 𝟑 = [𝟏. 𝟖𝟑, 𝟎. 𝟓, 𝟎. 𝟐 + −𝟎. 𝟎𝟎𝟖𝟒[+𝟏, 𝟎. 𝟏, 𝟎. 𝟑 = [𝟏. 𝟖𝟑, 𝟎. 𝟓, 𝟎. 𝟐 + [−𝟎. 𝟎𝟎𝟖𝟒, −𝟎. 𝟎𝟎𝟎𝟖𝟒, −𝟎. 𝟎𝟎𝟐𝟓 = [𝟏. 𝟖𝟐𝟐, 𝟎. 𝟒𝟗𝟗, 𝟎. 𝟏𝟗𝟖
  • 12. Weights Update Equation • The new weights are: • Based on the new weights, the network will be re-trained. 𝑾 𝟏𝒏𝒆𝒘 𝑾 𝟐𝒏𝒆𝒘 𝒃 𝒏𝒆𝒘 𝟎. 𝟏𝟗𝟖 𝟎. 𝟒𝟗𝟗 𝟏. 𝟖𝟐𝟐 𝟎. 𝟏 In Out 𝑾 𝟏 = 𝟎. 𝟓 𝑾 𝟐 = 𝟎. 𝟐 +𝟏 𝒃 = 𝟏. 𝟖𝟑 𝟎. 𝟑
  • 13. Weights Update Equation • The new weights are: • Based on the new weights, the network will be re-trained. • Continue these operations until prediction error reaches an acceptable value. 1. Updating weights. 2. Retraining network. 3. Calculating prediction error. 𝑾 𝟏𝒏𝒆𝒘 𝑾 𝟐𝒏𝒆𝒘 𝒃 𝒏𝒆𝒘 𝟎. 𝟏𝟗𝟖 𝟎. 𝟒𝟗𝟗 𝟏. 𝟖𝟐𝟐 𝟎. 𝟏 In Out 𝑾 𝟏 = 𝟎. 𝟒𝟗𝟗 𝑾 𝟐 = 𝟎. 𝟏𝟗𝟖 +𝟏 𝒃 = 𝟏. 𝟖22 𝟎. 𝟑
  • 14. Why Backpropagation Algorithm is Important? • The backpropagation algorithm is used to answer these questions and understand effect of each weight over the prediction error. New Weights !Old Weights
  • 15. Forward Vs. Backward Passes • When training a neural network, there are two passes: forward and backward. • The goal of the backward pass is to know how each weight affects the total error. In other words, how changing the weights changes the prediction error? Forward Backward
  • 16. Backward Pass • Let us work with a simpler example: • How to answer this question: What is the effect on the output Y given a change in variable X? • This question is answered using derivatives. Derivative of Y wrt X ( 𝝏𝒀 𝝏𝑿 ) will tell us the effect of changing the variable X over the output Y. 𝒀 = 𝑿 𝟐 𝒁 + 𝑯
  • 17. Calculating Derivatives • The derivative 𝝏𝒀 𝝏𝑿 can be calculated as follows: • Based on these two derivative rules: • The result will be: 𝝏𝒀 𝛛𝑿 = 𝛛 𝛛𝑿 (𝑿 𝟐 𝒁 + 𝑯) 𝒀 = 𝑿 𝟐 𝒁 + 𝑯 𝛛 𝛛𝑿 𝑿 𝟐 = 𝟐𝑿Square 𝛛 𝛛𝑿 𝑪 = 𝟎Constant 𝝏𝒀 𝛛𝑿 = 𝟐𝑿𝒁 + 𝟎 = 𝟐𝑿𝒁
  • 18. Prediction Error – Weight Derivative E W? 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 Change in Y wrt X 𝝏𝒀 𝛛𝑿 Change in E wrt W 𝝏𝑬 𝛛𝑾
  • 19. Prediction Error – Weight Derivative 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
  • 20. Prediction Error – Weight Derivative 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
  • 21. Prediction Error – Weight Derivative 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕)
  • 22. Prediction Error – Weight Derivative 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 = 𝟏 𝟏 + 𝒆−𝒔
  • 23. Prediction Error – Weight Derivative 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝟏 𝟏 + 𝒆−𝒔 𝟐 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 = 𝟏 𝟏 + 𝒆−𝒔
  • 24. Prediction Error – Weight Derivative 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝟏 𝟏 + 𝒆−𝒔 𝟐 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 = 𝟏 𝟏 + 𝒆−𝒔
  • 25. Prediction Error – Weight Derivative 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝟏 𝟏 + 𝒆−𝒔 𝟐 𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 = 𝟏 𝟏 + 𝒆−𝒔
  • 26. Prediction Error – Weight Derivative 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝟏 𝟏 + 𝒆−𝒔 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 = 𝟏 𝟏 + 𝒆−𝒔 𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝟏 𝟏 + 𝒆−(𝑿1∗ 𝑾1+ 𝑿2∗𝑾2+𝒃) 𝟐
  • 27. Multivariate Chain Rule Predicted Output Prediction Error sop Weights 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒇 𝒙 = 𝟏 𝟏 + 𝒆−𝒔 𝒔 = 𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝑾 𝟏, 𝑾 𝟐 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝟏 𝟏 + 𝒆−(𝑿1∗ 𝑾1+ 𝑿2∗𝑾2+𝒃) 𝟐 𝝏𝑬 𝝏𝑾 = 𝝏 𝝏𝑾 ( 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝟏 𝟏 + 𝒆−(𝑿 𝟏∗ 𝑾 𝟏+ 𝑿 𝟐∗𝑾 𝟐+𝒃) 𝟐 ) Chain Rule
  • 28. Multivariate Chain Rule Predicted Output Prediction Error sop Weights 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒇 𝒙 = 𝟏 𝟏 + 𝒆−𝒔 𝒔 = 𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝑾 𝟏, 𝑾 𝟐 𝝏𝑬 𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝛛𝒔 𝝏𝒔 𝛛𝑾 𝟏 𝝏𝒔 𝛛𝑾 𝟐 𝝏𝑬 𝛛𝑾 𝟏 𝝏𝑬 𝛛𝑾 𝟐 Let’s calculate these individual partial derivatives. 𝝏𝑬 𝝏𝑾 𝟏 = 𝝏𝑬 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 ∗ 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝒔 ∗ 𝝏𝒔 𝝏𝑾 𝟏 𝝏𝑬 𝝏𝑾 𝟐 = 𝝏𝑬 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 ∗ 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝒔 ∗ 𝝏𝒔 𝝏𝑾 𝟐 𝝏𝑬 𝝏𝑾 𝟐 = 𝝏𝑬 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 ∗ 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝒔 ∗ 𝝏𝒔 𝝏𝑾 𝟐
  • 29. Error-Predicted ( 𝝏𝑬 𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 ) Partial Derivative Substitution 𝝏𝑬 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝝏 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 ( 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐) = 𝟐 ∗ 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐−𝟏 ∗ (𝟎 − 𝟏) )= (𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅) ∗ (−𝟏 = 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 𝝏𝑬 𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟖𝟕𝟒 − 𝟎. 𝟎𝟑 𝝏𝑬 𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝟎. 𝟖𝟒𝟒 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐
  • 30. Predicted-sop ( 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝒔 ) Partial Derivative 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝒔 = 𝝏 𝝏𝒔 ( 𝟏 𝟏 + 𝒆−𝒔 ) 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝒔 = 𝟏 𝟏 + 𝒆−𝒔 (𝟏 − 𝟏 𝟏 + 𝒆−𝒔 ) 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝒔 = 𝟏 𝟏 + 𝒆−𝒔 (𝟏 − 𝟏 𝟏 + 𝒆−𝒔 ) = 𝟏 𝟏 + 𝒆−𝟏.𝟗𝟒 (𝟏 − 𝟏 𝟏 + 𝒆−𝟏.𝟗𝟒 ) = 𝟏 𝟏 + 𝟎. 𝟏𝟒𝟒 (𝟏 − 𝟏 𝟏 + 𝟎. 𝟏𝟒𝟒 ) = 𝟏 𝟏. 𝟏𝟒𝟒 (𝟏 − 𝟏 𝟏. 𝟏𝟒𝟒 ) = 𝟎. 𝟖𝟕𝟒(𝟏 − 𝟎. 𝟖𝟕𝟒) = 𝟎. 𝟖𝟕𝟒(𝟎. 𝟏𝟐𝟔) 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝛛𝒔 = 𝟎. 𝟏𝟏 Substitution 𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐞𝐝 = 𝟏 𝟏 + 𝒆−𝒔
  • 31. Sop-𝑊1 ( 𝝏𝒔 𝛛𝑾 𝟏 ) Partial Derivative 𝝏𝒔 𝛛𝑾 𝟏 = 𝛛 𝛛𝑾 𝟏 (𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃) = 𝟏 ∗ 𝑿 𝟏 ∗ 𝑾 𝟏 𝟏−𝟏 + 𝟎 + 𝟎 = 𝑿 𝟏 ∗ 𝑾 𝟏 𝟎 )= 𝑿 𝟏(𝟏 𝝏𝒔 𝛛𝑾 𝟏 = 𝑿 𝟏 𝝏𝒔 𝛛𝑾 𝟏 = 𝑿 𝟏 Substitution 𝝏𝒔 𝛛𝑾 𝟏 = 𝟎. 𝟏 𝐬 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃
  • 32. 𝝏𝒔 𝛛𝑾 𝟐 = 𝛛 𝛛𝑾 𝟐 (𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃) = 𝟎 + 𝟏 ∗ 𝑿 𝟐 ∗ 𝑾 𝟐 𝟏−𝟏 + 𝟎 = 𝑿 𝟐 ∗ 𝑾 𝟐 𝟎 )= 𝑿 𝟐(𝟏 𝝏𝒔 𝛛𝑾 𝟐 = 𝑿 𝟐 𝝏𝒔 𝛛𝑾 𝟐 = 𝑿 𝟐 = 𝟎. 𝟑 Substitution 𝝏𝒔 𝛛𝑾 𝟐 = 𝟎. 𝟑 𝐬 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃 Sop-𝑊1 ( 𝝏𝒔 𝛛𝑾 𝟐 ) Partial Derivative
  • 33. Error-𝑊1 ( 𝛛𝑬 𝛛𝑾 𝟏 ) Partial Derivative • After calculating each individual derivative, we can multiply all of them to get the desired relationship between the prediction error and each weight. 𝝏𝑬 𝝏𝑾 𝟏 = 𝝏𝑬 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 ∗ 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝒔 ∗ 𝝏𝒔 𝝏𝑾 𝟏 𝝏𝑬 𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝟎. 𝟖𝟒𝟒 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝛛𝒔 = 𝟎. 𝟏𝟏 𝝏𝒔 𝛛𝑾 𝟏 = 𝟎. 𝟏 𝝏𝑬 𝛛𝑾 𝟏 = 𝟎. 𝟖𝟒𝟒 ∗ 𝟎. 𝟏𝟏 ∗ 𝟎. 𝟏 𝝏𝑬 𝛛𝑾 𝟏 = 𝟎. 𝟎𝟏 Calculated Derivatives
  • 34. Error-𝑊2 ( 𝛛𝑬 𝛛𝑾 𝟐 ) Partial Derivative 𝝏𝑬 𝝏𝑾 𝟐 = 𝝏𝑬 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 ∗ 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝝏𝒔 ∗ 𝝏𝒔 𝝏𝑾 𝟐 𝝏𝑬 𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝟎. 𝟖𝟒𝟒 𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝛛𝒔 = 𝟎. 𝟏𝟏 𝝏𝒔 𝛛𝑾 𝟐 = 𝟎. 𝟑 𝛛𝑬 𝛛𝑾 𝟐 = 𝟎. 𝟎𝟑 𝝏𝑬 𝛛𝑾 𝟐 = 𝟎. 𝟖𝟒𝟒 ∗ 𝟎. 𝟏𝟏 ∗ 𝟎. 𝟑 Calculated Derivatives
  • 35. Interpreting Derivatives • There are two useful pieces of information from the derivatives calculated previously. Increasing/decreasing weight increases/decreases error. Derivative MagnitudeDerivative Sign Positive Increasing/decreasing weight decreases/increases error. Negative Increasing/decreasing weight by P increases/decreases error by MAG*P. Increasing/decreasing weight by P decreases/increases error by MAG*P. Positive Sign Negative Sign In our example, because both 𝛛𝑬 𝛛𝑾 𝟏 and 𝛛𝑬 𝛛𝑾 𝟐 are positive, then we would like to decrease the weights in order to decrease the prediction error. 𝛛𝑬 𝛛𝑾 𝟐 = 𝟎. 𝟎𝟑 𝝏𝑬 𝛛𝑾 𝟏 = 𝟎. 𝟎𝟏
  • 36. Updating Weights • Each weight will be updated based on its derivative according to this equation: 𝑾𝒊𝒏𝒆𝒘 = 𝑾𝒊𝒐𝒍𝒅 − η ∗ 𝛛𝑬 𝛛𝑾𝒊 𝑾 𝟏𝒏𝒆𝒘 = 𝑾 𝟏 − η ∗ 𝛛𝑬 𝛛𝑾 𝟏 = 𝟎. 𝟓 − 0.01 ∗ 𝟎. 𝟎𝟏 𝑾 𝟏𝒏𝒆𝒘 = 𝟎. 𝟒𝟗𝟗𝟗𝟏 𝑾 𝟐𝒏𝒆𝒘 = 𝑾 𝟐 − η ∗ 𝛛𝑬 𝛛𝑾 𝟐 = 𝟎. 𝟐 − 0.01 ∗ 𝟎. 𝟎𝟐𝟖 𝑾 𝟐𝒏𝒆𝒘 = 𝟎. 𝟏𝟗𝟗𝟕 Updating 𝑾 𝟏 Updating 𝑾 𝟐 Continue updating weights according to derivatives and re-train the network until reaching an acceptable error.
  • 37. Second Example Backpropagation for NN with Hidden Layer
  • 38. ANN with Hidden Layer 𝑾 𝟏 𝑾 𝟐 𝑾 𝟑 𝑾 𝟒 𝑾 𝟓 𝑾 𝟔 𝒃 𝟏 𝒃 𝟐 𝒃 𝟑 𝟎. 𝟓 𝟎. 𝟏 𝟎. 𝟔𝟐 𝟎. 𝟐 −𝟎. 𝟐 𝟎. 𝟑 𝟎. 𝟒 −𝟎. 𝟏 𝟏. 𝟖𝟑 𝐗 𝟏 𝐗 𝟐 𝐎𝐮𝐭𝐩𝐮𝐭 𝟎. 𝟏 𝟎. 𝟑 𝟎. 𝟎𝟑 Training Data Initial Weights
  • 39. ANN with Hidden Layer Initial Weights PredictionTraining
  • 40. ANN with Hidden Layer Initial Weights PredictionTraining BackpropagationUpdate
  • 41. Forward Pass – Hidden Layer Neurons 𝒉 𝟏𝒊𝒏 = 𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝟏 = 𝟎. 𝟏 ∗ 𝟎. 𝟓 + 𝟎. 𝟑 ∗ 𝟎. 𝟏 + 𝟎. 𝟒 𝒉 𝟏𝒊𝒏 = 𝟎. 𝟒𝟖 𝒉 𝟏𝒐𝒖𝒕 = 𝟏 𝟏 + 𝒆−𝒉 𝟏𝒊𝒏 = 𝟏 𝟏 + 𝒆−𝟎.𝟒𝟖 𝒉 𝟏𝒐𝒖𝒕 = 𝟎. 𝟔𝟏𝟖 𝒉 𝟏 In Out
  • 42. Forward Pass – Hidden Layer Neurons 𝒉 𝟐𝒊𝒏 = 𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐 = 𝟎. 𝟏 ∗ 𝟎. 𝟔𝟐 + 𝟎. 𝟑 ∗ 𝟎. 𝟐 − 𝟎. 𝟏 𝒉 𝟐𝒊𝒏 = 𝟎. 𝟎𝟐𝟐 𝒉 𝟐𝒐𝒖𝒕 = 𝟏 𝟏 + 𝒆−𝒉 𝟐𝒊𝒏 = 𝟏 𝟏 + 𝒆−𝟎.𝟎𝟐𝟐 𝒉 𝟐𝒐𝒖𝒕 = 𝟎. 𝟓𝟎𝟔 𝒉 𝟐 In Out
  • 43. Forward Pass – Output Layer Neuron 𝒐𝒖𝒕𝒊𝒏 = 𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑 = 𝟎. 𝟔𝟏𝟖 ∗ −𝟎. 𝟐 + 𝟎. 𝟓𝟎𝟔 ∗ 𝟎. 𝟑 + 𝟏. 𝟖𝟑 𝒐𝒖𝒕𝒊𝒏 = 𝟏. 𝟖𝟓𝟖 𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟏 𝟏 + 𝒆−𝒐𝒖𝒕 𝒊𝒏 = 𝟏 𝟏 + 𝒆−𝟏.𝟖𝟓𝟖 𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟔𝟓 𝒐𝒖𝒕 In Out
  • 44. Forward Pass – Prediction Error 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 𝑬 = 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕 𝒐𝒖𝒕 𝟐 = 𝟏 𝟐 𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟔𝟓 𝟐 𝑬 = 𝟎. 𝟑𝟒𝟗 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟔𝟓 𝝏𝑬 𝝏𝑾 𝟏 , 𝝏𝑬 𝝏𝑾 𝟐 , 𝝏𝑬 𝝏𝑾 𝟑 , 𝝏𝑬 𝝏𝑾 𝟒 , 𝝏𝑬 𝝏𝑾 𝟓 , 𝝏𝑬 𝝏𝑾 𝟔
  • 46. E−𝑊5 ( 𝝏𝑬 𝝏𝑾 𝟓 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟓 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝑾 𝟓
  • 47. E−𝑊5 ( 𝝏𝑬 𝝏𝑾 𝟓 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟓 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝑾 𝟓 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝝏 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 ( 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕 𝒐𝒖𝒕 𝟐 ) = 𝟐 ∗ 𝟏 𝟐 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕 𝒐𝒖𝒕 𝟐−𝟏 ∗ (𝟎 − 𝟏) = 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ (−𝟏) 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝒐𝒖𝒕 𝒐𝒖𝒕 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝒐𝒖𝒕 𝒐𝒖𝒕 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟖𝟔𝟓 − 𝟎. 𝟎𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓 Partial Derivative Substitution
  • 48. E−𝑊5 ( 𝝏𝑬 𝝏𝑾 𝟓 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟓 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝑾 𝟓 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝝏 𝝏𝒐𝒖𝒕𝒊𝒏 ( 𝟏 𝟏 + 𝒆−𝒐𝒖𝒕 𝒊𝒏 ) 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = ( 𝟏 𝟏 + 𝒆−𝒐𝒖𝒕 𝒊𝒏 )(𝟏 − 𝟏 𝟏 + 𝒆−𝒐𝒖𝒕 𝒊𝒏 ) 𝜕𝒐𝒖𝒕 𝒐𝒖𝒕 𝜕𝒐𝒖𝒕𝒊𝒏 = ( 𝟏 𝟏 + 𝒆−𝟏.𝟖𝟓𝟖 )(𝟏 − 𝟏 𝟏 + 𝒆−𝟏.𝟖𝟓𝟖 ) = ( 𝟏 𝟏. 𝟓𝟔 )(𝟏 − 𝟏 𝟏. 𝟓𝟔 ) = 𝟎. 𝟔𝟒𝟏 𝟏 − 𝟎. 𝟔𝟒𝟏 = 𝟎. 𝟔𝟒𝟏 𝟎. 𝟑𝟓𝟗 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 Partial Derivative Substitution
  • 49. E−𝑊5 ( 𝝏𝑬 𝝏𝑾 𝟓 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟓 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝑾 𝟓 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟓 = 𝝏 𝝏𝑾 𝟓 (𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑) = 𝟏 ∗ 𝒉 𝟏𝒐𝒖𝒕 ∗ (𝑾 𝟓) 𝟏−𝟏 + 𝟎 + 𝟎 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟓 = 𝒉 𝟏𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟓 = 𝒉 𝟏𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟓 = 𝟎. 𝟔𝟏𝟖 Partial Derivative Substitution
  • 50. E−𝑊5 ( 𝝏𝑬 𝝏𝑾 𝟓 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟓 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝑾 𝟓 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟓 = 𝟎. 𝟔𝟏𝟖 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓 𝝏𝑬 𝝏𝑾 𝟓 = 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟔𝟏𝟖 𝝏𝑬 𝝏𝑾 𝟓 = 𝟎. 𝟏𝟏𝟗
  • 51. E−𝑊6 ( 𝝏𝑬 𝝏𝑾 𝟔 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟔 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝑾 𝟔
  • 52. E−𝑊6 ( 𝝏𝑬 𝝏𝑾 𝟔 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟔 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝑾 𝟔 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓
  • 53. E−𝑊6 ( 𝝏𝑬 𝝏𝑾 𝟔 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟓 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝑾 𝟔 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟔 = 𝝏 𝝏𝑾 𝟔 (𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑) = 𝟎 + 𝟏 ∗ 𝒉 𝟐𝒐𝒖𝒕 ∗ (𝑾 𝟔) 𝟏−𝟏 +𝟎 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟔 = 𝒉 𝟐𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟔 = 𝒉 𝟐𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟔 = 𝟎. 𝟓𝟎𝟔 Partial Derivative Substitution
  • 54. E−𝑊6 ( 𝝏𝑬 𝝏𝑾 𝟔 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟔 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝑾 𝟔 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝑾 𝟔 = 𝟎. 𝟓𝟎𝟔 𝝏𝑬 𝛛𝑾 𝟔 = 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟓𝟎𝟔 𝛛𝑬 𝛛𝑾 𝟔 = 𝟎. 𝟎𝟗𝟕
  • 55. E−𝑊1 ( 𝝏𝑬 𝝏𝑾 𝟏 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟏 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟏
  • 56. E−𝑊1 ( 𝝏𝑬 𝝏𝑾 𝟏 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟏 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟏 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓
  • 57. E−𝑊1 ( 𝝏𝑬 𝝏𝑾 𝟏 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟏 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟏 Partial Derivative Substitution 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟏 𝒐𝒖𝒕 = 𝝏 𝝏𝒉𝟏 𝒐𝒖𝒕 (𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑) = (𝒉 𝟏𝒐𝒖𝒕) 𝟏−𝟏 ∗ 𝑾 𝟓 + 𝟎 + 𝟎 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟏 𝒐𝒖𝒕 = 𝑾 𝟓 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟏 𝒐𝒖𝒕 = 𝑾 𝟓 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 = −𝟎. 𝟐
  • 58. E−𝑊1 ( 𝝏𝑬 𝝏𝑾 𝟏 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟏 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟏 Partial Derivative Substitution 𝝏𝒉𝟏 𝒐𝒖𝒕 𝝏𝒉𝟏𝒊𝒏 = 𝝏 𝝏𝒉 𝟏𝒊𝒏 ( 𝟏 𝟏 + 𝒆−𝒉 𝟏𝒊𝒏 ) 𝝏𝒉𝟏 𝒐𝒖𝒕 𝝏𝒉𝟏𝒊𝒏 = ( 𝟏 𝟏 + 𝒆−𝒉 𝟏𝒊𝒏 )(𝟏 − 𝟏 𝟏 + 𝒆−𝒉 𝟏𝒊𝒏 ) 𝝏𝒉𝟏 𝒐𝒖𝒕 𝝏𝒉𝟏𝒊𝒏 = ( 𝟏 𝟏 + 𝒆−𝒉 𝟏𝒊𝒏 )(𝟏 − 𝟏 𝟏 + 𝒆−𝒉 𝟏𝒊𝒏 ) = ( 𝟏 𝟏 + 𝒆−𝟎.𝟒𝟖 )(𝟏 − 𝟏 𝟏 + 𝒆−𝟎.𝟒𝟖 ) 𝝏𝒉 𝟐𝒐𝒖𝒕 𝝏𝒉 𝟐𝒊𝒏 = 𝟎. 𝟐𝟑𝟔
  • 59. E−𝑊1 ( 𝝏𝑬 𝝏𝑾 𝟏 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟏 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟏 Partial Derivative Substitution 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟏 = 𝝏 𝝏𝑾 𝟏 (𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝟏) = 𝑿 𝟏 ∗ (𝑾 𝟏) 𝟏−𝟏+ 𝟎 + 𝟎 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟏 = 𝑿 𝟏 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟏 = 𝑿 𝟏 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟏 = 𝟎. 𝟏
  • 60. E−𝑊1 ( 𝝏𝑬 𝝏𝑾 𝟏 ) Parial Derivative 𝝏𝑬 𝛛𝑾 𝟏 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟏 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟏 = 𝟎. 𝟏 𝝏𝒉 𝟐𝒐𝒖𝒕 𝝏𝒉 𝟐𝒊𝒏 = 𝟎. 𝟐𝟑𝟔 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 = −𝟎. 𝟐 𝝏𝑬 𝝏𝑾 𝟏 = 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ −𝟎. 𝟐 ∗ 𝟎. 𝟐𝟑𝟔 ∗ 𝟎. 𝟏 𝝏𝑬 𝝏𝑾 𝟏 = −𝟎. 𝟎𝟎𝟏
  • 61. E−𝑊2 ( 𝝏𝑬 𝝏𝑾 𝟐 ) Parial Derivative: 𝝏𝑬 𝛛𝑾 𝟐 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟐
  • 62. E−𝑊2 ( 𝝏𝑬 𝝏𝑾 𝟐 ) Parial Derivative: 𝝏𝑬 𝛛𝑾 𝟐 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟐 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓 𝝏𝒉 𝟐𝒐𝒖𝒕 𝝏𝒉 𝟐𝒊𝒏 = 𝟎. 𝟐𝟑𝟔 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 = −𝟎. 𝟐
  • 63. E−𝑊2 ( 𝝏𝑬 𝝏𝑾 𝟐 ) Parial Derivative: Partial Derivative Substitution 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟐 = 𝝏 𝝏𝑾 𝟐 (𝑿 𝟏 ∗ 𝑾 𝟏 + 𝑿 𝟐 ∗ 𝑾 𝟐 + 𝒃 𝟏) = 𝟎 + 𝑿 𝟐 ∗ (𝑾 𝟐) 𝟏−𝟏+𝟎 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟐 = 𝑿 𝟐 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟐 = 𝑿 𝟐 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟐 = 𝟎. 𝟑 𝝏𝑬 𝛛𝑾 𝟐 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟐
  • 64. E−𝑊2 ( 𝝏𝑬 𝝏𝑾 𝟐 ) Parial Derivative: 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓 𝝏𝒉 𝟐𝒐𝒖𝒕 𝝏𝒉 𝟐𝒊𝒏 = 𝟎. 𝟐𝟑𝟔 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 = −𝟎. 𝟐 𝝏𝒉𝟏𝒊𝒏 𝝏𝑾 𝟐 = 𝟎. 𝟑 𝝏𝑬 𝝏𝑾 𝟐 = 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ −𝟎. 𝟐 ∗ 𝟎. 𝟐𝟑𝟔 ∗ 𝟎. 𝟑 𝝏𝑬 𝝏𝑾 𝟐 = −. 𝟎𝟎𝟑 𝝏𝑬 𝛛𝑾 𝟐 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟏 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟏 𝒐𝒖𝒕 𝛛𝒉𝟏𝒊𝒏 ∗ 𝛛𝒉𝟏𝒊𝒏 𝛛𝑾 𝟐
  • 65. E−𝑊3 ( 𝝏𝑬 𝝏𝑾 𝟑 ) Parial Derivative: 𝝏𝑬 𝛛𝑾 𝟑 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟑
  • 66. E−𝑊3 ( 𝝏𝑬 𝝏𝑾 𝟑 ) Parial Derivative: 𝝏𝑬 𝛛𝑾 𝟑 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟑 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓
  • 67. E−𝑊3 ( 𝝏𝑬 𝝏𝑾 𝟑 ) Parial Derivative: 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟐 𝒐𝒖𝒕 = 𝝏 𝝏𝒉𝟐 𝒐𝒖𝒕 (𝒉 𝟏𝒐𝒖𝒕 ∗ 𝑾 𝟓 + 𝒉 𝟐𝒐𝒖𝒕 ∗ 𝑾 𝟔 + 𝒃 𝟑) = 𝟎 + (𝒉 𝟐𝒐𝒖𝒕) 𝟏−𝟏∗ 𝑾 𝟔 + 𝟎 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟐 𝒐𝒖𝒕 = 𝑾 𝟔 Partial Derivative Substitution 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟐 𝒐𝒖𝒕 = 𝑾 𝟔 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟐 𝒐𝒖𝒕 = 𝟎. 𝟑 𝝏𝑬 𝛛𝑾 𝟑 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟑
  • 68. E−𝑊3 ( 𝝏𝑬 𝝏𝑾 𝟑 ) Parial Derivative: 𝝏𝒉𝟐 𝒐𝒖𝒕 𝝏𝒉𝟐𝒊𝒏 = 𝝏 𝝏𝒉 𝟐𝒊𝒏 ( 𝟏 𝟏 + 𝒆−𝒉 𝟐𝒊𝒏 ) 𝝏𝒉𝟐 𝒐𝒖𝒕 𝝏𝒉𝟐𝒊𝒏 = ( 𝟏 𝟏 + 𝒆−𝒉 𝟐𝒊𝒏 )(𝟏 − 𝟏 𝟏 + 𝒆−𝒉 𝟐𝒊𝒏 ) Partial Derivative Substitution 𝝏𝒉𝟐 𝒐𝒖𝒕 𝝏𝒉𝟐𝒊𝒏 = ( 𝟏 𝟏 + 𝒆−𝒉 𝟐𝒊𝒏 )(𝟏 − 𝟏 𝟏 + 𝒆−𝒉 𝟐𝒊𝒏 ) = ( 𝟏 𝟏 + 𝒆−𝟎.𝟎𝟐𝟐 )(𝟏 − 𝟏 𝟏 + 𝒆−𝟎.𝟎𝟐𝟐 ) 𝝏𝒉 𝟐𝒐𝒖𝒕 𝝏𝒉 𝟐𝒊𝒏 = 𝟎. 𝟐𝟓 𝝏𝑬 𝛛𝑾 𝟑 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟑
  • 69. E−𝑊3 ( 𝝏𝑬 𝝏𝑾 𝟑 ) Parial Derivative: 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟑 = 𝝏 𝝏𝑾 𝟑 (𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐) = 𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐 = (𝑿 𝟏) 𝟏−𝟏∗ 𝑾 𝟑 + 𝟎 + 𝟎 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟑 = 𝑾 𝟑 Partial Derivative Substitution 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟑 = 𝑾 𝟑 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟑 = 𝟎. 𝟔𝟐 𝝏𝑬 𝛛𝑾 𝟑 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟑
  • 70. E−𝑊3 ( 𝝏𝑬 𝝏𝑾 𝟑 ) Parial Derivative: 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟐 𝒐𝒖𝒕 = 𝟎. 𝟑 𝝏𝒉 𝟐𝒐𝒖𝒕 𝝏𝒉 𝟐𝒊𝒏 = 𝟎. 𝟐𝟓 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟑 = 𝟎. 𝟔𝟐 𝝏𝑬 𝝏𝑾 𝟑 = 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟑 ∗ 𝟎. 𝟐𝟓 ∗ 𝟎. 𝟔𝟐 𝝏𝑬 𝝏𝑾 𝟑 = 𝟎. 𝟎𝟎𝟗 𝝏𝑬 𝛛𝑾 𝟑 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟑
  • 71. E−𝑊4 ( 𝝏𝑬 𝝏𝑾 𝟒 ) Parial Derivative: 𝝏𝑬 𝛛𝑾 𝟒 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟒
  • 72. E−𝑊4 ( 𝝏𝑬 𝝏𝑾 𝟒 ) Parial Derivative: 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟐 𝒐𝒖𝒕 = 𝟎. 𝟑 𝝏𝒉 𝟐𝒐𝒖𝒕 𝝏𝒉 𝟐𝒊𝒏 = 𝟎. 𝟐𝟓 𝝏𝑬 𝛛𝑾 𝟒 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟒
  • 73. E−𝑊4 ( 𝝏𝑬 𝝏𝑾 𝟒 ) Parial Derivative: 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟒 = 𝝏 𝝏𝑾 𝟒 (𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐) = 𝑿 𝟏 ∗ 𝑾 𝟑 + 𝑿 𝟐 ∗ 𝑾 𝟒 + 𝒃 𝟐 = 𝟎 + (𝑿 𝟐) 𝟏−𝟏∗ 𝑾 𝟒 + 𝟎 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟒 = 𝑾 𝟒 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟒 = 𝑾 𝟒 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟒 = 𝟎. 𝟐 Partial Derivative Substitution 𝝏𝑬 𝛛𝑾 𝟒 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟒
  • 74. E−𝑊4 ( 𝝏𝑬 𝝏𝑾 𝟒 ) Parial Derivative: 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 𝝏𝒐𝒖𝒕𝒊𝒏 = 𝟎. 𝟐𝟑 𝝏𝑬 𝝏𝒐𝒖𝒕 𝒐𝒖𝒕 = 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏 𝝏𝒉𝟐 𝒐𝒖𝒕 = 𝟎. 𝟑 𝝏𝒉 𝟐𝒐𝒖𝒕 𝝏𝒉 𝟐𝒊𝒏 = 𝟎. 𝟐𝟓 𝝏𝒉𝟐𝒊𝒏 𝝏𝑾 𝟒 = 𝟎. 𝟐 𝝏𝑬 𝝏𝑾 𝟒 = 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟑 ∗ 𝟎. 𝟐𝟓 ∗ 𝟎. 𝟐 𝝏𝑬 𝝏𝑾 𝟒 = 𝟎. 𝟎𝟎𝟑 𝝏𝑬 𝛛𝑾 𝟒 = 𝛛𝑬 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 ∗ 𝛛𝒐𝒖𝒕 𝒐𝒖𝒕 𝛛𝒐𝒖𝒕𝒊𝒏 ∗ 𝛛𝒐𝒖𝒕𝒊𝒏 𝛛𝒉𝟐 𝒐𝒖𝒕 ∗ 𝛛𝒉𝟐 𝒐𝒖𝒕 𝛛𝒉𝟐𝒊𝒏 ∗ 𝛛𝒉𝟐𝒊𝒏 𝛛𝑾 𝟒
  • 75. All Error-Weights Partial Derivatives 𝝏𝑬 𝝏𝑾 𝟒 = 𝟎. 𝟎𝟎𝟑 𝝏𝑬 𝝏𝑾 𝟑 = 𝟎. 𝟎𝟎𝟗 𝝏𝑬 𝝏𝑾 𝟐 = −. 𝟎𝟎𝟑 𝝏𝑬 𝝏𝑾 𝟏 = −𝟎. 𝟎𝟎𝟏 𝛛𝑬 𝛛𝑾 𝟔 = 𝟎. 𝟎𝟗𝟕 𝝏𝑬 𝝏𝑾 𝟓 = 𝟎. 𝟏𝟏𝟗
  • 76. Updated Weights 𝑾 𝟏𝒏𝒆𝒘 = 𝑾 𝟏 − η ∗ 𝝏𝑬 𝝏𝑾 𝟏 = 𝟎. 𝟓 − 𝟎. 𝟎𝟏 ∗ −𝟎. 𝟎𝟎𝟏 = 𝟎. 𝟓𝟎𝟎𝟎𝟏 𝑾 𝟐𝒏𝒆𝒘 = 𝑾 𝟐 − η ∗ 𝝏𝑬 𝝏𝑾 𝟐 = 𝟎. 𝟏 − 𝟎. 𝟎𝟏 ∗ −𝟎. 𝟎𝟎𝟑 = 𝟎. 𝟏𝟎𝟎𝟎𝟑 𝑾 𝟑𝒏𝒆𝒘 = 𝑾 𝟑 − η ∗ 𝝏𝑬 𝝏𝑾 𝟑 = 𝟎. 𝟔𝟐 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟎𝟎𝟗 = 𝟎. 𝟔𝟏𝟗𝟗𝟏 𝑾 𝟒𝒏𝒆𝒘 = 𝑾 𝟒 − η ∗ 𝝏𝑬 𝝏𝑾 𝟒 = 𝟎. 𝟐 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟎𝟎𝟑 = 𝟎. 𝟏𝟗𝟗𝟕 𝑾 𝟓𝒏𝒆𝒘 = 𝑾 𝟓 − η ∗ 𝝏𝑬 𝝏𝑾 𝟓 = −𝟎. 𝟐 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟔𝟏𝟖 = −𝟎. 𝟐𝟎𝟔𝟏𝟖 𝑾 𝟔𝒏𝒆𝒘 = 𝑾 𝟔 − η ∗ 𝝏𝑬 𝝏𝑾 𝟔 = 𝟎. 𝟑 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟎𝟗𝟕 = 𝟎. 𝟐𝟗𝟗𝟎𝟑 Continue updating weights according to derivatives and re-train the network until reaching an acceptable error.