2. INTRO
• # of layers L ≥ 2 (not counting the
input layer)
• Backward propagation of errors….. i.e
output nodes propagate backward to
inner/input nodes.
• Supervised learning method
• Allows quick convergence to
satisfactory error.
3.
4. • Initilization of Weights:
• Small random values are assigned
• Feed Forward:
• The input pattern is applied and the output calculated(x ->
z -> y)
• Back Propagation of Errors:
• Output=target???? NO???
• Error=Target- Output
• Distribute error back to all units in previous layer
• Updation of weights and biases
• Error is used to change the weights in such a way that the
error will get smaller. The process is repeated again and
again until the error is minimal.
5. • Bipolar Sigmoid function
f(x) = -1 + 2 / [1 + e-x]
• Output range of the function: [-1, 1].
Functions graph:
6. • First apply the inputs to the network and work out
the output – this initial output could be anything, as
the initial weights were random.
• Next work out the error for neuron B.
– ErrorB = OutputB (1-OutputB)(TargetB – OutputB)
• The “Output(1-Output)” term is necessary in the
equation because of the Sigmoid Function
7. • Change the weight.
W+AB = WAB + (ErrorB x OutputA)
Notice that it is the output of the connecting neuron (A) we use (notB).
We update all the weights in the output layer in this way.
• Calculate the Errors for the hidden layer neurons.
– Unlike the output layer we can’t calculate these directly (because we
don’t have a Target), so we Back Propagate them from the output
layer .
• Take the Errors from the output neurons and run them
back through the weights to get the hidden layer
errors.
• Neuron A is connected to B and C then we take the
errors from B and C to generate an error for A.
– ErrorA = Output A (1 - Output A)(ErrorB WAB + ErrorC
WAC)
8. • Having obtained the Error for the hidden layer
neurons now proceed to change the hidden
layer weights.
• By repeating this method we can train
network of any number of layers.
9. • Autoassociation of patterns (vectors) with
themselves using a small number of hidden nodes:
• training samples:: x:n (x has dimension n)
hidden nodes: m < n (A n-m-n net)
• If training is successful, applying any vector x on
input nodes will generate the same x on output
nodes
• Pattern z on hidden layer becomes a compressed
representation of x (with smaller dimension m < n)
• Application: reducing transmission cost
.
n nm
V W
mn
V
n
W
mx xz z
10. • Example: compressing character bitmaps.
–Each character is represented by a 7 by 9
pixel bitmap, or a binary vector of
dimension 63
–10 characters (A – J) are used in experiment
–Neurons in input/output layer=63
–Neurons in hidden=24