SlideShare a Scribd company logo
1 of 39
Download to read offline
Machine(s) Learning with Neural Networks


www.bgoncalves.com
github.com/bmtgoncalves/Neural-Networks
www.bgoncalves.com@bgoncalves
References
Machine Learning
Andrew Ng
Neural Networks for Machine Learning
Geoff Hinton
www.bgoncalves.com@bgoncalves
How the Brain “Works” (Cartoon version)
www.bgoncalves.com@bgoncalves
How the Brain “Works” (Cartoon version)
• Each neuron receives input from other neurons
• 1011 neurons, each with with 104 weights
• Weights can be positive or negative
• Weights adapt during the learning process
• “neurons that fire together wire together” (Hebb)
• Different areas perform different functions using same structure (Modularity)
www.bgoncalves.com@bgoncalves
How the Brain “Works” (Cartoon version)
Inputs Outputf(Inputs)
www.bgoncalves.com@bgoncalves
www.bgoncalves.com@bgoncalves
Supervised Learning - Classification
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
.
Sample N
Feature1
Feature3
Feature2
…
Label
FeatureM
• Dataset formatted as an NxM matrix of N samples and M
features
• Each sample belongs to a specific class or has a specific label.
• The goal of classification is to predict to which class a
previously unseen sample belongs to by learning defining
regularities of each class
• K-Nearest Neighbor
• Support Vector Machine
• Neural Networks
• Two fundamental types of problems:
• Classification (discrete output value)
• Regression (continuous output value)
www.bgoncalves.com@bgoncalves
Supervised Learning - Overfitting
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
.
Sample N
Feature1
Feature3
Feature2
…
Label
FeatureM
• “Learning the noise”
• “Memorization” instead of “generalization”
• How can we prevent it?
• Split dataset into two subsets: Training and Testing
• Train model using only the Training dataset and evaluate
results in the previously unseen Testing dataset.
• Different heuristics on how to split:
• Single split
• k-fold cross validation: split dataset in k parts, train in k-1
and evaluate in 1, repeat k times and average results.
TrainingTesting
www.bgoncalves.com@bgoncalves
Bias-Variance Tradeoff
www.bgoncalves.com@bgoncalves
Perceptron
x1
x2
x3
xN
w
1j
w2j
w3j
wN
j
zj
wT
x
aj
(z)
w
0j
1
Inputs Weights
Output
Activation
function
Bias
www.bgoncalves.com@bgoncalves
Activation Function
• Non-Linear function
• Differentiable
• non-decreasing
• Compute new sets of features
• Each layer builds up a more abstract
representation of the data
activation.py
http://github.com/bmtgoncalves/Neural-Networks
www.bgoncalves.com@bgoncalves
Activation Function - Sigmoid
(z) =
1
1 + e z
activation.py
• Non-Linear function
• Differentiable
• non-decreasing
• Compute new sets of features
• Each layer builds up a more abstract
representation of the data
• Perhaps the most common
http://github.com/bmtgoncalves/Neural-Networks
www.bgoncalves.com@bgoncalves
Activation Function - ReLu
activation.py
• Non-Linear function
• Differentiable
• non-decreasing
• Compute new sets of features
• Each layer builds up a more abstract
representation of the data
• Results in faster learning than with
sigmoid
http://github.com/bmtgoncalves/Neural-Networks
www.bgoncalves.com@bgoncalves
Activation Function - tanh
(z) =
ez
e z
ez + e z
activation.py
• Non-Linear function
• Differentiable
• non-decreasing
• Compute new sets of features
• Each layer builds up a more abstract
representation of the data
http://github.com/bmtgoncalves/Neural-Networks
www.bgoncalves.com@bgoncalves
Perceptron - Forward Propagation
• The output of a perceptron is determined by a
sequence of steps:
• obtain the inputs
• multiply the inputs by the respective weights
• calculate value of the activation function
• (map activation value to predicted output)
x1
x2
x3
xN
w
1j
w2j
w3j
wN
j
aj
w
0j
1
wT
x
aj = !T
x
@bgoncalves
Perceptron - Forward Propagation
import numpy as np
def forward(Theta, X, active):
N = X.shape[0]
# Add the bias column
X_ = np.concatenate((np.ones((N, 1)), X), 1)
# Multiply by the weights
z = np.dot(X_, Theta.T)
# Apply the activation function
a = active(z)
return a
if __name__ == "__main__":
Theta1 = np.load('input/Theta1.npy')
X = np.load('input/X_train.npy')[:10]
from activation import sigmoid
active_value = forward(Theta1, X, sigmoid)
forward.py
http://github.com/bmtgoncalves/Neural-Networks
www.bgoncalves.com@bgoncalves
Perceptron - Training
x1
x2
x3
xN
w
1j
w2j
w3j
wN
j
aj
w
0j
1
wT
x
• Training Procedure:
• If correct, do nothing
• If output incorrectly outputs 0, add input to weight
vector
• if output incorrectly outputs 1, subtract input to
weight vector
• Guaranteed to converge, if a correct set of weights
exists
• Given enough features, perceptrons can learn almost
anything
• Specific Features used limit what is possible to learn
www.bgoncalves.com@bgoncalves
Forward Propagation
• The output of a perceptron is determined by a sequence of steps:
• obtain the inputs
• multiply the inputs by the respective weights
• calculate output using the activation function
• To create a multi-layer perceptron, you can simply use the output of
one layer as the input to the next one.

• But how can we propagate back the errors and update the weights?
x1
x2
x3
xN
w
1j
w2j
w3j
wN
j
aj
w
0j
1
wT
x
1
w
0k
w
1k
w2k
w3k
wNk
ak
wT
a
a1
a2
aN
www.bgoncalves.com@bgoncalves
Backward Propagation of Errors (BackProp)
• BackProp operates in two phases:
• Forward propagate the inputs and calculate the deltas
• Update the weights
• The error at the output is a weighted average difference between predicted output and the
observed one.
• For inner layers there is no “real output”!
www.bgoncalves.com@bgoncalves
Chain-rule
• From the forward propagation described above, we know that the
output of a neuron is:
• But how can we calculate how to modify the weights ?
• We take the derivative of the error with respect to the weights!

• Using the chain rule:

• And finally we can update each weight in the previous layer:

• where is the learning rate
yj = wT
x
@E
@wij
=
@E
@yj
@yj
@wij
wij
@E
@wij
wij wij ↵
@E
@wij
yj
↵
www.bgoncalves.com@bgoncalves
The Cross Entropy is complementary to sigmoid
activation in the output layer and improves its stability.
Loss Functions
• For learning to occur, we must quantify how far off we are from the desired output. There are
two common ways of doing this:
• Quadratic error function:
• Cross Entropy
E =
1
N
X
n
|yn an|
2
J =
1
N
X
n
h
yT
n log an + (1 yn)
T
log (1 an)
i
www.bgoncalves.com@bgoncalves
A practical example - MNIST
8
2
visualize_digits.py

convert_input.py
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
.
Sample N
Feature1
Feature3
Feature2
…
Label
FeatureM
http://github.com/bmtgoncalves/Neural-Networks
http://yann.lecun.com/exdb/mnist/
www.bgoncalves.com@bgoncalves
A practical example - MNIST
Sample 1
Sample 2
Sample 3
Sample 4
Sample 5
Sample 6
.
Sample N
Feature1
Feature3
Feature2
…
Label
FeatureM
visualize_digits.py

convert_input.py
X y
3 layers:
1 input layer
1 hidden layer
1 output layer
X ⇥1 ⇥2
argmax
http://github.com/bmtgoncalves/Neural-Networks
http://yann.lecun.com/exdb/mnist/
www.bgoncalves.com@bgoncalves
A practical example - MNIST
784 50 10 1
5000examples
10 ⇥ 5150 ⇥ 785
nn.py
forward.py
X ⇥1 ⇥2
argmax
http://github.com/bmtgoncalves/Neural-Networks
www.bgoncalves.com@bgoncalves
A practical example - MNIST
784 50 10 1
5000examples
10 ⇥ 5150 ⇥ 785
nn.py
forward.py
X ⇥1 ⇥2
argmax
http://github.com/bmtgoncalves/Neural-Networks
www.bgoncalves.com@bgoncalves
A practical example - MNIST
784 50 10 1
5000examples
10 ⇥ 5150 ⇥ 785
def predict(Theta1, Theta2, X):
h1 = forward(Theta1, X, sigmoid)
h2 = forward(Theta2, h1, sigmoid)
return np.argmax(h2, 1)
nn.py
forward.py
Forward Propagation
X ⇥1 ⇥2
argmax
http://github.com/bmtgoncalves/Neural-Networks
def forward(Theta, X, active):
N = X.shape[0]
# Add the bias column
X_ = np.concatenate((np.ones((N, 1)), X), 1)
# Multiply by the weights
z = np.dot(X_, Theta.T)
# Apply the activation function
a = active(z)
return a
www.bgoncalves.com@bgoncalves
A practical example - MNIST
784 50 10 1
5000examples
10 ⇥ 5150 ⇥ 785
def predict(Theta1, Theta2, X):
h1 = forward(Theta1, X, sigmoid)
h2 = forward(Theta2, h1, sigmoid)
return np.argmax(h2, 1)
nn.py
forward.py
Forward Propagation
X ⇥1 ⇥2
argmax
http://github.com/bmtgoncalves/Neural-Networks
def forward(Theta, X, active):
N = X.shape[0]
# Add the bias column
X_ = np.concatenate((np.ones((N, 1)), X), 1)
# Multiply by the weights
z = np.dot(X_, Theta.T)
# Apply the activation function
a = active(z)
return a
www.bgoncalves.com@bgoncalves
A practical example - MNIST
784 50 10 1
5000examples
10 ⇥ 5150 ⇥ 785
def predict(Theta1, Theta2, X):
h1 = forward(Theta1, X, sigmoid)
h2 = forward(Theta2, h1, sigmoid)
return np.argmax(h2, 1)
nn.py
forward.py
Forward Propagation
X ⇥1 ⇥2
argmax
http://github.com/bmtgoncalves/Neural-Networks
def forward(Theta, X, active):
N = X.shape[0]
# Add the bias column
X_ = np.concatenate((np.ones((N, 1)), X), 1)
# Multiply by the weights
z = np.dot(X_, Theta.T)
# Apply the activation function
a = active(z)
return a
www.bgoncalves.com@bgoncalves
A practical example - MNIST
784 50 10 1
5000examples
10 ⇥ 5150 ⇥ 785
nn.py
forward.py
50 ⇥ 784 10 ⇥ 50
Forward Propagation
Backward Propagation
Vectors
Matrices
Matrices
X ⇥1 ⇥2
argmax
http://github.com/bmtgoncalves/Neural-Networks
@bgoncalves
Backprop
def backprop(Theta1, Theta2, X, y):
N = X.shape[0]
K = Theta2.shape[0]
J = 0
Delta2 = np.zeros(Theta2.shape)
Delta1 = np.zeros(Theta1.shape)
for i in range(N): # Forward propagation, saving intermediate results
a1 = np.concatenate(([1], X[i])) # Input layer
z2 = np.dot(Theta1, a1)
a2 = np.concatenate(([1], sigmoid(z2))) # Hidden Layer
z3 = np.dot(Theta2, a2)
a3 = sigmoid(z3) # Output layer
y0 = one_hot(K, y[i])
# Cross entropy
J -= np.dot(y0.T, np.log(a3))+np.dot((1-y0).T, np.log(1-a3))
# Calculate the weight deltas
delta_3 = a3-y0
delta_2 = np.dot(Theta2.T, delta_3)[1:]*sigmoidGradient(z2)
Delta2 += np.outer(delta_3, a2)
Delta1 += np.outer(delta_2, a1)
J /= N
Theta1_grad = Delta1/N
Theta2_grad = Delta2/N
return [J, Theta1_grad, Theta2_grad] nn.py
http://github.com/bmtgoncalves/Neural-Networks
@bgoncalves
Training
Theta1 = init_weights(input_layer_size, hidden_layer_size)
Theta2 = init_weights(hidden_layer_size, num_labels)
iter = 0
tol = 1e-3
J_old = 1/tol
diff = 1
while diff > tol:
J_train, Theta1_grad, Theta2_grad = backprop(Theta1, Theta2, X_train, y_train)
diff = abs(J_old-J_train)
J_old = J_train
Theta1 -= .5*Theta1_grad
Theta2 -= .5*Theta2_grad
nn.py
http://github.com/bmtgoncalves/Neural-Networks
www.bgoncalves.com@bgoncalves
Bias-Variance Tradeoff
Loss/Bias
www.bgoncalves.com@bgoncalves
Bias-Variance Tradeoff
Loss/Bias
High Bias
Low Variance
Low Bias
High Variance
www.bgoncalves.com@bgoncalves
Bias-Variance Tradeoff
Loss/Bias
High Bias
Low Variance
Low Bias
High Variance
Variance
www.bgoncalves.com@bgoncalves
Learning Rate
Epoch
High Learning Rate
Very High Learning Rate
Loss
Low Learning Rate
Best Learning Rate
wij wij ↵
@E
@wij
↵ defines size of step in direction of
@E
@!ij
www.bgoncalves.com@bgoncalves
Tips
• online learning - update weights after each case

- might be useful to update model as new data is obtained

- subject to fluctuations
• mini-batch - update weights after a “small” number of cases

- batches should be balanced

- if dataset is redundant, the gradient estimated using only a fraction of the data 

is a good approximation to the full gradient.
• momentum - let gradient change the velocity of weight change instead of the value directly
• rmsprop - divide learning rate for each weight by a running average of “recent” gradients
• learning rate - vary over the course of the training procedure and use different learning rates

for each weight
www.bgoncalves.com@bgoncalves
Neural Network Architectures
www.bgoncalves.com@bgoncalves
Interpretability
www.bgoncalves.com@bgoncalves
“Deep” learning

More Related Content

What's hot

[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codesNAVER D2
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
 
Introduction to complex networks
Introduction to complex networksIntroduction to complex networks
Introduction to complex networksVincent Traag
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterMark Chang
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache sparkEmiliano Martinez Sanchez
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Kyunghoon Kim
 
Study material ip class 12th
Study material ip class 12thStudy material ip class 12th
Study material ip class 12thanimesh dwivedi
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템NAVER D2
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceHansol Kang
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying ObjectsDavid Evans
 
Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...Ilya Kuzovkin
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)Hansol Kang
 
Bloom filter
Bloom filterBloom filter
Bloom filterwang ping
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruitymrphilroth
 

What's hot (20)

[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
 
Introduction to complex networks
Introduction to complex networksIntroduction to complex networks
Introduction to complex networks
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache spark
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1
 
Study material ip class 12th
Study material ip class 12thStudy material ip class 12th
Study material ip class 12th
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 
[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템[232]mist 고성능 iot 스트림 처리 시스템
[232]mist 고성능 iot 스트림 처리 시스템
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Class 26: Objectifying Objects
Class 26: Objectifying ObjectsClass 26: Objectifying Objects
Class 26: Objectifying Objects
 
Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...Mastering the game of Go with deep neural networks and tree search (article o...
Mastering the game of Go with deep neural networks and tree search (article o...
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
Bloom filter
Bloom filterBloom filter
Bloom filter
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 

Viewers also liked

Making Sense of Data Big and Small
Making Sense of Data Big and SmallMaking Sense of Data Big and Small
Making Sense of Data Big and SmallBruno Gonçalves
 
Complenet 2017
Complenet 2017Complenet 2017
Complenet 2017tnoulas
 
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing EconomyMining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing Economytnoulas
 
Human Mobility (with Mobile Devices)
Human Mobility (with Mobile Devices)Human Mobility (with Mobile Devices)
Human Mobility (with Mobile Devices)Bruno Gonçalves
 
Twitterology - The Science of Twitter
Twitterology - The Science of TwitterTwitterology - The Science of Twitter
Twitterology - The Science of TwitterBruno Gonçalves
 

Viewers also liked (7)

Making Sense of Data Big and Small
Making Sense of Data Big and SmallMaking Sense of Data Big and Small
Making Sense of Data Big and Small
 
Mining Georeferenced Data
Mining Georeferenced DataMining Georeferenced Data
Mining Georeferenced Data
 
Complenet 2017
Complenet 2017Complenet 2017
Complenet 2017
 
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing EconomyMining Geo-referenced Data: Location-based Services and the Sharing Economy
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
 
Word2vec and Friends
Word2vec and FriendsWord2vec and Friends
Word2vec and Friends
 
Human Mobility (with Mobile Devices)
Human Mobility (with Mobile Devices)Human Mobility (with Mobile Devices)
Human Mobility (with Mobile Devices)
 
Twitterology - The Science of Twitter
Twitterology - The Science of TwitterTwitterology - The Science of Twitter
Twitterology - The Science of Twitter
 

Similar to Machine(s) Learning with Neural Networks

Artificial Neural Network.pptx
Artificial Neural Network.pptxArtificial Neural Network.pptx
Artificial Neural Network.pptxshashankbhadouria4
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowNicholas McClure
 
19 - Neural Networks I.pptx
19 - Neural Networks I.pptx19 - Neural Networks I.pptx
19 - Neural Networks I.pptxEmanAl15
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksarjitkantgupta
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Alex Conway
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksWoodridge Software
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries AnalysisBruno Gonçalves
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14Sri Ambati
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMayuraD1
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphsRevanth Kumar
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural NetworksBhaskar Mitra
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkmustafa aadel
 

Similar to Machine(s) Learning with Neural Networks (20)

Artificial Neural Network.pptx
Artificial Neural Network.pptxArtificial Neural Network.pptx
Artificial Neural Network.pptx
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
19 - Neural Networks I.pptx
19 - Neural Networks I.pptx19 - Neural Networks I.pptx
19 - Neural Networks I.pptx
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017
 
Neural network
Neural networkNeural network
Neural network
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries Analysis
 
Deep learning
Deep learningDeep learning
Deep learning
 
H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14H2O Open Source Deep Learning, Arno Candel 03-20-14
H2O Open Source Deep Learning, Arno Candel 03-20-14
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
Deep neural networks & computational graphs
Deep neural networks & computational graphsDeep neural networks & computational graphs
Deep neural networks & computational graphs
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 

Recently uploaded

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Machine(s) Learning with Neural Networks

  • 1. Machine(s) Learning with Neural Networks 
 www.bgoncalves.com github.com/bmtgoncalves/Neural-Networks
  • 3. www.bgoncalves.com@bgoncalves How the Brain “Works” (Cartoon version)
  • 4. www.bgoncalves.com@bgoncalves How the Brain “Works” (Cartoon version) • Each neuron receives input from other neurons • 1011 neurons, each with with 104 weights • Weights can be positive or negative • Weights adapt during the learning process • “neurons that fire together wire together” (Hebb) • Different areas perform different functions using same structure (Modularity)
  • 5. www.bgoncalves.com@bgoncalves How the Brain “Works” (Cartoon version) Inputs Outputf(Inputs)
  • 7. www.bgoncalves.com@bgoncalves Supervised Learning - Classification Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 . Sample N Feature1 Feature3 Feature2 … Label FeatureM • Dataset formatted as an NxM matrix of N samples and M features • Each sample belongs to a specific class or has a specific label. • The goal of classification is to predict to which class a previously unseen sample belongs to by learning defining regularities of each class • K-Nearest Neighbor • Support Vector Machine • Neural Networks • Two fundamental types of problems: • Classification (discrete output value) • Regression (continuous output value)
  • 8. www.bgoncalves.com@bgoncalves Supervised Learning - Overfitting Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 . Sample N Feature1 Feature3 Feature2 … Label FeatureM • “Learning the noise” • “Memorization” instead of “generalization” • How can we prevent it? • Split dataset into two subsets: Training and Testing • Train model using only the Training dataset and evaluate results in the previously unseen Testing dataset. • Different heuristics on how to split: • Single split • k-fold cross validation: split dataset in k parts, train in k-1 and evaluate in 1, repeat k times and average results. TrainingTesting
  • 11. www.bgoncalves.com@bgoncalves Activation Function • Non-Linear function • Differentiable • non-decreasing • Compute new sets of features • Each layer builds up a more abstract representation of the data activation.py http://github.com/bmtgoncalves/Neural-Networks
  • 12. www.bgoncalves.com@bgoncalves Activation Function - Sigmoid (z) = 1 1 + e z activation.py • Non-Linear function • Differentiable • non-decreasing • Compute new sets of features • Each layer builds up a more abstract representation of the data • Perhaps the most common http://github.com/bmtgoncalves/Neural-Networks
  • 13. www.bgoncalves.com@bgoncalves Activation Function - ReLu activation.py • Non-Linear function • Differentiable • non-decreasing • Compute new sets of features • Each layer builds up a more abstract representation of the data • Results in faster learning than with sigmoid http://github.com/bmtgoncalves/Neural-Networks
  • 14. www.bgoncalves.com@bgoncalves Activation Function - tanh (z) = ez e z ez + e z activation.py • Non-Linear function • Differentiable • non-decreasing • Compute new sets of features • Each layer builds up a more abstract representation of the data http://github.com/bmtgoncalves/Neural-Networks
  • 15. www.bgoncalves.com@bgoncalves Perceptron - Forward Propagation • The output of a perceptron is determined by a sequence of steps: • obtain the inputs • multiply the inputs by the respective weights • calculate value of the activation function • (map activation value to predicted output) x1 x2 x3 xN w 1j w2j w3j wN j aj w 0j 1 wT x aj = !T x
  • 16. @bgoncalves Perceptron - Forward Propagation import numpy as np def forward(Theta, X, active): N = X.shape[0] # Add the bias column X_ = np.concatenate((np.ones((N, 1)), X), 1) # Multiply by the weights z = np.dot(X_, Theta.T) # Apply the activation function a = active(z) return a if __name__ == "__main__": Theta1 = np.load('input/Theta1.npy') X = np.load('input/X_train.npy')[:10] from activation import sigmoid active_value = forward(Theta1, X, sigmoid) forward.py http://github.com/bmtgoncalves/Neural-Networks
  • 17. www.bgoncalves.com@bgoncalves Perceptron - Training x1 x2 x3 xN w 1j w2j w3j wN j aj w 0j 1 wT x • Training Procedure: • If correct, do nothing • If output incorrectly outputs 0, add input to weight vector • if output incorrectly outputs 1, subtract input to weight vector • Guaranteed to converge, if a correct set of weights exists • Given enough features, perceptrons can learn almost anything • Specific Features used limit what is possible to learn
  • 18. www.bgoncalves.com@bgoncalves Forward Propagation • The output of a perceptron is determined by a sequence of steps: • obtain the inputs • multiply the inputs by the respective weights • calculate output using the activation function • To create a multi-layer perceptron, you can simply use the output of one layer as the input to the next one.
 • But how can we propagate back the errors and update the weights? x1 x2 x3 xN w 1j w2j w3j wN j aj w 0j 1 wT x 1 w 0k w 1k w2k w3k wNk ak wT a a1 a2 aN
  • 19. www.bgoncalves.com@bgoncalves Backward Propagation of Errors (BackProp) • BackProp operates in two phases: • Forward propagate the inputs and calculate the deltas • Update the weights • The error at the output is a weighted average difference between predicted output and the observed one. • For inner layers there is no “real output”!
  • 20. www.bgoncalves.com@bgoncalves Chain-rule • From the forward propagation described above, we know that the output of a neuron is: • But how can we calculate how to modify the weights ? • We take the derivative of the error with respect to the weights!
 • Using the chain rule:
 • And finally we can update each weight in the previous layer:
 • where is the learning rate yj = wT x @E @wij = @E @yj @yj @wij wij @E @wij wij wij ↵ @E @wij yj ↵
  • 21. www.bgoncalves.com@bgoncalves The Cross Entropy is complementary to sigmoid activation in the output layer and improves its stability. Loss Functions • For learning to occur, we must quantify how far off we are from the desired output. There are two common ways of doing this: • Quadratic error function: • Cross Entropy E = 1 N X n |yn an| 2 J = 1 N X n h yT n log an + (1 yn) T log (1 an) i
  • 22. www.bgoncalves.com@bgoncalves A practical example - MNIST 8 2 visualize_digits.py
 convert_input.py Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 . Sample N Feature1 Feature3 Feature2 … Label FeatureM http://github.com/bmtgoncalves/Neural-Networks http://yann.lecun.com/exdb/mnist/
  • 23. www.bgoncalves.com@bgoncalves A practical example - MNIST Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 . Sample N Feature1 Feature3 Feature2 … Label FeatureM visualize_digits.py
 convert_input.py X y 3 layers: 1 input layer 1 hidden layer 1 output layer X ⇥1 ⇥2 argmax http://github.com/bmtgoncalves/Neural-Networks http://yann.lecun.com/exdb/mnist/
  • 24. www.bgoncalves.com@bgoncalves A practical example - MNIST 784 50 10 1 5000examples 10 ⇥ 5150 ⇥ 785 nn.py forward.py X ⇥1 ⇥2 argmax http://github.com/bmtgoncalves/Neural-Networks
  • 25. www.bgoncalves.com@bgoncalves A practical example - MNIST 784 50 10 1 5000examples 10 ⇥ 5150 ⇥ 785 nn.py forward.py X ⇥1 ⇥2 argmax http://github.com/bmtgoncalves/Neural-Networks
  • 26. www.bgoncalves.com@bgoncalves A practical example - MNIST 784 50 10 1 5000examples 10 ⇥ 5150 ⇥ 785 def predict(Theta1, Theta2, X): h1 = forward(Theta1, X, sigmoid) h2 = forward(Theta2, h1, sigmoid) return np.argmax(h2, 1) nn.py forward.py Forward Propagation X ⇥1 ⇥2 argmax http://github.com/bmtgoncalves/Neural-Networks def forward(Theta, X, active): N = X.shape[0] # Add the bias column X_ = np.concatenate((np.ones((N, 1)), X), 1) # Multiply by the weights z = np.dot(X_, Theta.T) # Apply the activation function a = active(z) return a
  • 27. www.bgoncalves.com@bgoncalves A practical example - MNIST 784 50 10 1 5000examples 10 ⇥ 5150 ⇥ 785 def predict(Theta1, Theta2, X): h1 = forward(Theta1, X, sigmoid) h2 = forward(Theta2, h1, sigmoid) return np.argmax(h2, 1) nn.py forward.py Forward Propagation X ⇥1 ⇥2 argmax http://github.com/bmtgoncalves/Neural-Networks def forward(Theta, X, active): N = X.shape[0] # Add the bias column X_ = np.concatenate((np.ones((N, 1)), X), 1) # Multiply by the weights z = np.dot(X_, Theta.T) # Apply the activation function a = active(z) return a
  • 28. www.bgoncalves.com@bgoncalves A practical example - MNIST 784 50 10 1 5000examples 10 ⇥ 5150 ⇥ 785 def predict(Theta1, Theta2, X): h1 = forward(Theta1, X, sigmoid) h2 = forward(Theta2, h1, sigmoid) return np.argmax(h2, 1) nn.py forward.py Forward Propagation X ⇥1 ⇥2 argmax http://github.com/bmtgoncalves/Neural-Networks def forward(Theta, X, active): N = X.shape[0] # Add the bias column X_ = np.concatenate((np.ones((N, 1)), X), 1) # Multiply by the weights z = np.dot(X_, Theta.T) # Apply the activation function a = active(z) return a
  • 29. www.bgoncalves.com@bgoncalves A practical example - MNIST 784 50 10 1 5000examples 10 ⇥ 5150 ⇥ 785 nn.py forward.py 50 ⇥ 784 10 ⇥ 50 Forward Propagation Backward Propagation Vectors Matrices Matrices X ⇥1 ⇥2 argmax http://github.com/bmtgoncalves/Neural-Networks
  • 30. @bgoncalves Backprop def backprop(Theta1, Theta2, X, y): N = X.shape[0] K = Theta2.shape[0] J = 0 Delta2 = np.zeros(Theta2.shape) Delta1 = np.zeros(Theta1.shape) for i in range(N): # Forward propagation, saving intermediate results a1 = np.concatenate(([1], X[i])) # Input layer z2 = np.dot(Theta1, a1) a2 = np.concatenate(([1], sigmoid(z2))) # Hidden Layer z3 = np.dot(Theta2, a2) a3 = sigmoid(z3) # Output layer y0 = one_hot(K, y[i]) # Cross entropy J -= np.dot(y0.T, np.log(a3))+np.dot((1-y0).T, np.log(1-a3)) # Calculate the weight deltas delta_3 = a3-y0 delta_2 = np.dot(Theta2.T, delta_3)[1:]*sigmoidGradient(z2) Delta2 += np.outer(delta_3, a2) Delta1 += np.outer(delta_2, a1) J /= N Theta1_grad = Delta1/N Theta2_grad = Delta2/N return [J, Theta1_grad, Theta2_grad] nn.py http://github.com/bmtgoncalves/Neural-Networks
  • 31. @bgoncalves Training Theta1 = init_weights(input_layer_size, hidden_layer_size) Theta2 = init_weights(hidden_layer_size, num_labels) iter = 0 tol = 1e-3 J_old = 1/tol diff = 1 while diff > tol: J_train, Theta1_grad, Theta2_grad = backprop(Theta1, Theta2, X_train, y_train) diff = abs(J_old-J_train) J_old = J_train Theta1 -= .5*Theta1_grad Theta2 -= .5*Theta2_grad nn.py http://github.com/bmtgoncalves/Neural-Networks
  • 35. www.bgoncalves.com@bgoncalves Learning Rate Epoch High Learning Rate Very High Learning Rate Loss Low Learning Rate Best Learning Rate wij wij ↵ @E @wij ↵ defines size of step in direction of @E @!ij
  • 36. www.bgoncalves.com@bgoncalves Tips • online learning - update weights after each case
 - might be useful to update model as new data is obtained
 - subject to fluctuations • mini-batch - update weights after a “small” number of cases
 - batches should be balanced
 - if dataset is redundant, the gradient estimated using only a fraction of the data 
 is a good approximation to the full gradient. • momentum - let gradient change the velocity of weight change instead of the value directly • rmsprop - divide learning rate for each weight by a running average of “recent” gradients • learning rate - vary over the course of the training procedure and use different learning rates
 for each weight