Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!
In this lecture, we will get an introduction to Autoencoders and Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano. This will be a preview of the QuantUniversity Deep Learning Workshop that will be offered in 2017.
1. Location:
QuantUniversity Meetup
January 19th 2017
Boston MA
Deep Learning : An introduction
Part II
2016 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
www.QuantUniversity.com
sri@quantuniversity.com
2. 2
Slides and Code will be available at:
http://www.analyticscertificate.com/DeepLearning
3. - Analytics Advisory services
- Custom training programs
- Architecture assessments, advice and audits
4. • Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
4
5. 5
Quantitative Analytics and Big Data Analytics Onboarding
• Trained more than 500 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Launched the Analytics Certificate
Program in September
▫ New Cohort in March 2017
• Coming soon: Deep Learning and
Cognitive computing Certificate!
6. 6
• February 2017
▫ Apache Spark Lecture – Feb 3rd
▫ Deep Learning Workshop – Boston – March 27-28
▫ Anomaly Detection Workshop – Boston – April 24-25
• March 2017
▫ Deep Learning Workshop – New York (Date TBD)
Events of Interest
9. 9
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering, etc.
▫ Create a transformed representation of the original data=> PCA
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
10. 10
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y
13. 13
• Goal is to have 𝑥 to approximate x
• Interesting applications such as
▫ Data compression
▫ Visualization
▫ Pre-train neural networks
Autoencoder
14. 14
Demo in Keras1
1. https://blog.keras.io/building-autoencoders-in-keras.html
2. https://keras.io/models/model/
15. 15
• Pretraining step: Train a sequence of shallow autoencoders, greedily
one layer at a time, using unsupervised data.
• Fine-tuning step 1: train the last layer using supervised data
• Fine-tuning step 2: use backpropagation to fine-tune the entire
network using supervised data
Autoencoders1
1. http://ai.stanford.edu/~quocle/tutorial2.pdf
20. 20
• Keras is a high-level neural networks library, written in Python and
capable of running on top of either TensorFlow or Theano. It was
developed with a focus on enabling fast experimentation.
• Allows for easy and fast prototyping (through total modularity,
minimalism, and extensibility).
• Supports both convolutional networks and recurrent networks, as
well as combinations of the two.
• Supports arbitrary connectivity schemes (including multi-input and
multi-output training).
• Runs seamlessly on CPU and GPU.
Keras
21. 21
• Use 72 for training and 36 for testing
• Lookback 1, 10
• Longer the lookback, larger the network
Multi-Layer Perceptron
Size 8
Size 1
23. 23
• Has 3 types of parameters
▫ W – Hidden weights
▫ U – Hidden to Hidden weights
▫ V – Hidden to Label weights
• All W,U,V are shared
Recurrent Neural Networks1
1. http://ai.stanford.edu/~quocle/tutorial2.pdf
24. 24
Where can Recurrent Neural Networks be used?1
1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/
1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image
classification).
2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive
or negative sentiment).
4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in
English and then outputs a sentence in French).
5. Synced sequence input and output (e.g. video classification where we wish to label each frame of
the video).
25. 25
• Andrej Karpathy’s article
▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• Hand writing generation demo
▫ http://www.cs.toronto.edu/~graves/handwriting.html
Sample applications
26. 26
Recurrent Neural Networks
• A recurrent neural network can be thought of as multiple copies of
the same network, each passing a message to a successor. 1
• Backpropagation(computing gradient wrt all parameters of the
network) which is process used to propagate errors and weights
needs to be modified for RNNs due to the existence of loops
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
27. 27
• BPTT begins by unfolding a recurrent neural network through time
as shown in the figure.
• Training then proceeds in a manner similar to training a feed-
forward neural network with backpropagation, except that the
training patterns are visited in sequential order.
Back Propagation through time (BPTT)1
1. https://en.wikipedia.org/wiki/Backpropagation_through_time
28. 28
• Backpropagation through time (BPTT) for RNNs is difficult due to a
problem known as vanishing/exploding gradient . i.e, the gradient
becomes extremely small or large towards the first and end of the
network.
• This is addressed by LSTM RNNs. Instead of neurons, LSTMs use
memory cells 1
Addressing the problem of Vanishing/Exploding gradient
http://deeplearning.net/tutorial/lstm.html
29. 29
• Dataset of 25,000 movies reviews from IMDB, labeled by sentiment
(positive/negative).
• Reviews have been preprocessed, and each review is encoded as a sequence of
word indexes (integers).
• For convenience, words are indexed by overall frequency in the dataset, so that
for instance the integer "3" encodes the 3rd most frequent word in the data.
• The 2011 paper (see below) had approximately 88% accuracy
• See
▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py
▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-
networks-python-keras/
▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf
Demo – IMDB Dataset
30. 30
Network
The most frequent 5000 words are chosen and mapped to 32 length vector
Sequences are restricted to 500 words; > 500 cut off ; < 500 pad
LSTM layer with 100 output dimensions
Accuracy: 84.08%
31. 31
• Use 72 for training and 36 for testing
• Lookback 1
Using RNNs for the CIF forecasting problem
0
200
400
600
800
1000
1200
1400
1600
1800
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
101
105
32. 32
Result
Train Score: 50.54 RMSE
Test Score: 65.34 RMSE
Lookback = 1
Train Score: 41.65 RMSE
Test Score: 90.68 RMSE
Lookback = 10
35. Thank you!
Members & Sponsors!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
35