SlideShare a Scribd company logo
1 of 26
Download to read offline
DA 5330 – Advanced Machine Learning
Applications
Lecture 9 – Deep Sequence Models
Maninda Edirisooriya
manindaw@uom.lk
Sequence Modeling
• There are some events happening as a sequence of events. E.g.:
• Price of gold with time
• Velocity vector of a football during a kick
• Glucose level in blood with time
• Base pairs of a DNA sequence
• Right and left turns of a steering wheel while driving a car
• Sequence of words in an a essay
• Sequence of sound frequencies in a speech
• Sequence Modeling are the techniques used to model the events
happening as a sequence
• Using Deep Learning to model it is known as Deep Sequence Modeling
Modeling Independent Events vs. Sequences
Independent events are
dependent only on the input X
Sequence Events dependent on,
1. The input X given at the current
timeframe and
2. The previous event/events
Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
Sequence Model Applications
Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
Recurrent Neural Networks (RNNs)
• RNNs are the special type of NNs that can keep track of the events
happened past in the memory
• An RNN maintains a hidden state ht which represents the cumulative
history of the events
Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
Recurrent Neural Networks (RNNs)
• RNN maintains 3 separate weight vectors for multiplying with,
• Input: wxh
• Hidden State: whh
• Output: why
• Non-linear activation function tanh is used after adding the linear
combinations of input vector and previous hidden state with their
respective weight vectors to get the hidden state
• Then the linear combination is taken with hidden state with the
output weight vector to get the output vector
• Note that wxh, whh and why are common for all the time steps
Training RNNs
• Losses are calculated for each of the time step and the total of them is
taken as the total loss
• Backpropagation is applied through the time steps of the RNN
• Compared to other NN types RNNs are deeper which creates the
problems of,
• Exploding Gradient problem and
• Vanishing Gradient problem
• Gradient Clipping is used as a solution for Exploding Gradient Problem
• ReLU Activation, Identity Initialization and modified versions of RNN
such as LSTM and GRU are used to address Vanishing Gradient problem
Backpropagation Through Time (for RNNs)
Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
RNN Application – Language Modeling
• Given a sequence of words, predicting the probability of the next word is
known as Language Modeling. E.g.:
• “Capital of Sri Lanka is _____ ” is an example where “Colombo” should be the next
word to be there in this sentence
• In this application, words have to be considered as input events to the RNN
• But RNN can only take numerical values as inputs but not words
• Therefore, words have to be converted to numerical values first
Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
Converting Words Numerically
• First the given string (e.g.: “Capital of Sri Lanka is”) of words have to
be converted to a sequence of word tokens by splitting with spaces
• Which results the list, [“Capital”, “of”, “Sri”, “Lanka”, “is”]
• Then each word should be assigned a numerical value. There are
several ways to do it
• Having a vocabulary of words (i.e. like an English dictionary) and assigning
each of the unique word a unique number. E.g. [“Capital”:34, “of”:567,
“Sri”:734, “Lanka”:56, “is”:346] This is Label Encoding which is suitable only
for ordinal values. But words are not ordinal.
• Therefore, we can use one-hot encoding instead for word tokens.
One-hot Encoding of Words
• E.g.: Assuming there are 1000 words in the vocabulary,
• “Capital”: [0, 0, … 1, … 0]
0 1 34 999
• “of”: [0, 0, … 1, … 0]
0 1 567 999
• “Sri”: [0, 0, … 1, … 0]
0 1 734 999
• “Lanka”: [0, 0, …1, … 0]
0 1 56 999
• “is”: [0, 0, … 1 ,… 0]
0 1 346 999
Word Embeddings
• However, one-hot
encodings are extremely
sparse and large in size
• Word Embedding is a
sparse and memory
efficient alternative that
also captures natural
relationships in between
words
Source: https://www.scaler.com/topics/tensorflow/tensorflow-word-embeddings/
Word Embeddings of Words
• E.g.: Assuming the size of embedding is 4,
• “Capital”: [34, 74, 85, 83]
• “of”: [63, 85, 97, 64]
• “Sri”: [36, 45, 15, 90]
• “Lanka”: [62, 37, 63, 56]
• “is”: [42, 73, 93, 69]
• As each of the word in the sentence is relevant to each of the event in the
independent variable, X and dependent variable Y, training happens as,
• x0 = [0, 0, 0, 0]
• x1 = y0 = [34, 74, 85, 83]
• x2 = y1 = [63, 85, 97, 64]
• x3 = y2 = [36, 45, 15, 90]
• x4 = y3 = [62, 37, 63, 56]
• x5 = y4 = [42, 73, 93, 69]
Sampling From The Trained Language Model
• Let the output of one time step be the input of the next time step
• Keep that going till the unknown token is generated as ො
𝑦
Limitations of RNNs
• Encoding Bottleneck: As historical information is only propagated via
the hidden state, the size of the hidden state is a bottleneck for
storing the historical context of a RNN
• Inefficient Learning due to no Parallelism: As each of the time step is
considered as a distinct layer during the backpropagation, training
time is increased with the number of time steps
• No Long-Term Memory: RNNs are only capable of keeping the recent
history in its hidden state where the long term memory gets lost with
the increased number of time steps. This issue is handled in LSTM
(Long Short Term Memory) and GRU (Gated Recurrent Unit) types
From RNN to GRU and LSTM – RNN Summary
• First lets look at the hidden state formula of a RNN
• Here you can see the hidden state of previous time step and the input
are concatenated and multiplied with a single weight matrix Wa
• Then a common bias ba is added
• Activation function g is generally the tanh function
From RNN to GRU and LSTM
• GRU (Gated Recurrent Unit) has hidden state known as the Cell State
c<t> instead of the well-known hidden state of a RNN
• As the Cell State is updated only when applicable (i.e. with special
conditions) it can maintain a long-term memory compared to a RNN
• LSTM (Long Short Term Memory) has both hidden state a<t> and a cell
state c<t> where the cell state maintains long-term memory
• As a LSTM has both of them it needs more memory and processing
power than the GRU
• However, LSTM has better long-term memory in general, where GRU
may lack
From RNN to GRU and LSTM
• Though we explain examples (like Language Modeling) using RNNs,
due to their lack of long term memory, they are not used in practice,
in such scenarios with word sequences
• Instead, in almost all practical implementations GRUs or LSTMs are
used instead of RNNs
• As GRUs and LSTMs can be used to replace most of the RNN related
architecture, we just explain with RNNs for the simplicity in upcoming
slides. E.g.:
• Bidirectional RNNs can be replaced with Bidirectional GRUs and Bidirectional
LSTMs
• Deep RNNs can be replaced with Deep GRUs and Deep LSTMs
• Attentions models are common for RNNs, GRUs and LSTMs
From RNN to GRU and LSTM
Now let’s look at the formula of a GRU and a LSTM
GRU Formula LSTM Formula
Source: Deep Learning Specialization, Andrew NG
Bidirectional RNN (BRNN)
• As we have learned RNNs are used to model an event sequence in
one direction
• In other words RNN unit in time step t has the information from
previous time steps t-1, t-2, … 0
• However, in some use cases like Natural Language Understanding
(NLU) we have to process information not only in a one direction in a
sentence but both ways!
• For example, filling the missing word in “Colombo is the _______ of Sri Lanka”
needs reading the word sequence in both directions, as reading up to
“Colombo is the” does not get the full information to fill the missing word
Bidirectional RNN (BRNN)
BRNNs are two sequence of RNN time step units ordered in both
directions to be trained
Source: https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd66
Deep RNNs
• RNNs by nature act as Deep NNs due to its learning nature across
time steps is sequential like layers in a Deep NN
• However, even with this expensive-to-learn architecture, RNNs we
learned up to now only have a shallow layers of neurons from
information input (x vectors) to the information output (y vectors)
• When we need to model more complex functions with RNNs we have
to add several layers of RNNs in real which are known as Deep RNNs
• Generally, we do not go too many deeper layers due to increased
memory and processing requirements in Deep RNNs
Deep RNNs
Source: https://www.researchgate.net/figure/A-Deep-RNN-architecture-representing-a-bee-in-the-proposed-algorithm-Black-lines-are-the_fig1_353469535
Attention Models
• Say, your sequence model application is related to Natural Language
Processing (NLP) where words are used as the input vectors
• In natural language only some of the words in the sentence are
important to get the meaning of the sentence or fill a missing word
• Sequence models we discussed up to now, are giving the same weight
to all the time steps when making the predictions which is not the
real requirement
• Deep Learning models that are capable of giving a focused attention
to only some words of the word sequence while making predictions is
known as Attention Models
Attention Models
• Instead of directly getting the y output from the
BRNN units, output from all time steps are used as
information to find the most relevant word in a
different word sequence denoted by St , which is
unidirectional
• This process of finding the most relevant words is
known as the Attention Mechanism
• Softmax function is used to train the weights of
attention mechanism so that only a single word is
given almost all the attention
• The downside of the Attention Models is that their
processing complexity is quadratic with the
number of time steps
Source: https://machinelearningmastery.com/how-does-attention-work-in-encoder-
decoder-recurrent-neural-networks/
Questions?

More Related Content

Similar to Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU and LSTM networks and their architecture.

5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdfFEG
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
Deep Learning Automated Helpdesk
Deep Learning Automated HelpdeskDeep Learning Automated Helpdesk
Deep Learning Automated HelpdeskPranav Sharma
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyNUPUR YADAV
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendationsBalázs Hidasi
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyAkshayaNagarajan10
 
Complete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxComplete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxArunKumar674066
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptxthanhdowork
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processingBalayogi G
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer ConnectAnuj Gupta
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Zachary S. Brown
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 

Similar to Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU and LSTM networks and their architecture. (20)

Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
5_RNN_LSTM.pdf
5_RNN_LSTM.pdf5_RNN_LSTM.pdf
5_RNN_LSTM.pdf
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
Deep Learning Automated Helpdesk
Deep Learning Automated HelpdeskDeep Learning Automated Helpdesk
Deep Learning Automated Helpdesk
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Natural Language Processing using Java
Natural Language Processing using JavaNatural Language Processing using Java
Natural Language Processing using Java
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A Survey
 
Complete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptxComplete solution for Recurrent neural network.pptx
Complete solution for Recurrent neural network.pptx
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
Recent trends in natural language processing
Recent trends in natural language processingRecent trends in natural language processing
Recent trends in natural language processing
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
Deep Learning and Modern Natural Language Processing (AnacondaCon2019)
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 

More from Maninda Edirisooriya

Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning TechniquesLecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning TechniquesManinda Edirisooriya
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Maninda Edirisooriya
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Maninda Edirisooriya
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Maninda Edirisooriya
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMAnalyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMManinda Edirisooriya
 

More from Maninda Edirisooriya (19)

Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning TechniquesLecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning Techniques
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMAnalyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
 
WSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolboxWSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolbox
 
Training Report
Training ReportTraining Report
Training Report
 
GViz - Project Report
GViz - Project ReportGViz - Project Report
GViz - Project Report
 
Mortivation
MortivationMortivation
Mortivation
 
Hafnium impact 2008
Hafnium impact 2008Hafnium impact 2008
Hafnium impact 2008
 
ChatCrypt
ChatCryptChatCrypt
ChatCrypt
 

Recently uploaded

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 

Recently uploaded (20)

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 

Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU and LSTM networks and their architecture.

  • 1. DA 5330 – Advanced Machine Learning Applications Lecture 9 – Deep Sequence Models Maninda Edirisooriya manindaw@uom.lk
  • 2. Sequence Modeling • There are some events happening as a sequence of events. E.g.: • Price of gold with time • Velocity vector of a football during a kick • Glucose level in blood with time • Base pairs of a DNA sequence • Right and left turns of a steering wheel while driving a car • Sequence of words in an a essay • Sequence of sound frequencies in a speech • Sequence Modeling are the techniques used to model the events happening as a sequence • Using Deep Learning to model it is known as Deep Sequence Modeling
  • 3. Modeling Independent Events vs. Sequences Independent events are dependent only on the input X Sequence Events dependent on, 1. The input X given at the current timeframe and 2. The previous event/events Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
  • 4. Sequence Model Applications Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
  • 5. Recurrent Neural Networks (RNNs) • RNNs are the special type of NNs that can keep track of the events happened past in the memory • An RNN maintains a hidden state ht which represents the cumulative history of the events Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
  • 6. Recurrent Neural Networks (RNNs) • RNN maintains 3 separate weight vectors for multiplying with, • Input: wxh • Hidden State: whh • Output: why • Non-linear activation function tanh is used after adding the linear combinations of input vector and previous hidden state with their respective weight vectors to get the hidden state • Then the linear combination is taken with hidden state with the output weight vector to get the output vector • Note that wxh, whh and why are common for all the time steps
  • 7. Training RNNs • Losses are calculated for each of the time step and the total of them is taken as the total loss • Backpropagation is applied through the time steps of the RNN • Compared to other NN types RNNs are deeper which creates the problems of, • Exploding Gradient problem and • Vanishing Gradient problem • Gradient Clipping is used as a solution for Exploding Gradient Problem • ReLU Activation, Identity Initialization and modified versions of RNN such as LSTM and GRU are used to address Vanishing Gradient problem
  • 8. Backpropagation Through Time (for RNNs) Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
  • 9. RNN Application – Language Modeling • Given a sequence of words, predicting the probability of the next word is known as Language Modeling. E.g.: • “Capital of Sri Lanka is _____ ” is an example where “Colombo” should be the next word to be there in this sentence • In this application, words have to be considered as input events to the RNN • But RNN can only take numerical values as inputs but not words • Therefore, words have to be converted to numerical values first Source: https://www.youtube.com/watch?v=ySEx_Bqxvvo&list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI&index=9
  • 10. Converting Words Numerically • First the given string (e.g.: “Capital of Sri Lanka is”) of words have to be converted to a sequence of word tokens by splitting with spaces • Which results the list, [“Capital”, “of”, “Sri”, “Lanka”, “is”] • Then each word should be assigned a numerical value. There are several ways to do it • Having a vocabulary of words (i.e. like an English dictionary) and assigning each of the unique word a unique number. E.g. [“Capital”:34, “of”:567, “Sri”:734, “Lanka”:56, “is”:346] This is Label Encoding which is suitable only for ordinal values. But words are not ordinal. • Therefore, we can use one-hot encoding instead for word tokens.
  • 11. One-hot Encoding of Words • E.g.: Assuming there are 1000 words in the vocabulary, • “Capital”: [0, 0, … 1, … 0] 0 1 34 999 • “of”: [0, 0, … 1, … 0] 0 1 567 999 • “Sri”: [0, 0, … 1, … 0] 0 1 734 999 • “Lanka”: [0, 0, …1, … 0] 0 1 56 999 • “is”: [0, 0, … 1 ,… 0] 0 1 346 999
  • 12. Word Embeddings • However, one-hot encodings are extremely sparse and large in size • Word Embedding is a sparse and memory efficient alternative that also captures natural relationships in between words Source: https://www.scaler.com/topics/tensorflow/tensorflow-word-embeddings/
  • 13. Word Embeddings of Words • E.g.: Assuming the size of embedding is 4, • “Capital”: [34, 74, 85, 83] • “of”: [63, 85, 97, 64] • “Sri”: [36, 45, 15, 90] • “Lanka”: [62, 37, 63, 56] • “is”: [42, 73, 93, 69] • As each of the word in the sentence is relevant to each of the event in the independent variable, X and dependent variable Y, training happens as, • x0 = [0, 0, 0, 0] • x1 = y0 = [34, 74, 85, 83] • x2 = y1 = [63, 85, 97, 64] • x3 = y2 = [36, 45, 15, 90] • x4 = y3 = [62, 37, 63, 56] • x5 = y4 = [42, 73, 93, 69]
  • 14. Sampling From The Trained Language Model • Let the output of one time step be the input of the next time step • Keep that going till the unknown token is generated as ො 𝑦
  • 15. Limitations of RNNs • Encoding Bottleneck: As historical information is only propagated via the hidden state, the size of the hidden state is a bottleneck for storing the historical context of a RNN • Inefficient Learning due to no Parallelism: As each of the time step is considered as a distinct layer during the backpropagation, training time is increased with the number of time steps • No Long-Term Memory: RNNs are only capable of keeping the recent history in its hidden state where the long term memory gets lost with the increased number of time steps. This issue is handled in LSTM (Long Short Term Memory) and GRU (Gated Recurrent Unit) types
  • 16. From RNN to GRU and LSTM – RNN Summary • First lets look at the hidden state formula of a RNN • Here you can see the hidden state of previous time step and the input are concatenated and multiplied with a single weight matrix Wa • Then a common bias ba is added • Activation function g is generally the tanh function
  • 17. From RNN to GRU and LSTM • GRU (Gated Recurrent Unit) has hidden state known as the Cell State c<t> instead of the well-known hidden state of a RNN • As the Cell State is updated only when applicable (i.e. with special conditions) it can maintain a long-term memory compared to a RNN • LSTM (Long Short Term Memory) has both hidden state a<t> and a cell state c<t> where the cell state maintains long-term memory • As a LSTM has both of them it needs more memory and processing power than the GRU • However, LSTM has better long-term memory in general, where GRU may lack
  • 18. From RNN to GRU and LSTM • Though we explain examples (like Language Modeling) using RNNs, due to their lack of long term memory, they are not used in practice, in such scenarios with word sequences • Instead, in almost all practical implementations GRUs or LSTMs are used instead of RNNs • As GRUs and LSTMs can be used to replace most of the RNN related architecture, we just explain with RNNs for the simplicity in upcoming slides. E.g.: • Bidirectional RNNs can be replaced with Bidirectional GRUs and Bidirectional LSTMs • Deep RNNs can be replaced with Deep GRUs and Deep LSTMs • Attentions models are common for RNNs, GRUs and LSTMs
  • 19. From RNN to GRU and LSTM Now let’s look at the formula of a GRU and a LSTM GRU Formula LSTM Formula Source: Deep Learning Specialization, Andrew NG
  • 20. Bidirectional RNN (BRNN) • As we have learned RNNs are used to model an event sequence in one direction • In other words RNN unit in time step t has the information from previous time steps t-1, t-2, … 0 • However, in some use cases like Natural Language Understanding (NLU) we have to process information not only in a one direction in a sentence but both ways! • For example, filling the missing word in “Colombo is the _______ of Sri Lanka” needs reading the word sequence in both directions, as reading up to “Colombo is the” does not get the full information to fill the missing word
  • 21. Bidirectional RNN (BRNN) BRNNs are two sequence of RNN time step units ordered in both directions to be trained Source: https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd66
  • 22. Deep RNNs • RNNs by nature act as Deep NNs due to its learning nature across time steps is sequential like layers in a Deep NN • However, even with this expensive-to-learn architecture, RNNs we learned up to now only have a shallow layers of neurons from information input (x vectors) to the information output (y vectors) • When we need to model more complex functions with RNNs we have to add several layers of RNNs in real which are known as Deep RNNs • Generally, we do not go too many deeper layers due to increased memory and processing requirements in Deep RNNs
  • 24. Attention Models • Say, your sequence model application is related to Natural Language Processing (NLP) where words are used as the input vectors • In natural language only some of the words in the sentence are important to get the meaning of the sentence or fill a missing word • Sequence models we discussed up to now, are giving the same weight to all the time steps when making the predictions which is not the real requirement • Deep Learning models that are capable of giving a focused attention to only some words of the word sequence while making predictions is known as Attention Models
  • 25. Attention Models • Instead of directly getting the y output from the BRNN units, output from all time steps are used as information to find the most relevant word in a different word sequence denoted by St , which is unidirectional • This process of finding the most relevant words is known as the Attention Mechanism • Softmax function is used to train the weights of attention mechanism so that only a single word is given almost all the attention • The downside of the Attention Models is that their processing complexity is quadratic with the number of time steps Source: https://machinelearningmastery.com/how-does-attention-work-in-encoder- decoder-recurrent-neural-networks/