Deep Learning for Natural Language Processing

NATURAL LANGUAGE
PROCESSING
DEEP LEARNING FOR

MACHINE LEARNING ENGINEER
WILDER RODRIGUES
• Coursera Mentor
• City.AI Ambassador;
• IBM Watson AI XPRIZE contestant;
• Kaggler;
• Guest attendee at AI for
Good Global Summit at the UN;
• X-Men geek;
• family man and father of 5 (3 kids and
2 cats).
@wilderrodrigues
https://medium.com/@wilder.rodrigues/

WHAT IS IN THERE FOR YOU?
AGENDA
• The Basics
• Vector Representation of Words
• The Shallow
• [Deep] Neural Networks for NLP
• The Deep
• Convolutional Networks for NLP
• The Recurrent
• Long-short Term Memory for NLP
• Where do we go from here?
• Automation of AWS GPUs with Terraform

VECTOR
REPRESENTATION
OF WORDS
THE BASICS

HOW DOES IT WORK?
WORD2VEC
• Cosine distance between words in the vector
space:
• X = vector(”biggest”)−vector(”big”) +
vector(”small”)
• X = smallest
• Algorithms:
• Skip-Gram
• It predicts the context words from the
target words.
• CBOW
• It predicts the target word from the bag of
all context words.
Cosine Distance Euclidian Distance
The CBOW architecture predicts the current word based on the context,
and the Skip-gram predicts surrounding words given the current word.

[DEEP]
NEURAL
NETWORKS
THE SHALLOW

CONVOLUTIONAL
NEURAL
NETWORKS
THE DEEP

HOW THEY WORK?
CNNS
• Filters
• Kernel
• Strides
• Padding
• One equation to rule them all:
* =
6x6x3
3x3x3
4x4x16
4x4x16
2x2x16 2x2x16
* =
2
6
3
3
6
4
7
9
8
3
1
-1
4
0
0
4
2
3
91
1
6
2
3
2
5
7
9
7
2
1
4
3
2
7
7
4
8
2
6
7
3
4
4
3
9
1
55
(6 + 2 . 0 - 3) / 1 + 1 = 4
(6 + 2 . 0 - 3) / 1 + 1 = 4
16
4x4x16

HOW THEY WORK WITH TEXT?
CNNS
• Each row of the matrix corresponds
to a word/token. Meaning, each row
is a low-dimensional vector that
represents a word/token.
• The width of the filters is usually the
same as the width of the input
matrix.
• The height may vary, but it’s typically
between 2 and 5. So, for a 2x5 filter
it means we would cover 2 words
per sliding window.

LONG
SHORT TERM
MEMORY
THE RECURRENT

LONG-TERM DEPENDENCIES PROBLEMS
RNNS
• Small vs Large gap between the
relevant information for the
prediction:
• “the clouds are in the sky.”;
• “I grew up in France… I speak
fluent French.”.

HOW THEY WORK?
LSTMS
• LSTMs’ Gates:
• Forget
• Decides whether the state will be passed through
or not.
• Input
• Decides on which values to update and then feeds
a tanh which will output the next Candidate state.
• Update the new state based on the previous one
plus the candidate state.
• Output
• Feeds a sigmoid function to decide which parts of
the state will be output.
• Feeds a tanh function with the State and multiplies
its output with the sigmoid result.

TERRAFORM
WHERE DO WE GO
FROM HERE?

INFRASTRUCTURE AS CODE
BUILDING A LANDSCAPE
• Abstracts resources and providers:
• Physical hardware;
• Virtual machines; and
• Containers.
• Multi-Tier Applications
• Multi-Cloud Deployment
• Software Demos

WHERE DID I GET THIS STUFF FROM?
REFERENCES
• Efficient Estimation of Word Representations in Vector Space: Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google,
2013.
• A Neural Probabilistic Language Model: Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin. Université de
Montréal, Montréal, Québec, Canada, 2013.
• Dropout: A Simple Way to Prevent Neural Networks from Overfitting: Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya
Sutskever, Ruslan Salakhutdinov. University of Toronto, Toronto, Ontario, Canada.
• https://medium.com/cityai/deep-learning-for-natural-language-processing-part-i-8369895ffb98
• https://medium.com/cityai/deep-learning-for-natural-language-processing-part-ii-8b2b99b3fa1e
• https://medium.com/cityai/deep-learning-for-natural-language-processing-part-iii-96cfc6acfcc3
• http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
• https://github.com/ekholabs/DLinK
• https://github.com/ekholabs/automated_ml

Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing

Recommended

Recommended

More Related Content

Similar to Deep Learning for Natural Language Processing

Similar to Deep Learning for Natural Language Processing (20)

More from Wilder Rodrigues

More from Wilder Rodrigues (7)

Recently uploaded

Recently uploaded (20)

Deep Learning for Natural Language Processing