MACHINE LEARNING ENGINEER
• Coursera Mentor
• City.AI Ambassador;
• IBM Watson AI XPRIZE contestant;
• Guest attendee at AI for
Good Global Summit at the UN;
• X-Men geek;
• family man and father of 5 (3 kids and
WHAT IS IN THERE FOR YOU?
• The Basics
• Vector Representation of Words
• The Shallow
• [Deep] Neural Networks for NLP
• The Deep
• Convolutional Networks for NLP
• The Recurrent
• Long-short Term Memory for NLP
• Where do we go from here?
• Automation of AWS GPUs with Terraform
HOW DOES IT WORK?
• Cosine distance between words in the vector
• X = vector(”biggest”)−vector(”big”) +
• X = smallest
• It predicts the context words from the
• It predicts the target word from the bag of
all context words.
Cosine Distance Euclidian Distance
The CBOW architecture predicts the current word based on the context,
and the Skip-gram predicts surrounding words given the current word.
HOW THEY WORK WITH TEXT?
• Each row of the matrix corresponds
to a word/token. Meaning, each row
is a low-dimensional vector that
represents a word/token.
• The width of the filters is usually the
same as the width of the input
• The height may vary, but it’s typically
between 2 and 5. So, for a 2x5 filter
it means we would cover 2 words
per sliding window.
LONG-TERM DEPENDENCIES PROBLEMS
• Small vs Large gap between the
relevant information for the
• “the clouds are in the sky.”;
• “I grew up in France… I speak
HOW THEY WORK?
• LSTMs’ Gates:
• Decides whether the state will be passed through
• Decides on which values to update and then feeds
a tanh which will output the next Candidate state.
• Update the new state based on the previous one
plus the candidate state.
• Feeds a sigmoid function to decide which parts of
the state will be output.
• Feeds a tanh function with the State and multiplies
its output with the sigmoid result.
WHERE DID I GET THIS STUFF FROM?
• Efficient Estimation of Word Representations in Vector Space: Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google,
• A Neural Probabilistic Language Model: Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin. Université de
Montréal, Montréal, Québec, Canada, 2013.
• Dropout: A Simple Way to Prevent Neural Networks from Overfitting: Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya
Sutskever, Ruslan Salakhutdinov. University of Toronto, Toronto, Ontario, Canada.