SlideShare a Scribd company logo
1 of 85
Download to read offline
@graphific
Roelof Pieters
Deep	Learning	for	Natural	
Language	Embeddings
2	May	2016	

KTH
www.csc.kth.se/~roelof/
roelof@kth.se
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
3. Language is culturally specific
Some of the challenges in Language Understanding:
2
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
• the students went to class
DT NN VB P NN
3
Some of the challenges in Language Understanding:
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
4
Some of the challenges in Language Understanding:
[Karlgren 2014, NLP Sthlm Meetup]5
Digital Media Deluge: text
[ http://lexicon.gavagai.se/lookup/en/lol ]6
Digital Media Deluge: text
lol ?
…
Can we understand Language ?
1. Language is ambiguous:

Every sentence has many possible interpretations.
2. Language is productive:

We will always encounter new words or new
constructions
3. Language is culturally specific
Some of the challenges in Language Understanding:
7
Can we understand Language ?
8
NLP Data?
some of the (many) treebank datasets
source: http://www-nlp.stanford.edu/links/statnlp.html#Treebanks
!
9
Penn Treebank
That’s a lot of “manual” work:
10
• the students went to class
DT NN VB P NN
• plays well with others
VB ADV P NN
NN NN P DT
• fruit flies like a banana
NN NN VB DT NN
NN VB P DT NN
NN NN P DT NN
NN VB VB DT NN
With a lot of issues:
Penn Treebank
11
Deep Learning: Why for NLP ?
Beat state of the art at:
• Language Modeling (Mikolov et al. 2011) [WSJ AR task]
• Speech Recognition (Dahl et al. 2012, Seide et al 2011;
following Mohammed et al. 2011)
• Sentiment Classification (Socher et al. 2011)
and we already know for some longer time it works well for
Computer Vision of course:
• MNIST hand-written digit recognition (Ciresan et al.
2010)
• Image Recognition (Krizhevsky et al. 2012) [ImageNet]
12
One Model rules them all ?



DL approaches have been successfully applied to:
Deep Learning: Why for NLP ?
Automatic summarization Coreference resolution Discourse analysis
Machine translation Morphological segmentation Named entity recognition (NER)
Natural language generation
Natural language understanding
Optical character recognition (OCR)
Part-of-speech tagging
Parsing
Question answering
Relationship extraction
sentence boundary disambiguation
Sentiment analysis
Speech recognition
Speech segmentation
Topic segmentation and recognition
Word segmentation
Word sense disambiguation
Information retrieval (IR)
Information extraction (IE)
Speech processing
13
• What is the meaning of a word?

(Lexical semantics)
• What is the meaning of a sentence?

([Compositional] semantics)
• What is the meaning of a longer piece of text?
(Discourse semantics)
Semantics: Meaning
14
• Language models define probability distributions over
(natural language) strings or sentences
• Joint and Conditional Probability
Language Model
15
• Language models define probability distributions
over (natural language) strings or sentences
Language Model
16
• Language models define probability distributions
over (natural language) strings or sentences
Language Model
17
Word senses
What is the meaning of words?
• Most words have many different senses:

dog = animal or sausage?
How are the meanings of different words related?
• - Specific relations between senses:

Animal is more general than dog.
• - Semantic fields:

money is related to bank
18
Word senses
Polysemy:
• A lexeme is polysemous if it has different related
senses
• bank = financial institution or building
Homonyms:
• Two lexemes are homonyms if their senses are
unrelated, but they happen to have the same spelling
and pronunciation
• bank = (financial) bank or (river) bank
19
Word senses: relations
Symmetric relations:
• Synonyms: couch/sofa

Two lemmas with the same sense
• Antonyms: cold/hot, rise/fall, in/out

Two lemmas with the opposite sense
Hierarchical relations:
• Hypernyms and hyponyms: pet/dog

The hyponym (dog) is more specific than the hypernym
(pet)
• Holonyms and meronyms: car/wheel

The meronym (wheel) is a part of the holonym (car)
20
Distributional representations
“You shall know a word by the company it keeps”

(J. R. Firth 1957)
One of the most successful ideas of modern
statistical NLP!
these words represent banking
21
Distributional hypothesis
He filled the wampimuk, passed it
around and we all drunk some
We found a little, hairy wampimuk
sleeping behind the tree
(McDonald & Ramscar 2001)
22
• NLP treats words mainly (rule-based/statistical
approaches at least) as atomic symbols:

• or in vector space:

• also known as “one hot” representation.
• Its problem ?
Word Representation
Love Candy Store
[0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …]
Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] AND
Store [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 !
23
Word Representation
24
Distributional semantics
Distributional meaning as co-occurrence vector:
25
Deep Distributional representations
• Taking it further:
• Continuous word embeddings
• Combine vector space semantics with the
prediction of probabilistic models
• Words are represented as a dense vector:
Candy =
26
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
27
Word Embeddings: SocherVector Space Model
adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birth
the place where I was born
28
How?
• Representation of words as continuous vectors has a
long history (Hinton et al. 1986; Rumelhart et al. 1986;
Elman 1990)
• First neural network language model: NNLM (Bengio
et al. 2001; Bengio et al. 2003) based on earlier ideas of
distributed representations for symbols (Hinton 1986)
How?
30
Word Embeddings: SocherVector Space Model
Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA
In a perfect world:
the country of my birth
the place where I was born ?
…
31
How? Language Compositionality
Principle of compositionality:
the “meaning (vector) of a
complex expression (sentence) is
determined by:
— Gottlob Frege 

(1848 - 1925)
- the meanings of its constituent
expressions (words) and
- the rules (grammar) used to
combine them”
32
• Can theoretically (given enough units) approximate
“any” function
• and fit to “any” kind of data
• Efficient for NLP: hidden layers can be used as word
lookup tables
• Dense distributed word vectors + efficient NN
training algorithms:
• Can scale to billions of words !
Neural Networks for NLP
33
• How do we handle the compositionality of language in
our models?
34
Compositionality
• How do we handle the compositionality of language in
our models?
• Recursion :

the same operator (same parameters) is
applied repeatedly on different components
35
Compositionality
• How do we handle the compositionality of language in
our models?
• Option 1: Recurrent Neural Networks (RNN)
36
RNN 1: Recurrent Neural Networks
• How do we handle the compositionality of language in
our models?
• Option 2: Recursive Neural Networks (also
sometimes called RNN)
37
RNN 2: Recursive Neural Networks
• achieved SOTA in 2011 on
Language Modeling (WSJ AR
task) (Mikolov et al.,
INTERSPEECH 2011):
• and again at ASRU 2011:
38
Recurrent Neural Networks
“Comparison to other LMs shows that RNN
LMs are state of the art by a large margin.
Improvements inrease with more training data.”
“[ RNN LM trained on a] single core on 400M words in a few days,
with 1% absolute improvement in WER on state of the art setup”
Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)

Recurrent neural network based language model
39
Recurrent Neural Networks
(simple recurrent 

neural network for LM)
input
hidden layer(s)
output layer
+ sigmoid activation function
+ softmax function:
Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)

Recurrent neural network based language model
40
Recurrent Neural Networks
backpropagation through time
41
Recurrent Neural Networks
backpropagation through time
class based recurrent NN
[code (Mikolov’s RNNLM Toolkit) and more info: http://rnnlm.org/ ]
• Recursive Neural
Network for LM (Socher
et al. 2011; Socher
2014)
• achieved SOTA on new
Stanford Sentiment
Treebank dataset (but
comparing it to many
other models):
Recursive Neural Network
42
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
Recursive Neural Tensor Network
43
code & info: http://www.socher.org/index.php/Main/
ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks
Socher, R., Liu, C.C., NG, A.Y., Manning, C.D. (2011) 

Parsing Natural Scenes and Natural Language with Recursive Neural Networks
Recursive Neural Tensor Network
44
• RNN (Socher et al.
2011a)
Recursive Neural Network
45
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
• RNN (Socher et al.
2011a)
• Matrix-Vector RNN
(MV-RNN) (Socher et
al., 2012)
Recursive Neural Network
46
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
• RNN (Socher et al.
2011a)
• Matrix-Vector RNN
(MV-RNN) (Socher et
al., 2012)
• Recursive Neural
Tensor Network (RNTN)
(Socher et al. 2013)
Recursive Neural Network
47
• negation detection:
Recursive Neural Network
48
Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
info & code: http://nlp.stanford.edu/sentiment/
Recurrent vs Recursive
49
recursive net untied recursive net recurrent net
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
Recurrent NN for Vector Space
50
NP
PP/IN
NP
DT NN PRP$ NN
Parse Tree
INDT NN PRP NN
Compositionality
51
Recurrent NN: CompositionalityRecurrent NN for Vector Space
NP
IN
NP
PRP NN
Parse Tree
DT NN
Compositionality
52
Recurrent NN: CompositionalityRecurrent NN for Vector Space
NP
IN
NP
DT NN PRP NN
PP
NP (S / ROOT)
“rules” “meanings”
Compositionality
53
Recurrent NN: CompositionalityRecurrent NN for Vector Space
Vector Space + Word Embeddings: Socher
54
Recurrent NN: CompositionalityRecurrent NN for Vector Space
Vector Space + Word Embeddings: Socher
55
Recurrent NN for Vector Space
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/56
Word Embeddings: Turian (2010)
Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning
code & info: http://metaoptimize.com/projects/wordreprs/
57
Word Embeddings: Collobert & Weston (2011)
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) .
Natural Language Processing (almost) from Scratch
58
Word2Vec & Linguistic Regularities: Mikolov (2013)
code & info: https://code.google.com/p/word2vec/
Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations
59
Word2Vec & Linguistic Regularities: Mikolov (2013)
• Similar to the feedforward NNLM,
but
• Non-linear hidden layer removed
• Projection layer shared for all words –
Not just the projection matrix
• Thus, all words get projected into the
same position – Their vectors are just
averaged
• Called CBOW (continuous BoW)
because the order of the words is lost
• Another modification is to use words
from past and from future (window
centered on current word)
Continuous BoW (CBOW) Model
Word2Vec & Linguistic Regularities: Mikolov (2013)
• Similar to CBOW, but instead of predicting
the current word based on the context
• Tries to maximize classification of a word
based on another word in the same
sentence
• Thus, uses each current word as an input to
a log-linear classifier
• Predicts words within a certain window
• Observations – Larger window size =>
better quality of the resulting word vectors,
higher training time – More distant words
are usually less related to the current word
than those close to it – Give less weight to
the distant words by sampling less from
those words in the training examples
Continuous Skip-gram Model
GloVe: Global Vectors for Word Representation
(Pennington 2015)
62
Glove == Word2Vec?
Omer Levy & Yoav Goldberg (2014) Neural Word Embedding
as Implicit Matrix Factorization
Paragraph Vectors: Dai et al. (2014)
63
Paragraph Vectors: Dai et al. (2014)
64
Paragraph vector with
distributed memory (PV-DM)
Distributed bag of words
version (PVDBOW)
Paragraph Vectors: Dai et al. (2014)
65
Swivel: Improving Embeddings by Noticing
What's Missing (Shazeer et al 2016)
66
Swivel: Improving Embeddings by Noticing
What's Missing (Shazeer et al 2016)
67
Multi-embeddings: Stanford (2012)
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)

Improving Word Representations via Global Context and Multiple Word Prototypes
68
Word Embeddings for MT: Mikolov (2013)
Mikolov, T., Le, V. L., Sutskever, I. (2013) . 

Exploiting Similarities among Languages for Machine Translation
69
Word Embeddings for MT: Kiros (2014)
70
Recursive Embeddings for Sentiment: Socher (2013)
Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts, C. (2013) 

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.
code & demo: http://nlp.stanford.edu/sentiment/index.html
71
• RNN
RNN
=
(unrolled)
>Word Embeddings: Sentiment Analysis
• RNN
>Word Embeddings: Sentiment Analysis
• Bidirectional RNN
>Word Embeddings: Sentiment Analysis
• Vanishing gradient problem
• LSTM
>Word Embeddings: Sentiment Analysis
• LSTM
Most cited LSTM paper:

- Graves, Alex. Supervised
sequence labelling with recurrent
neural networks. Vol. 385.
Springer, 2012.
Introduction of the LSTM model:

- Hochreiter, S., & Schmidhuber, J. (1997). Long short-term
memory. Neural computation, 9(8), 1735-1780.

Addition of the forget gate to the LSTM model:

- Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning
to forget: Continual prediction with LSTM. Neural
computation, 12(10), 2451-2471.
>Word Embeddings: Sentiment Analysis
>Word Embeddings: Sentiment Analysis
• Vanishing gradient problem… solved
Wanna be Doing
Deep Learning?
python has a wide range of deep
learning-related libraries available
Deep Learning with Python
Low level
High level
deeplearning.net/software/theano
caffe.berkeleyvision.org
tensorflow.org/
lasagne.readthedocs.org/en/latest
and of course:
keras.io
Code & Papers?
http://gitxiv.com/ #GitXiv
Creative AI projects?
http://www.creativeai.net/ #Crea?veAI
Questions?
love letters? existential dilemma’s? academic questions? gifts? 

find me at:

www.csc.kth.se/~roelof/
roelof@kth.se
@graphific
Oh, and soon we’re looking for Creative AI enthusiasts !
- job
- internship
- thesis work
in
AI (Deep Learning)

&

Creativity
Deep learning for natural language embeddings

More Related Content

What's hot

BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
Karthik Murugesan
 

What's hot (20)

Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Learning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionaryLearning to understand phrases by embedding the dictionary
Learning to understand phrases by embedding the dictionary
 
Information Retrieval with Deep Learning
Information Retrieval with Deep LearningInformation Retrieval with Deep Learning
Information Retrieval with Deep Learning
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
Deep Learning for NLP Applications
Deep Learning for NLP ApplicationsDeep Learning for NLP Applications
Deep Learning for NLP Applications
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
Anthiil Inside workshop on NLP
Anthiil Inside workshop on NLPAnthiil Inside workshop on NLP
Anthiil Inside workshop on NLP
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
 
NLP Bootcamp
NLP BootcampNLP Bootcamp
NLP Bootcamp
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Talk from NVidia Developer Connect
Talk from NVidia Developer ConnectTalk from NVidia Developer Connect
Talk from NVidia Developer Connect
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
BIng NLP Expert - Dl summer-school-2017.-jianfeng-gao.v2
 
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in RoboticsNonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
 

Viewers also liked

word embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisword embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysis
Mostapha Benhenda
 
Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documents
Abdullah Khan Zehady
 

Viewers also liked (20)

Multi-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative aiMulti-modal embeddings: from discriminative to generative models and creative ai
Multi-modal embeddings: from discriminative to generative models and creative ai
 
Creative AI & multimodality: looking ahead
Creative AI & multimodality: looking aheadCreative AI & multimodality: looking ahead
Creative AI & multimodality: looking ahead
 
Distributed Representations of Sentences and Documents
Distributed Representations of Sentences and DocumentsDistributed Representations of Sentences and Documents
Distributed Representations of Sentences and Documents
 
Graph, Data-science, and Deep Learning
Graph, Data-science, and Deep LearningGraph, Data-science, and Deep Learning
Graph, Data-science, and Deep Learning
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Deep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with styleDeep Neural Networks 
that talk (Back)… with style
Deep Neural Networks 
that talk (Back)… with style
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Explore Data: Data Science + Visualization
Explore Data: Data Science + VisualizationExplore Data: Data Science + Visualization
Explore Data: Data Science + Visualization
 
Building a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) MachineBuilding a Deep Learning (Dream) Machine
Building a Deep Learning (Dream) Machine
 
Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!Deep Learning & NLP: Graphs to the Rescue!
Deep Learning & NLP: Graphs to the Rescue!
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
 
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
Pennington, Socher, and Manning. (2014) GloVe: Global vectors for word repres...
 
Automatic Summarizaton Tutorial
Automatic Summarizaton TutorialAutomatic Summarizaton Tutorial
Automatic Summarizaton Tutorial
 
Visualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLPVisualizing and understanding neural models in NLP
Visualizing and understanding neural models in NLP
 
CNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian TweetsCNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian Tweets
 
word embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysisword embeddings and applications to machine translation and sentiment analysis
word embeddings and applications to machine translation and sentiment analysis
 
Distributed representation of sentences and documents
Distributed representation of sentences and documentsDistributed representation of sentences and documents
Distributed representation of sentences and documents
 
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
 

Similar to Deep learning for natural language embeddings

LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
VishnuRajuV
 
Multimodality in second-language offline processing
Multimodality in second-language offline processingMultimodality in second-language offline processing
Multimodality in second-language offline processing
Renata Geld
 
MixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptx
MariYam371004
 
Eurocall 2009 conference presentation
Eurocall 2009 conference presentationEurocall 2009 conference presentation
Eurocall 2009 conference presentation
Takeshi Sato
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
RIILP
 
EUROCALL 2012 conference presentation
EUROCALL 2012 conference presentationEUROCALL 2012 conference presentation
EUROCALL 2012 conference presentation
Takeshi Sato
 

Similar to Deep learning for natural language embeddings (20)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
 
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
Symbol Emergence in Robotics: Language Acquisition via Real-world Sensorimoto...
 
Nlp Sentemental analysis of Tweetr And CaseStudy
Nlp Sentemental analysis of Tweetr And CaseStudyNlp Sentemental analysis of Tweetr And CaseStudy
Nlp Sentemental analysis of Tweetr And CaseStudy
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
NLP pipeline in machine translation
NLP pipeline in machine translationNLP pipeline in machine translation
NLP pipeline in machine translation
 
Multimodality in second-language offline processing
Multimodality in second-language offline processingMultimodality in second-language offline processing
Multimodality in second-language offline processing
 
MixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptxMixedLanguageProcessingTutorialEMNLP2019.pptx
MixedLanguageProcessingTutorialEMNLP2019.pptx
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Eurocall 2009 conference presentation
Eurocall 2009 conference presentationEurocall 2009 conference presentation
Eurocall 2009 conference presentation
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Advances In Wsd Acl 2005
Advances In Wsd Acl 2005Advances In Wsd Acl 2005
Advances In Wsd Acl 2005
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
Language as social sensor - Marko Grobelnik - Dubrovnik - HrTAL2016 - 30 Sep ...
 
EUROCALL 2012 conference presentation
EUROCALL 2012 conference presentationEUROCALL 2012 conference presentation
EUROCALL 2012 conference presentation
 

Recently uploaded

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 

Deep learning for natural language embeddings

  • 2. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 3. Language is culturally specific Some of the challenges in Language Understanding: 2
  • 3. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions • plays well with others VB ADV P NN NN NN P DT • fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN • the students went to class DT NN VB P NN 3 Some of the challenges in Language Understanding:
  • 4. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 4 Some of the challenges in Language Understanding:
  • 5. [Karlgren 2014, NLP Sthlm Meetup]5 Digital Media Deluge: text
  • 7. Can we understand Language ? 1. Language is ambiguous:
 Every sentence has many possible interpretations. 2. Language is productive:
 We will always encounter new words or new constructions 3. Language is culturally specific Some of the challenges in Language Understanding: 7
  • 8. Can we understand Language ? 8
  • 9. NLP Data? some of the (many) treebank datasets source: http://www-nlp.stanford.edu/links/statnlp.html#Treebanks ! 9
  • 10. Penn Treebank That’s a lot of “manual” work: 10
  • 11. • the students went to class DT NN VB P NN • plays well with others VB ADV P NN NN NN P DT • fruit flies like a banana NN NN VB DT NN NN VB P DT NN NN NN P DT NN NN VB VB DT NN With a lot of issues: Penn Treebank 11
  • 12. Deep Learning: Why for NLP ? Beat state of the art at: • Language Modeling (Mikolov et al. 2011) [WSJ AR task] • Speech Recognition (Dahl et al. 2012, Seide et al 2011; following Mohammed et al. 2011) • Sentiment Classification (Socher et al. 2011) and we already know for some longer time it works well for Computer Vision of course: • MNIST hand-written digit recognition (Ciresan et al. 2010) • Image Recognition (Krizhevsky et al. 2012) [ImageNet] 12
  • 13. One Model rules them all ?
 
 DL approaches have been successfully applied to: Deep Learning: Why for NLP ? Automatic summarization Coreference resolution Discourse analysis Machine translation Morphological segmentation Named entity recognition (NER) Natural language generation Natural language understanding Optical character recognition (OCR) Part-of-speech tagging Parsing Question answering Relationship extraction sentence boundary disambiguation Sentiment analysis Speech recognition Speech segmentation Topic segmentation and recognition Word segmentation Word sense disambiguation Information retrieval (IR) Information extraction (IE) Speech processing 13
  • 14. • What is the meaning of a word?
 (Lexical semantics) • What is the meaning of a sentence?
 ([Compositional] semantics) • What is the meaning of a longer piece of text? (Discourse semantics) Semantics: Meaning 14
  • 15. • Language models define probability distributions over (natural language) strings or sentences • Joint and Conditional Probability Language Model 15
  • 16. • Language models define probability distributions over (natural language) strings or sentences Language Model 16
  • 17. • Language models define probability distributions over (natural language) strings or sentences Language Model 17
  • 18. Word senses What is the meaning of words? • Most words have many different senses:
 dog = animal or sausage? How are the meanings of different words related? • - Specific relations between senses:
 Animal is more general than dog. • - Semantic fields:
 money is related to bank 18
  • 19. Word senses Polysemy: • A lexeme is polysemous if it has different related senses • bank = financial institution or building Homonyms: • Two lexemes are homonyms if their senses are unrelated, but they happen to have the same spelling and pronunciation • bank = (financial) bank or (river) bank 19
  • 20. Word senses: relations Symmetric relations: • Synonyms: couch/sofa
 Two lemmas with the same sense • Antonyms: cold/hot, rise/fall, in/out
 Two lemmas with the opposite sense Hierarchical relations: • Hypernyms and hyponyms: pet/dog
 The hyponym (dog) is more specific than the hypernym (pet) • Holonyms and meronyms: car/wheel
 The meronym (wheel) is a part of the holonym (car) 20
  • 21. Distributional representations “You shall know a word by the company it keeps”
 (J. R. Firth 1957) One of the most successful ideas of modern statistical NLP! these words represent banking 21
  • 22. Distributional hypothesis He filled the wampimuk, passed it around and we all drunk some We found a little, hairy wampimuk sleeping behind the tree (McDonald & Ramscar 2001) 22
  • 23. • NLP treats words mainly (rule-based/statistical approaches at least) as atomic symbols:
 • or in vector space:
 • also known as “one hot” representation. • Its problem ? Word Representation Love Candy Store [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] Candy [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 …] AND Store [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 …] = 0 ! 23
  • 25. Distributional semantics Distributional meaning as co-occurrence vector: 25
  • 26. Deep Distributional representations • Taking it further: • Continuous word embeddings • Combine vector space semantics with the prediction of probabilistic models • Words are represented as a dense vector: Candy = 26
  • 27. Word Embeddings: SocherVector Space Model adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: 27
  • 28. Word Embeddings: SocherVector Space Model adapted rom Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born 28
  • 29. How?
  • 30. • Representation of words as continuous vectors has a long history (Hinton et al. 1986; Rumelhart et al. 1986; Elman 1990) • First neural network language model: NNLM (Bengio et al. 2001; Bengio et al. 2003) based on earlier ideas of distributed representations for symbols (Hinton 1986) How? 30
  • 31. Word Embeddings: SocherVector Space Model Figure (edited) from Bengio, “Representation Learning and Deep Learning”, July, 2012, UCLA In a perfect world: the country of my birth the place where I was born ? … 31
  • 32. How? Language Compositionality Principle of compositionality: the “meaning (vector) of a complex expression (sentence) is determined by: — Gottlob Frege 
 (1848 - 1925) - the meanings of its constituent expressions (words) and - the rules (grammar) used to combine them” 32
  • 33. • Can theoretically (given enough units) approximate “any” function • and fit to “any” kind of data • Efficient for NLP: hidden layers can be used as word lookup tables • Dense distributed word vectors + efficient NN training algorithms: • Can scale to billions of words ! Neural Networks for NLP 33
  • 34. • How do we handle the compositionality of language in our models? 34 Compositionality
  • 35. • How do we handle the compositionality of language in our models? • Recursion :
 the same operator (same parameters) is applied repeatedly on different components 35 Compositionality
  • 36. • How do we handle the compositionality of language in our models? • Option 1: Recurrent Neural Networks (RNN) 36 RNN 1: Recurrent Neural Networks
  • 37. • How do we handle the compositionality of language in our models? • Option 2: Recursive Neural Networks (also sometimes called RNN) 37 RNN 2: Recursive Neural Networks
  • 38. • achieved SOTA in 2011 on Language Modeling (WSJ AR task) (Mikolov et al., INTERSPEECH 2011): • and again at ASRU 2011: 38 Recurrent Neural Networks “Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.” “[ RNN LM trained on a] single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup” Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)
 Recurrent neural network based language model
  • 39. 39 Recurrent Neural Networks (simple recurrent 
 neural network for LM) input hidden layer(s) output layer + sigmoid activation function + softmax function: Mikolov, T., Karafiat, M., Burget, L., Cernock, J.H., Khudanpur, S. (2011)
 Recurrent neural network based language model
  • 41. 41 Recurrent Neural Networks backpropagation through time class based recurrent NN [code (Mikolov’s RNNLM Toolkit) and more info: http://rnnlm.org/ ]
  • 42. • Recursive Neural Network for LM (Socher et al. 2011; Socher 2014) • achieved SOTA on new Stanford Sentiment Treebank dataset (but comparing it to many other models): Recursive Neural Network 42 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 43. Recursive Neural Tensor Network 43 code & info: http://www.socher.org/index.php/Main/ ParsingNaturalScenesAndNaturalLanguageWithRecursiveNeuralNetworks Socher, R., Liu, C.C., NG, A.Y., Manning, C.D. (2011) 
 Parsing Natural Scenes and Natural Language with Recursive Neural Networks
  • 45. • RNN (Socher et al. 2011a) Recursive Neural Network 45 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 46. • RNN (Socher et al. 2011a) • Matrix-Vector RNN (MV-RNN) (Socher et al., 2012) Recursive Neural Network 46 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 47. • RNN (Socher et al. 2011a) • Matrix-Vector RNN (MV-RNN) (Socher et al., 2012) • Recursive Neural Tensor Network (RNTN) (Socher et al. 2013) Recursive Neural Network 47
  • 48. • negation detection: Recursive Neural Network 48 Socher, R., Perelygin,, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C. (2013)
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank info & code: http://nlp.stanford.edu/sentiment/
  • 49. Recurrent vs Recursive 49 recursive net untied recursive net recurrent net
  • 50. NP PP/IN NP DT NN PRP$ NN Parse Tree Recurrent NN for Vector Space 50
  • 51. NP PP/IN NP DT NN PRP$ NN Parse Tree INDT NN PRP NN Compositionality 51 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 52. NP IN NP PRP NN Parse Tree DT NN Compositionality 52 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 53. NP IN NP DT NN PRP NN PP NP (S / ROOT) “rules” “meanings” Compositionality 53 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 54. Vector Space + Word Embeddings: Socher 54 Recurrent NN: CompositionalityRecurrent NN for Vector Space
  • 55. Vector Space + Word Embeddings: Socher 55 Recurrent NN for Vector Space
  • 56. Word Embeddings: Turian (2010) Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning code & info: http://metaoptimize.com/projects/wordreprs/56
  • 57. Word Embeddings: Turian (2010) Turian, J., Ratinov, L., Bengio, Y. (2010). Word representations: A simple and general method for semi-supervised learning code & info: http://metaoptimize.com/projects/wordreprs/ 57
  • 58. Word Embeddings: Collobert & Weston (2011) Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P. (2011) . Natural Language Processing (almost) from Scratch 58
  • 59. Word2Vec & Linguistic Regularities: Mikolov (2013) code & info: https://code.google.com/p/word2vec/ Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic Regularities in Continuous Space Word Representations 59
  • 60. Word2Vec & Linguistic Regularities: Mikolov (2013) • Similar to the feedforward NNLM, but • Non-linear hidden layer removed • Projection layer shared for all words – Not just the projection matrix • Thus, all words get projected into the same position – Their vectors are just averaged • Called CBOW (continuous BoW) because the order of the words is lost • Another modification is to use words from past and from future (window centered on current word) Continuous BoW (CBOW) Model
  • 61. Word2Vec & Linguistic Regularities: Mikolov (2013) • Similar to CBOW, but instead of predicting the current word based on the context • Tries to maximize classification of a word based on another word in the same sentence • Thus, uses each current word as an input to a log-linear classifier • Predicts words within a certain window • Observations – Larger window size => better quality of the resulting word vectors, higher training time – More distant words are usually less related to the current word than those close to it – Give less weight to the distant words by sampling less from those words in the training examples Continuous Skip-gram Model
  • 62. GloVe: Global Vectors for Word Representation (Pennington 2015) 62 Glove == Word2Vec? Omer Levy & Yoav Goldberg (2014) Neural Word Embedding as Implicit Matrix Factorization
  • 63. Paragraph Vectors: Dai et al. (2014) 63
  • 64. Paragraph Vectors: Dai et al. (2014) 64 Paragraph vector with distributed memory (PV-DM) Distributed bag of words version (PVDBOW)
  • 65. Paragraph Vectors: Dai et al. (2014) 65
  • 66. Swivel: Improving Embeddings by Noticing What's Missing (Shazeer et al 2016) 66
  • 67. Swivel: Improving Embeddings by Noticing What's Missing (Shazeer et al 2016) 67
  • 68. Multi-embeddings: Stanford (2012) Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng (2012)
 Improving Word Representations via Global Context and Multiple Word Prototypes 68
  • 69. Word Embeddings for MT: Mikolov (2013) Mikolov, T., Le, V. L., Sutskever, I. (2013) . 
 Exploiting Similarities among Languages for Machine Translation 69
  • 70. Word Embeddings for MT: Kiros (2014) 70
  • 71. Recursive Embeddings for Sentiment: Socher (2013) Socher, R., Perelygin, A., Wu, J., Chuang, J.,Manning, C., Ng, A., Potts, C. (2013) 
 Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. code & demo: http://nlp.stanford.edu/sentiment/index.html 71
  • 72.
  • 74. >Word Embeddings: Sentiment Analysis • RNN
  • 75. >Word Embeddings: Sentiment Analysis • Bidirectional RNN
  • 76. >Word Embeddings: Sentiment Analysis • Vanishing gradient problem
  • 77. • LSTM >Word Embeddings: Sentiment Analysis
  • 78. • LSTM Most cited LSTM paper:
 - Graves, Alex. Supervised sequence labelling with recurrent neural networks. Vol. 385. Springer, 2012. Introduction of the LSTM model: - Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780. Addition of the forget gate to the LSTM model: - Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471. >Word Embeddings: Sentiment Analysis
  • 79. >Word Embeddings: Sentiment Analysis • Vanishing gradient problem… solved
  • 80. Wanna be Doing Deep Learning?
  • 81. python has a wide range of deep learning-related libraries available Deep Learning with Python Low level High level deeplearning.net/software/theano caffe.berkeleyvision.org tensorflow.org/ lasagne.readthedocs.org/en/latest and of course: keras.io
  • 84. Questions? love letters? existential dilemma’s? academic questions? gifts? 
 find me at:
 www.csc.kth.se/~roelof/ roelof@kth.se @graphific Oh, and soon we’re looking for Creative AI enthusiasts ! - job - internship - thesis work in AI (Deep Learning)
 &
 Creativity