SlideShare a Scribd company logo
1 of 36
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD Embeddings
A non-exhaustive introduction to Word Embeddings
x
y
z
w
Christian S. Perone
christian.perone@gmail.com
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
AGENDA
INTRODUCTION
Philosophy of Language
Vector Space Model
Embeddings
Word Embeddings
Language Modeling
WORD2VEC
Introduction
Semantic Relations
Other properties
WORD MOVERS DISTANCE
Rationale
Model
Results
Q&A
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WHO AM I
Christian S. Perone
Machine Learning/Software Engineer
Blog
http://blog.christianperone.com
Open-source projects
https://github.com/perone
Twitter @tarantulae
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
Section I
INTRODUCTION
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
PHILOSOPHY OF LANGUAGE
(...) the meaning
of a word is its use
in the language.
ā€”Wittgenstein, Ludwig,
Philosophical Investigations ā€“ 1953
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
VECTOR SPACE MODEL
Interpreted in a lato sensu, VSM is a space where text is
represented as a vector of numbers instead of its original
textual representation
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
VECTOR SPACE MODEL
Interpreted in a lato sensu, VSM is a space where text is
represented as a vector of numbers instead of its original
textual representation
Many approaches to go from other spaces to a vector space
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
VECTOR SPACE MODEL
Interpreted in a lato sensu, VSM is a space where text is
represented as a vector of numbers instead of its original
textual representation
Many approaches to go from other spaces to a vector space
Many advantages when you have vectors with special
properties
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
VECTOR SPACE MODEL
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
EMBEDDINGS
From a space with one dimension per word to a continuous vector
space with much lower dimensionality. From one mathematical object
to another, but preserving ā€œstructureā€.
Source: Our beloved scikit-learn.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD EMBEDDINGS
Word Model
Word
Embedding
V(cat) = [ 1.4, -1.3, ... ]
Cat
sat
mat
on
cat = [ 0, 1, 0, ... ]
Sparse
Dense
From a sparse representation (usually one-hot encoding) to a
dense representation
Embeddings created as by-product vs explicit model
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
LANGUAGE MODELING
P(w1, Ā· Ā· Ā· , wn) =
i
P(wi | w1, Ā· Ā· Ā· , wiāˆ’1)
P(ā€the cat sat on the matā€œ) > P(ā€the mat sat on the catā€œ)
Useful for many different tasks, such as speech recognition,
handwriting recognition, translation, etc.
Naive counting: doesnā€™t generalize, too many possible
sentences
A word sequence on which the model will be tested is likely to
be different from all the word sequences seen during training.
[Bengio et al, 2003]
Markov assumption / how to approximate it
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
MARKOV ASSUMPTION AND N-GRAM MODELS
Markov assumption simpliļ¬es the model, it tries to approximate the
components of the product.
Unigram: P(w1, Ā· Ā· Ā· , wn) =
i
P(wi)
Bigram: P(wi | w1, Ā· Ā· Ā· , wiāˆ’1) ā‰ˆ P(wi | wiāˆ’1)
Extend to trigram, 4-gram, etc.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD EMBEDDINGS
CHARACTERISTICS
Language modeling, low-dimensional and dense but increased complexity.
Examples: neural language models, word2vec, GloVe, etc.
Source: Bengio et al., 2003
Classic neural language model
proposed by Bengio et al. in
2003.
After that, many other important
works by Collobert and
Weston (2008) and then by
Mikolov et al (2013).
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
Section II
WORD2VEC
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD2VEC
Unsupervised technique with supervised tasks, takes a text
corpus and produces word embeddings as output. Two different
architectures:
w(t-2)
w(t+1)
w(t-1)
w(t+2)
w(t)
SUM
INPUT PROJECTION OUTPUT
w(t)
INPUT PROJECTION OUTPUT
w(t-2)
w(t-1)
w(t+1)
w(t+2)
CBOW Skip-gram
Figure 2: Graphical representation of the CBOW model and Skip-gram model. In the CBOW model, the distributed
representations of context (or surrounding words) are combined to predict the word in the middle. In the Skip-gram
model, the distributed representation of the input word is used to predict the context.
Source: Exploiting Similarities among Languages for Machine Translation. Mikolov, Thomas et al.
2013.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD2VEC
Source: TensorFlow
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
AMAZING EMBEDDINGS
Semantic relationships are often preserved on vector operations.
Source: TensorFlow
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD ANALOGIES
Suppose we have the vector w āˆˆ Rn of any given word such as wking,
then we can do:
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD ANALOGIES
Suppose we have the vector w āˆˆ Rn of any given word such as wking,
then we can do:
wking āˆ’ wman + wwoman ā‰ˆ wqueen
This vector operation shows that the closest word vector to the
resulting vector is the vector wqueen.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD ANALOGIES
Suppose we have the vector w āˆˆ Rn of any given word such as wking,
then we can do:
wking āˆ’ wman + wwoman ā‰ˆ wqueen
This vector operation shows that the closest word vector to the
resulting vector is the vector wqueen.
This is an amazing property for the word embeddings, because it
means that they carry important relational information that can
be used to many different tasks.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
LANGUAGE STRUCTURE
āˆ’0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
āˆ’0.25
āˆ’0.2
āˆ’0.15
āˆ’0.1
āˆ’0.05
0
0.05
0.1
0.15
one
two
three
four
five
ā†’
āˆ’0.2 0 0.2 0.4 0.6 0.8 1 1.2
āˆ’0.6
āˆ’0.5
āˆ’0.4
āˆ’0.3
āˆ’0.2
āˆ’0.1
0
0.1
0.2
uno (one)
dos (two)
tres (three)
cuatro (four)
cinco (five)
āˆ’0.3 āˆ’0.25 āˆ’0.2 āˆ’0.15 āˆ’0.1 āˆ’0.05 0 0.05 0.1 0.15
āˆ’0.3
āˆ’0.25
āˆ’0.2
āˆ’0.15
āˆ’0.1
āˆ’0.05
0
0.05
0.1
0.15
0.2
cat
dog
cow
horse
pig
āˆ’0.5 āˆ’0.4 āˆ’0.3 āˆ’0.2 āˆ’0.1 0 0.1 0.2 0.3 0.4 0.5
āˆ’0.5
āˆ’0.4
āˆ’0.3
āˆ’0.2
āˆ’0.1
0
0.1
0.2
0.3
0.4
0.5
gato (cat)
perro (dog)
vaca (cow)
caballo (horse)
cerdo (pig)
Figure 1: Distributed word vector representations of numbers and animals in English (left) and Spanish (right). The ļ¬ve
vectors in each language were projected down to two dimensions using PCA, and then manually rotated to accentuate
their similarity. It can be seen that these concepts have similar geometric arrangements in both spaces, suggesting that
Source: Exploiting Similarities among Languages for Machine Translation. Mikolov, Thomas et al.
2013.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
DEEP LEARNING ?
Word2vec isnā€™t Deep Learning, the model is actually very shallow.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
DEEP LEARNING ?
Word2vec isnā€™t Deep Learning, the model is actually very shallow.
However, there is an important relation here, because word
embeddings are usually used to initialize dense LSTM embeddings
for different tasks using deep architectures.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
DEEP LEARNING ?
Word2vec isnā€™t Deep Learning, the model is actually very shallow.
However, there is an important relation here, because word
embeddings are usually used to initialize dense LSTM embeddings
for different tasks using deep architectures.
Also, you can of course train Word2vec models using techniques
developed in Deep Learning context.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
DEMO
Demo time for some word analogies in Portuguese.
Model trained by Kyubyong Park.
Trained on Wikipedia (pt) - 1.3GB corpus
w āˆˆ Rn
where n is 300
Vocabulary size is 50246
Model available at https://github.com/Kyubyong/wordvectors
For comparison, Wikipedia (en) is 13.5GB
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
Section III
WORD MOVERS DISTANCE
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD MOVERS DISTANCE
Word2vec deļ¬nes a vector for word, but how can we use its
information to compare documents ?
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD MOVERS DISTANCE
Word2vec deļ¬nes a vector for word, but how can we use its
information to compare documents ?
There are many approaches to represent documents, to mention:
BOW, TF-IDF, N-grams, etc. However, they frequently show
near-orthogonality.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD MOVERS DISTANCE
Take the two sentences:
ā€œObama speaks to the media in Illinoisā€
and
ā€œThe President greets the press in Chicagoā€
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD MOVERS DISTANCE
Take the two sentences:
ā€œObama speaks to the media in Illinoisā€
and
ā€œThe President greets the press in Chicagoā€
While these sentences have no words in common, they convey
nearly the same information, a fact that cannot be represented by
the BOW model (Kusner, Matt J. et al. 2015).
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD MOVERS DISTANCE
KILIAN@WUSTL.EDU
is, MO 63130
ā€˜Obamaā€™
word2vec embedding
ā€˜Presidentā€™
ā€˜speaksā€™
ā€˜Illinoisā€™
ā€˜mediaā€™
ā€˜greetsā€™
ā€˜pressā€™
ā€˜Chicagoā€™
document 2document 1
Obama
speaks
to
the
media
in
Illinois
The
President
greets
the
press
in
Chicago
Figure 1. An illustration of the word moverā€™s distance. All
non-stop words (bold) of both documents are embedded into a
Source: From Word Embeddings To Document Distances. Kusner, Matt J. et al. 2015.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD MOVERS DISTANCE
rd Embeddings To Document Distances
3a) for more
ture and the
m model can
f words per
The ability
del to learn
- vec(sushi)
(Einstein) -
) (Mikolov
g is entirely
xt corpus of
ough we use
ghout, other
eston, 2008;
The President greets the press in Chicago.
Obama speaks in Illinois.
1.30
D1
D2
D3
D0
D0 The President greets the press in Chicago.
Obama speaks to the media in Illinois.
The band gave a concert in Japan.
0.49 0.42 0.44
0.200.240.451.07
1.63
+ +=
= + + + 0.28
0.18+
Figure 2. (Top:) The components of the WMD metric between a
query D0 and two sentences D1, D2 (with equal BOW distance).
The arrows represent ļ¬‚ow between two words and are labeled
with their distance contribution. (Bottom:) The ļ¬‚ow between two
sentences D3 and D0 with different numbers of words. This mis-
Source: From Word Embeddings To Document Distances. Kusner, Matt J. et al. 2015.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
WORD MOVERS DISTANCE
From Word Embeddings To Document Distances
1 2 3 4 5 6 7 8
0
10
20
30
40
50
60
70
twitter recipe ohsumed classic reuters amazon
testerror%
43
33
44
33 32 32
29
66
63 61
49 51
44
36
8.0 9.7
62
44 41
35
6.9
5.0
6.7
2.8
33
29
14
8.16.96.3
3.5
59
42
28
14
17
12
9.3
7.4
34
17
22
21
8.4
6.4
4.3
21
4.6
53 53
59
54
48
45
43
51
56 54
58
36
40
31
29
27
20newsbbcsport
k-nearest neighbor error
BOW [Frakes & Baeza-Yates, 1992]
TF-IDF [Jones, 1972]
Okapi BM25 [Robertson & Walker, 1994]
LSI [Deerwester et al., 1990]
LDA [Blei et al., 2003]
mSDA [Chen et al., 2012]
Componential Counting Grid [Perina et al., 2013]
Word Mover's Distance
Figure 3. The kNN test error results on 8 document classiļ¬cation data sets, compared to canonical and state-of-the-art baselines methods.
1 2 3 4 5 6 7 8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
averageerrorw.r.t.BOW
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0
1.29
1.15
1.0
0.72
0.60 0.55
0.49 0.42
BOW
TF-IDF
Okapi BM25
LSI
LDA
mSDA
CCG
WMD
Figure 4. The kNN test errors of various document metrics aver-
aged over all eight datasets, relative to kNN with BOW.
w, TF(w, D) is its term frequency in document D, |D| is
Table 2. Test error percentage and standard deviation for different
text embeddings. NIPS, AMZ, News are word2vec (w2v) models
trained on different data sets whereas HLBL and Collo were also
obtained with other embedding algorithms.
DOCUMENT k-NEAREST NEIGHBOR RESULTS
DATASET HLBL CW NIPS AMZ NEWS
(W2V) (W2V) (W2V)
BBCSPORT 4.5 8.2 9.5 4.1 5.0
TWITTER 33.3 33.7 29.3 28.1 28.3
RECIPE 47.0 51.6 52.7 47.4 45.1
OHSUMED 52.0 56.2 55.6 50.4 44.5
CLASSIC 5.3 5.5 4.0 3.8 3.0
REUTERS 4.2 4.6 7.1 9.1 3.5
AMAZON 12.3 13.3 13.9 7.8 7.2
Source: From Word Embeddings To Document Distances. Kusner, Matt J. et al. 2015.
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
Section IV
Q&A
INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A
Q&A

More Related Content

What's hot

What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?Traian Rebedea
Ā 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
Ā 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
Ā 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
Ā 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
Ā 
Word embeddings
Word embeddingsWord embeddings
Word embeddingsShruti kar
Ā 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Hady Elsahar
Ā 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity RecognitionTomer Lieber
Ā 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector spaceAbdullah Khan Zehady
Ā 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSSujit Pal
Ā 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
Ā 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processinggulshan kumar
Ā 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processingSanzid Kawsar
Ā 
Knowledge Graph Embedding
Knowledge Graph EmbeddingKnowledge Graph Embedding
Knowledge Graph Embeddingssuserf2f0fe
Ā 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Databricks
Ā 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlpankit_ppt
Ā 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
Ā 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introductionRobert Lujo
Ā 

What's hot (20)

What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
Ā 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Ā 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
Ā 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Ā 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
Ā 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
Ā 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
Ā 
Bert
BertBert
Bert
Ā 
Introduction to Named Entity Recognition
Introduction to Named Entity RecognitionIntroduction to Named Entity Recognition
Introduction to Named Entity Recognition
Ā 
NLP
NLPNLP
NLP
Ā 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
Ā 
Building Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDSBuilding Named Entity Recognition Models Efficiently using NERDS
Building Named Entity Recognition Models Efficiently using NERDS
Ā 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Ā 
Natural lanaguage processing
Natural lanaguage processingNatural lanaguage processing
Natural lanaguage processing
Ā 
Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
Ā 
Knowledge Graph Embedding
Knowledge Graph EmbeddingKnowledge Graph Embedding
Knowledge Graph Embedding
Ā 
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Building a Pipeline for State-of-the-Art Natural Language Processing Using Hu...
Ā 
Intro to nlp
Intro to nlpIntro to nlp
Intro to nlp
Ā 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
Ā 
Natural language processing (NLP) introduction
Natural language processing (NLP) introductionNatural language processing (NLP) introduction
Natural language processing (NLP) introduction
Ā 

Similar to Word Embeddings - Introduction

Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptxChode Amarnath
Ā 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networksconnectbeubax
Ā 
Paper dissected glove_ global vectors for word representation_ explained _ ...
Paper dissected   glove_ global vectors for word representation_ explained _ ...Paper dissected   glove_ global vectors for word representation_ explained _ ...
Paper dissected glove_ global vectors for word representation_ explained _ ...Nikhil Jaiswal
Ā 
Multilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesMultilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesGerard de Melo
Ā 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsVincenzo Lomonaco
Ā 
ŁˆŲ±Ų“Ų© ŲŖŲ¶Ł…ŁŠŁ† Ų§Ł„ŁƒŁ„Ł…Ų§ŲŖ ŁŁŠ Ų§Ł„ŲŖŲ¹Ł„Ł… Ų§Ł„Ų¹Ł…ŁŠŁ‚ Word embeddings workshop
ŁˆŲ±Ų“Ų© ŲŖŲ¶Ł…ŁŠŁ† Ų§Ł„ŁƒŁ„Ł…Ų§ŲŖ ŁŁŠ Ų§Ł„ŲŖŲ¹Ł„Ł… Ų§Ł„Ų¹Ł…ŁŠŁ‚ Word embeddings workshopŁˆŲ±Ų“Ų© ŲŖŲ¶Ł…ŁŠŁ† Ų§Ł„ŁƒŁ„Ł…Ų§ŲŖ ŁŁŠ Ų§Ł„ŲŖŲ¹Ł„Ł… Ų§Ł„Ų¹Ł…ŁŠŁ‚ Word embeddings workshop
ŁˆŲ±Ų“Ų© ŲŖŲ¶Ł…ŁŠŁ† Ų§Ł„ŁƒŁ„Ł…Ų§ŲŖ ŁŁŠ Ų§Ł„ŲŖŲ¹Ł„Ł… Ų§Ł„Ų¹Ł…ŁŠŁ‚ Word embeddings workshopiwan_rg
Ā 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
Ā 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data scienceNikhil Jaiswal
Ā 
DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonDF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonMoscowDataFest
Ā 
Introduction to word embeddings with Python
Introduction to word embeddings with PythonIntroduction to word embeddings with Python
Introduction to word embeddings with PythonPavel Kalaidin
Ā 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptxGowrySailaja
Ā 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsForward Gradient
Ā 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for LexicographyLeiden University
Ā 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecšŸ‘‹ Christopher Moody
Ā 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
Ā 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlpLaraOlmosCamarena
Ā 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...IRJET Journal
Ā 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distancesGanesh Borle
Ā 

Similar to Word Embeddings - Introduction (20)

Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptx
Ā 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
Ā 
Paper dissected glove_ global vectors for word representation_ explained _ ...
Paper dissected   glove_ global vectors for word representation_ explained _ ...Paper dissected   glove_ global vectors for word representation_ explained _ ...
Paper dissected glove_ global vectors for word representation_ explained _ ...
Ā 
Lda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notesLda2vec text by the bay 2016 with notes
Lda2vec text by the bay 2016 with notes
Ā 
Multilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesMultilingual Text Classification using Ontologies
Multilingual Text Classification using Ontologies
Ā 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
Ā 
ŁˆŲ±Ų“Ų© ŲŖŲ¶Ł…ŁŠŁ† Ų§Ł„ŁƒŁ„Ł…Ų§ŲŖ ŁŁŠ Ų§Ł„ŲŖŲ¹Ł„Ł… Ų§Ł„Ų¹Ł…ŁŠŁ‚ Word embeddings workshop
ŁˆŲ±Ų“Ų© ŲŖŲ¶Ł…ŁŠŁ† Ų§Ł„ŁƒŁ„Ł…Ų§ŲŖ ŁŁŠ Ų§Ł„ŲŖŲ¹Ł„Ł… Ų§Ł„Ų¹Ł…ŁŠŁ‚ Word embeddings workshopŁˆŲ±Ų“Ų© ŲŖŲ¶Ł…ŁŠŁ† Ų§Ł„ŁƒŁ„Ł…Ų§ŲŖ ŁŁŠ Ų§Ł„ŲŖŲ¹Ł„Ł… Ų§Ł„Ų¹Ł…ŁŠŁ‚ Word embeddings workshop
ŁˆŲ±Ų“Ų© ŲŖŲ¶Ł…ŁŠŁ† Ų§Ł„ŁƒŁ„Ł…Ų§ŲŖ ŁŁŠ Ų§Ł„ŲŖŲ¹Ł„Ł… Ų§Ł„Ų¹Ł…ŁŠŁ‚ Word embeddings workshop
Ā 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
Ā 
[Emnlp] what is glo ve part ii - towards data science
[Emnlp] what is glo ve  part ii - towards data science[Emnlp] what is glo ve  part ii - towards data science
[Emnlp] what is glo ve part ii - towards data science
Ā 
DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with PythonDF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
DF1 - Py - Kalaidin - Introduction to Word Embeddings with Python
Ā 
Introduction to word embeddings with Python
Introduction to word embeddings with PythonIntroduction to word embeddings with Python
Introduction to word embeddings with Python
Ā 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
Ā 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
Ā 
New word analogy corpus
New word analogy corpusNew word analogy corpus
New word analogy corpus
Ā 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
Ā 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
Ā 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
Ā 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
Ā 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
Ā 
Word Embedding to Document distances
Word Embedding to Document distancesWord Embedding to Document distances
Word Embedding to Document distances
Ā 

More from Christian Perone

Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionChristian Perone
Ā 
Bayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studiesBayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studiesChristian Perone
Ā 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningChristian Perone
Ā 
PyTorch under the hood
PyTorch under the hoodPyTorch under the hood
PyTorch under the hoodChristian Perone
Ā 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonChristian Perone
Ā 
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooDeep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooChristian Perone
Ā 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
Ā 
Machine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learnMachine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learnChristian Perone
Ā 
Python - IntroduĆ§Ć£o BĆ”sica
Python - IntroduĆ§Ć£o BĆ”sicaPython - IntroduĆ§Ć£o BĆ”sica
Python - IntroduĆ§Ć£o BĆ”sicaChristian Perone
Ā 
C++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing featuresC++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing featuresChristian Perone
Ā 

More from Christian Perone (11)

PyTorch 2 Internals
PyTorch 2 InternalsPyTorch 2 Internals
PyTorch 2 Internals
Ā 
Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introduction
Ā 
Bayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studiesBayesian modelling for COVID-19 seroprevalence studies
Bayesian modelling for COVID-19 seroprevalence studies
Ā 
Uncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep LearningUncertainty Estimation in Deep Learning
Uncertainty Estimation in Deep Learning
Ā 
PyTorch under the hood
PyTorch under the hoodPyTorch under the hood
PyTorch under the hood
Ā 
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Ā 
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural ZooDeep Learning - Convolutional Neural Networks - Architectural Zoo
Deep Learning - Convolutional Neural Networks - Architectural Zoo
Ā 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
Ā 
Machine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learnMachine Learning com Python e Scikit-learn
Machine Learning com Python e Scikit-learn
Ā 
Python - IntroduĆ§Ć£o BĆ”sica
Python - IntroduĆ§Ć£o BĆ”sicaPython - IntroduĆ§Ć£o BĆ”sica
Python - IntroduĆ§Ć£o BĆ”sica
Ā 
C++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing featuresC++0x :: Introduction to some amazing features
C++0x :: Introduction to some amazing features
Ā 

Recently uploaded

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
Ā 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
Ā 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoĆ£o Esperancinha
Ā 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
Ā 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
Ā 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
Ā 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
Ā 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
Ā 
Microsoft 365 Copilot: How to boost your productivity with AI ā€“ Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI ā€“ Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI ā€“ Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI ā€“ Part one: Ado...Nikki Chapple
Ā 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
Ā 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
Ā 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
Ā 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
Ā 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
Ā 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
Ā 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
Ā 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
Ā 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
Ā 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
Ā 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
Ā 

Recently uploaded (20)

Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
Ā 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
Ā 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
Ā 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Ā 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
Ā 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
Ā 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
Ā 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
Ā 
Microsoft 365 Copilot: How to boost your productivity with AI ā€“ Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI ā€“ Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI ā€“ Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI ā€“ Part one: Ado...
Ā 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
Ā 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Ā 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
Ā 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
Ā 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Ā 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
Ā 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Ā 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
Ā 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
Ā 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
Ā 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Ā 

Word Embeddings - Introduction

  • 1. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD Embeddings A non-exhaustive introduction to Word Embeddings x y z w Christian S. Perone christian.perone@gmail.com
  • 2. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A AGENDA INTRODUCTION Philosophy of Language Vector Space Model Embeddings Word Embeddings Language Modeling WORD2VEC Introduction Semantic Relations Other properties WORD MOVERS DISTANCE Rationale Model Results Q&A
  • 3. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WHO AM I Christian S. Perone Machine Learning/Software Engineer Blog http://blog.christianperone.com Open-source projects https://github.com/perone Twitter @tarantulae
  • 4. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A Section I INTRODUCTION
  • 5. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A PHILOSOPHY OF LANGUAGE (...) the meaning of a word is its use in the language. ā€”Wittgenstein, Ludwig, Philosophical Investigations ā€“ 1953
  • 6. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A VECTOR SPACE MODEL Interpreted in a lato sensu, VSM is a space where text is represented as a vector of numbers instead of its original textual representation
  • 7. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A VECTOR SPACE MODEL Interpreted in a lato sensu, VSM is a space where text is represented as a vector of numbers instead of its original textual representation Many approaches to go from other spaces to a vector space
  • 8. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A VECTOR SPACE MODEL Interpreted in a lato sensu, VSM is a space where text is represented as a vector of numbers instead of its original textual representation Many approaches to go from other spaces to a vector space Many advantages when you have vectors with special properties
  • 9. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A VECTOR SPACE MODEL
  • 10. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A EMBEDDINGS From a space with one dimension per word to a continuous vector space with much lower dimensionality. From one mathematical object to another, but preserving ā€œstructureā€. Source: Our beloved scikit-learn.
  • 11. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD EMBEDDINGS Word Model Word Embedding V(cat) = [ 1.4, -1.3, ... ] Cat sat mat on cat = [ 0, 1, 0, ... ] Sparse Dense From a sparse representation (usually one-hot encoding) to a dense representation Embeddings created as by-product vs explicit model
  • 12. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A LANGUAGE MODELING P(w1, Ā· Ā· Ā· , wn) = i P(wi | w1, Ā· Ā· Ā· , wiāˆ’1) P(ā€the cat sat on the matā€œ) > P(ā€the mat sat on the catā€œ) Useful for many different tasks, such as speech recognition, handwriting recognition, translation, etc. Naive counting: doesnā€™t generalize, too many possible sentences A word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. [Bengio et al, 2003] Markov assumption / how to approximate it
  • 13. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A MARKOV ASSUMPTION AND N-GRAM MODELS Markov assumption simpliļ¬es the model, it tries to approximate the components of the product. Unigram: P(w1, Ā· Ā· Ā· , wn) = i P(wi) Bigram: P(wi | w1, Ā· Ā· Ā· , wiāˆ’1) ā‰ˆ P(wi | wiāˆ’1) Extend to trigram, 4-gram, etc.
  • 14. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD EMBEDDINGS CHARACTERISTICS Language modeling, low-dimensional and dense but increased complexity. Examples: neural language models, word2vec, GloVe, etc. Source: Bengio et al., 2003 Classic neural language model proposed by Bengio et al. in 2003. After that, many other important works by Collobert and Weston (2008) and then by Mikolov et al (2013).
  • 15. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A Section II WORD2VEC
  • 16. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD2VEC Unsupervised technique with supervised tasks, takes a text corpus and produces word embeddings as output. Two different architectures: w(t-2) w(t+1) w(t-1) w(t+2) w(t) SUM INPUT PROJECTION OUTPUT w(t) INPUT PROJECTION OUTPUT w(t-2) w(t-1) w(t+1) w(t+2) CBOW Skip-gram Figure 2: Graphical representation of the CBOW model and Skip-gram model. In the CBOW model, the distributed representations of context (or surrounding words) are combined to predict the word in the middle. In the Skip-gram model, the distributed representation of the input word is used to predict the context. Source: Exploiting Similarities among Languages for Machine Translation. Mikolov, Thomas et al. 2013.
  • 17. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD2VEC Source: TensorFlow
  • 18. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A AMAZING EMBEDDINGS Semantic relationships are often preserved on vector operations. Source: TensorFlow
  • 19. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD ANALOGIES Suppose we have the vector w āˆˆ Rn of any given word such as wking, then we can do:
  • 20. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD ANALOGIES Suppose we have the vector w āˆˆ Rn of any given word such as wking, then we can do: wking āˆ’ wman + wwoman ā‰ˆ wqueen This vector operation shows that the closest word vector to the resulting vector is the vector wqueen.
  • 21. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD ANALOGIES Suppose we have the vector w āˆˆ Rn of any given word such as wking, then we can do: wking āˆ’ wman + wwoman ā‰ˆ wqueen This vector operation shows that the closest word vector to the resulting vector is the vector wqueen. This is an amazing property for the word embeddings, because it means that they carry important relational information that can be used to many different tasks.
  • 22. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A LANGUAGE STRUCTURE āˆ’0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 āˆ’0.25 āˆ’0.2 āˆ’0.15 āˆ’0.1 āˆ’0.05 0 0.05 0.1 0.15 one two three four five ā†’ āˆ’0.2 0 0.2 0.4 0.6 0.8 1 1.2 āˆ’0.6 āˆ’0.5 āˆ’0.4 āˆ’0.3 āˆ’0.2 āˆ’0.1 0 0.1 0.2 uno (one) dos (two) tres (three) cuatro (four) cinco (five) āˆ’0.3 āˆ’0.25 āˆ’0.2 āˆ’0.15 āˆ’0.1 āˆ’0.05 0 0.05 0.1 0.15 āˆ’0.3 āˆ’0.25 āˆ’0.2 āˆ’0.15 āˆ’0.1 āˆ’0.05 0 0.05 0.1 0.15 0.2 cat dog cow horse pig āˆ’0.5 āˆ’0.4 āˆ’0.3 āˆ’0.2 āˆ’0.1 0 0.1 0.2 0.3 0.4 0.5 āˆ’0.5 āˆ’0.4 āˆ’0.3 āˆ’0.2 āˆ’0.1 0 0.1 0.2 0.3 0.4 0.5 gato (cat) perro (dog) vaca (cow) caballo (horse) cerdo (pig) Figure 1: Distributed word vector representations of numbers and animals in English (left) and Spanish (right). The ļ¬ve vectors in each language were projected down to two dimensions using PCA, and then manually rotated to accentuate their similarity. It can be seen that these concepts have similar geometric arrangements in both spaces, suggesting that Source: Exploiting Similarities among Languages for Machine Translation. Mikolov, Thomas et al. 2013.
  • 23. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A DEEP LEARNING ? Word2vec isnā€™t Deep Learning, the model is actually very shallow.
  • 24. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A DEEP LEARNING ? Word2vec isnā€™t Deep Learning, the model is actually very shallow. However, there is an important relation here, because word embeddings are usually used to initialize dense LSTM embeddings for different tasks using deep architectures.
  • 25. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A DEEP LEARNING ? Word2vec isnā€™t Deep Learning, the model is actually very shallow. However, there is an important relation here, because word embeddings are usually used to initialize dense LSTM embeddings for different tasks using deep architectures. Also, you can of course train Word2vec models using techniques developed in Deep Learning context.
  • 26. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A DEMO Demo time for some word analogies in Portuguese. Model trained by Kyubyong Park. Trained on Wikipedia (pt) - 1.3GB corpus w āˆˆ Rn where n is 300 Vocabulary size is 50246 Model available at https://github.com/Kyubyong/wordvectors For comparison, Wikipedia (en) is 13.5GB
  • 27. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A Section III WORD MOVERS DISTANCE
  • 28. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD MOVERS DISTANCE Word2vec deļ¬nes a vector for word, but how can we use its information to compare documents ?
  • 29. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD MOVERS DISTANCE Word2vec deļ¬nes a vector for word, but how can we use its information to compare documents ? There are many approaches to represent documents, to mention: BOW, TF-IDF, N-grams, etc. However, they frequently show near-orthogonality.
  • 30. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD MOVERS DISTANCE Take the two sentences: ā€œObama speaks to the media in Illinoisā€ and ā€œThe President greets the press in Chicagoā€
  • 31. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD MOVERS DISTANCE Take the two sentences: ā€œObama speaks to the media in Illinoisā€ and ā€œThe President greets the press in Chicagoā€ While these sentences have no words in common, they convey nearly the same information, a fact that cannot be represented by the BOW model (Kusner, Matt J. et al. 2015).
  • 32. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD MOVERS DISTANCE KILIAN@WUSTL.EDU is, MO 63130 ā€˜Obamaā€™ word2vec embedding ā€˜Presidentā€™ ā€˜speaksā€™ ā€˜Illinoisā€™ ā€˜mediaā€™ ā€˜greetsā€™ ā€˜pressā€™ ā€˜Chicagoā€™ document 2document 1 Obama speaks to the media in Illinois The President greets the press in Chicago Figure 1. An illustration of the word moverā€™s distance. All non-stop words (bold) of both documents are embedded into a Source: From Word Embeddings To Document Distances. Kusner, Matt J. et al. 2015.
  • 33. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD MOVERS DISTANCE rd Embeddings To Document Distances 3a) for more ture and the m model can f words per The ability del to learn - vec(sushi) (Einstein) - ) (Mikolov g is entirely xt corpus of ough we use ghout, other eston, 2008; The President greets the press in Chicago. Obama speaks in Illinois. 1.30 D1 D2 D3 D0 D0 The President greets the press in Chicago. Obama speaks to the media in Illinois. The band gave a concert in Japan. 0.49 0.42 0.44 0.200.240.451.07 1.63 + += = + + + 0.28 0.18+ Figure 2. (Top:) The components of the WMD metric between a query D0 and two sentences D1, D2 (with equal BOW distance). The arrows represent ļ¬‚ow between two words and are labeled with their distance contribution. (Bottom:) The ļ¬‚ow between two sentences D3 and D0 with different numbers of words. This mis- Source: From Word Embeddings To Document Distances. Kusner, Matt J. et al. 2015.
  • 34. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A WORD MOVERS DISTANCE From Word Embeddings To Document Distances 1 2 3 4 5 6 7 8 0 10 20 30 40 50 60 70 twitter recipe ohsumed classic reuters amazon testerror% 43 33 44 33 32 32 29 66 63 61 49 51 44 36 8.0 9.7 62 44 41 35 6.9 5.0 6.7 2.8 33 29 14 8.16.96.3 3.5 59 42 28 14 17 12 9.3 7.4 34 17 22 21 8.4 6.4 4.3 21 4.6 53 53 59 54 48 45 43 51 56 54 58 36 40 31 29 27 20newsbbcsport k-nearest neighbor error BOW [Frakes & Baeza-Yates, 1992] TF-IDF [Jones, 1972] Okapi BM25 [Robertson & Walker, 1994] LSI [Deerwester et al., 1990] LDA [Blei et al., 2003] mSDA [Chen et al., 2012] Componential Counting Grid [Perina et al., 2013] Word Mover's Distance Figure 3. The kNN test error results on 8 document classiļ¬cation data sets, compared to canonical and state-of-the-art baselines methods. 1 2 3 4 5 6 7 8 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 averageerrorw.r.t.BOW 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 1.29 1.15 1.0 0.72 0.60 0.55 0.49 0.42 BOW TF-IDF Okapi BM25 LSI LDA mSDA CCG WMD Figure 4. The kNN test errors of various document metrics aver- aged over all eight datasets, relative to kNN with BOW. w, TF(w, D) is its term frequency in document D, |D| is Table 2. Test error percentage and standard deviation for different text embeddings. NIPS, AMZ, News are word2vec (w2v) models trained on different data sets whereas HLBL and Collo were also obtained with other embedding algorithms. DOCUMENT k-NEAREST NEIGHBOR RESULTS DATASET HLBL CW NIPS AMZ NEWS (W2V) (W2V) (W2V) BBCSPORT 4.5 8.2 9.5 4.1 5.0 TWITTER 33.3 33.7 29.3 28.1 28.3 RECIPE 47.0 51.6 52.7 47.4 45.1 OHSUMED 52.0 56.2 55.6 50.4 44.5 CLASSIC 5.3 5.5 4.0 3.8 3.0 REUTERS 4.2 4.6 7.1 9.1 3.5 AMAZON 12.3 13.3 13.9 7.8 7.2 Source: From Word Embeddings To Document Distances. Kusner, Matt J. et al. 2015.
  • 35. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A Section IV Q&A
  • 36. INTRODUCTION WORD2VEC WORD MOVERS DISTANCE Q&A Q&A