SlideShare a Scribd company logo
1 of 22
Download to read offline
On the Limitations of
Unsupervised Bilingual
Dictionary Induction
Sebastian Ruder Ivan VulićAnders Søgaard
‣ Recently: Unsupervised neural machine translation
(Artetxe et al., ICLR 2018; Lample et al., ICLR 2018)
2
Background:

Unsupervised MT
‣ Key component:
Initialization via
unsupervised cross-lingual
alignment of word
embedding spaces
‣ Cross-lingual word embeddings enable cross-lingual
transfer
‣ Most common approach: Project one word embedding
space into another by learning a transformation matrix 

between source embeddings and their translations
(Mikolov et al., 2013)
‣ More recently: Use an adversarial setup to learn an
unsupervised mapping
‣ Assumption: Word embedding spaces are approximately
isomorphic, i.e. same number of vertices, connected the
same way.
3
Background:

Cross-lingual word embeddings
n
∑
i=1
∥Wxi − yi∥2
W
xi yin
‣ Nearest neighbour (NN) graphs of top 10 most frequent
words in English and German are not isomorphic.
‣ NN graphs of top 10 most frequent English words and their
translations into German
4
How similar are embeddings
across languages?
English German
‣ Not isomorphic
‣ NN graphs of top 10 most frequent English nouns and their
translations
5
How similar are embeddings
across languages?
English German
‣ Not isomorphic
Word embeddings are not approximately isomorphic across
languages.
‣ Need a metric to measure how similar two NN graphs
and of different languages are
‣ Propose eigenvector similarity
‣ : adjacency matrices of
‣ : degree matrices of
‣ : Laplacians of
‣ : eigenvalues (spectra) of
6
How do we quantify similarity?
G1
G2
A1, A2 G1, G2
G1, G2D1, D2
L1 = D1 − A1, L2 = D2 − A2 G1, G2
λ1, λ2 L1, L2
Δ =
k
∑
i=1
(λ1i
− λ2i
)2 k = min
j
{
∑
k
i=1
λji
∑
n
i=1
λji
> 0.9}‣ Metric: where
7
How do we quantify similarity?
‣ Quantifies how much two NN graphs are isospectral, i.e.
they have the same spectrum (same sets of eigenvalues).
‣ Isomorphic isospectral, but isospectral isomorphic
‣
‣ : are isospectral (very similar)
‣ : become less similar
→ ↛
Δ : G1, G2 → [0,∞)
Δ = 0 G1, G2
Δ → ∞ G1, G2
Δ =
k
∑
i=1
(λ1i
− λ2i
)2 k = min
j
{
∑
k
i=1
λji
∑
n
i=1
λji
> 0.9}‣ Metric: where
‣ Besides isomorphism, several other implicit assumptions
‣ May or may not scale to low-resource languages
8
Unsupervised cross-lingual
learning assumptions
Conneau et al. (2018) This work
Languages
Dependent-marking,
fusional and isolating
Agglutinative, many cases
Corpora Comparable (Wikipedia) Different domains
Algorithms/
hyperparameters
Same Different
1. Monolingual word embeddings:

Learn monolingual vector spaces and .
2. Adversarial mapping:

Learn a translation matrix . Train discriminator to
discriminate samples from and .
9
Conneau et al. (2018)
X Y
W
WX Y
3. Refinement (Procrustes analysis):

Build bilingual dictionary of frequent words using . Learn
a new based on frequent word pairs.
4. Cross-domain similarity local scaling (CSLS):

Use similarity measure that increases similarity of isolated
word vectors, decreases similarity of vectors in dense areas.
10
Conneau et al. (2018)
W
W
‣ Extract identically spelled words in both languages
‣ Use these as bilingual seed words
‣ Run refinement step of Conneau et al. (2018)
11
A simple weakly supervised method
12
Experiments:

Bilingual dictionary induction
‣ Given a list of source language words, find the closest
target language word in the cross-lingual embedding space
‣ Compare against a gold standard dictionary
‣ Metric: Precision at 1 (P@1)
‣ Use fastText monolingual embeddings
Conneau et al. (2018) This work
Languages

(English to)
French, German,
Chinese, Russian,
Spanish
Estonian (ET), Finnish (FI),
Greek (EL), Hungarian
(HU), Polish (PL), Turkish
13
Impact of language similarity
‣ Unsupervised approaches are challenged by languages that
are not isolating and not dependent marking
‣ Naive supervision leads to competitive performance on
similar language pairs and better results for dissimilar pairs
P@1
0
22.5
45
67.5
90
EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI
Unsupervised (Adversarial) Weakly supervised (Identical strings)
14
Impact of language similarity
‣ Eigenvector similarity strongly correlates with BDI
performance
Eigenvectorsimilarity
0
2
4
6
8
BDI performance
0 22.5 45 67.5 90
(ρ ∼ 0.89)
15
Impact of domain differences
English-Spanish
P@1
0
17.5
35
52.5
70
Domainsimilarity
0
0.2
0.4
0.6
0.8
EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
‣ Source and target embeddings induced on 3 corpora:

EuroParl (EP), Wikipedia (Wiki), Medical (EMEA)
‣ Unsupervised approaches break down when domains are
dissimilar
16
Impact of domain differences
English-Finnish
P@1
0
7.5
15
22.5
30
Domainsimilarity
0
0.2
0.4
0.6
0.8
EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
‣ Domain differences may exacerbate difficulties of
generalising across dissimilar languages
17
Impact of domain differences
English-Hungarian
P@1
0
7.5
15
22.5
30
Domainsimilarity
0
0.2
0.4
0.6
0.8
EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA
Domain similarity Unsupervised Weakly supervised
‣ Weak supervision helps to bridge domain differences, but
performance still deteriorates
18
Impact of hyper-parametersP@1
0
22.5
45
67.5
90
== win=10 ngrams=2-7 win=10, ngrams=2-7
English-Spanish (skipgram) English-Spanish (cbow)
‣ Settings: English with skipgram, win=2, ngrams=3-6
‣ Vary hyper-parameters of Spanish embeddings
≠ ≠ ≠
19
Impact of hyper-parametersP@1
0
22.5
45
67.5
90
== win=10 ngrams=2-7 win=10, ngrams=2-7
English-Spanish (skipgram) English-Spanish (cbow)
‣ Different algorithms introduce embedding spaces with
wildly different structures.
≠ ≠ ≠
‣ Worse performance overall, but better performance for
dissimilar language pairs (Estonian, Finnish, Greek).
‣ Monolingual word embeddings may overfit to rare
peculiarities of languages. 20
Impact of dimensionalityP@1
0
22.5
45
67.5
90
EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR
300-dimensional embeddings 40-dimensional embeddings
21
Impact of evaluation procedure
‣ Part-of-speech:

Performance on verbs is lowest across the board.
‣ Frequency:

Sensitivity to frequency for Hungarian, but less so for
Spanish.
‣ Homographs:

Lower precision due to loan words/proper names. High
precision for free with weak supervision.

‣ Word embedding spaces are not approximately
isomorphic across languages.
‣ We can use eigenvector similarity to characterise the
relatedness of two monolingual vector spaces.
‣ Eigenvector similarity strongly correlates with
unsupervised bilingual dictionary induction performance.
‣ Limitations of unsupervised bilingual dictionary induction:
‣ Morphologically rich languages.
‣ Corpora from different domains.
‣ Different word embedding algorithms.
22
Takeaways

More Related Content

Similar to On the Limitations of Unsupervised Bilingual Dictionary Induction

LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTSLEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTSijwscjournal
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...Tobias Kuhn
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageTobias Kuhn
 
Slides:Coercion Quantification
Slides:Coercion QuantificationSlides:Coercion Quantification
Slides:Coercion QuantificationNingningXIE1
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmMeetupDataScienceRoma
 
Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
 Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli... Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...English Literature and Language Review ELLR
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningLena Shakurova
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...csandit
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Grammatical encoding. PURMOHAMMAD
Grammatical encoding. PURMOHAMMADGrammatical encoding. PURMOHAMMAD
Grammatical encoding. PURMOHAMMADMehdi Purmohammad
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Yuki Tomo
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)ThennarasuSakkan
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Ldml - public
Ldml - publicLdml - public
Ldml - publicseanscon
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 

Similar to On the Limitations of Unsupervised Bilingual Dictionary Induction (20)

LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTSLEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
LEARNING CROSS-LINGUAL WORD EMBEDDINGS WITH UNIVERSAL CONCEPTS
 
ijcai11
ijcai11ijcai11
ijcai11
 
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
A Multilingual Semantic Wiki based on Attempto Controlled English and Grammat...
 
A Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural LanguageA Multilingual Semantic Wiki Based on Controlled Natural Language
A Multilingual Semantic Wiki Based on Controlled Natural Language
 
Slides:Coercion Quantification
Slides:Coercion QuantificationSlides:Coercion Quantification
Slides:Coercion Quantification
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
 Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli... Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Grammatical encoding. PURMOHAMMAD
Grammatical encoding. PURMOHAMMADGrammatical encoding. PURMOHAMMAD
Grammatical encoding. PURMOHAMMAD
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
About programming languages
About programming languagesAbout programming languages
About programming languages
 
11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)11 terms in corpus linguistics1 (1)
11 terms in corpus linguistics1 (1)
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Ldml - public
Ldml - publicLdml - public
Ldml - public
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 

More from Sebastian Ruder

Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSebastian Ruder
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoSebastian Ruder
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiSebastian Ruder
 
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana IfrimHashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana IfrimSebastian Ruder
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingSebastian Ruder
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningSebastian Ruder
 
Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Sebastian Ruder
 
Spoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer GilmartinSpoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer GilmartinSebastian Ruder
 
NIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian RuderNIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian RuderSebastian Ruder
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverSebastian Ruder
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoSebastian Ruder
 
Funded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIENFunded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIENSebastian Ruder
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...Sebastian Ruder
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Sebastian Ruder
 
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...Sebastian Ruder
 

More from Sebastian Ruder (20)

Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain ShiftStrong Baselines for Neural Semi-supervised Learning under Domain Shift
Strong Baselines for Neural Semi-supervised Learning under Domain Shift
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
Successes and Frontiers of Deep Learning
Successes and Frontiers of Deep LearningSuccesses and Frontiers of Deep Learning
Successes and Frontiers of Deep Learning
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
 
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian MihaiMachine intelligence in HR technology: resume analysis at scale - Adrian Mihai
Machine intelligence in HR technology: resume analysis at scale - Adrian Mihai
 
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana IfrimHashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
Hashtagger+: Real-time Social Tagging of Streaming News - Dr. Georgiana Ifrim
 
Transfer Learning for Natural Language Processing
Transfer Learning for Natural Language ProcessingTransfer Learning for Natural Language Processing
Transfer Learning for Natural Language Processing
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...Making sense of word senses: An introduction to word-sense disambiguation and...
Making sense of word senses: An introduction to word-sense disambiguation and...
 
Spoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer GilmartinSpoken Dialogue Systems and Social Talk - Emer Gilmartin
Spoken Dialogue Systems and Social Talk - Emer Gilmartin
 
NIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian RuderNIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian Ruder
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John Glover
 
Multi-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer CalixtoMulti-modal Neural Machine Translation - Iacer Calixto
Multi-modal Neural Machine Translation - Iacer Calixto
 
Funded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIENFunded PhD/MSc. Opportunities at AYLIEN
Funded PhD/MSc. Opportunities at AYLIEN
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
Idiom Token Classification using Sentential Distributed Semantics (Giancarlo ...
 

Recently uploaded

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 

Recently uploaded (20)

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 

On the Limitations of Unsupervised Bilingual Dictionary Induction

  • 1. On the Limitations of Unsupervised Bilingual Dictionary Induction Sebastian Ruder Ivan VulićAnders Søgaard
  • 2. ‣ Recently: Unsupervised neural machine translation (Artetxe et al., ICLR 2018; Lample et al., ICLR 2018) 2 Background:
 Unsupervised MT ‣ Key component: Initialization via unsupervised cross-lingual alignment of word embedding spaces
  • 3. ‣ Cross-lingual word embeddings enable cross-lingual transfer ‣ Most common approach: Project one word embedding space into another by learning a transformation matrix 
 between source embeddings and their translations (Mikolov et al., 2013) ‣ More recently: Use an adversarial setup to learn an unsupervised mapping ‣ Assumption: Word embedding spaces are approximately isomorphic, i.e. same number of vertices, connected the same way. 3 Background:
 Cross-lingual word embeddings n ∑ i=1 ∥Wxi − yi∥2 W xi yin
  • 4. ‣ Nearest neighbour (NN) graphs of top 10 most frequent words in English and German are not isomorphic. ‣ NN graphs of top 10 most frequent English words and their translations into German 4 How similar are embeddings across languages? English German ‣ Not isomorphic
  • 5. ‣ NN graphs of top 10 most frequent English nouns and their translations 5 How similar are embeddings across languages? English German ‣ Not isomorphic Word embeddings are not approximately isomorphic across languages.
  • 6. ‣ Need a metric to measure how similar two NN graphs and of different languages are ‣ Propose eigenvector similarity ‣ : adjacency matrices of ‣ : degree matrices of ‣ : Laplacians of ‣ : eigenvalues (spectra) of 6 How do we quantify similarity? G1 G2 A1, A2 G1, G2 G1, G2D1, D2 L1 = D1 − A1, L2 = D2 − A2 G1, G2 λ1, λ2 L1, L2 Δ = k ∑ i=1 (λ1i − λ2i )2 k = min j { ∑ k i=1 λji ∑ n i=1 λji > 0.9}‣ Metric: where
  • 7. 7 How do we quantify similarity? ‣ Quantifies how much two NN graphs are isospectral, i.e. they have the same spectrum (same sets of eigenvalues). ‣ Isomorphic isospectral, but isospectral isomorphic ‣ ‣ : are isospectral (very similar) ‣ : become less similar → ↛ Δ : G1, G2 → [0,∞) Δ = 0 G1, G2 Δ → ∞ G1, G2 Δ = k ∑ i=1 (λ1i − λ2i )2 k = min j { ∑ k i=1 λji ∑ n i=1 λji > 0.9}‣ Metric: where
  • 8. ‣ Besides isomorphism, several other implicit assumptions ‣ May or may not scale to low-resource languages 8 Unsupervised cross-lingual learning assumptions Conneau et al. (2018) This work Languages Dependent-marking, fusional and isolating Agglutinative, many cases Corpora Comparable (Wikipedia) Different domains Algorithms/ hyperparameters Same Different
  • 9. 1. Monolingual word embeddings:
 Learn monolingual vector spaces and . 2. Adversarial mapping:
 Learn a translation matrix . Train discriminator to discriminate samples from and . 9 Conneau et al. (2018) X Y W WX Y
  • 10. 3. Refinement (Procrustes analysis):
 Build bilingual dictionary of frequent words using . Learn a new based on frequent word pairs. 4. Cross-domain similarity local scaling (CSLS):
 Use similarity measure that increases similarity of isolated word vectors, decreases similarity of vectors in dense areas. 10 Conneau et al. (2018) W W
  • 11. ‣ Extract identically spelled words in both languages ‣ Use these as bilingual seed words ‣ Run refinement step of Conneau et al. (2018) 11 A simple weakly supervised method
  • 12. 12 Experiments:
 Bilingual dictionary induction ‣ Given a list of source language words, find the closest target language word in the cross-lingual embedding space ‣ Compare against a gold standard dictionary ‣ Metric: Precision at 1 (P@1) ‣ Use fastText monolingual embeddings Conneau et al. (2018) This work Languages
 (English to) French, German, Chinese, Russian, Spanish Estonian (ET), Finnish (FI), Greek (EL), Hungarian (HU), Polish (PL), Turkish
  • 13. 13 Impact of language similarity ‣ Unsupervised approaches are challenged by languages that are not isolating and not dependent marking ‣ Naive supervision leads to competitive performance on similar language pairs and better results for dissimilar pairs P@1 0 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR ET-FI Unsupervised (Adversarial) Weakly supervised (Identical strings)
  • 14. 14 Impact of language similarity ‣ Eigenvector similarity strongly correlates with BDI performance Eigenvectorsimilarity 0 2 4 6 8 BDI performance 0 22.5 45 67.5 90 (ρ ∼ 0.89)
  • 15. 15 Impact of domain differences English-Spanish P@1 0 17.5 35 52.5 70 Domainsimilarity 0 0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA Domain similarity Unsupervised Weakly supervised ‣ Source and target embeddings induced on 3 corpora:
 EuroParl (EP), Wikipedia (Wiki), Medical (EMEA) ‣ Unsupervised approaches break down when domains are dissimilar
  • 16. 16 Impact of domain differences English-Finnish P@1 0 7.5 15 22.5 30 Domainsimilarity 0 0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA Domain similarity Unsupervised Weakly supervised ‣ Domain differences may exacerbate difficulties of generalising across dissimilar languages
  • 17. 17 Impact of domain differences English-Hungarian P@1 0 7.5 15 22.5 30 Domainsimilarity 0 0.2 0.4 0.6 0.8 EP-EP EP-Wiki EP-EMEA Wiki-EP Wiki-Wiki Wiki-EMEA EMEA-EP EMEA-Wiki EMEA-EMEA Domain similarity Unsupervised Weakly supervised ‣ Weak supervision helps to bridge domain differences, but performance still deteriorates
  • 18. 18 Impact of hyper-parametersP@1 0 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7 English-Spanish (skipgram) English-Spanish (cbow) ‣ Settings: English with skipgram, win=2, ngrams=3-6 ‣ Vary hyper-parameters of Spanish embeddings ≠ ≠ ≠
  • 19. 19 Impact of hyper-parametersP@1 0 22.5 45 67.5 90 == win=10 ngrams=2-7 win=10, ngrams=2-7 English-Spanish (skipgram) English-Spanish (cbow) ‣ Different algorithms introduce embedding spaces with wildly different structures. ≠ ≠ ≠
  • 20. ‣ Worse performance overall, but better performance for dissimilar language pairs (Estonian, Finnish, Greek). ‣ Monolingual word embeddings may overfit to rare peculiarities of languages. 20 Impact of dimensionalityP@1 0 22.5 45 67.5 90 EN-ES EN-ET EN-FI EN-EL EN-HU EN-PL EN-TR 300-dimensional embeddings 40-dimensional embeddings
  • 21. 21 Impact of evaluation procedure ‣ Part-of-speech:
 Performance on verbs is lowest across the board. ‣ Frequency:
 Sensitivity to frequency for Hungarian, but less so for Spanish. ‣ Homographs:
 Lower precision due to loan words/proper names. High precision for free with weak supervision.

  • 22. ‣ Word embedding spaces are not approximately isomorphic across languages. ‣ We can use eigenvector similarity to characterise the relatedness of two monolingual vector spaces. ‣ Eigenvector similarity strongly correlates with unsupervised bilingual dictionary induction performance. ‣ Limitations of unsupervised bilingual dictionary induction: ‣ Morphologically rich languages. ‣ Corpora from different domains. ‣ Different word embedding algorithms. 22 Takeaways