Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Deep contextualized
word representations
Matthew E. Peters, Mark Neumann, Mohit
Iyyer, Matt Gardner, Christopher Clark,
Ke...
Overview
• Propose a new type of deep contextualised word
representations (ELMo) that model:
‣ Complex characteristics of ...
Example
34
GloVe mostly learns
sport-related context
ELMo can distinguish the word
sense based on the context
Method
• Embeddings from Language Models: ELMo
• Learn word embeddings through building
bidirectional language models (biL...
Method
With long short term memory (LSTM) network,
predicting the next words in both directions to build
biLMs
36
have a n...
Method
ELMo represents a word   as a linear combination of
corresponding hidden layers (inc. its embedding)
37
tk
xk
ok
k ...
ELMo can be integrated to almost all neural NLP tasks
with simple concatenation to the embedding layer
Method
38
• Overvie...
Many linguistic tasks are improved by using ELMo
39
Evaluation
• Overview
• Method
• Evaluation
• Analysis
• Comments
Q&A
...
Analysis
The higher layer seemed to learn semantics while the lower
layer probably captured syntactic features
40
• Overvi...
Analysis
The higher layer seemed to learn semantics while the lower
layer probably captured syntactic features???
41
• Ove...
Analysis
ELMo-enhanced models can make use of small
datasets more efficiently
42
• Overview
• Method
• Evaluation
• Analys...
Comments
• Pre-trained ELMo models are available at https://
allennlp.org/elmo
‣ AllenNLP is a deep NLP library on top of ...
Upcoming SlideShare
Loading in …5
×

A Review of Deep Contextualized Word Representations (Peters+, 2018)

A brief review of the paper:
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT (pp. 2227–2237)

A Review of Deep Contextualized Word Representations (Peters+, 2018)

  1. 1. Deep contextualized word representations Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer NAACL-HLT 2018
  2. 2. Overview • Propose a new type of deep contextualised word representations (ELMo) that model: ‣ Complex characteristics of word use (e.g., syntax and semantics) ‣ How these uses vary across linguistic contexts (i.e., to model polysemy) • Show that ELMo can improve existing neural models in various NLP tasks • Argue that ELMo can capture more abstract linguistic characteristics in the higher level of layers 33 • Overview • Method • Evaluation • Analysis • Comments
  3. 3. Example 34 GloVe mostly learns sport-related context ELMo can distinguish the word sense based on the context
  4. 4. Method • Embeddings from Language Models: ELMo • Learn word embeddings through building bidirectional language models (biLMs) ‣ biLMs consist of forward and backward LMs ✦ Forward: ✦ Backward: 35 • Overview • Method • Evaluation • Analysis • Comments p(t1, t2, …, tN) = N ∏ k=1 p(tk |t1, t2, …, tk−1) p(t1, t2, …, tN) = N ∏ k=1 p(tk |tk+1, tk+2, …, tN)
  5. 5. Method With long short term memory (LSTM) network, predicting the next words in both directions to build biLMs 36 have a nice one … Output layer Hidden layers (LSTMs) Embedding layer … a nice one xk ok k − 1 k − 1 Expanded in the forward direction of kThe forward LM architecture • Overview • Method • Evaluation • Analysis • Comments tk hLM k1 hLM k2
  6. 6. Method ELMo represents a word   as a linear combination of corresponding hidden layers (inc. its embedding) 37 tk xk ok k − 1 k − 1 hLM k1 ok k + 1 k + 1 hLM k2 hLM k1 hLM k2 Forward LM Backward LM tk tk stask 0 stask 1 stask 2 Concatenate hidden layers hLM k1 hLM k2 hLM k0 ([xk ; xk]) × × × [hLM kj ; hLM kj ] { ELMotask k ×γtask ∑= Unlike usual word embeddings, ELMo is assigned to every token instead of a type biLMs ELMo is a task specific representation. A down-stream task learns weighting parameters
  7. 7. ELMo can be integrated to almost all neural NLP tasks with simple concatenation to the embedding layer Method 38 • Overview • Method • Evaluation • Analysis • Comments have a nice ELMo ELMo ELMo biLMs Corpus Usual inputs Enhance inputs with ELMosTrain
  8. 8. Many linguistic tasks are improved by using ELMo 39 Evaluation • Overview • Method • Evaluation • Analysis • Comments Q&A Textual entailment Sentiment analysis Semantic role labelling Coreference resolution Named entity recognition
  9. 9. Analysis The higher layer seemed to learn semantics while the lower layer probably captured syntactic features 40 • Overview • Method • Evaluation • Analysis • Comments Word sense disambiguation PoS tagging
  10. 10. Analysis The higher layer seemed to learn semantics while the lower layer probably captured syntactic features??? 41 • Overview • Method • Evaluation • Analysis • Comments Most models preferred “syntactic (probably)” features Even in sentiment analysis
  11. 11. Analysis ELMo-enhanced models can make use of small datasets more efficiently 42 • Overview • Method • Evaluation • Analysis • Comments Textual entailment Semantic role labelling
  12. 12. Comments • Pre-trained ELMo models are available at https:// allennlp.org/elmo ‣ AllenNLP is a deep NLP library on top of PyTorch ‣ AllenNLP is a product of AI2 (Allen Institute for Artificial Intelligence) which works on other interesting projects like Semantic Scholar • ELMo can process character-level inputs ‣ Japanese (Chinese, Korean, …) ELMo models likely to be possible 43 • Overview • Method • Evaluation • Analysis • Comments

×