A Review of Deep Contextualized Word Representations (Peters+, 2018)

•

26 likes•19,929 views

A brief review of the paper: Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In NAACL-HLT (pp. 2227–2237)

Engineering

Deep contextualized
word representations
Matthew E. Peters, Mark Neumann, Mohit
Iyyer, Matt Gardner, Christopher Clark,
Kenton Lee, Luke Zettlemoyer
NAACL-HLT 2018

Overview
• Propose a new type of deep contextualised word
representations (ELMo) that model:
‣ Complex characteristics of word use (e.g., syntax and
semantics)
‣ How these uses vary across linguistic contexts (i.e., to
model polysemy)
• Show that ELMo can improve existing neural models in
various NLP tasks
• Argue that ELMo can capture more abstract linguistic
characteristics in the higher level of layers
33
• Overview
• Method
• Evaluation
• Analysis
• Comments

Example
34
GloVe mostly learns
sport-related context
ELMo can distinguish the word
sense based on the context

Method
• Embeddings from Language Models: ELMo
• Learn word embeddings through building
bidirectional language models (biLMs)
‣ biLMs consist of forward and backward LMs
✦ Forward:
✦ Backward:
35
• Overview
• Method
• Evaluation
• Analysis
• Comments
p(t1, t2, …, tN) =
N
∏
k=1
p(tk |t1, t2, …, tk−1)
p(t1, t2, …, tN) =
N
∏
k=1
p(tk |tk+1, tk+2, …, tN)

Method
With long short term memory (LSTM) network,
predicting the next words in both directions to build
biLMs
36
have a nice one …
Output layer
Hidden layers
(LSTMs)
Embedding layer
…
a nice one
xk
ok
k − 1
k − 1
Expanded in the forward direction of kThe forward LM architecture
• Overview
• Method
• Evaluation
• Analysis
• Comments
tk
hLM
k1
hLM
k2

Method
ELMo represents a word as a linear combination of
corresponding hidden layers (inc. its embedding)
37
tk
xk
ok
k − 1
k − 1
hLM
k1
ok
k + 1
k + 1
hLM
k2
hLM
k1
hLM
k2
Forward LM Backward LM
tk tk
stask
0
stask
1
stask
2
Concatenate
hidden layers
hLM
k1
hLM
k2
hLM
k0
([xk ; xk])
×
×
× [hLM
kj ; hLM
kj ]
{
ELMotask
k ×γtask
∑=
Unlike usual word embeddings, ELMo is
assigned to every token instead of a type
biLMs
ELMo is a task specific
representation. A down-stream
task learns weighting parameters

ELMo can be integrated to almost all neural NLP tasks
with simple concatenation to the embedding layer
Method
38
• Overview
• Method
• Evaluation
• Analysis
• Comments
have a nice
ELMo ELMo ELMo
biLMs
Corpus
Usual inputs
Enhance inputs
with ELMosTrain

Many linguistic tasks are improved by using ELMo
39
Evaluation
• Overview
• Method
• Evaluation
• Analysis
• Comments
Q&A
Textual entailment
Sentiment analysis
Semantic role labelling
Coreference resolution
Named entity recognition

Analysis
The higher layer seemed to learn semantics while the lower
layer probably captured syntactic features
40
• Overview
• Method
• Evaluation
• Analysis
• Comments
Word sense disambiguation PoS tagging

Analysis
The higher layer seemed to learn semantics while the lower
layer probably captured syntactic features???
41
• Overview
• Method
• Evaluation
• Analysis
• Comments
Most models preferred
“syntactic (probably)” features
Even in sentiment analysis

Analysis
ELMo-enhanced models can make use of small
datasets more efficiently
42
• Overview
• Method
• Evaluation
• Analysis
• Comments
Textual entailment Semantic role labelling

Comments
• Pre-trained ELMo models are available at https://
allennlp.org/elmo
‣ AllenNLP is a deep NLP library on top of PyTorch
‣ AllenNLP is a product of AI2 (Allen Institute for
Artificial Intelligence) which works on other
interesting projects like Semantic Scholar
• ELMo can process character-level inputs
‣ Japanese (Chinese, Korean, …) ELMo models likely
to be possible
43
• Overview
• Method
• Evaluation
• Analysis
• Comments

What's hot

Word embedding ShivaniChoudhary74

BertAbdallah Bashir

BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu

NLP State of the Art | BERTshaurya uppal

Introduction For seq2seq(sequence to sequence) and RNNHye-min Ahn

1909 BERT: why-and-how (CODE SEMINAR)WarNik Chow

Word embeddings, RNN, GRU and LSTMDivya Gera

Word2Vechyunyoung Lee

[Paper review] BERTJEE HYUN PARK

Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev

Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev

NLP using transformers Arvind Devaraj

Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters

BERTKhang Pham

Fine tune and deploy Hugging Face NLP modelsOVHcloud

Fine tuning large LMsSylvainGugger

Pre trained language modelJiWenKim

BERT MODULE FOR TEXT CLASSIFICATION.pptxManvanthBC

(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BARThyunyoung Lee

A Simple Explanation of XLNetDomyoung Lee

What's hot (20)

Word embedding

Bert

BERT: Bidirectional Encoder Representations from Transformers

NLP State of the Art | BERT

Introduction For seq2seq(sequence to sequence) and RNN

1909 BERT: why-and-how (CODE SEMINAR)

Word embeddings, RNN, GRU and LSTM

Word2Vec

[Paper review] BERT

Introduction to Transformers for NLP - Olga Petrova

Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)

NLP using transformers

Deep Learning for Natural Language Processing: Word Embeddings

BERT

Fine tune and deploy Hugging Face NLP models

Fine tuning large LMs

Pre trained language model

BERT MODULE FOR TEXT CLASSIFICATION.pptx

(Presentation)NLP Pretraining models based on deeplearning -BERT, GPT, and BART

A Simple Explanation of XLNet

Similar to A Review of Deep Contextualized Word Representations (Peters+, 2018)

[論文紹介] Deep contextualized word representationsOgataTomoya

State-of-the-Art Text Classification using Deep Contextual Word RepresentationsAusaf Ahmed

AINL 2016: NikolenkoLidia Pivovarova

Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence LabelingAssociation for Computational Linguistics

Cross-domain Document Retrieval: Matching between Conversational and Formal W...Jinho Choi

Tokenization and how to use it from scratchMahmoud Yasser

NLP from scratch Bryan Gummibearehausen

Jarrar: ORM in Description Logic Mustafa Jarrar

1909 paclicWarNik Chow

Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Saurabh Kaushik

Anthiil Inside workshop on NLPSatyam Saxena

Representation Learning of Text for NLPAnuj Gupta

Natural Language Processing (NLP)Yuriy Guts

Nlp research presentationSurya Sg

An Introduction to Pre-training General Language Representationszperjaccico

230915 paper summary learning to world model with language with details - pub...Seungjoon1

NLP Bootcamp 2018 : Representation Learning of text for NLPAnuj Gupta

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...Hiroki Shimanaka

NLPJeet Das

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP TasksMasahiro Kaneko

Similar to A Review of Deep Contextualized Word Representations (Peters+, 2018) (20)

[論文紹介] Deep contextualized word representations

State-of-the-Art Text Classification using Deep Contextual Word Representations

AINL 2016: Nikolenko

Marek Rei - 2017 - Semi-supervised Multitask Learning for Sequence Labeling

Cross-domain Document Retrieval: Matching between Conversational and Formal W...

Tokenization and how to use it from scratch

NLP from scratch

Jarrar: ORM in Description Logic

1909 paclic

Engineering Intelligent NLP Applications Using Deep Learning – Part 2

Anthiil Inside workshop on NLP

Representation Learning of Text for NLP

Natural Language Processing (NLP)

Nlp research presentation

An Introduction to Pre-training General Language Representations

230915 paper summary learning to world model with language with details - pub...

NLP Bootcamp 2018 : Representation Learning of text for NLP

[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...

NLP

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Recently uploaded

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa

Porous Ceramics seminar and technical writingrakeshbaidya232001

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Introduction and different types of Ethernet.pptxupamatechverse

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

Processing & Properties of Floor and Wall Tiles.pptx

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx

Porous Ceramics seminar and technical writing

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Coefficient of Thermal Expansion and their Importance.pptx

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

Introduction and different types of Ethernet.pptx

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝

A Review of Deep Contextualized Word Representations (Peters+, 2018)

1. Deep contextualized word representations Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer NAACL-HLT 2018

2. Overview • Propose a new type of deep contextualised word representations (ELMo) that model: ‣ Complex characteristics of word use (e.g., syntax and semantics) ‣ How these uses vary across linguistic contexts (i.e., to model polysemy) • Show that ELMo can improve existing neural models in various NLP tasks • Argue that ELMo can capture more abstract linguistic characteristics in the higher level of layers 33 • Overview • Method • Evaluation • Analysis • Comments

3. Example 34 GloVe mostly learns sport-related context ELMo can distinguish the word sense based on the context

4. Method • Embeddings from Language Models: ELMo • Learn word embeddings through building bidirectional language models (biLMs) ‣ biLMs consist of forward and backward LMs ✦ Forward: ✦ Backward: 35 • Overview • Method • Evaluation • Analysis • Comments p(t1, t2, …, tN) = N ∏ k=1 p(tk |t1, t2, …, tk−1) p(t1, t2, …, tN) = N ∏ k=1 p(tk |tk+1, tk+2, …, tN)

5. Method With long short term memory (LSTM) network, predicting the next words in both directions to build biLMs 36 have a nice one … Output layer Hidden layers (LSTMs) Embedding layer … a nice one xk ok k − 1 k − 1 Expanded in the forward direction of kThe forward LM architecture • Overview • Method • Evaluation • Analysis • Comments tk hLM k1 hLM k2

6. Method ELMo represents a word as a linear combination of corresponding hidden layers (inc. its embedding) 37 tk xk ok k − 1 k − 1 hLM k1 ok k + 1 k + 1 hLM k2 hLM k1 hLM k2 Forward LM Backward LM tk tk stask 0 stask 1 stask 2 Concatenate hidden layers hLM k1 hLM k2 hLM k0 ([xk ; xk]) × × × [hLM kj ; hLM kj ] { ELMotask k ×γtask ∑= Unlike usual word embeddings, ELMo is assigned to every token instead of a type biLMs ELMo is a task specific representation. A down-stream task learns weighting parameters

7. ELMo can be integrated to almost all neural NLP tasks with simple concatenation to the embedding layer Method 38 • Overview • Method • Evaluation • Analysis • Comments have a nice ELMo ELMo ELMo biLMs Corpus Usual inputs Enhance inputs with ELMosTrain

8. Many linguistic tasks are improved by using ELMo 39 Evaluation • Overview • Method • Evaluation • Analysis • Comments Q&A Textual entailment Sentiment analysis Semantic role labelling Coreference resolution Named entity recognition

9. Analysis The higher layer seemed to learn semantics while the lower layer probably captured syntactic features 40 • Overview • Method • Evaluation • Analysis • Comments Word sense disambiguation PoS tagging

10. Analysis The higher layer seemed to learn semantics while the lower layer probably captured syntactic features??? 41 • Overview • Method • Evaluation • Analysis • Comments Most models preferred “syntactic (probably)” features Even in sentiment analysis

11. Analysis ELMo-enhanced models can make use of small datasets more efficiently 42 • Overview • Method • Evaluation • Analysis • Comments Textual entailment Semantic role labelling

12. Comments • Pre-trained ELMo models are available at https:// allennlp.org/elmo ‣ AllenNLP is a deep NLP library on top of PyTorch ‣ AllenNLP is a product of AI2 (Allen Institute for Artificial Intelligence) which works on other interesting projects like Semantic Scholar • ELMo can process character-level inputs ‣ Japanese (Chinese, Korean, …) ELMo models likely to be possible 43 • Overview • Method • Evaluation • Analysis • Comments

A Review of Deep Contextualized Word Representations (Peters+, 2018)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Review of Deep Contextualized Word Representations (Peters+, 2018)

Similar to A Review of Deep Contextualized Word Representations (Peters+, 2018) (20)

Recently uploaded

Recently uploaded (20)

A Review of Deep Contextualized Word Representations (Peters+, 2018)