Neural AMR: Sequence-to-Sequence Models for Parsing and Generation

Neural AMR: Sequence-to-Sequence
Models for Parsing and Generation
Ιοannis Konstas
joint work with Srinivasan Iyer, Mark Yatskar,
Yejin Choi, Luke Zettlemoyer

AMR graph
text
Generate from AMR
Attention
Encoder Decodergraph text

AMR graph
text
Generate from AMRParse to AMR
Attention
Attention
Encoder Decodertext graph

AMR graph
text
Attention
Attention
Paired
Training

AMR graph
text
Attention
Attention
SOTA
Paired
Training

Abstract Meaning Representation
(Banarescu et al., 2013)
know
I planet
lazy
ARG0 ARG1
inhabit
man
ARG1-of
ARG0
mod
I have known a planet that was inhabited by
a lazy man.
‣ Rooted Directed Acyclic Graph
‣ Nodes: concepts (nouns, verbs, named entities, etc)
‣ Edges: Semantic Role Labels

know
I planet
lazy
ARG0 ARG1
inhabit
man
ARG1-of
ARG0
mod
a lazy man.
know

know
I planet
lazy
ARG0 ARG1
inhabit
man
ARG1-of
ARG0
mod
a lazy man.
know
I

know
I planet
lazy
ARG0 ARG1
inhabit
man
ARG1-of
ARG0
mod
a lazy man.
know
I planet

Input: AMR Graph
know
I planet
lazy
ARG0 ARG1
inhabit
man
ARG1-of
ARG0
mod
a lazy man.
I knew a planet that was inhabited by a lazy
man.
I know a planet. It is inhabited by a lazy
man.
know
I planet
Generate from AMR

know
I planet
lazy
ARG0 ARG1
inhabit
man
ARG1-of
ARG0
mod
a lazy man.
know
I planet
Parse to AMR
Input: Text

Applications
‣ Text Summarization (Liu et al., 2015)

Applications
sentences:

Applications
sentences:
Parse
sentence
AMR graphs:

Applications
summary
AMR graph:
sentences:
Parse
sentence
AMR graphs:

Applications
summary
AMR graph:
sentences:
Parse
sentence
AMR graphs:
Generate
summary:

Applications
summary
AMR graph:
sentences:
Parse
sentence
AMR graphs:
The children told that lie
Source
そのうそは⼦子供たちがついた
sono uso-wa kodomo-tachi-ga tsui-ta
that lie-TOP child-and others-NOM breathe out-PAST
Target
Generate
summary:

Applications
summary
AMR graph:
sentences:
Parse
sentence
AMR graphs:
Source
Target
tell
child lie
that
ARG0 ARG1
ARG0-of
‣ Machine Translation (Jones et al., 2012)
Generate
summary:
Parse AMR
graph:

Applications
summary
AMR graph:
sentences:
Parse
sentence
AMR graphs:
Source
Target
Graph-to-graph
transformation:
tell
child lie
that
ARG0 ARG1
ARG0-of
tsuku
kodomo
tachi
sono
ARG1 ARG0
ARG0-of
Generate
summary:
Parse AMR
graph:

Applications
summary
AMR graph:
sentences:
Parse
sentence
AMR graphs:
Source
Target
Graph-to-graph
transformation:
tell
child lie
that
ARG0 ARG1
ARG0-of
tsuku
kodomo
tachi
sono
ARG1 ARG0
ARG0-of
Generate
summary:
Parse AMR
graph:
Generate
translation:

Existing Approaches
‣ MT-based
‣ Flanigan et al. 2016, Pourdamaghani and Knight 2016, Song et al. 2016
‣ Grammar-based
‣ Lampouras and Vlachos 2017, Mille et al. 2017
Generate from AMR

Existing Approaches
‣ MT-based
‣ Flanigan et al. 2016, Pourdamaghani and Knight 2016, Song et al. 2016
‣ Grammar-based
‣ Lampouras and Vlachos 2017, Mille et al. 2017
Parse to AMR
‣ Alignment-based
‣ Flanigan et al. 2014, 2017 (JAMR)
‣ Grammar-based
‣ Wang et al. 2016 (CAMR), Pust et al. 2015, Artzi et al. 2015, Damonte et al. 2017,
Goodman et al. 2016, Puzikov et al. 2016, Brandt et al. 2017, Nguyen et al. 2017
‣ Neural-based
‣ Barzdins and Gosko 2016, Peng et al. 2017, Noord and Bos 2017, Buys and Blunsom
2017
Generate from AMR

Overview
‣ Sequence-to-sequence architecture
‣ End-to-end model w/o intermediate representations
‣ Linearisation of AMR graph to string
‣ Pre-process
‣ Paired Training
‣ Scalable data augmentation algorithm

Sequence-to-sequence model
Encoderinput

Encoderinput
know
ARG0
I
ARG1
(
h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)
[ ] [ ] [ ] [ ] [ ]h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)

Encoder Decoderinput output
I
The
A
…
know
knew
planet
…
a
planet
man
…
…
inhabit
inhabited
was
…
ˆw = argmax
w
Y
i
p wi|w<i, h(s)
know
ARG0
I
ARG1
(
h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)
[ ] [ ] [ ] [ ] [ ]h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)

Attention
Encoder Decoderinput output
know ARG0 I ARG1 ( planet ARG1-of inhabit
<s>
I
know
the
planet
of
I
The
A
…
know
knew
planet
…
a
planet
man
…
…
inhabit
inhabited
was
…
ˆw = argmax
w
Y
i
p wi|w<i, h(s)
know
ARG0
I
ARG1
(
h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)
[ ] [ ] [ ] [ ] [ ]h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)

Linearization
Graph —> Depth First Search (Human-authored annotation)
hold
person
meet
group
ARG0 ARG1
person
expert
ARG1-of
have-role
country
“United
States”
ofﬁcial
date-entity city
“New York”2002 1
time location
name
ARG1
name
ARG2-of
ARG0-of
ARG2
year month
ARG0
US ofﬁcials held an expert group meeting in January 2002 in New York .

Linearization
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 United_States
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:time (date-entity 2002 1)
:location New_York
hold
person
meet
group
ARG0 ARG1
person
expert
ARG1-of
have-role
country
“United
States”
ofﬁcial
date-entity city
time location
name
ARG1
name
ARG2-of
ARG0-of
ARG2
year month
ARG0

Linearization
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 United_States
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location New_York
hold
person
meet
group
ARG0 ARG1
person
expert
ARG1-of
have-role
country
“United
States”
ofﬁcial
date-entity city
time location
name
ARG1
name
ARG2-of
ARG0-of
ARG2
year month
ARG0
person

Linearization
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 United_States
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location New_York
hold
person
meet
group
ARG0 ARG1
person
expert
ARG1-of
have-role
country
“United
States”
ofﬁcial
date-entity city
time location
name
ARG1
name
ARG2-of
ARG0-of
ARG2
year month
ARG0
person
have-role
country
“United
States”
ofﬁcial

Pre-processing
Linearization —> Anonymization
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:time (date-entity year_0 month_0)
:location loc_1
hold
person
meet
group
ARG0 ARG1
person
expert
ARG1-of
have-role
country
“United
States”
ofﬁcial
date-entity city
time location
name
ARG1
name
ARG2-of
ARG0-of
ARG2
year month
ARG0

Pre-processing
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
hold
person
meet
group
ARG0 ARG1
person
expert
ARG1-of
have-role
country
“United
States”
ofﬁcial
date-entity city
time location
name
ARG1
name
ARG2-of
ARG0-of
ARG2
year month
ARG0
country
“United
States”

Pre-processing
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
hold
person
meet
group
ARG0 ARG1
person
expert
ARG1-of
have-role
country
“United
States”
ofﬁcial
date-entity city
time location
name
ARG1
name
ARG2-of
ARG0-of
ARG2
year month
ARG0
country
“United
States”
2002 1

Pre-processing
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
hold
person
meet
group
ARG0 ARG1
person
expert
ARG1-of
have-role
country
“United
States”
ofﬁcial
date-entity city
time location
name
ARG1
name
ARG2-of
ARG0-of
ARG2
year month
ARG0
country
“United
States”
2002 1 “New York”
city

Pre-processing
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
hold
person
meet
group
ARG0 ARG1
person
expert
ARG1-of
have-role
country
“United
States”
ofﬁcial
date-entity city
time location
name
ARG1
name
ARG2-of
ARG0-of
ARG2
year month
ARG0
country
“United
States”
2002 1 “New York”
city
loc_0 ofﬁcials held an expert group meeting in month_0 year_0 in loc_1 .

Experimental Setup
AMR LDC2015E86 (SemEval-2016 Task 8)
‣ Hand annotated MR graphs: newswire, forums
‣ ~16k training / 1k development / 1k test pairs
Train
‣ Optimize cross-entropy loss
Evaluation
‣ BLEU n-gram precision (Generation)
(Papineni et al., 2002)
‣ SMATCH score (Parsing)
(Cai and Knight, 2013)

Experiments
‣ Vanilla experiment
‣ Limited Language Model Capacity
‣ Paired Training
‣ Data augmentation algorithm

First Attempt (Generation)
TreeToStr: Flanigan et al, NAACL 2016
TSP: Song et al, EMNLP 2016
PBMT: Pourdamaghani and Knight, INLG 2016

First Attempt (Generation)BLEU
0
5.8
11.6
17.4
23.2
29
TreeToStr TSP PBMT NeuralAMR
26.9
22.423

0
5.8
11.6
17.4
23.2
29
22
26.9
22.423

0
5.8
11.6
17.4
23.2
29
22
26.9
22.423
All systems use a
Language Model
trained on a very
large corpus.
We will emulate via
data augmentation.
(Sennrich et al., ACL 2016)

What went wrong?
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
US ofﬁcials held an expert group meeting in January
2002 in New York .
United States ofﬁcials held held a meeting in
January 2002 .
Reference
Prediction
44.26%
74.85%

What went wrong?
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
2002 in New York .
January 2002 .
Reference
Prediction
‣ Repetition
44.26%
74.85%

What went wrong?
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
2002 in New York .
January 2002 .
Reference
Prediction
‣ Repetition
‣ Coverage
44.26%
74.85%

What went wrong?
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
2002 in New York .
January 2002 .
Reference
Prediction
‣ Repetition
‣ Coverage
a) Sparsity
Tokens
0
4500
9000
13500
18000
Total OOV@1 OOV@5
44.26%
74.85%

What went wrong?
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
2002 in New York .
January 2002 .
Reference
Prediction
‣ Repetition
‣ Coverage
a) Sparsity
b) Avg sent length: 20 words
c) Limited Language
Modeling capacity
Tokens
0
4500
9000
13500
18000
Total OOV@1 OOV@5
44.26%
74.85%

Data Augmentation
Original Dataset: ~16k graph-sentence pairs

Data Augmentation
Gigaword: ~183M sentences *only*

Data Augmentation
Gigaword: ~183M sentences *only*
Sample sentences with vocabulary overlap
%
0
20
40
60
80
OOV@1 OOV@5
Original Giga-200k
Giga-2M Giga-20M

Data Augmentation
graph
text
Generate from AMR
Attention

Data Augmentation
graph
text
Attention
Attention

Data Augmentation
graph
text
Attention
Attention
Re-train

Semi-supervised Learning
‣Self-training
‣ McClosky et al. 2006
‣Co-training
‣ Yarowski 1995, Blum and Mitchell 1998, Sarkar 2001
‣ Sogaard and Rishoj, 2010

Paired Training
Train AMR Parser P on Original Dataset ( , )

Paired Training
for i = 0 … N

Paired Training
for i = 0 … N
Si =Sample k 10i sentences from Gigaword

Paired Training
for i = 0 … N
Parse Si sentences with P

Paired Training
Self-trainParser
for i = 0 … N
Re-train AMR Parser P on Si

Paired Training
Self-trainParser
for i = 0 … N
Re-train AMR Parser P on Si
Train Generator G on SN ( , )

Training AMR Parser
Train P on
Original Dataset

Training AMR Parser
Sample S1=200k
sentences
from Gigaword
Train P on
Original Dataset
200k

Training AMR Parser
Sample S1=200k
sentences
from Gigaword
Parse S1 with P
Train P on
Original Dataset
200k
200k
( , )

Training AMR Parser
Sample S1=200k
sentences
from Gigaword
Train P on
S1=200k
Parse S1 with P
Train P on
Original Dataset
200k
200k 200k
( , )

Training AMR Parser
Sample S1=200k
sentences
from Gigaword
Train P on
S1=200k
Fine-tune P on
Original Dataset
Parse S1 with P
Train P on
Original Dataset
200k
200k
Fine-tune: init parameters from previous
step and train on Original Dataset
200k
( , )

Training AMR Parser
Train P on
S2=2M
Fine-tune P on
Original Dataset
Sample S2=2M
sentences
from Gigaword
Parse S2 with P
200k
200k 200k
( , )

Training AMR Parser
Train P on
S2=2M
Fine-tune P on
Original Dataset
Sample S2=2M
sentences
from Gigaword
Parse S2 with P
00k
2M
2M 2M
( , )

Training AMR Parser
Train P on
S3=20M
Fine-tune P on
Original Dataset
Sample S3=20M
sentences
from Gigaword
Parse S3 with P
2M
2M 2M
( , )

Training AMR Parser
Train P on
S3=20M
Fine-tune P on
Original Dataset
Sample S3=20M
sentences
from Gigaword
Parse S3 with P
2M
( , )
20M
20M 20M

Training AMR Generator
Train G on
S4=20M
Fine-tune G on
Original Dataset
Sample S4=20M
sentences
from Gigaword
Parse S4 with P
( , )
2M
20M
20M 20M

Training AMR Generator
Train G on
S4=20M
Fine-tune G on
Original Dataset
Sample S4=20M
sentences
from Gigaword
Parse S4 with P
( , )
20M
20M 20M
G G
20M

Final Results (Generation)

Final Results (Generation)BLEU
0
7
14
21
28
35
TreeToStr TSP PBMT
NeuralAMR NeuralAMR-200k NeuralAMR-2M
NeuralAMR-20M
22
26.9
22.423

0
7
14
21
28
35
TreeToStr TSP PBMT
NeuralAMR-20M
27.4
22
26.9
22.423

0
7
14
21
28
35
TreeToStr TSP PBMT
NeuralAMR-20M
32.3
27.4
22
26.9
22.423

0
7
14
21
28
35
TreeToStr TSP PBMT
NeuralAMR-20M
33.8
32.3
27.4
22
26.9
22.423

Final Results (Parsing)
SBMT: Pust et al, 2015
CharLSTM+CAMR: Noord and Bos, 2017
Seq2Seq: Peng et al., 2017

Final Results (Parsing)SMATCH
0
14
28
42
56
70
SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M
62.1
52
67.367.1
SBMT: Pust et al, 2015
CharLSTM+CAMR: Noord and Bos, 2017
Seq2Seq: Peng et al., 2017

How did we do? (Generation)
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
US officials held an expert group meeting in
January 2002 in New York .
In January 2002 United States officials held a
meeting of the group experts in New York .
Reference
Prediction
44.26%
74.85%
Errors: Disfluency Coverage

How did we do? (Generation)
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 loc_0
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location loc_1
US officials held an expert group meeting in
January 2002 in New York .
In January 2002 United States officials held a
meeting of the group experts in New York .
Reference
Prediction
44.26%
74.85%
The report stated British government must
help to stabilize weak states and push for
international regulations that would stop
terrorists using freely available information to
create and unleash new forms of biological
warfare such as a modified version of the
influenza virus.
Reference
The report stated that the Britain government
must help stabilize the weak states and push
international regulations to stop the use of freely
available information to create a form of new
biological warfare such as the modified version
of the influenza .
Prediction
Errors: Disfluency Coverage

Summary
‣ Sequence-to-sequence models for Parsing and Generation
‣ Paired Training: scalable data augmentation algorithm
‣ Achieve state-of-the-art performance on generating from AMR
‣ Best-performing Neural AMR Parser
‣ Demo, Code and Pre-trained Models: http://ikonstas.net

Summary
Thank You
‣ Sequence-to-sequence models for Parsing and Generation
‣ Paired Training: scalable data augmentation algorithm
‣ Achieve state-of-the-art performance on generating from AMR
‣ Best-performing Neural AMR Parser
‣ Demo, Code and Pre-trained Models: http://ikonstas.net
thank-01
you
ARG1

Encoding
Linearize —> RNN encoding
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 United_States
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location New_York

Encoding
hold
ARG0
(
person
ARG0-of
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 United_States
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location New_York
- Token embeddings

Encoding
hold
ARG0
(
person
ARG0-of
h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 United_States
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location New_York
- Token embeddings
- Recurrent Neural Network (RNN)

Encoding
hold
ARG0
(
person
ARG0-of
h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 United_States
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location New_York
- Token embeddings
- Bi-directional RNN

Encoding
hold
ARG0
(
person
ARG0-of
h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)
[ ] [ ] [ ] [ ] [ ]h1
(s) h2
(s) h3
(s) h4
(s) h5
(s)
hold
:ARG0 (person
:ARG0-of (have-role
:ARG1 United_States
:ARG2 official)
)
:ARG1 (meet
:ARG0 (person
:ARG1-of expert
:ARG2-of group)
)
:location New_York
- Token embeddings
- Bi-directional RNN

Decoding
h1hN
(s)
RNN Encoding —> RNN Decoding (Beam search)

Decoding
h1hN
(s)
;
- init h(s)

Decoding
h1hN
(s)
;
Holding
Held
US
…
- softmax
- init h(s)

Decoding
h1hN
(s)
;
Holding
Held
US
…
h2
a
the
meeting
…
…
w11: Holding
Heldsw12:
Holdw13:
USw14:
- softmax
- p wi|w<i, h(s)
- init h(s)

Decoding
h1hN
(s)
;
Holding
Held
US
…
h2
a
the
meeting
…
h3
US
person
expert
…
…
…
w11: Holding
Heldsw12:
Holdw13:
USw14:
…
Hold aw21:
Hold thew22:
Held aw23:
Held thew24:
- softmax
- p wi|w<i, h(s)
- init h(s)

Decoding
h1hN
(s)
;
Holding
Held
US
…
h2
a
the
meeting
…
h3
US
person
expert
…
hk…
…
w11: Holding
Heldsw12:
Holdw13:
USw14:
…
Hold aw21:
Hold thew22:
Held aw23:
Held thew24:
wk1: The US officials held
wk2: US officials held a
wk3: US officials hold the
wk4: US officials will hold a
…
meeting
meetings
meet
…
- softmax
- p wi|w<i, h(s)
- init h(s)

Attention
h2 h3
a
the
meeting
…
w2: held

Attention
h3
a
the
meeting
…
w2: held
hold
ARG0
(
person
ARG0-of
[][][][][]h1
(s)h2
(s)h3
(s)h4
(s)h5
(s)
c3
[ ]

Attention
h3
a
the
meeting
…
w2: held
hold
ARG0
(
person
ARG0-of
[][][][][]h1
(s)h2
(s)h3
(s)h4
(s)h5
(s)
c3
[ ]
ci =
X
i
aijh
(s)
j
ai = soft max fi h(s)
, hi

Attention
hold ARG0 ( person role US official ) ARG1 ( meet expert group )
US
officials
held
an
expert
group
meeting
in
January
2002
h3
a
the
meeting
…
w2: held
hold
ARG0
(
person
ARG0-of
[][][][][]h1
(s)h2
(s)h3
(s)h4
(s)h5
(s)
c3
[ ]
ci =
X
i
aijh
(s)
j
ai = soft max fi h(s)
, hi

Neural AMR: Sequence-to-Sequence Models for Parsing and Generation

Recommended

Recommended

More Related Content

More from Association for Computational Linguistics

More from Association for Computational Linguistics (20)

Recently uploaded

Recently uploaded (20)

Neural AMR: Sequence-to-Sequence Models for Parsing and Generation