Dependency tree-to-tree translation models are powerful because they can naturally handle long range reorderings which is important for distant language pairs. The translation process is easy if it can be accomplished only by replacing non-terminals in translation rules with other rules. However it is sometimes necessary to adjoin translation rules. Flexible non-terminals have been proposed as a promising solution for this problem. A flexible non-terminal provides several insertion position candidates for the rules to be adjoined, but it increases the computational cost of decoding. In this paper we propose a neural network based insertion position selection model to reduce the computational cost by selecting the appropriate insertion positions. The experimental results show the proposed model can select the appropriate insertion position with a high accuracy. It reduces the decoding time and improves the translation quality owing to reduced search space.
Introduction of Human Body & Structure of cell.pptx
Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree
1. Insertion Position Selection
Model for Flexible Non-Terminals
in Dependency Tree-to-Tree
Machine Translation
Toshiaki Nakazawa
Japan Science and Technology Agency
(JST)
John Richardson Sadao Kurohashi
Kyoto University
4/11/2016 @ EMNLP2016
2. Where to insert?
I found Pikachu by chance
yesterday
insertion positions
0.70.25 0.02 0.01prob. 0.010.01
2
3. Where to insert?
I found Pikachu by chance yesterday
in the park
insertion positions
0.20.1 0.6 0.01
0.01
@Texas State Capitol
0.01
0.1
3
5. Dependency Tree-to-Tree Translation
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
Input Translation Rules Output
ピカチュウ Pikachu
偶然
公園 the
park
[X7]
偶然
昨日 yesterday
で
[X]
[X]
[X]
[X]
found
by
chance
[X]
I
[X7]
found
Pikachu
by
I
chance
yesterday
the
park
in
found
Pikachu
by
I
chance
yesterday
Pikachu
I
found
by
chance
Flexible Non-terminals
[Richardson+, 2016]
floating
subtree
floating
subtree
5
6. Translation Quality and Decoding Speed
w/ and w/o Flexible Non-terminals
• Using ASPEC (Asian Scientific Paper Excerpt
Corpus) JE and JC
• Time is a relative decoding time
Ja->En En->Ja Ja->Zh Zh->Ja
BLEU
Tim
e
BLEU
Tim
e
BLEU
Tim
e
BLEU
Tim
e
w/o Flex
20.2
8
1.00
28.7
7
1.00
24.8
5
1.00
30.5
1
1.00
w/ Flex
21.6
1
6.28
30.5
7
3.30
28.7
9
5.16
34.3
2
5.28
6
7. Appropriate Insertion Position Selection
• roughly half of all translation rules were
augmented with flexible non-terminals
[Richardson+, 2016]
• flexible non-terminals make the search space
much bigger -> slower decoding speed,
increased search error
• reduce the number of possible insertion
positions in translation rules by a Neural
Network model
7
8. Insertion Position Selection
Model for Flexible Non-Terminals
in Dependency Tree-to-Tree
Machine Translation
Toshiaki Nakazawa
Japan Science and Technology Agency
John Richardson Sadao Kurohashi
Kyoto University
4/11/2016 @ EMNLP2016
10. Insertion Position Selection Model
• For each insertion position:
–predict
• scores of the insertion positions
–given
• input: the floating word (I) and its parent word
(Ps) with the distance (Ds)
• target: previous (Sp) and next (Sn) sibling words
of the insertion position and the parent (Pt)
with the distance (Dt)
10
11. Information for Selection Model
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
Input Translation Rules
偶然
[X7]
偶然
found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=
4
[X]
Dt
=
-2
Non-terminals:
reverted to the
original word in
the parallel
corpus
11
[yesterday]
[found]
12. Information for Selection Model
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
Input Translation Rules
偶然
[X7]
偶然
found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=
4
[X]
Dt
=
-3
= [POST-BOTTOM]
12
[yesterday]
[found]
13. Neural Network Model
220
I
Ps
Pt
Sp
1
Sn
1
Ds
Dt
k 100100
220220220220
100
word to be inserted
parent of I
distance from PS
previous sibling
next sibling
parent of the
insertion position
distance from Pt
fully-connected
feed-forward network
()
・・・
1
1
1
・・・
insertion position 2
insertion position N
scores
0.1
0.6
・
・
・
0.1
0
1
・
・
・
0
()
softmax gold
loss =
softmax cross-entropy
insertion position 1
13
14. Training Data Creation
• Training data for the NN model can be
automatically created from the word-aligned
parallel corpus
– consider each alignment as the floating word and
remove it from the target tree
14
私
は
を
見つけた
I
found
by
ピカチュウ
Pikachu
偶然
chance
[X]
[X]
[X]
[X]
label
0
0
0
1
20. Conclusion
• Proposed insertion position selection model
to reduced the number of insertion positions
for flexible non-terminals in the translation
rules
• Automatic evaluation scores and decoding
speed are improved
20
21. Future Work
• Use grand-children’s info
– Recursive NN [Liu et al., 2015] or Convolutional
NN [Mou et al., 2015]
• Shift to NMT!!
– Actually, we’ve already shifted and participated
WAT2016 shared tasks
• However, NMT is still far from perfect
21
23. 23
Thank You!
AD I’m co-organizing
The 3rd Workshop on Asian Translation
(WAT2016)
in conjunction with COLING 2016
Invited talk by Google about GNMT!
Please come to the workshop!
http://lotus.kuee.kyoto-u.ac.jp/WAT/