SlideShare a Scribd company logo
1 of 23
Insertion Position Selection
Model for Flexible Non-Terminals
in Dependency Tree-to-Tree
Machine Translation
Toshiaki Nakazawa
Japan Science and Technology Agency
(JST)
John Richardson Sadao Kurohashi
Kyoto University
4/11/2016 @ EMNLP2016
Where to insert?
I found Pikachu by chance
yesterday
insertion positions
0.70.25 0.02 0.01prob. 0.010.01
2
Where to insert?
I found Pikachu by chance yesterday
in the park
insertion positions
0.20.1 0.6 0.01
0.01
@Texas State Capitol
0.01
0.1
3
Pikachu
Dependency Tree-to-Tree Translation
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
I
found
by
Input Translation Rules Output
ピカチュウ Pikachu
偶然
[X7]
[X7]
偶然
chance
I
found
by
[X7]
chance
公園 the
park
昨日 yesterday
で 4
Dependency Tree-to-Tree Translation
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
Input Translation Rules Output
ピカチュウ Pikachu
偶然
公園 the
park
[X7]
偶然
昨日 yesterday
で
[X]
[X]
[X]
[X]
found
by
chance
[X]
I
[X7]
found
Pikachu
by
I
chance
yesterday
the
park
in
found
Pikachu
by
I
chance
yesterday
Pikachu
I
found
by
chance
Flexible Non-terminals
[Richardson+, 2016]
floating
subtree
floating
subtree
5
Translation Quality and Decoding Speed
w/ and w/o Flexible Non-terminals
• Using ASPEC (Asian Scientific Paper Excerpt
Corpus) JE and JC
• Time is a relative decoding time
Ja->En En->Ja Ja->Zh Zh->Ja
BLEU
Tim
e
BLEU
Tim
e
BLEU
Tim
e
BLEU
Tim
e
w/o Flex
20.2
8
1.00
28.7
7
1.00
24.8
5
1.00
30.5
1
1.00
w/ Flex
21.6
1
6.28
30.5
7
3.30
28.7
9
5.16
34.3
2
5.28
6
Appropriate Insertion Position Selection
• roughly half of all translation rules were
augmented with flexible non-terminals
[Richardson+, 2016]
• flexible non-terminals make the search space
much bigger -> slower decoding speed,
increased search error
• reduce the number of possible insertion
positions in translation rules by a Neural
Network model
7
Insertion Position Selection
Model for Flexible Non-Terminals
in Dependency Tree-to-Tree
Machine Translation
Toshiaki Nakazawa
Japan Science and Technology Agency
John Richardson Sadao Kurohashi
Kyoto University
4/11/2016 @ EMNLP2016
INSERTION POSITION SELECTION
MODEL
9
Insertion Position Selection Model
• For each insertion position:
–predict
• scores of the insertion positions
–given
• input: the floating word (I) and its parent word
(Ps) with the distance (Ds)
• target: previous (Sp) and next (Sn) sibling words
of the insertion position and the parent (Pt)
with the distance (Dt)
10
Information for Selection Model
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
Input Translation Rules
偶然
[X7]
偶然
found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=
4
[X]
Dt
=
-2
Non-terminals:
reverted to the
original word in
the parallel
corpus
11
[yesterday]
[found]
Information for Selection Model
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
Input Translation Rules
偶然
[X7]
偶然
found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=
4
[X]
Dt
=
-3
= [POST-BOTTOM]
12
[yesterday]
[found]
Neural Network Model
220
I
Ps
Pt
Sp
1
Sn
1
Ds
Dt
k 100100
220220220220
100
word to be inserted
parent of I
distance from PS
previous sibling
next sibling
parent of the
insertion position
distance from Pt
fully-connected
feed-forward network
()
・・・
1
1
1
・・・
insertion position 2
insertion position N
scores
0.1
0.6
・
・
・
0.1
0
1
・
・
・
0
()
softmax gold
loss =
softmax cross-entropy
insertion position 1
13
Training Data Creation
• Training data for the NN model can be
automatically created from the word-aligned
parallel corpus
– consider each alignment as the floating word and
remove it from the target tree
14
私
は
を
見つけた
I
found
by
ピカチュウ
Pikachu
偶然
chance
[X]
[X]
[X]
[X]
label
0
0
0
1
EXPERIMENTS
15
Insertion Position Selection Experiment
• Parallel corpus: ASPEC-JE/JC (2M/680K
sentences)
• Data size
• Comparison
– L2-regularized logistic regression (using Multi-core
LIBLINEAR)
Ja-
>En
En-
>Ja
Ja-
>Zh
Zh-
>Ja
Training 15.7M 5.7M
Development 160K 58K
Test 160K 58K
Ave. # IP 3.39 3.15 3.72 3.41
16
Experimental Results
Ja->En En->Ja Ja->Zh Zh->Ja
Training 15.7M 5.7M
Development 160K 58K
Test 160K 58K
Ave. # IP 3.39 3.15 3.72 3.41
Mean loss 0.089 0.058 0.105 0.056
Top 1 Accuracy (%) 97.08 97.72 96.51 97.99
Top 2 Accuracy (%) 98.94 99.52 98.97 99.56
Logit Accuracy (%) 55.00 89.03 68.04 83.16
17
Translation Experiment
• Parallel corpus: ASPEC-JE/JC (2M/680K
sentences)
• Decoder: KyotoEBMT [Richardson+, 2014]
• 5 Settings
– Phrase-based and hierarchical phrase-based SMTs
– w/o Flex: not using flexible non-terminals
– w/ Flex: baseline with flexible non-terminals
– Prop: using insertion position selection (only top 1)
• BLEU and relative decoding time
18
Translation Experimental Results
Ja->En En->Ja Ja->Zh Zh->Ja
BLEU Time BLEU Time BLEU Time BLEU Time
PBSMT 18.45 - 27.48 - 27.96 - 34.65 -
HPBSMT 18.72 - 30.19 - 27.71 - 35.43 -
w/o Flex 20.28 1.00 28.77 1.00 24.85 1.00 30.51 1.00
w/ Flex 21.61 6.28 30.57 3.30 28.79 5.16 34.32 5.28
Prop 22.07 2.25 30.50 1.27 29.83 2.21 34.71 1.89
19
Conclusion
• Proposed insertion position selection model
to reduced the number of insertion positions
for flexible non-terminals in the translation
rules
• Automatic evaluation scores and decoding
speed are improved
20
Future Work
• Use grand-children’s info
– Recursive NN [Liu et al., 2015] or Convolutional
NN [Mou et al., 2015]
• Shift to NMT!!
– Actually, we’ve already shifted and participated
WAT2016 shared tasks
• However, NMT is still far from perfect
21
J->E Adequacy in WAT2016
22
3.76 3.71
21.75 21
37.25
51.75
46.75
30.50
20.75
26.75
16.25
4.75 5
10.00
1 0.5
6.00
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1
2
3
4
5
3.83Average adequacy
BLEU 26.22 26.39 25.41
Kyoto-U
(NMT)
NAIST/CMU
(NMT)
NAIST
(2015 best, F2T)
Team name
23
Thank You!
AD I’m co-organizing
The 3rd Workshop on Asian Translation
(WAT2016)
in conjunction with COLING 2016
Invited talk by Google about GNMT!
Please come to the workshop!
http://lotus.kuee.kyoto-u.ac.jp/WAT/

More Related Content

Viewers also liked

3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
3-step Parallel Corpus Cleaning using Monolingual Crowd Workers3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
3-step Parallel Corpus Cleaning using Monolingual Crowd WorkersToshiaki Nakazawa
 
第3回アジア翻訳ワークショップの人手評価結果の分析
第3回アジア翻訳ワークショップの人手評価結果の分析第3回アジア翻訳ワークショップの人手評価結果の分析
第3回アジア翻訳ワークショップの人手評価結果の分析Toshiaki Nakazawa
 
G社のNMT論文を読んでみた
G社のNMT論文を読んでみたG社のNMT論文を読んでみた
G社のNMT論文を読んでみたToshiaki Nakazawa
 
Attention-based NMT description
Attention-based NMT descriptionAttention-based NMT description
Attention-based NMT descriptionToshiaki Nakazawa
 
自然言語処理のためのDeep Learning
自然言語処理のためのDeep Learning自然言語処理のためのDeep Learning
自然言語処理のためのDeep LearningYuta Kikuchi
 
最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情Yuta Kikuchi
 
ニューラル機械翻訳の動向@IBIS2017
ニューラル機械翻訳の動向@IBIS2017ニューラル機械翻訳の動向@IBIS2017
ニューラル機械翻訳の動向@IBIS2017Toshiaki Nakazawa
 

Viewers also liked (9)

3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
3-step Parallel Corpus Cleaning using Monolingual Crowd Workers3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
 
第3回アジア翻訳ワークショップの人手評価結果の分析
第3回アジア翻訳ワークショップの人手評価結果の分析第3回アジア翻訳ワークショップの人手評価結果の分析
第3回アジア翻訳ワークショップの人手評価結果の分析
 
G社のNMT論文を読んでみた
G社のNMT論文を読んでみたG社のNMT論文を読んでみた
G社のNMT論文を読んでみた
 
Attention-based NMT description
Attention-based NMT descriptionAttention-based NMT description
Attention-based NMT description
 
NLP2017 NMT Tutorial
NLP2017 NMT TutorialNLP2017 NMT Tutorial
NLP2017 NMT Tutorial
 
自然言語処理のためのDeep Learning
自然言語処理のためのDeep Learning自然言語処理のためのDeep Learning
自然言語処理のためのDeep Learning
 
最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情最近のDeep Learning (NLP) 界隈におけるAttention事情
最近のDeep Learning (NLP) 界隈におけるAttention事情
 
深層学習による自然言語処理の研究動向
深層学習による自然言語処理の研究動向深層学習による自然言語処理の研究動向
深層学習による自然言語処理の研究動向
 
ニューラル機械翻訳の動向@IBIS2017
ニューラル機械翻訳の動向@IBIS2017ニューラル機械翻訳の動向@IBIS2017
ニューラル機械翻訳の動向@IBIS2017
 

Similar to Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in EngineeringPrince Jain
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningRoberto Pereira Silveira
 
TEXT FEUTURE SELECTION USING PARTICLE SWARM OPTIMIZATION (PSO)
TEXT FEUTURE SELECTION  USING PARTICLE SWARM OPTIMIZATION (PSO)TEXT FEUTURE SELECTION  USING PARTICLE SWARM OPTIMIZATION (PSO)
TEXT FEUTURE SELECTION USING PARTICLE SWARM OPTIMIZATION (PSO)yahye abukar
 
metaheuristic tabu pso
metaheuristic tabu psometaheuristic tabu pso
metaheuristic tabu psoheba_ahmad
 
Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017Iwan Sofana
 

Similar to Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree (7)

PSO and Its application in Engineering
PSO and Its application in EngineeringPSO and Its application in Engineering
PSO and Its application in Engineering
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
RLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdfRLTopics_2021_Lect1.pdf
RLTopics_2021_Lect1.pdf
 
TEXT FEUTURE SELECTION USING PARTICLE SWARM OPTIMIZATION (PSO)
TEXT FEUTURE SELECTION  USING PARTICLE SWARM OPTIMIZATION (PSO)TEXT FEUTURE SELECTION  USING PARTICLE SWARM OPTIMIZATION (PSO)
TEXT FEUTURE SELECTION USING PARTICLE SWARM OPTIMIZATION (PSO)
 
metaheuristic tabu pso
metaheuristic tabu psometaheuristic tabu pso
metaheuristic tabu pso
 
Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017Pso kota baru parahyangan 2017
Pso kota baru parahyangan 2017
 
PSO
PSOPSO
PSO
 

Recently uploaded

CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGiovaniTrinidad
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Sérgio Sacani
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosZachary Labe
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxzaydmeerab121
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlshansessene
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxpriyankatabhane
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxJosielynTars
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxMedical College
 

Recently uploaded (20)

CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
Interferons.pptx.
Interferons.pptx.Interferons.pptx.
Interferons.pptx.
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
Gas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptxGas-ExchangeS-in-Plants-and-Animals.pptx
Gas-ExchangeS-in-Plants-and-Animals.pptx
 
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
Observation of Gravitational Waves from the Coalescence of a 2.5–4.5 M⊙ Compa...
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
Explainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenariosExplainable AI for distinguishing future climate change scenarios
Explainable AI for distinguishing future climate change scenarios
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
well logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptxwell logging & petrophysical analysis.pptx
well logging & petrophysical analysis.pptx
 
bonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girlsbonjourmadame.tumblr.com bhaskar's girls
bonjourmadame.tumblr.com bhaskar's girls
 
Loudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptxLoudspeaker- direct radiating type and horn type.pptx
Loudspeaker- direct radiating type and horn type.pptx
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
How we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptxHow we decide powerpoint presentation.pptx
How we decide powerpoint presentation.pptx
 
Introduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptxIntroduction of Human Body & Structure of cell.pptx
Introduction of Human Body & Structure of cell.pptx
 

Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

  • 1. Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree Machine Translation Toshiaki Nakazawa Japan Science and Technology Agency (JST) John Richardson Sadao Kurohashi Kyoto University 4/11/2016 @ EMNLP2016
  • 2. Where to insert? I found Pikachu by chance yesterday insertion positions 0.70.25 0.02 0.01prob. 0.010.01 2
  • 3. Where to insert? I found Pikachu by chance yesterday in the park insertion positions 0.20.1 0.6 0.01 0.01 @Texas State Capitol 0.01 0.1 3
  • 4. Pikachu Dependency Tree-to-Tree Translation 私 は 昨日 公園 で ピカチュウ を 見つけた 私 は を 見つけた I found by Input Translation Rules Output ピカチュウ Pikachu 偶然 [X7] [X7] 偶然 chance I found by [X7] chance 公園 the park 昨日 yesterday で 4
  • 5. Dependency Tree-to-Tree Translation 私 は 昨日 公園 で ピカチュウ を 見つけた 私 は を 見つけた Input Translation Rules Output ピカチュウ Pikachu 偶然 公園 the park [X7] 偶然 昨日 yesterday で [X] [X] [X] [X] found by chance [X] I [X7] found Pikachu by I chance yesterday the park in found Pikachu by I chance yesterday Pikachu I found by chance Flexible Non-terminals [Richardson+, 2016] floating subtree floating subtree 5
  • 6. Translation Quality and Decoding Speed w/ and w/o Flexible Non-terminals • Using ASPEC (Asian Scientific Paper Excerpt Corpus) JE and JC • Time is a relative decoding time Ja->En En->Ja Ja->Zh Zh->Ja BLEU Tim e BLEU Tim e BLEU Tim e BLEU Tim e w/o Flex 20.2 8 1.00 28.7 7 1.00 24.8 5 1.00 30.5 1 1.00 w/ Flex 21.6 1 6.28 30.5 7 3.30 28.7 9 5.16 34.3 2 5.28 6
  • 7. Appropriate Insertion Position Selection • roughly half of all translation rules were augmented with flexible non-terminals [Richardson+, 2016] • flexible non-terminals make the search space much bigger -> slower decoding speed, increased search error • reduce the number of possible insertion positions in translation rules by a Neural Network model 7
  • 8. Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree Machine Translation Toshiaki Nakazawa Japan Science and Technology Agency John Richardson Sadao Kurohashi Kyoto University 4/11/2016 @ EMNLP2016
  • 10. Insertion Position Selection Model • For each insertion position: –predict • scores of the insertion positions –given • input: the floating word (I) and its parent word (Ps) with the distance (Ds) • target: previous (Sp) and next (Sn) sibling words of the insertion position and the parent (Pt) with the distance (Dt) 10
  • 11. Information for Selection Model 私 は 昨日 公園 で ピカチュウ を 見つけた 私 は を 見つけた Input Translation Rules 偶然 [X7] 偶然 found by chance I [X7] I Ps Pt Sp Sn Ds = 4 [X] Dt = -2 Non-terminals: reverted to the original word in the parallel corpus 11 [yesterday] [found]
  • 12. Information for Selection Model 私 は 昨日 公園 で ピカチュウ を 見つけた 私 は を 見つけた Input Translation Rules 偶然 [X7] 偶然 found by chance I [X7] I Ps Pt Sp Sn Ds = 4 [X] Dt = -3 = [POST-BOTTOM] 12 [yesterday] [found]
  • 13. Neural Network Model 220 I Ps Pt Sp 1 Sn 1 Ds Dt k 100100 220220220220 100 word to be inserted parent of I distance from PS previous sibling next sibling parent of the insertion position distance from Pt fully-connected feed-forward network () ・・・ 1 1 1 ・・・ insertion position 2 insertion position N scores 0.1 0.6 ・ ・ ・ 0.1 0 1 ・ ・ ・ 0 () softmax gold loss = softmax cross-entropy insertion position 1 13
  • 14. Training Data Creation • Training data for the NN model can be automatically created from the word-aligned parallel corpus – consider each alignment as the floating word and remove it from the target tree 14 私 は を 見つけた I found by ピカチュウ Pikachu 偶然 chance [X] [X] [X] [X] label 0 0 0 1
  • 16. Insertion Position Selection Experiment • Parallel corpus: ASPEC-JE/JC (2M/680K sentences) • Data size • Comparison – L2-regularized logistic regression (using Multi-core LIBLINEAR) Ja- >En En- >Ja Ja- >Zh Zh- >Ja Training 15.7M 5.7M Development 160K 58K Test 160K 58K Ave. # IP 3.39 3.15 3.72 3.41 16
  • 17. Experimental Results Ja->En En->Ja Ja->Zh Zh->Ja Training 15.7M 5.7M Development 160K 58K Test 160K 58K Ave. # IP 3.39 3.15 3.72 3.41 Mean loss 0.089 0.058 0.105 0.056 Top 1 Accuracy (%) 97.08 97.72 96.51 97.99 Top 2 Accuracy (%) 98.94 99.52 98.97 99.56 Logit Accuracy (%) 55.00 89.03 68.04 83.16 17
  • 18. Translation Experiment • Parallel corpus: ASPEC-JE/JC (2M/680K sentences) • Decoder: KyotoEBMT [Richardson+, 2014] • 5 Settings – Phrase-based and hierarchical phrase-based SMTs – w/o Flex: not using flexible non-terminals – w/ Flex: baseline with flexible non-terminals – Prop: using insertion position selection (only top 1) • BLEU and relative decoding time 18
  • 19. Translation Experimental Results Ja->En En->Ja Ja->Zh Zh->Ja BLEU Time BLEU Time BLEU Time BLEU Time PBSMT 18.45 - 27.48 - 27.96 - 34.65 - HPBSMT 18.72 - 30.19 - 27.71 - 35.43 - w/o Flex 20.28 1.00 28.77 1.00 24.85 1.00 30.51 1.00 w/ Flex 21.61 6.28 30.57 3.30 28.79 5.16 34.32 5.28 Prop 22.07 2.25 30.50 1.27 29.83 2.21 34.71 1.89 19
  • 20. Conclusion • Proposed insertion position selection model to reduced the number of insertion positions for flexible non-terminals in the translation rules • Automatic evaluation scores and decoding speed are improved 20
  • 21. Future Work • Use grand-children’s info – Recursive NN [Liu et al., 2015] or Convolutional NN [Mou et al., 2015] • Shift to NMT!! – Actually, we’ve already shifted and participated WAT2016 shared tasks • However, NMT is still far from perfect 21
  • 22. J->E Adequacy in WAT2016 22 3.76 3.71 21.75 21 37.25 51.75 46.75 30.50 20.75 26.75 16.25 4.75 5 10.00 1 0.5 6.00 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 2 3 4 5 3.83Average adequacy BLEU 26.22 26.39 25.41 Kyoto-U (NMT) NAIST/CMU (NMT) NAIST (2015 best, F2T) Team name
  • 23. 23 Thank You! AD I’m co-organizing The 3rd Workshop on Asian Translation (WAT2016) in conjunction with COLING 2016 Invited talk by Google about GNMT! Please come to the workshop! http://lotus.kuee.kyoto-u.ac.jp/WAT/

Editor's Notes

  1. (top1) group scoring
  2. (top1) group scoring
  3. Struggle for victory
  4. (top1) group scoring
  5. (top1) group scoring