2. Part of this slide is stolen from the slide of Kohen
(www.statmt.org)
2
3. 3
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
7. The history of machine translation
• 1629
- Proposed universal language by René Descartes
- Different tongues shares one set of symbols
• 1947
- First computer used transistors instead of vacuum tubes
• 1949 ~
- Rule-based machine translation
• 1954
- First demo by IBM
• 1993 ~
- Statistical machine translation
• 2013 ~
- Neural machine translation
7
8. Rule-based translation systems
• Translation rules created by experts of
linguistics
• Hard to maintain or update
• The performance is still (or almost) the state-of-
the-art
8
10. 10
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
14. Evaluation of SMT
• BLEU
- n-gram matching (usually 4-gram)
• NIST
- Content words are more important
• RIBES (Hideki Isozaki, 2010)
- Order is also important
- Better for SVO-to-SOV language pairs
14
16. A brief history of the development of SMT
• 1990 ~ 2000
- Word-based models (IBM models)
- Brown, Och, Ney.
• 2003
- Phrase-based models
- Philip Kohen
• 2005;2007
- Hierarchical Phrase-based models
- David Chiang
• 2010 ~
- Tree models, Factor models
16
17. Language model
• Modelling p(the dog is sparking)
- In order to know which candidate is more natural
• Markov Assumption
• 5-gram model is mostly used in SMT
17
22. How to get word alignments
• In short
- Run giza++ with parallel corpus
- Wait for 5 hours
• Technically
- 5 IBM models, HMM models, EM algorithm
22
27. Phrase-based translation model
27
He goes to the curry restaurant
Group into phrases
He goes to the curry restaurant
Translate
彼は ⾏く に カレー屋
Reorder
彼は ⾏くにカレー屋
39. Resources of SMT
• Parallel corpus
- LDC datas
- www.ldc.upenn.edu
- Europarl corpus
- Danish, Dutch, English, Finnish, French,
- German, Greek, Italian, Portuguese, Spanish, Swedish
- Japanese
- NTCIR-8 (3M) , ASPEC (3M)
• Word alignment software
- GIZA ++, Berkeley aligner
• Language modelling
- SRILIM, Berkeley LM, KenLM
• Decoder
- Moses (maintained by the group of Kohen)
- Travatar (Graham Neubig)
39
40. 40
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
41. Recent developments of SMT
• Advances in decoders
• Super-large-scale language model
- language model compression
• Margin Infused Relaxed Algorithm (MIRA)
- train the hyper parameters in a smart way
• Tree models
- Tree-to-Tree translation
- String-to-Tree translation
- Tree-to-String translation
- Forest-to-String translation *
- Robust to parsing errors
• Factor models
• Pre-reordering
41
42. What is a parse tree
42
Context-free grammar Dependency grammar
44. Pre-reordering phrase-based translation model
44
He goes to the curry restaurant
He the curry restaurant
Group into phrases
He the curry restaurant
Translate
彼は ⾏くにカレー屋
Pre-reordering
to goes
goesto
45. Example of pre-reordering
45
寿命 の 向上 が 実用 化 の 大きな 課題 で あ る 。
the life of the improvement va_nsubjpass the practical application of a large problem is .
Restructured parse tree
the improvement of the life is a large problem of the practical application.
Original input
Reordered input
Reference
47. 47
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
48. Problem of conventional SMT
• Under-fitting (non-parametric approach)
• Solution:
- Deep recurrent neural networks
48
57. Evaluation result: evaluation scores
57
BLEU RIBES HUMAN JPO
Baseline phrase-based SMT 29.80 0.691
Baseline hierarchical phrase-based SMT 32.56 0.746
Baseline Tree-to-string SMT 33.44 0.758 30.00
Submitted system 1
(NMT)
34.19 0.802 43.50
Submitted system 2
(NMT + System combination)
36.21 0.809 53.75 3.81
Best competitor 1: NAIST
(Travatar System with NeuralMT Reranking)
38.17 0.813 62.25 4.04
Best competitor 2: naver
(SMT t2s + Spell correction + NMT reranking)
36.14 0.803 53.25 4.00
58. (Option) Finding & Insights
‣ Soft-attention models outperforms multi-layer
encoder-decoder models
‣ Training models on pre-reordered data hurts
the performance
‣ NMT models tend to make grammatically
valid but incomplete translations
58
59. 59
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
60. Can’t use monolingual data
• Deep fusion (Gulcehre et al., 2015)
• Integrate a neural language model trained on massive
monolingual corpus
60
61. The attention mechanism is not perfect
• Local search (Minh-Thang Luong, 2015)
61
Local search modelGlobal search model
63. Translation does not cover all the words
• Coverage-based NMT model (Zhaopeng Tu et al., 2016)
63
64. Objective function is bad
• Cross-entropy is too much different to BLEU
• Solutions:
- (1) Data as demonstrator (Bengio et al., 2015)
64
65. Objective function is bad (cont.)
• Cross-entropy is too much different to BLEU
• Solutions:
- (2) Mixed REINFORCE (Ranzato et al., 2016)
65
66. Objective function is bad (cont.)
• Cross-entropy is too much different to BLEU
• Solutions:
- (3) Minimum Risk Training (Shen et al., 2015)
66
Objective of MRT
6 BLEU gain in Chinese-English task
67. Large vocabulary problem
• The problem
- English vocab. has 700K words
- So I set the size of output layer to 700K
- Then I get memory error
• Solutions
- I still want to use 700K vocab.
- Noise-contrastive estimation (Gutmann and Hyvarinen, 2010)
- Clustering (Mikolov. et al., 2013)
- Approximate Learning Approach (Jean et al., 2015)
- I give up, cut it to 80K vocab. and recover <UNK> tokens
- Positional unknown model (Minh-Thang Luong et al, 2015)
67
68. 68
Agenda
• Overview of the history
• Statistical machine translation
• Recent developments in SMT
• Neural machine translation
• Some problems of NMT
• Futures of MT
69. Future of MT
• Semantic preserving translation
• Character/sub-word level models
• Translation in context
• Low-resource translation
- Knowledge transfer
- Multilingual translation
69