5. ACL 2019 BEA Challenge
• Building Educational Application 2019: Shared Task
• Restricted Track
• Public data only
• Low Resource Track
• WI+Locness dev (4K) only
5
6. Data
6
• Data sources for each track
Lang8 NUCLE FCE WI+Locness
Description
Online English
learning site
College student
essays
ESL exam
questions
English essays
(native & non-native)
Data size
(sentences)
570K 21K 33K
33K (train)
4K (dev)
4K (test)
Quality Relatively poor Good Good Good
Restricted Track Low Resource Track
Train Lang8, NUCLE, FCE, WI-train WI-dev-3k
Template WI-train WI-dev-3k
Fine-tuning WI-train WI-dev-3k
Validation WI-dev WI-dev-1k
7. ERRANT
• ERRor Annotation Toolkit (Bryant et al., 2017)*
• Automatically annotate parallel English sentences with
error type information
• Extract the edits, and then classify them according to a
rule-based error type framework
* Christopher Bryant, Mariano Felice, and Ted Briscoe. 2017. Automatic annotation
and evaluation of Error Types for Grammatical Error Correction. In Proceedings of
the 55th Annual Meeting of the Association for Computational Linguistics (Volume
1: Long Papers). Vancouver, Canada.
7
8. ERRANT
Input
Travel by bus is exspensive, bored and annoying.
Output
[Travel→Travelling] by bus is [exspensive→expensive],
[bored→boring] and annoying.
8
R:SPELL
R:VERB:FORM
R:VERB:FORM
10. GEC as Low-resource Machine
Translation*
• Translating from erroneous
to correct text
• Techniques proposed for
low-resource MT are
applicable to improving
neural GEC
* M. Junczys-Dowmunt, R. Grundkiewicz, S. Guha, K. Heafield: Approaching Neural
Grammatical Error Correction as a Low-Resource Machine Translation Task, NAACL
2018.
10
11. Denoising Autoencoder
• Learns to reconstruct the original input given its
noisy version
• Minimize the reconstruction loss
𝐿(𝑥, dec(enc )𝑥 )
given an input 𝑥 and a noising function 𝑓 𝑥 = )𝑥
11
12. Copy-augmented Transformer*
• Combines Transformer with copy scores
• Copy score: softmax outputs of the encoder-decoder
attention
• Pretrained on denoising
autoencoding task
• Auxiliary losses
• Token-level labeling
• Sentence-level
copying
* Zhao, Wei, et al. "Improving Grammatical Error Correction via Pre-Training a Copy-
Augmented Architecture with Unlabeled Data.” NAACL (2019).
12
14. Pipeline
Pre
processing
• Context-aware
spell checker
• BPE
segmentation
Pre-
training
• Error
extraction
• Perturbation
Training Fine-tuning
Post
processing
• <unk> edit
removal
• Re-rank
• Error type
control
14
Sequential transfer learning
15. Preprocessing
• Context-aware spellchecker
• Example:
• This is an esay about my favorite sport.
• This is an esay question.
• Incorporates context using a pre-trained neural language
model (LM)
• Fix casing errors with list of proper nouns
• Byte pair encoding (BPE) segmentation
15
16. Pre-training
• Pre-training a seq2seq model on a denoising task
• Realistic noising scenarios
• Token-based approach
• Extract human edits from annotated GEC corpora
• Missing punctuations (adding a comma), preposition errors
(of→at), verb tenses (has→have)
• Type-based approach
• Use a priori knowledge
• Replace with other prepositions, nouns with their
singular/plural versions, verbs with one of their inflected
versions
16
17. Pre-training
• Generating pre-training data
• Generate erroneous sentences from high-quality English
corpora
• If a token is exists in the dictionary of token edits
• A token-based error is generated with the probability 0.9
• If a token is not processed
• Apply a type-based error
17
Source Gutenberg Tatoeba WikiText-103
Size
(# sentences)
11.6M 1.17M 3.93M
18. Training and Fine-tuning
• Model
• Transformer*
• Copy-augmented Transformer
• Fine-tuning
• Both the development & test sets come from the same
source (WI+Locness)
• Use smaller learning rates
* Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information
processing systems. 2017.
18
19. Postprocessing
• <unk> recovery
• Infrequent tokens will be changed to <unk> by BPE
tokenization
• LM re-ranking
• Generate sentences which are corrected or not
corrected for each changed place, and calculate their
perplexity
• Error type control
• Randomly choose some categories to drop and calculate
ERAANT F0.5 score in valid set
19
22. Context-Aware Spellchecking
• Our spellchecker incorporates context to hunspell
using a pre-trained neural language model (LM)
22
Add LM-based approach
Fixing casing issues
23. Comparison of error generation
• Performance gap
decreases on
Restricted Track
• Our pre-training
functions as proxy for
training
23
24. Result on error types
• Token-based error generations
• Type-based error generations
• Context-aware spellchecker
• Challenging to match human
annotators’ “naturalness” edits
24