Testing tools and AI - ideas what to try with some tool examples
조음 Goodness-Of-Pronunciation 자질을 이용한 영어 학습자의 조음 오류 진단
1. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Mispronunciation Diagnosis of L2 English
at Articulatory Level Using Articulatory
Goodness-Of-Pronunciation Features
Naver Tech Talk
Hyuksu Ryu1
1Department of Linguistics, Seoul National University, Seoul, Korea
July 3, 2017
2. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Table of Contents
1 Introduction
2 Articulatory features
3 Method
4 Quantitative analysis of salient mispronunciation
5 Experiments
6 Conclusion
3. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Outlines
1 Introduction
2 Articulatory features
3 Method
4 Quantitative analysis of salient mispronunciation
5 Experiments
6 Conclusion
4. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Introduction
CALL/CAPT
• Computer-Assisted Language Learning
• Computer-Aided Pronunciation Training
Mispronunciation detection & diagnosis
• Necessary for conducting effective CALL/CAPT
Previous works regarding mispronunciation detection
• Extended recognition network (ERN) based approach
• Harrison et al. (2009)
• Confidence score based approach
• Franco et al. (1997), Witt & Young (2000)
5. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Introduction
Mispronunciation detection - ERN
• Expands pronunciation dictionaries of learners
• By predicting frequent erroneous pronunciation sequences
• When the erroneous pronunciation seq. are recognized
• Considered learners made pronunciation error
• Drawbacks
• difficult to identify mispronunciation patterns that learners
frequently show in terms of each L1-L2 pair
• difficult to guarantee that ERN covers most of the
possible mispronunciations
6. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Introduction
Mispronunciation detection - Confidence score
• Goodness-Of-Pronunciation (Witt & Young, 2000)
• Virtues
• easy to compute
• L1/L2 independence
• Drawbacks
• difficult to provide corrective feedback
learners do not know how to interpret confidence score alone
• Diagnosis for the detected errors are not provided
7. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Introduction
Previous works regarding diagnosis for mispronunciation
• Li et al. (2017)
• suggested multi-distribution DNN
• using acoustic features, grapheme, and canonical pronunciation
as input
• to predict actual pronunciation learners
• predicted pronunciation = canonical pronunciation →
mispronunciation
• Xie et al. (2016)
• extracted landmark features for nasal codas
• spoken by learners of Chinese
• detected pronunciation errors by applying SVM
• diagnose mispronunciation by recognition and detection results
8. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Introduction
In which way diagnosis is performed?
Pronunciation segments
Mispronunciation Correct pronunciation
False Acceptance
(FA)
True Rejection
(TR)
True Acceptance
(TA)
False Rejection
(FR)
Correct Diagnosis
(CD)
Diagnostic Error
(DE)
1 Pronunciation error detector
• distinguishes b/w mispronunciation & correct pronunciation
2 Mispronunciation Diagnosis
• carried out for instances which are correctly detected as
mispronunciations (True Rejection)
• diagnosis performance - DER (diagnosis error rate)
• defined as the % of incorrectly recognized among TR
9. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Introduction
Limitation of hierarchical approaches for diagnosis
• Provide diagnosis at phone level only
• example: ‘give’ /gIv/ as /gIb/
• if detect errors & recognize the phone as /b/
• the system reports a diagnosis of /v/→/b/
• Had better provide diagnosis information at articulatory level
• for more effective feedback
• diagnosis of fricative → stop, rather than /v/→/b/
• 2-step diagnosis procedure: detection & recognition
• detection errors and recognition errors are piled up
• affect diagnosis accuracy
10. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Introduction
Previous studies using articulatory features
• Ryu & Chung (2016)
• propose articulatory Goodness-Of-Pronunciation
• as novel features for pronunciation assessment in English
• Li et al. (2016a)
• extend GOP into speech attributes
• to detect mispronunciation of onset consonants in learners’
Chinese
Goal of this paper
• Propose a method to provide an articulatory diagnosis
• in English produced by Korea learners
• using articulatory Goodness-Of-Pronunciation features
• based on the distinctive feature theory
11. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Outlines
1 Introduction
2 Articulatory features
3 Method
4 Quantitative analysis of salient mispronunciation
5 Experiments
6 Conclusion
12. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Distinctive features
Phoneme
• The smallest unit that distinguishes meaning b/w words in a
particular language
Distinctive features
• Chomsky and Halle (1968)
• The minimum unit that discriminates phonemes in a language
• Differentiated by phonological features (Hayes 2008)
• makes the two phonemes ‘distinctively’ different
• /p/: [-voice] & /b/: [+voice]
Natural class
• A set of distinctive features
• Phoneme - represented by natural classes
• /p/: [-voice, -sonorant, -continuant, . . . , +labial]
13. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Distinctive features
Characteristics of distinctive features
• Binary values
• present / absent
• Possible to distinguish phonemes by multiple distinctive
features
• /p/: [-voice, -sonorant, -delayed release, . . . +labial]
• /d/: [+voice, -sonorant, -delayed release, . . . -labial]
• Articulatory properties
• articulatory features in this paper → distinctive features
• based on Hayes(2008)
14. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
List of Distinctive features
24 Articulatory attributes (distinctive features)
• In terms of categories of Manner, Place, and Laryngeal
Cat. Attribute Phonemes
Manner
Consonantal
/p, b, m, f, v, T, D, t, d, s, z, m, n, l, Ù, Ã, S, Z, ô, j,
k, g, N, h, w/
Sonorant
/m, n, l, ô, j, N, w, i, u, I, U, E, o, 2, O, æ, A, aU, aI,
eI, OI, Ä/
Continuant
/f, v, T, D, s, z, l, S, Z, ô, j, h, w, i, u, I, U, E, o, 2, O,
æ, A, aU, aI, eI, OI, Ä/
Approximant /l, ô, j, w, i, u, I, U, E, o, 2, O, æ, A, aU, aI, eI, OI, Ä/
Delayed release /f, v, T, D, s, z, Ù, Ã, S, Z/
Nasal /m, n, N/
Stop /p, b, t, d, k, g/
Fricative /f, v, T, D, s, z, S, Z/
Affricate /Ù, Ã/
15. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
List of Distinctive features
24 Articulatory attributes (distinctive features)
• In terms of categories of Manner, Place, and Laryngeal
Cat. Attribute Phonemes
Place
Labial /p, b, m, f, v, u, U, o, O, aU, OI/
Round /w, u, U, o, O, aU, OI/
Labiodental /f, v/
Coronal /T, D, t, d, s, z, n, l, Ù, Ã, S, Z, ô, Ä/
Anterior /T, D, s, z, n, l, Ä/
Distributed /T, D, Ù, Ã, S, Z, ô, Ä/
Strident /s, z, Ù, Ã, S, Z/
Lateral /l/
Dorsal /j, k, g, N, w/
High /j, k, g, N, w, i, u, I, U, aI, eI, OI/
Low /æ, A, aU, aI/
Front /j, i, I, E, æ, aI, eI, OI/
Back /w, u, U, o, 2, O, A, aU, aI, OI/
Tense /j, w, i, u, E, o, OI, eI, Ä/
16. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
List of Distinctive features
24 Articulatory attributes (distinctive features)
• In terms of categories of Manner, Place, and Laryngeal
Category Attribute Phonemes
Laryngeal Voice
/b, m, v, D, d, z, n, l, Ã, Z, ô, j, g, N, w, i, u, I,
U, E, o, 2, O, æ, A, aU, aI, eI, OI, Ä/
17. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Goodness-Of-Pronunciation (GOP)
Goodness-Of-Pronunciation (GOP)
• Suggested by Witt & Young (2000)
• To detect individual pronunciation errors
• Defined as the normalized posterior probability
• The distance b/w the phone of learners & native AM
GOP ≡
log P(op|p)
N(p)
−
log maxI
i=1P(op|qi )
N(p)
• N(p): # of frames composing the target phone p
• P(op|qi ): the prob. of observing op given the phone qi
18. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Articulatory GOP (aGOP)
Articulatory GOP (aGOP)
• Suggested in this paper
• Used to compare articulatory characteristics b/w natives and
learners w.r.t articulatory attributes
• Also used for pronunciation assessment (Ryu & Chung 2016)
aGOPk
(p) ≡
log P(op|qk)
N(p)
−
maxi P(op|qk
i )
N(p)
• k: the sort of articulatory attribute
• qk: the canonical value of the kth articulatory attribute at the
position of the forced-aligned target segment p
19. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Previous study using articulatory features
Li et al. (2016b)
• Mispronunciation detection of Mandarin learners
• Focused on mispronunciation detection of onset consonants
• Articulatory modeling in terms of categories
• only 4 articulatory models; manner, place, voice, aspiration
• each category - multiple attributes
• limitation that low performance when the category has
multiple attributes, such as place (Li et al. 2016a)
This study
• Articulatory modeling based on each attribute
• binary modeling: presence/absence
• Specify articulatory attributes in more details based on the
phonological theory
• more various articulatory information
• use them for mispronunciation diagnosis
20. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Outlines
1 Introduction
2 Articulatory features
3 Method
4 Quantitative analysis of salient mispronunciation
5 Experiments
6 Conclusion
21. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Corpus and Annotation
Corpus
• ETRI English speech corpus produced by Korean learners
• 21,110 sentences (21 hours)
• 151 learners
Annotation
• Phone-level transcription
• Ten Korean annotators
• expertise in phonetics/phonology
• experience in phone-level transcription
• 88.13% of phone-level agreement (Ryu et al. (2012))
22. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Acoustic model
Acoustic model
• AM for English native speech
• Using WSJ corpus of 37,000 sentences
• CD-DNN-HMM AM
• 39-Dim. MFCC+∆+∆∆
• using the default configurations of the Kaldi toolkit
• In addition to phone AM
• articulatory AM also trained in terms of articulatory attributes
• in order to compute aGOPs
23. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Diagnosis modeling
Articulatory diagnosis framework
Forced alignment/Recognition
GOP/aGOPs extraction
Is forced-aligned segment
a consonant?
Yes
Voicing/Place/Manner
Diagnosis
Rounding/Height/Backness
Diagnosis
No
24. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Diagnosis modeling
Articulatory diagnosis
• Based on forced-alignment, examine whether the
corresponding segment is a consonant or a vowel
• Articulatory diagnosis in the case of consonants
• voicing
• place of articulation
• manner of articulation
• Articulatory diagnosis in the case of vowels
• rounding
• height
• backness
26. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Articulatory diagnosis framework
Articulatory Diagnosis for Consonants
• Explanatory variables: 24aGOPs + GOP
• Response variable:
• Binary value - correct/incorrect at each articulatory level
• by comparing canonical pronunciation & the actual realization
• Example of /T/→/s/
phone voice place manner
canonical /T/ voiceless dental fricative
actual /s/ voiceless alveolar fricative
response correct incorrect correct
27. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Articulatory diagnosis framework
Articulatory diagnosis modeling
• Feed-Forward Neural Network (FFNN)
• for each articulatory-level diagnosis
• Implemented by TensorFlow (Abadi et al., 2015)
• Hyper-parameters & configurations
• # of hidden layers: [3, 4, 5, 6, 7]
• # of nodes per layer: [128, 256, 512, 1024]
• act. func.: Exponential Linear Unit (Clevert et al., 2016)
• dropout rate: 0.5
• weight initialization: He initialization (He et al., 2015)
• learning rate: 0.005
• 10,000 epochs & early stopping based on the accuracy of the
validation set
33. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Quantitative analysis
Determining the most noticeable variations
• Appear only in the learners’ speech
• Choose variations more frequent than in native speech (Hong
et al., 2014) among salient phones
• /d, t/
1 deletion in consonant clusters (‘just’ /Ã2st/→/Ã2s/)
2 flapping (‘body’ /bAdi/→/bARi/)
• such variations - frequent in natives’ speech (Hong et al., 2014)
• not included in the list of the most noticeable variations
• Adopting the analysis of Hong et al. (2014)
• Consider the most noticeable variations → salient
mispronunciation patterns
34. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Quantitative analysis
Salient mispronunciations in consonants at articulatory level
Level Canon. Act. Example Freq. RatioVoicing
/z/
/s/
does
2,935 85.69%
(3,425) /d2z/→/d2s/
/v/
/f/
love to
305 21.27%
(1,434) /l2v tU/→/l2f tU/
Place
/D/
/d/
this
3,235 92.75%
(3,488) /DIs/→/dIs/
/s/
thing
213 34.75%
/T/ /TIN/→/sIN/
(613)
/t/
thank
331 54.00%
/TæNk/→/tæNk/
Manner
/D/
/d/
this
3,235 92.75%
(3,488) /DIs/→/dIs/
/T/
/s/
thing
213 34.75%
(613) /TIN/→/sIN/
/v/
/b/
give
766 53.42%
(1,434) /gIv/→/gIb/
35. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Quantitative analysis
Salient mispronunciations in consonants at articulatory level
• Voicing
• devoicing
• /z/→/s/: mainly occurs at word final
• /v/→/f/: mostly caused by regressive assimilation
• Place of articulation
• dental→alveolar
• do not exist in L1 phonemes
• Manner of articulation
• fricative→stop
• learners fail to produce fricative
• which do not exit in L1
• substitute them w/ their corresponding stops
36. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Quantitative analysis
Salient mispronunciations in vowels at articulatory level
Level Canon. Act. Example Freq. RatioRound
/A/
(2,381)
/o/
project
295 12.39%/prAÃEkt/→
/proÃEkt/
Height
/A/
(2,381)
/o/
project
295 12.39%/prAÃEkt/→
/proÃEkt/
/O/
(1,690)
/o/
law
735 43.49%
/lO:/→/lo/
/2/
(6,317)
/A/
another
1,106 17.51%/@n2DÄ/→
/@nADÄ/
/æ/
and
1,030 16.31%
/2nd/→/ænd/
Backness
/2/
(6,317)
/æ/
and
1,030 16.31%
/2nd/→/ænd/
/E/
Helen
654 10.35%
/hEl@n/→/hElEn/
37. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Quantitative analysis
Salient mispronunciations in vowels at articulatory level
• Rounding
• unrounded→rounded
• Height
• raising: low→mid
• lowering: mid→low
• Backness
• fronting: back→front
Reason for variations
• Not exist L1 and replace it w/ the most similar phoneme
• /O/→/o/
• Orthographic interference (Hong et al. 2015)
• ‘project’/prAÃEkt/→/proÃEkt/
• influenced from the grapheme ‘o’ for /A/
38. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Outlines
1 Introduction
2 Articulatory features
3 Method
4 Quantitative analysis of salient mispronunciation
5 Experiments
6 Conclusion
39. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Experimental setup
Articulatory diagnosis experiment
• Based on the corpus analysis of salient mispronunciations
• 7 salient phones
Data balancing
• Correct » incorrect → bias problem
• Adopt other phones’ correctly pronounced observations →
mispronounced samples of the target segment (Li et al., 2016)
Data split
• Training : test = 8:2
• 1:1 balance of correct/incorrect in training & test set
• Augmented instances - only in training set
• Validation = 20% of training set
• to determine hyper-parameters of FFNN
40. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Experimental setup
Details of training, validation, and test sets
Cat. Phone Training (Validation) Test Total
consonant
/z/ 14,685 (2,937) 3,671 18,356
/D/ 18,367 (3,673) 4,591 22,958
/T/ 4,447 (889) 1,111 5,558
/v/ 12,893 (2,578) 3,223 16,116
vowel
/A/ 14,314 (2,862) 3,578 17,892
/O/ 13,624 (2,724) 3,405 17,029
/2/ 61,077 (12,215) 15,269 76,346
41. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Experimental results
Performance of articulatory diagnosis in consonants
• In average: > 70% accuracy & .75 F1 score
• The proposed - effective for articulatory diagnosis
Phone Level Accuracy Precision Recall F1
/z/
voicing 70.14% 0.683 0.890 0.773
place 85.57% 0.857 0.877 0.867
manner 79.38% 0.821 0.825 0.823
/D/
voicing 83.60% 0.837 0.898 0.866
place 60.50% 0.623 0.670 0.646
manner 62.13% 0.632 0.852 0.726
/T/
voicing 79.68% 0.814 0.857 0.835
place 65.83% 0.672 0.697 0.684
manner 71.76% 0.761 0.830 0.794
/v/
voicing 80.18% 0.821 0.859 0.840
place 75.43% 0.795 0.842 0.818
manner 71.40% 0.751 0.815 0.782
average
voicing 78.40% 0.789 0.876 0.828
place 71.83% 0.737 0.772 0.754
manner 71.17% 0.741 0.831 0.781
42. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Experimental results
Performance of articulatory diagnosis in consonants
• Performance of place for /D, T/
• slightly lower than average
• Why?
• inter-dental fricative
• relatively small amount of amplitude (low energy)
• difficult to distinguish mispronunciation
• these factors affect the performance
44. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Experimental results
Performance of articulatory diagnosis in vowels
• Low performance for certain articulatory level
• Training sets contain variations to diphthongs
• /A/→/aI/
• Diphthongs - drastic articulatory change within a segment
• ex. /OI/
• mid→high in height
• back→front in backness
45. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Outlines
1 Introduction
2 Articulatory features
3 Method
4 Quantitative analysis of salient mispronunciation
5 Experiments
6 Conclusion
46. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Conclusion
In this paper,
• We proposed a method to provide an articulatory diagnosis
• in English spoken by Korean learners
• using articulatory Goodness-Of-Pronunciation (aGOP) features
• based on the distinctive feature theory in Phonology
So far, previous studies regarding diagnosis have limitation
• Carried out diagnosis at phone level
• Need to be performed at articulatory level for corrective
feedback
47. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Conclusion
We performed
• Articulatory diagnosis modeling
• consonants: voicing, place, and manner of articulation
• vowels: rounding, height, and backness
• Corpus-based analysis of salient mispronunciation patterns
By the results,
• The proposed method for articulatory diagnosis
• > 70% accuracy & > .75 of F1-score for all articulatory levels
• except height in vowels
• Effective mispronunciation diagnosis at articulatory level by
the proposed method
48. Introduction Articulatory features Method Quant. Analysis Experiments Conclusion
Conclusion
Limitations
• Only decides the pronunciation is correct or not at the
articulatory level
• Not provide corrective feedback on how to correct the
pronunciation
In future work,
• Need to extend the experiment to provide corrective feedback
at articulatory levels