Interspeech 2019 Survey Talk: When Attention Meets Speech Applications

ASAPP, One World Trade Center, 80th Floor, New York, 10007
asapp.com Confidential - Not for further distribution
Kyu J. Han, Ramon Prieto, Tao Ma
When Attention Meets
Speech Applications
September 16, 2019

Confidential - Not for further distribution
Intro
“ATTENTION” In Interspeech 2019
Very Deep Self-attention Networks for End-to-End Speech Recognition
Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention
Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention
Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile
Phonetically-aware embeddings - Wide Residual Networks with Time-Delay Neural
Networks and Self Attention models for the 2018 NIST Speaker Recognition Evaluation
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews
RWTH ASR System for LibriSpeech: Hybrid vs Attention
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Large Margin Training for Attention Based End-to-End Speech Recognition
Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models
Attention model for articulatory features detection
Attention based Hybrid I-vector BLSTM Model for Language Recognition
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
Self Attention in Variational Sequential Learning for Summarization
Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling
Conversational Emotion Analysis via Attention Mechanisms
An analysis of local monotonic attention variants
Lattice generation in attention-based speech recognition models
A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting
Individual differences in implicit attention to phonetic detail in speech perception
Learning how to listen: A temporal-frequential attention model for sound event detection
An Online Attention-based Model for Speech Recognition
Online Hybrid CTC/Attention Architecture for End-to-end Speech Recognition
The influence of distraction on speech processing: How selective is selective attention?
Environment-dependent Attention-driven Recurrent Convolutional Neural Network for Robust Speech Enhancement
Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation
Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition
Multi-task multi-resolution char-to-BPE cross-attention decoder for end-to-end speech recognition
Multi-Stride Self-Attention for Speech Recognition
Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
Attention-based word vector prediction with LSTMs and its application to the OOV problem in ASR
Multi-stream Network With Temporal Attention For Environmental Sound Classification
Few-Shot Audio Classification with AttentionalGraph Neural Networks
Vectorized Beam Search for CTC-Attention-based Speech Recognition
Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Spatio-Temporal Attention Pooling for Audio Scene Classification
Multi-Scale Time-Frequency Attention for Rare Sound Event Detection
A new time-frequency attention mechanism for TDNN and CNN-LSTM-TDNN - with application to language identification
An Attention-Based Hybrid Network for Automatic Detection of Alzheimer’s Disease from Narrative Speech
Automatic Hierarchical Attention Neural Network for Detecting Alzheimer’s Disease
Neural Text Clustering with Document-level Attention based on Dynamic Soft
Labels
End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw
Waveform
Pyramid Memory Block and Timestep Attention for Speech Emotion Recognition
Cross-AttentionEnd-to-End ASR for Two-Party Conversations
Self-AttentionTransducers for End-to-End Speech Recognition
Variational Attentionusing Articulatory Priors for generating Code Mixed Speech using Monolingual Corpora

Intro
● Around 50 papers, with the titles including “ATTENTION”
● Diverse areas being applied
○ Speech recognition
○ Speaker recognition
○ Language recognition
○ Emotion recognition
○ Speech synthesis
○ Audio classification
○ Event detection
○ Semantic classification
“ATTENTION” In Interspeech 2019

TABLE OF CONTENTS
1. Attention
2. Attention in Speech Recognition
3. Attention in Speaker Recognition
4. Pay Attentions on Challenges!
5. Conclusions / Q&A

● Understands where to pay more
attention
ATTENTION
Attention
Source: commons.wikimedia.org

attention
● Common to humans
○ Visual attention
ATTENTION
Attention

Attention
attention
ATTENTION

Attention
Source: giphy.com
attention
○ Auditory attention
ATTENTION

Attention
Source: cbsnews.com
attention
○ Social attention
ATTENTION

Attention
Source: giphy.com
attention
● Common to human decision making
○ Family meeting
ATTENTION

Attention
Source: metroatlantahome.com
attention
○ Family meeting
○ House price
ATTENTION

Attention
ATTENTION
attention
○ Family meeting
○ House price
● In neural networks,
○ “Generating sequences with
RNNs”, by A. Graves (2013)

Attention
ATTENTION
attention
○ Family meeting
○ House price
A. Graves, "Generating sequences with
recurrent neural networks", 2013.

Attention
ATTENTION
attention
○ Family meeting
○ House price
■ Soft windowing

Attention
ATTENTION
attention
○ Family meeting
○ House price
■ Soft windowing
■ Gaussian convolution
■ Location-aware attention

Attention
ATTENTION
attention
○ Family meeting
○ House price
■ Soft windowing
○ “Neural machine translation by
jointly learning to align and
translate”, D. Bahdanau, K. Cho
and Y. Bengio (2014/2015)

Attention
ATTENTION
attention
○ Family meeting
○ House price
■ Soft windowing
■ Content-aware attention
D. Bahdanau, et al., ”Neural machine
translation by jointly learning to align
and translate", 2014/2015.
:

Attention
ATTENTION
attention
○ Family meeting
○ House price
■ Soft windowing
○ “Attention is all you need”, A.
Vaswani, et al. (2017)

Attention
ATTENTION
attention
○ Family meeting
○ House price
■ Soft windowing
○ “Attention is all you need”, A.
Vaswani, et al. (2017)
■ Multi-head attention
■ No-recurrence
A. Vaswani, et al., ”Attention is all you
need", 2017.

Attention in End-to-End ASR
CTC RNN Transducer Seq-to-Seq
R. Prabhavalkar, et al., ”A comparison of
sequence-to-sequence models for
speech recognition", 2017.

● CTC + attention (2018)
○ Hybrid attention
○ Implicit LM
○ Component attention
○ 20% (relative) impr. In WER
A. Das, et al., ”Advancing connectionist
temporal classification with attention
modeling", 2018.

● RNN-T + attention (2017)
○ Combines RNN-T w/ attention
○ Content-aware attention
○ Marginal impr. obtained
RNN-T RNN-T w/ Attention

Seq-to-Seq

● Same structure with Bahdanau’s
neural translation model (2014/15)
First Attention in Speech

● Same structure with Bahdanau’s
neural translation model (2014/15)
○ Encoder-decoder architecture w/ attention
First Attention in Speech
J. Chorowski, et al., “End-to-end continuous
speech recognition using attention-based
recurrent NN: First results", 2014/15.

● ARSG using hybrid attention (2015)
○ Addressed the limitation of content-aware
attention  hybrid attention
Attention-based Recurrent Sequence Generator

● ARSG using hybrid attention (2015)
○ Addressed the limitation of content-aware
attention  hybrid attention
Attention-based Recurrent Sequence Generator
(F: Convolving matrix)
J. Chorowski, et al., “Attention-based
models for speech recognition", 2014/15.

● Two improvements for LVCSR (2016)
○ Windowing on attention during training
○ Frame pooling
■ Similar with LAS’s pyramidal encoder
structure
Improved ARSG
D. Bahdanau, et al., “End-to-end attention-
based large vocabulary speech recognition",
2016.

● Combination w/ CTC objective (2017)
○ Joint CTC/attention decoding
○ Main model architecture in ESPnet
(https://github.com/espnet/espnet)
Multi-Objective Training
S. Watanabe, et al., “Hybrid CTC/attention
architecture for end-to-end speech
recognition", 2017.

● LAS (2015)
○ Pyramidal encoder structure from
downsampling
Listen, Attend and Spell

● LAS (2015)
○ Pyramidal encoder structure from
downsampling
Listen, Attend and Spell
W. Chan, et al., “Listen, attend and
spell", 2015.

● Multi-head attention (2018)
○ Inspired by Transformer (A. Vaswani, 2017)
○ Replacing single head attention
Further Development of LAS
C. Chiu, et al., ”State-of-the-art speech
recognition with sequence-to-sequence
models", 2018.

● Multi-head attention (2018)
○ Inspired by Transformer (A. Vaswani, 2017)
○ Replacing single head attention
● SpecAugment (2019)
○ Data augmentation to LAS
○ Achieved state-of-the-art results on LibriSpeech and
SWBD
Further Development of LAS
C. Chiu, et al., ”State-of-the-art speech
recognition with sequence-to-sequence
models", 2018.
D. Park, et al., “SpecAugment: A simple data
augmentation method for automatic

Performance of Seq-to-Seq w/ Attention
D. Park, et al., “SpecAugment: A simple data
augmentation method for automatic
LibriSpeech SWBD

● Non-recurrence structure
○ Inspired by FIR approximation on IIRs
○ Exploits memory blocks
○ Can model long-term dependency, even
without recurrence in its structure
Feedforward Sequential Memory Network
Recurrent Feedback in RNN as IIR
Memory Blocks in FSMN as FIR
S. Zhang, et al.,
”Feedforward
sequential memory
networks without
recurrent feedback",
2015.

● Non-recurrence structure
○ Inspired by FIR approximation on IIRs
○ Exploits memory blocks
○ Can model long-term dependency, even
without recurrence in its structure
Feedforward Sequential Memory Network
Deep FSMN
c-FSMNFSMN
S. Zhang, et al.,
”Deep-FSMN for
large vocabulary
continuous speech
recognition ",
2018.

● Speech-Transformer
○ Transformer applied to Mandarin Chinese
○ With convolution layers on inputs
Multi-Head Self-Attention
L. Dong, et al., “Speech-
Transformer: A no
recurrence sequence-to-
sequence model for speech
recognition", 2018.

● Transformer with convolutions
○ Convolutional contexts applied to inputs, similarly
A. Mohamed, et al.,
“Transformers with
convolutional context for
ASR", 2019.

D. Povey, et al., “A time-
restricted self-attention
layer for ASR", 2018.
● Time-restricted self-attention
○ Left & right contexts restricting the attention mechanism
○ Relative positional encoding
○ Encoder structure only
○ LF-MMI objective

K. Han, et al., “Multi-stride
self-attention for speech
recognition", 2018.

● Self-attention network (SAN) with CTC
○ CTC objective
J. Salazar, et al., “Self-
attention networks for
connectionist temporal
classification in speech
recognition", 2019.

● Attention in speaker verification
○ Just averaged embeddings for a given utterance to make them a
fixed-length representation in the past
○ Applied attention on such embeddings, instead
Deep Speaker Embedding w/ Attention
G. Bhattacharya, et al.,
“Deep speaker
embeddings for short-
duration speaker
verification", 2017.

● Attention in speaker verification
○ Just averaged embeddings for a given utterance to make them a
fixed-length representation in the past
○ Applied attention on such embeddings, instead
● Feedforward networks w/ attention
C. Raffel, et al., “Feed-
forward networks with
attention can solve some
long-term memory
problems", 2017.

● Attentive statistics pooling
○ Appends standard deviation to weighted mean after attention
K. Okabe, et al.,
“Attentive statistics
pooling for deep speaker
embedding", 2018.

S. Zhang, et al., “End-
to-end attention based
text-dependent
speaker verification",
2016.
● Multimodal attention in speaker verification
○ Attention on phonetic and speaker representation for the wake
word “Hey Cortana”
○ Combining keyword spotting with speaker verification

● D-vectors in LSTM
○ Generates embedding through LSTMs
G. Heigold, et al., “End-
to-end text dependent
speaker verification",
2016.

○ Attention applied to get normalized weights for hidden
embedding
F. Chowdhury, et al.,
“Attention-based models for
text-dependent speaker

○ Attention applied to get normalized weights for hidden
embedding
F. Chowdhury, et al.,
“Attention-based models for
text-dependent speaker
Cross-layer Attention Divided-layer Attention

● Self-attentive embedding
○ Extension of x-vector w/ structured self-attention from sentence embedding
Z. Lin, et al., “Structured
self-attentive sentence
embedding", 2017.

● Self-attentive embedding
○ Extension of x-vector w/ structured self-attention
○ Multi-heads
Y. Zhu, et al., “Self-
attentive speaker
embeddings for text-
independent speaker

Challenges: Attention in Online
● Can we attend monotonically?
Attention Monotonic Chunkwise Attention
C. Chui, et al.,
“Monotonic chunkwise
attention", 2018.

Challenges: Speech Frames
● Are they ideal as basic units?
http://jalammar.github.io/illustrated-bert/

● Are they ideal as basic units?
http://jalammar.github.io/illustrated-bert/
https://towardsdatascience.com/deconstructing-bert-part-2-
visualizing-the-inner-workings-of-attention-60a16d86b5c1

● Some effort exist…
○ Multi-resolution of
speech frames in multi-
stream self-attention
○ But, the question
remains…
K. Han, et al., “State-of-the-art speech
recognition using multi-stream self-attention
with dilated 1D convolutions", 2019.

Lots of Areas ATTENDED
● Example
○ Multimodal emotion recognition
J. Li, et al., “Attentive to individual: A
multimodal emotion recognition network
with personalized attention profile", 2019.

References
1. Alex Graves, “Generating sequence with recurrent neural networks,” arXiv:1308.0850 [cs], Aug. 2013.
2. Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio, “Neural machine translation by jointly learning to align and translate,” ICLR, May 2015,
arXiv:1409.0473 [cs], Sep. 2014.
3. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser and Illia Polosukhin, “Attention is all you need”,
arXiv:1706.03762 [cs], June 2017.
4. Rohit Prabhavalkar, Kanishka Rao, Tara N. Sainath, Bo Li, Leif Johnson and Navdeep Jaitly, “A comparison of sequence-to-sequence model for speech
recognition,” Interspeech, Aug. 2017.
5. Amit Das, Jinyu Li, Rui Zhao and Yifan Gong, “Advancing connectionist temporal classification with attention modeling,” ICASSP, April 2018.
6. Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio, “End-to-end continuous speech recognition using attention-based recurrent NN:
First results,” Deep Learning and Representation Learning Workshop @NIPS, Dec. 2014.
7. Jan Chorowski, Dzmitry Bahdanau, Dimitriy Serdyuk, Kyunghyun Cho and Yoshua Bengio, “Attention-based models for speech recognition,” NIPS, Dec. 2015.
8. Dzmitry Bahdanau, Jan Chorowski, Dimitriy Serdyuk, Philemon Brakel and Yoshua Bengio, “End-to-end attention-based large vocabulary speech
recognition,” ICASSP, March 2016.
9. Suyoun Kim, Takaaki Hori and Shinji Watanabe, “Joint CTC-attention based end-to-end speech recognition using multi-task learning,” ICASSP, March 2017.
10. Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey and Tomoki Hayashi, “Hybrid CTC/attention architecture for end-to-end speech recognition,”
Journal of Selected Topics in Signal Processing, vol. 11, no. 8, Dec. 2017.
11. Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin
Chen, Adithya Renduchintala and Tsubasa Ochiai, “ESPnet: End-to-End Speech Processing Toolkit,” Interspeech, Sept. 2018.
12. William Chan, Navdeep Jaitly, Quoc V. Le and Oriol Vinyals, “Listen, attend and spell,” arXiv:1508.01211 [cs], Aug. 2015.
13. Chung-Cheng Chiu, Tara N. Sainath, Yonghui Wu, Rohit Prabhavalkar, Patrick Nguyen, Zhifeng Chen, Anjuli Kannan, Ron J. Weiss, Kanishka Rao, Ekaterina
Gonina, Navdeep Jaitly, Bo Li, Jan Chorowski, Michiel Bacchiani, “State-of-the-art speech recognition with sequence-to-sequence models,” ICASSP, April
2018.
14. Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk and Quoc V. Le, “SpecAugment: A simple data augmentation
method for automatic speech recognition,” Interspeech, Sept. 2019.
REFERENCES

References
15. Albert Zeyer, Kazuki Irie, Ralf Schluter and Hermann Ney, “Improved training of end-to-end attention models for speech recognition,” Interspeech, Sept.
2018.
16. Kazuki Irie, Rohit Prabhavalkar, Anjuli Kannan, Antoine Bruguier, David Rybach and Patrick Nguyen, “On the Choice of modeling unit for sequence-to-
sequence speech recognition,” Interspeech, Sept. 2019.
17. Albert Zeyer, Andre Merboldt, Ralf Schluter and Hermann Ney, “A comprehensive analysis on attention models”, Interpretability and Robustness in Audio,
Speech, and Language Workshop @NIPS, Dec. 2018.
18. Liang Lu, Xingxing Zhang and Steve Renals, “On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech
recognition,” ICASSP, March 2016.
19. Shubham Toshniwal, Hao Tang, Liang Lu and Karen Livescu, “Multitask learning with low-level auxiliary tasks for encoder-decoder based speech
recognition,” Interspeech, Aug. 2017.
20. Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu, “Improving attention based sequence-to-sequence models for end-to-
end English conversational speech recognition,” Interspeech, Sept. 2018.
21. Shiliang Zhang, Hui Jiang, Si Wei and Lirong Dai, “Feed- forward sequential memory neural networks without recurrent feedback,” arXiv:1510.02693 [cs],
Oct. 2015.
22. Shiliang Zhang, Cong Liu, Hui Jiang, Si Wei, Lirong Dai and Yu Hu, “Feedforward sequential memory networks: A new structure to learn long-term
dependency,” arXiv:1512.08301 [cs], Dec. 2015.
23. Shiliang Zhang, Hui Jiang, Shifu Xiong, Si Wei and Li-Rong Dai, “Compact feedforward sequential memory networks for large vocabulary continuous
speech recognition,” Interspeech, Sept. 2016.
24. Shiliang Zhang, Ming Lei, Zhijie Yan and Lirong Dai, “Deep-FSMN for large vocabulary continuous speech recognition,” arXiv:1803.05030 [cs], March 2018.
25. Xuerui Yang, Jiwei Li and Xi Zhou, “A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition”, arXiv:1810.11352 [cs], Oct. 2018.
26. Linhao Dong, Shuang Xu and Bo Xu, “Speech-Transformer: A no-recurrence sequence-to-sequence model for speech recognition,” ICASSP, April 2018.
27. Shiiyu Zhou, Linhao Dong, Shuang Xu and Bo Xu, “A comparison of modeling units in sequence-to-sequence speech recognition with the Transformer on
Mandarin Chinese,” arXiv:1805.06239 [cs], May 2018.
28. Shiyu Zhou, Linhao Dong, Shuang Xu and Bo Xu, “Syllable-based sequence-to-sequence speech recognition with the Transformer in Mandarin Chinese,”
Interspeech, Sept. 2018.
REFERENCES

References
29. Abdelrahman Mohamed, Dmytro Okhonko and Lukr Zettlemoyer, “Transformers with convolutional context for ASR,” arXiv:1904.11660 [cs], April 2019.
30. Daniel Povey, Hossein Hadian, Pegah Ghahremani, Ke Li and Sanjeev Khudanpur, ”A time-restricted self-attention layer for ASR,” ICASSP, April 2018.
31. Kyu J. Han, Jing Huang, Yun Tang, Xiaodong He and Bowen Zhou, “Multi-stride self-attention for speech recognition,” Interspeech, Sept. 2019.
32. Julian Salazar, Katrin Kirchhoff and Zhiheng Huang, ”Self-attention networks for connectionist temporal classification in speech recognition,” ICASSP, May
2019.
33. Shaoshi Ling, Julian Salazar and Katrin Kirchhoff, “Contextual phonetic pretraining for end-to-end utterance-level language and speaker recognition,”
Interspeech, Sept. 2019.
34. Yuanyuan Zhao, Jie Li, Xiaorui Wang and Yan Li, “The Speechtransformer for large-scale Mandarin Chinese speech recognition,” ICASSP, May, 2019.
35. Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stuker and Alex Waibel, “Self-attention acoustic models,” Interspeech, Sept. 2018.
36. Ngoc-Quan Pham, Thai-Son Nguyen, Jan Niehues, Markus Muller, Sebastian Stuker and Alex Waibel, “Very deep self-attention networks for end-to-end
speech recognition,” Interspeech, Sept. 2019.
37. Dong Yu and Jinyu Li, “Recent progress in deep learning based acoustic models (updated),” IEEE/CAA Journal of Automatica Sinica, vol. 4, no. 3, 2017.
38. Gautam Bhattacharya, Jahangir Alam and Patrick Kenny, “Deep speaker embeddings for short-duration speaker verification,” Interspeech, Aug. 2017.
39. Colin Raffel and Daniel P. W. Ellis, “Feed-forward networks with attention can solve some long-term memory problems,” ICLR, May 2015.
40. Koji Okabe, Takafumi Koshinaka and Koichi Shinoda, “Attentive statistics pooling for deep speaker embedding,” Interspeech, Sept. 2018.
41. Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li and Yifan Gong, “End-to-end attention based text-dependent speaker verification,” SLT, Dec. 2016.
42. Georg Heigold, Ignacio Moreno, Samy Bengio and Noam Sharzeer, “End-to-end text dependent speaker verification,” ICASSP, March 2016.
43. F. A. Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno and Li Wan, “Attention-based models for text-dependent speaker verification,”
arXiv:11710.10470 [cs], Oct. 2017.
44. Yann N. Dauphin, Angela Fan, Michael Auli and David Grangier, “Language modeling with gated convolutional networks,” arXiv:1612.08083 [cs], Dec. 2016.
45. Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou and Yoshua Bengio, “A structured self-attentive sentence
embedding,” ICLR, April 2017.
46. Yingke Zhu, Tom Ko, David Snyder, Brian Mak, Daniel Povey, “Self-attentive speaker embedding for text-independent speaker verification,” Interspeech,
Sept. 2018.
REFERENCES

References
47. Qiongqiong Wang, Koji Okabe, Kong Aik Lee, Hitoshi Yamamoto and Takafumi Koshinaka, “Attention mechanism in speaker recognition: What does it learn
in deep speaker embedding?,” SLT, Dec. 2018.
48. Chung-Cheng Chiu and Colin Raffel, “Monotonic chunkwise attention,” ICLR, May 2018.
49. Kyu J. Han, Ramon Prieto and Tao Ma, “State-of-the-art speech recognition using multi-stream self-attention with dilated 1D convolutions,” ASRU, Dec.
2019.
50. Jeng-Lin Li and Chi-Chun Lee, “Attentive to individual: A multimodal emotion recognition network with personalized attention profile,” Interspeech, Sept.
2019.
REFERENCES

Interspeech 2019 Survey Talk: When Attention Meets Speech Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Interspeech 2019 Survey Talk: When Attention Meets Speech Applications

Similar to Interspeech 2019 Survey Talk: When Attention Meets Speech Applications (20)

Recently uploaded

Recently uploaded (20)

Interspeech 2019 Survey Talk: When Attention Meets Speech Applications