19. 入力層での微小変動は層を経るごとに
減少しやすい [Yu+, 2013]
[2/2] 深さの重要性
[Yu+, 2013] D. Yu, et al., “Feature Learning in Deep Neural Networks - A Study on Speech
Recognition Tasks,” ICLR 2013.
1+lhlh
19
20. ll δh + 1+lh
入力層での微小変動は層を経るごとに
減少しやすい [Yu+, 2013]
lh
[2/2] 深さの重要性
[Yu+, 2013] D. Yu, et al., “Feature Learning in Deep Neural Networks - A Study on Speech
Recognition Tasks,” ICLR 2013.
はどう変化するか?1+lh
20
25. Robustness remains as a major challenge
in the deep learning acoustic model
[Huang+, 2014]
[Huang+, 2013] Y. Huang, “A Comparative Analytic Study on the Gaussian
Mixture and Context Dependent Deep Neural Network Hidden Markov Models,”
Interspeech 2014
25
31. HMM/Alignment DNN input
table-top table-top
headset headset
%WER
43.1
26.4
[Yoshioka+, 2015] T. Yoshioka and M. J. F. Gales, “Environmentally robust ASR front-end
for deep neural networkacoustic models,” CSL, 2015
[Yoshioka+, 2015]
31
32. HMM/Alignment DNN input
table-top table-top
headset headset
headset table-top
%WER
43.1
26.4
41.3
この差がDNN単体の
ロバストネス(のなさ)
[Yoshioka+, 2015]
32
[Yoshioka+, 2015] T. Yoshioka and M. J. F. Gales, “Environmentally robust ASR front-end
for deep neural networkacoustic models,” CSL, 2015
47. 線形層挿入アプローチ
SIモデル
LIN/FDLR LHN LHUC
LIN: Linear Input Network; FDLR: Feature-space Discriminative Linear Regreession
LHN: Linear Hidden Network; LHUC: Linear Hidden Unit Contribution 47
59. MelFB波形 状態尤度
Clean
training
Noise adaptive
training
w/o DAE 40.6 % 9.6 %
23.2 % 10.7 %w/ DAE
[CHiME-1]
Results from [Araki+, 2014] S. Araki, et al., “Exploring multi-channel features for
denoising-autoencoder-based speech enhancement,” ICASSP, 2015
degradeした
どうしよう…
59
60. MelFB波形 状態尤度
Integrated DAE[Narayanan+, 2014]
[Narayanan+, 2014] A. Narayanan and D. Wang, “Investigation of Speech Separation
as a Front-End for Noise Robust Speech Recognition, ” IEEE T. ASLP, 2014 60
61. MelFB波形 状態尤度
Integrated DAE[Narayanan+, 2014]
音響モデルとは違うことをする
• 違うモデルを使う[Weninger+, 2013]
• 違う特徴量を使う
[Weninger+, 2014] F. Weninger, et al., “The Munich feature enhancement approach to the 2nd
CHiME challenge using BLSTM recurrent neural networks,” CHiME-2, 2013 61
62. 音素特徴量
[Mimura et al., 2015]
空間特徴量
[Araki+, 2015]
[Mimura+, 2015] M. Mimura, et al., “Deep autoencoders augmented with phone-class feature
for reverberant speech recognition,” ICASSP, 2015
違う特徴量を使う
62