Presentation slide for AI seminar at Artificial Intelligence Research Center, The National Institute of Advanced Industrial Science and Technology, Japan.
URL (in Japanese): https://www.airc.aist.go.jp/seminar_detail/seminar_046.html
A Brief Introduction of Anomalous Sound Detection: Recent Studies and Future Prospects
1. Yuma Koizumi
異常音検知の現状と展望
A Brief Introduction of Anomalous Sound Detection:
Recent Studies and Future Prospects
人工知能セミナー@産業総合研究所
AI seminar @ AIRC, AIST
15:00-17:00, Feb. 26th, 2021
2. Proprietary + Confidential
Special thanks
❏ Former colleagues at NTT Laboratories
❏ Dr. Kunio Kashino, Dr. Noboru Harada, Dr. Hisashi Uematsu, Akira Nakagawa,
Shoichiro Saito, Dr. Yasunori Ohishi, Daisuke Niizumi, Yuta Kawachi, Masataka
Yamaguchi, Masahiro Yasuda, Daiki Takeuchi, Luc Forget, Luca Mazzon, and
more...
❏ DCASE Challenge task co-organizers
❏ Dr. Yohei Kawaguchi, Dr. Harsh Purohit, Toshiki Nakamura, Yuki Nikaido, Ryo
Tanabe Kaori Suefusa, Takashi Endo (Hitachi, Ltd.) and Dr. Keisuke Imoto
(Doshisha University)
3. Proprietary + Confidential
Self-introduction
❏ Name: Yuma Koizumi (小泉 悠馬)
❏ Nov. 2020 - Current Research Scientist at Google Research
❏ Apr. 2014 - Nov. 2020 Research Scientist at NTT Media Intelligence Laboratories
❏ Ph.D degree, the University of Electro-Communications, Sept. 2017
❏ M.S. degree, Hosei University, Mar. 2014
❏ Research Topics
❏ Speech enhancement
❏ Anomalous sound detection (ASD)
❏ Audio captioning (1st place DCASE 2020 Challenge!)
9. Proprietary + Confidential
What is anomaly??
❏ Anomaly
❏ Something that is noticeable because it is different from what is usual [1]
❏ Anomalies are patterns in data that do not conform to a well-defined
notion of normal behavior [2]
[1] Longman Dictionary of Contemporary English
[2] V. Chandola, et al., “Anomaly detection: A survey,” ACM compt. Surv., 2009
anomaly = not normal
13. Proprietary + Confidential
Purpose of ASD
Anomalous sounds may have been caused
by dangerous events
Prompt detection of anomalous sound
for preventing the worst case
14. Proprietary + Confidential
❏ DCASE 2020 Challenge Task [Link]
❏ Upcoming task of DCASE Challenge 2021! [Link]
Research hot topic
16. Proprietary + Confidential
OK, I know deep learning!
I'll train deep classifier for A(x)!
Calm down!
Let's figure out the problem
17. Proprietary + Confidential
“Known” and “Unknown” anomalies
Number of training samples of target events
Environmental sound
detection & classification
18. Proprietary + Confidential
“Known” and “Unknown” anomalies
Number of training samples of target events
Massive
Baby crying Gunshot
Often called as anomalous sound detection
Mechanical failure
Sound event
detection
Car
Speech
Trumpet
...
19. Proprietary + Confidential
“Known” and “Unknown” anomalies
Number of training samples of target events
Massive
Often called as anomalous sound detection
Mechanical failure
Gear failure Engine failure Pomp failure
and more...
Difficult to collect
target anomalies
Impossible to collect
exhaustive patterns of anomalies
Sound event
detection
Car
Speech
Trumpet
...
Baby crying Gunshot
20. Proprietary + Confidential
“Known” and “Unknown” anomalies
Number of training samples of target events
Massive Few Zero-resource
Rare sound event detection Unsupervised
anomalous sound detection
Often called as anomalous sound detection
Mechanical failure
Gear failure Engine failure Pomp failure
and more...
Difficult to collect
target anomalies
Impossible to collect
exhaustive patterns of anomalies
Detecting unknown anomalies
without anomalous samples
Detecting known anomalies
using few anomalous samples
Sound event
detection
Car
Speech
Trumpet
...
Baby crying Gunshot
21. Proprietary + Confidential
“Known” and “Unknown” anomalies
Number of training samples of target events
Massive Few Zero-resource
Rare sound event detection Unsupervised
anomalous sound detection
Often called as anomalous sound detection
Mechanical failure
Gear failure Engine failure Pomp failure
and more...
Difficult to collect
target anomalies
Impossible to collect
exhaustive patterns of anomalies
Detecting unknown anomalies
without anomalous samples
Detecting known anomalies
using few anomalous samples
Sound event
detection
Car
Speech
Trumpet
...
Baby crying Gunshot
Today’s topic
23. Proprietary + Confidential
❏ Anomalous sound detection for machine condition monitoring
Application example
Impossible to deliberately make exhaustive patterns of mechanical failure
25. Proprietary + Confidential
Typical task setup
❏ Only normal samples are provided as training data!!
❏ DCASE 2020 Challenge Task 2: ToyADMOS [Koizumi+, 2019] & MIMII [Purohit+, 2019]
[Koizumi+, 2019]: Y. Koizumi, et al., “ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection,” Proc. of WASPAA, 2019.
[Purohit+, 2019]: H. Purohit, et al., “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” Proc. of DCASE Workshop, 2019.
6 machine types
(4+3) machine ID
Training data:
around 1000 samples of
10 sec normal sounds
27. Proprietary + Confidential
No anomalous samples?!
Normal
Anomaly
Label
(only normal)
Estimate
DNN
How can anomalies be detected
without anomalous training data?
Unsupervised ASD
30. Proprietary + Confidential
Outlier detection
❏ Normal: a subset of various sounds (full set)
❏ Anomaly: complement of normal
: various sounds
31. Proprietary + Confidential
Outlier detection
: various sounds
: given normal sounds
❏ Normal: a subset of various sounds (full set)
❏ Anomaly: complement of normal
32. Proprietary + Confidential
Outlier detection
: given normal sounds
: unknown sounds
= anomalous sounds
❏ Normal: a subset of various sounds (full set)
❏ Anomaly: complement of normal
33. Proprietary + Confidential
❏ Auto-encoder [Marchi+, 2015]
❏ Anomaly score = reconstruction error
❏ Auto-encoder is trained to reconst normal samples
How to model “normal”?
Enc Dec
Anomaly score
[Marchi+, 2015]: E. Marchi, et al., “A Novel Approach for Automatic Acoustic Novelty Detection using a Denoising Autoencoder with
Bidirectional LSTM Neural Networks,” Proc. of ICASSP, 2015.
Time
Frequency
Spectrogram
35. Proprietary + Confidential
Problem on auto-encoder
❏ Cost function does not mean that anomalies are not reconstructed
Normal training samples
2 2
2
2
Train
2 2
Boltzmann distribution
False negative
= overlooking
36. Proprietary + Confidential
Solutions
❏ Simulating anomalous sound
❏ Rejection sampling [Koizumi+, TASLP 2019]
❏ Batch uniformalization + add small another sound [Koizumi+, WASPAA 2019]
❏ Outlier expose [Hendrycks+, 2019]-like approach
❏ Classification of target machine and other individuals [Many DCASE
challenge submissions]
[Koizumi+, TASLP 2019]: Y. Koizumi, et al., “Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,” IEEE TASLP, 2019.
[Koizumi+, WASPAA 2019]: Y. Koizumi, et al., “Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds,” Proc. of
WASPAA, 2019.
[Hendrycks+, 2019]: D. Hendrycks, et al., “Deep Anomaly Detection with Outlier Exposure,” Proc. of ICLR, 2019.
How to increase A(x) of anomalies?
37. Proprietary + Confidential
Simulating anomalous sound
Cost =
1. Decreasing anomaly score for normal sounds &
2. Increasing anomaly score for simulated anomalous sounds
39. Proprietary + Confidential
Simulating anomalous sound
Cost =
1. Decreasing anomaly score for normal sounds &
2. Increasing anomaly score for simulated anomalous sounds
How to simulate anomalous sounds?
40. Proprietary + Confidential
Rejection sampling of anomalous sound
❏ Remember that “anomaly” is complement of normal
❏ Generate a sample from PDF of various sounds p(x)
❏ Accept it as “anomaly” when p(x | state=normal) is low
: various sounds
: given normal sounds
[Koizumi+, TASLP 2019]: Y. Koizumi, et al., “Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,” IEEE TASLP, 2019.
41. Proprietary + Confidential
Add small another sound
❏ Remember that “anomaly” is “not normal”
Normal sound + a collision sound = Anomalous sound
Normal sound + some rubbing sounds = Anomalous sound
Normal sound + clicking noise = Anomalous sound
Normal sound + something-else sound = Anomalous sound
[Koizumi+, WASPAA 2019]: Y. Koizumi, et al., “Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds,” Proc. of
WASPAA, 2019.
42. Proprietary + Confidential
❏ However…, often becomes "The Boy Who Cried Wolf"
❏ Rare normal sounds are identified as anomalies
❏ Weighting A(x) of normal sound by reciprocal of its probability
+ Batch-uniformalization
[Koizumi+, WASPAA 2019]: Y. Koizumi, et al., “Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds,” Proc. of
WASPAA, 2019.
Decrease A(x) of normals
especially rare normals
Increase A(x) of
simulated anomalies
43. Proprietary + Confidential
Toy example (cf. Problem on auto-encoder)
Normal training samples
2 2
2
2
Train
2 2
Boltzmann distribution
❏ Able to distinguish rare normals and anomalies
44. Proprietary + Confidential
But… what dense & ad-hoc method...
How to select “something-else sound”?
Criteria for select them?
More computationally efficient way?
46. Proprietary + Confidential
Outlier expose-like approach
❏ Outlier detection → Classification
❏ Classification of target machine and other individuals
6 machine types
(4+3) machine ID
Around 1000
samples of 10 sec
normal sounds
Recap: DCASE 2020 Challenge dataset
47. Proprietary + Confidential
❏ DNN solves machine ID identification instead of outlier-detection
Basic idea
Time
Frequency
Type: Valve
ID: 01
Training sample
Time
Frequency
Pump, ID01
Pump, ID02
Pump, ID03
Slide rail, ID07
Valve, ID01
Valve, ID02
...
...
DNN
Training
e.g. cross-entropy
48. Proprietary + Confidential
❏ DNN solves machine ID identification instead of outlier-detection
Basic idea (cont’d)
Time
Frequency
Type: Valve
ID: 01
Test sample
Time
Frequency
Pump, ID01
Pump, ID02
Pump, ID03
Slide rail, ID07
Valve, ID01
Valve, ID02
...
...
DNN
Test
Thresholding
Anomaly
Normal Anomaly score
49. Proprietary + Confidential
❏ DNN solves machine ID identification instead of outlier-detection
Basic idea (cont’d)
Auto-encoder Anomaly simulation Outlier-expose-like
50. Proprietary + Confidential
Target labels for classification-ASD
Which labels should be classification target?
❏ No answers yet, but many attempts have been made:
❏ Machine ID identification: [Giri+], [Primus+], [Zhou], [Lopez+]
❏ Machine Type & other datasets identification: [Primus+]
❏ Data augmentation identification: [Giri+], [Inoue/Vinayavekhin+]
Top-performing teams developed their own
methods independently
[Giri+]: R. Giri, et al., “Self-Supervised Classification for Detecting Anomalous Sounds,” Proc of DCASE Workshop, 2020
[Primus+]: P. Primus, et al., “Anomalous Sound Detection as a Simple Binary Classification Problem with Careful Selection of Proxy Outlier Examples,” Proc of DCASE
Workshop, 2020
[Inoue/Vinayavekhin+]: T. Inoue, P. Vinayavekhin, et al., “Detection of Anomalous Sounds for Machine Condition Monitoring using Classification Confidence” Proc of DCASE
Workshop, 2020
[Zhou]: Q. Zhou, “ARCFACE BASED SOUND MOBILENETS FOR DCASE 2020 TASK 2 ,” Tech. Report, DCASE Challenge 2020.
[Lopez+]: J. A. Lopez, “A SPEAKER RECOGNITION APPROACH TO ANOMALY DETECTION ,” Tech. Report, DCASE Challenge 2020.
51. Proprietary + Confidential
❏ Training fails in extremely easy/difficult classification cases
❏ Normal sounds of two individuals are exactly same or completely different
❏ Impossible to determine boundary between target normal and other sounds
Problems on classification-ASD
Due to this problem, although some teams achieved high scores on several machine
types, they dropped in ranks owing to relatively low Toy-conveyor scores
This problem can be a good start point to answer the next research question:
"which labels should be classification target?"
52. Proprietary + Confidential
Wanna try unsupervised ASD?
Try DCASE 2020 Challenge Task 2!!
Baseline system and dataset are available
http://dcase.community/challenge2020/task-unsupervised-detection-of-anomalous-sounds
54. Proprietary + Confidential
System is not perfect
❏ Two types of “mis-detection”
False-positive (Type I error) False-negative (Type II error)
❏ Normal → Anomaly
❏ Frequently occurs
❏ Often caused by changes in normal
condition
❏ Anomaly → Normal
❏ Rarely occurs, but critical problem
This section Next section
55. Proprietary + Confidential
❏ In practice, “the normal state” is not always constant
❏ Changes in engine speed due to changes in production products
❏ Seasonal variation (e.g. sound speed, noise, and more...)
❏ Accidentally changed microphone position
❏ and more…
❏ It results in making “false alert”
= Normal is mistakenly identified as anomaly
Domain shift problem
56. Proprietary + Confidential
❏ Need to update ASD system immediately
Few-shot model adaptation
Normal
DNN Normal
Normal
Normal
Old domain (source) New domain (target)
Massive training data + trained model Few training samples
57. Proprietary + Confidential
❏ AdaFlow [Yamaguchi+, 2019]
❏ Normalizing flow + adaptive batch normalization
❏ Assuming low computational resource (e.g. edge device)
❏ DNN update w/o backpropagation
Model adaptation for ASD
[Yamaguchi+, 2019]: M. Yamaguchi, et al., “AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Translation,”
Proc. of ICASSP, 2019.
Normal
ID: 01
Normal
ID: 02
Normal
ID: 03
BN
BN
BN
BN
BN
BN
BN
BN
BN
Training
58. Proprietary + Confidential
❏ AdaFlow [Yamaguchi+, 2019]
❏ Normalizing flow + adaptive batch normalization
❏ Assuming low computational resource (e.g. edge device)
❏ DNN update w/o backpropagation
Model adaptation for ASD
[Yamaguchi+, 2019]: M. Yamaguchi, et al., “AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Translation,”
Proc. of ICASSP, 2019.
BN
BN
BN
Adaptation
Normal
Freeze
Update mean & var. params
62. Proprietary + Confidential
❏ Overlooked anomalies
❏ Critical problem!!
❏ Need to update system immediately
❏ Correctly detected anomalies
❏ Well done!!
❏ Do we have room to improve system using obtained anomalies?
Sometimes we can get anomalies
65. Proprietary + Confidential
Still not two class classification
Overlook other
types of anomalies
Remember, we cannot collect exhaustive patterns of anomalies
66. Proprietary + Confidential
❏ E.g. density ratio-based classification
Why discriminative training is bad?
Normal
Given anomaly
Anomaly > Normal
Remember, we cannot collect exhaustive patterns of anomalies
67. Proprietary + Confidential
❏ Few-shot anomalies
❏ +Memory-based few-shot detector [Koizumi+, 2019], [Koizumi+, 2020]
❏ Enough amount of anomalies
❏ Complementary set VAE: estimating PDF of “complement of normal”
[Kawachi+, 2018], [Kawachi+, 2019]
Training strategies
[Koizumi+, 2019]: Y. Koizumi, et al., “SNIPER: Few-shot Learning for Anomaly Detection to Minimize False-Negative Rate with Ensured True-Positive Rate,” Proc. of ICASSP,
2019.
[Koizumi+, 2020]: Y. Koizumi, et al., “SPIDERnet: Attention Network for One-shot Anomaly Detection in Sounds,” Proc. of ICASSP, 2020.
[Kawachi+, 2018]: Y. Kawachi, et al., “Complementary Set Variational AutoEncoder for Supervised Anomaly Detection,” Proc. of ICASSP, 2018.
[Kawachi+, 2019]: Y. Kawachi, et al., “A Two-Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,” Proc. of ICASSP, 2019.
68. Proprietary + Confidential
❏ Increase A(x) when input is similar to memorized anomalies
+Few-shot learning
[Koizumi+, 2019]: Y. Koizumi, et al., “SNIPER: Few-shot Learning for Anomaly Detection to Minimize False-Negative Rate with Ensured True-Positive Rate,” Proc. of ICASSP,
2019.
[Koizumi+, 2020]: Y. Koizumi, et al., “SPIDERnet: Attention Network for One-shot Anomaly Detection in Sounds,” Proc. of ICASSP, 2020.
69. Proprietary + Confidential
❏ Complementary set PDF [Kawachi+, 2018], [Kawachi+, 2019]
Normal vs. Complement
[Kawachi+, 2018]: Y. Kawachi, et al., “Complementary Set Variational AutoEncoder for Supervised Anomaly Detection,” Proc. of ICASSP, 2018.
[Kawachi+, 2019]: Y. Kawachi, et al., “A Two-Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,” Proc. of ICASSP, 2019.
Normal
Complement
Anomaly > Normal
70. Proprietary + Confidential
Complementary set VAE [Kawachi+, 2018]
[Kawachi+, 2018]: Y. Kawachi, et al., “Complementary Set Variational AutoEncoder for Supervised Anomaly Detection,” Proc. of ICASSP, 2018.
Normal
Complement
Anomaly > Normal
Cost = Reconstruction error +
Likelihood of normal
Likelihood of complement
❏ Switch hidden space prior in VAD according to label
71. Proprietary + Confidential
Toy example [Kawachi+, 2018]
[Kawachi+, 2018]: Y. Kawachi, et al., “Complementary Set Variational AutoEncoder for Supervised Anomaly Detection,” Proc. of ICASSP, 2018.
❏ MNIST example
❏ Normal: 0-8
❏ Anomaly: 9
❏ Visualizing hidden space
❏ Since normal prior is Gaussian,
0-8 have been placed around
center of hidden space
Anomaly
Normal
74. Proprietary + Confidential
Anomaly detected!!
...but where…?
Where is anomaly??
Photo by Magda Ehlers from Pexels
+sound localization
Localization is also tackled
in DCASE Challenge [Link]
76. Proprietary + Confidential
How anomalous??
Anomaly detected!!
...but how…?
Photo by Andrea Piacquadio from Pexels
+audio captioning
Captioning is also tackled
in DCASE Challenge [Link]
High frequency rubbing noise.
It might be an anomaly in bearing.
77. Proprietary + Confidential
Conclusion
❏ Interesting and “tasty” problems
❏ Outlier-detection? Classification?
❏ Even the definition of the problem is uncertain
❏ Blue ocean
❏ Many unsolved problems and DCASE Challenge
❏ Domain adaptation, few-shot learning...
❏ Frontier
❏ Practically important but undeveloped research field
❏ Combining other DCASE tasks
81. Proprietary + Confidential
Problem on auto-encoder (supplement)
❏ Reconstruction error and energy of Boltzmann distribution
❏ MMSE-based training ignores normalizing constant
KL-div. between
PDF of normal
MMSE Constraint for increasing total anomaly score
= Increasing anomaly score of unknown samples