A Brief Introduction of Anomalous Sound Detection: Recent Studies and Future Prospects

Yuma Koizumi
異常音検知の現状と展望
A Brief Introduction of Anomalous Sound Detection:
Recent Studies and Future Prospects
人工知能セミナー＠産業総合研究所
AI seminar @ AIRC, AIST
15:00-17:00, Feb. 26th, 2021

Proprietary + Conﬁdential
Special thanks
❏ Former colleagues at NTT Laboratories
❏ Dr. Kunio Kashino, Dr. Noboru Harada, Dr. Hisashi Uematsu, Akira Nakagawa,
Shoichiro Saito, Dr. Yasunori Ohishi, Daisuke Niizumi, Yuta Kawachi, Masataka
Yamaguchi, Masahiro Yasuda, Daiki Takeuchi, Luc Forget, Luca Mazzon, and
more...
❏ DCASE Challenge task co-organizers
❏ Dr. Yohei Kawaguchi, Dr. Harsh Purohit, Toshiki Nakamura, Yuki Nikaido, Ryo
Tanabe Kaori Suefusa, Takashi Endo (Hitachi, Ltd.) and Dr. Keisuke Imoto
(Doshisha University)

Self-introduction
❏ Name: Yuma Koizumi (小泉悠馬)
❏ Nov. 2020 - Current Research Scientist at Google Research
❏ Apr. 2014 - Nov. 2020 Research Scientist at NTT Media Intelligence Laboratories
❏ Ph.D degree, the University of Electro-Communications, Sept. 2017
❏ M.S. degree, Hosei University, Mar. 2014
❏ Research Topics
❏ Speech enhancement
❏ Anomalous sound detection (ASD)
❏ Audio captioning (1st place DCASE 2020 Challenge!)

ASD & me
DCASE 2020 Challenge

ASD & me
DCASE 2021 Challenge

01
Agenda
02
03
04
05
Overview of ASD
Unsupervised ASD
+Domain shift
+Anomalous samples
Future Prospects

Anomaly detection
...what is “anomaly”?

What is anomaly??
❏ Anomaly
❏ Something that is noticeable because it is different from what is usual [1]
❏ Anomalies are patterns in data that do not conform to a well-defined
notion of normal behavior [2]
[1] Longman Dictionary of Contemporary English
[2] V. Chandola, et al., “Anomaly detection: A survey,” ACM compt. Surv., 2009
anomaly = not normal

Anomalous sounds
Gun shot
Photo by Alejo Reinoso
on Unsplash

Anomalous sounds
Baby crying
Photo by Marcos Paulo Prado
on Unsplash

Anomalous sounds
Mechanical failure
Photo by Ant Rozetsky
on Unsplash
Normal
Anomaly

Purpose of ASD
Anomalous sounds may have been caused
by dangerous events
Prompt detection of anomalous sound
for preventing the worst case

❏ DCASE 2020 Challenge Task [Link]
❏ Upcoming task of DCASE Challenge 2021! [Link]
Research hot topic

Implementation
Anomalous score
calculator
e.g. DNN
Thresholding
Anomaly
Normal
high
low
Anomaly score
e.g.
mel-spectrogram

OK, I know deep learning!
I'll train deep classifier for A(x)!
Calm down!
Let's figure out the problem

“Known” and “Unknown” anomalies
Number of training samples of target events
Environmental sound
detection & classification

Massive
Baby crying Gunshot
Often called as anomalous sound detection
Mechanical failure
Sound event
detection
Car
Speech
Trumpet
...

Massive
Mechanical failure
Gear failure Engine failure Pomp failure
and more...
Difficult to collect
target anomalies
Impossible to collect
exhaustive patterns of anomalies
Sound event
detection
Car
Speech
Trumpet
...
Baby crying Gunshot

Massive Few Zero-resource
Rare sound event detection Unsupervised
anomalous sound detection
Mechanical failure
and more...
target anomalies
Detecting unknown anomalies
without anomalous samples
Detecting known anomalies
using few anomalous samples
Sound event
detection
Car
Speech
Trumpet
...
Baby crying Gunshot

Massive Few Zero-resource
Rare sound event detection Unsupervised
anomalous sound detection
Mechanical failure
and more...
target anomalies
Detecting unknown anomalies
without anomalous samples
Detecting known anomalies
using few anomalous samples
Sound event
detection
Car
Speech
Trumpet
...
Baby crying Gunshot
Today’s topic

❏ Anomalous sound detection for machine condition monitoring
Application example
Impossible to deliberately make exhaustive patterns of mechanical failure

DCASE 2020 Challenge Task 2

Typical task setup
❏ Only normal samples are provided as training data!!
❏ DCASE 2020 Challenge Task 2: ToyADMOS [Koizumi+, 2019] & MIMII [Purohit+, 2019]
[Koizumi+, 2019]: Y. Koizumi, et al., “ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection,” Proc. of WASPAA, 2019.
[Purohit+, 2019]: H. Purohit, et al., “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” Proc. of DCASE Workshop, 2019.
6 machine types
(4+3) machine ID
Training data:
around 1000 samples of
10 sec normal sounds

No anomalous samples?!
Normal
Anomaly
Label
Estimate
DNN
Supervised ASD
Cross-entropy

No anomalous samples?!
Normal
Anomaly
Label
(only normal)
Estimate
DNN
How can anomalies be detected
without anomalous training data?
Unsupervised ASD

Before DCASE 2020
Outlier-detection
&
Autoencoder

Outlier detection
Learn what “normal” is
&
Detect “not normal”

Outlier detection
❏ Normal: a subset of various sounds (full set)
❏ Anomaly: complement of normal
: various sounds

Outlier detection
: various sounds
: given normal sounds

Outlier detection
: unknown sounds
= anomalous sounds

❏ Auto-encoder [Marchi+, 2015]
❏ Anomaly score = reconstruction error
❏ Auto-encoder is trained to reconst normal samples
How to model “normal”?
Enc Dec
Anomaly score
[Marchi+, 2015]: E. Marchi, et al., “A Novel Approach for Automatic Acoustic Novelty Detection using a Denoising Autoencoder with
Bidirectional LSTM Neural Networks,” Proc. of ICASSP, 2015.
Time
Frequency
Spectrogram

Problem on auto-encoder
Anomalies cannot be reconstructed?

Problem on auto-encoder
❏ Cost function does not mean that anomalies are not reconstructed
Normal training samples
2 2
2
2
Train
2 2
Boltzmann distribution
False negative
= overlooking

Solutions
❏ Simulating anomalous sound
❏ Rejection sampling [Koizumi+, TASLP 2019]
❏ Batch uniformalization + add small another sound [Koizumi+, WASPAA 2019]
❏ Outlier expose [Hendrycks+, 2019]-like approach
❏ Classification of target machine and other individuals [Many DCASE
challenge submissions]
[Koizumi+, TASLP 2019]: Y. Koizumi, et al., “Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,” IEEE TASLP, 2019.
[Koizumi+, WASPAA 2019]: Y. Koizumi, et al., “Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds,” Proc. of
WASPAA, 2019.
[Hendrycks+, 2019]: D. Hendrycks, et al., “Deep Anomaly Detection with Outlier Exposure,” Proc. of ICLR, 2019.
How to increase A(x) of anomalies?

Simulating anomalous sound
Cost =
1. Decreasing anomaly score for normal sounds &
2. Increasing anomaly score for simulated anomalous sounds

Auto-encoder Anomaly simulation Outlier-ex

Cost =
1. Decreasing anomaly score for normal sounds &
2. Increasing anomaly score for simulated anomalous sounds
How to simulate anomalous sounds?

Rejection sampling of anomalous sound
❏ Remember that “anomaly” is complement of normal
❏ Generate a sample from PDF of various sounds p(x)
❏ Accept it as “anomaly” when p(x | state=normal) is low
: various sounds
[Koizumi+, TASLP 2019]: Y. Koizumi, et al., “Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma,” IEEE TASLP, 2019.

Add small another sound
❏ Remember that “anomaly” is “not normal”
Normal sound + a collision sound = Anomalous sound
Normal sound + some rubbing sounds = Anomalous sound
Normal sound + clicking noise = Anomalous sound
Normal sound + something-else sound = Anomalous sound
WASPAA, 2019.

❏ However…, often becomes "The Boy Who Cried Wolf"
❏ Rare normal sounds are identified as anomalies
❏ Weighting A(x) of normal sound by reciprocal of its probability
+ Batch-uniformalization
WASPAA, 2019.
Decrease A(x) of normals
especially rare normals
Increase A(x) of
simulated anomalies

Toy example (cf. Problem on auto-encoder)
Normal training samples
2 2
2
2
Train
2 2
Boltzmann distribution
❏ Able to distinguish rare normals and anomalies

But… what dense & ad-hoc method...
How to select “something-else sound”?
Criteria for select them?
More computationally efficient way?

After DCASE 2020
Outlier-detection
&
Autoencoder
Classifier

Outlier expose-like approach
❏ Outlier detection → Classification
❏ Classification of target machine and other individuals
6 machine types
(4+3) machine ID
Around 1000
samples of 10 sec
normal sounds
Recap: DCASE 2020 Challenge dataset

❏ DNN solves machine ID identification instead of outlier-detection
Basic idea
Time
Frequency
Type: Valve
ID: 01
Training sample
Time
Frequency
Pump, ID01
Pump, ID02
Pump, ID03
Slide rail, ID07
Valve, ID01
Valve, ID02
...
...
DNN
Training
e.g. cross-entropy

Basic idea (cont’d)
Time
Frequency
Type: Valve
ID: 01
Test sample
Time
Frequency
Pump, ID01
Pump, ID02
Pump, ID03
Slide rail, ID07
Valve, ID01
Valve, ID02
...
...
DNN
Test
Thresholding
Anomaly
Normal Anomaly score

Basic idea (cont’d)
Auto-encoder Anomaly simulation Outlier-expose-like

Target labels for classification-ASD
Which labels should be classification target?
❏ No answers yet, but many attempts have been made:
❏ Machine ID identification: [Giri+], [Primus+], [Zhou], [Lopez+]
❏ Machine Type & other datasets identification: [Primus+]
❏ Data augmentation identification: [Giri+], [Inoue/Vinayavekhin+]
Top-performing teams developed their own
methods independently
[Giri+]: R. Giri, et al., “Self-Supervised Classification for Detecting Anomalous Sounds,” Proc of DCASE Workshop, 2020
[Primus+]: P. Primus, et al., “Anomalous Sound Detection as a Simple Binary Classification Problem with Careful Selection of Proxy Outlier Examples,” Proc of DCASE
Workshop, 2020
[Inoue/Vinayavekhin+]: T. Inoue, P. Vinayavekhin, et al., “Detection of Anomalous Sounds for Machine Condition Monitoring using Classification Confidence” Proc of DCASE
Workshop, 2020
[Zhou]: Q. Zhou, “ARCFACE BASED SOUND MOBILENETS FOR DCASE 2020 TASK 2 ,” Tech. Report, DCASE Challenge 2020.
[Lopez+]: J. A. Lopez, “A SPEAKER RECOGNITION APPROACH TO ANOMALY DETECTION ,” Tech. Report, DCASE Challenge 2020.

❏ Training fails in extremely easy/difficult classification cases
❏ Normal sounds of two individuals are exactly same or completely different
❏ Impossible to determine boundary between target normal and other sounds
Problems on classification-ASD
Due to this problem, although some teams achieved high scores on several machine
types, they dropped in ranks owing to relatively low Toy-conveyor scores
This problem can be a good start point to answer the next research question:
"which labels should be classification target?"

Wanna try unsupervised ASD?
Try DCASE 2020 Challenge Task 2!!
Baseline system and dataset are available
http://dcase.community/challenge2020/task-unsupervised-detection-of-anomalous-sounds

System is not perfect
❏ Two types of “mis-detection”
False-positive (Type I error) False-negative (Type II error)
❏ Normal → Anomaly
❏ Frequently occurs
❏ Often caused by changes in normal
condition
❏ Anomaly → Normal
❏ Rarely occurs, but critical problem
This section Next section

❏ In practice, “the normal state” is not always constant
❏ Changes in engine speed due to changes in production products
❏ Seasonal variation (e.g. sound speed, noise, and more...)
❏ Accidentally changed microphone position
❏ and more…
❏ It results in making “false alert”
= Normal is mistakenly identified as anomaly
Domain shift problem

❏ Need to update ASD system immediately
Few-shot model adaptation
Normal
DNN Normal
Normal
Normal
Old domain (source) New domain (target)
Massive training data + trained model Few training samples

❏ AdaFlow [Yamaguchi+, 2019]
❏ Normalizing flow + adaptive batch normalization
❏ Assuming low computational resource (e.g. edge device)
❏ DNN update w/o backpropagation
Model adaptation for ASD
[Yamaguchi+, 2019]: M. Yamaguchi, et al., “AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Translation,”
Proc. of ICASSP, 2019.
Normal
ID: 01
Normal
ID: 02
Normal
ID: 03
BN
BN
BN
BN
BN
BN
BN
BN
BN
Training

❏ AdaFlow [Yamaguchi+, 2019]
❏ Normalizing flow + adaptive batch normalization
❏ Assuming low computational resource (e.g. edge device)
❏ DNN update w/o backpropagation
Model adaptation for ASD
[Yamaguchi+, 2019]: M. Yamaguchi, et al., “AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Translation,”
Proc. of ICASSP, 2019.
BN
BN
BN
Adaptation
Normal
Freeze
Update mean & var. params

Important topic! But not much investigated...

DCASE 2021 Challenge Task 2
Stay Tune!!

❏ Overlooked anomalies
❏ Critical problem!!
❏ Need to update system immediately
❏ Correctly detected anomalies
❏ Well done!!
❏ Do we have room to improve system using obtained anomalies?
Sometimes we can get anomalies

Still not two class classification
Normal training
samples
Alright! We got
anomalous samples

Train two class
classifier!!

Overlook other
types of anomalies
Remember, we cannot collect exhaustive patterns of anomalies

❏ E.g. density ratio-based classification
Why discriminative training is bad?
Normal
Given anomaly
Anomaly > Normal
Remember, we cannot collect exhaustive patterns of anomalies

❏ Few-shot anomalies
❏ +Memory-based few-shot detector [Koizumi+, 2019], [Koizumi+, 2020]
❏ Enough amount of anomalies
❏ Complementary set VAE: estimating PDF of “complement of normal”
[Kawachi+, 2018], [Kawachi+, 2019]
Training strategies
[Koizumi+, 2019]: Y. Koizumi, et al., “SNIPER: Few-shot Learning for Anomaly Detection to Minimize False-Negative Rate with Ensured True-Positive Rate,” Proc. of ICASSP,
2019.
[Koizumi+, 2020]: Y. Koizumi, et al., “SPIDERnet: Attention Network for One-shot Anomaly Detection in Sounds,” Proc. of ICASSP, 2020.
[Kawachi+, 2018]: Y. Kawachi, et al., “Complementary Set Variational AutoEncoder for Supervised Anomaly Detection,” Proc. of ICASSP, 2018.
[Kawachi+, 2019]: Y. Kawachi, et al., “A Two-Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,” Proc. of ICASSP, 2019.

❏ Increase A(x) when input is similar to memorized anomalies
+Few-shot learning
[Koizumi+, 2019]: Y. Koizumi, et al., “SNIPER: Few-shot Learning for Anomaly Detection to Minimize False-Negative Rate with Ensured True-Positive Rate,” Proc. of ICASSP,
2019.
[Koizumi+, 2020]: Y. Koizumi, et al., “SPIDERnet: Attention Network for One-shot Anomaly Detection in Sounds,” Proc. of ICASSP, 2020.

❏ Complementary set PDF [Kawachi+, 2018], [Kawachi+, 2019]
Normal vs. Complement
[Kawachi+, 2019]: Y. Kawachi, et al., “A Two-Class Hyper-Spherical Autoencoder for Supervised Anomaly Detection,” Proc. of ICASSP, 2019.
Normal
Complement
Anomaly > Normal

Complementary set VAE [Kawachi+, 2018]
Normal
Complement
Anomaly > Normal
Cost = Reconstruction error +
Likelihood of normal
Likelihood of complement
❏ Switch hidden space prior in VAD according to label

Toy example [Kawachi+, 2018]
❏ MNIST example
❏ Normal: 0-8
❏ Anomaly: 9
❏ Visualizing hidden space
❏ Since normal prior is Gaussian,
0-8 have been placed around
center of hidden space
Anomaly
Normal

Anomaly detected!!
...but where…?
Where is anomaly??
Photo by Magda Ehlers from Pexels

Anomaly detected!!
...but where…?
Where is anomaly??
Photo by Magda Ehlers from Pexels
+sound localization
Localization is also tackled
in DCASE Challenge [Link]

How anomalous??
Anomaly detected!!
...but how…?
Photo by Andrea Piacquadio from Pexels

How anomalous??
Anomaly detected!!
...but how…?
Photo by Andrea Piacquadio from Pexels
+audio captioning
Captioning is also tackled
in DCASE Challenge [Link]
High frequency rubbing noise.
It might be an anomaly in bearing.

Conclusion
❏ Interesting and “tasty” problems
❏ Outlier-detection? Classification?
❏ Even the definition of the problem is uncertain
❏ Blue ocean
❏ Many unsolved problems and DCASE Challenge
❏ Domain adaptation, few-shot learning...
❏ Frontier
❏ Practically important but undeveloped research field
❏ Combining other DCASE tasks

Why not enjoy ASD?

Thank You
Yuma Koizumi
Research Scientist @ Google Research Tokyo

Problem on auto-encoder (supplement)
❏ Reconstruction error and energy of Boltzmann distribution
❏ MMSE-based training ignores normalizing constant
KL-div. between
PDF of normal
MMSE Constraint for increasing total anomaly score
= Increasing anomaly score of unknown samples

A Brief Introduction of Anomalous Sound Detection: Recent Studies and Future Prospects

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Recently uploaded

Recently uploaded (20)

A Brief Introduction of Anomalous Sound Detection: Recent Studies and Future Prospects