In this paper we propose a novel method for the detection of events of interest through audio analysis. The system that we propose is based on the representation of the audio streams through a Gammatone image, which describes the time-frequency distribution of the energy of the signal; this representation is inspired by the functioning of the human auditory system. A pool of AdaBoost cascade classifiers, one for each class of events of interest, is involved in the event detection stage. The performance of the proposed system has been evaluated on a large data set of audio events for surveillance applications and the achieved results, compared with two state of the art approaches, confirm its effectiveness.
Downlaod the paper at:
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6918643
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
Cascade classifiers trained on gammatonegrams for reliably detecting audio events
1. P. Foggia, A. Saggese, N. Strisciuglio, M. Vento
University of Salerno - Italy
"Cascade classifiers trained on gammatonegrams for reliably detecting audio events,"
Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on ,
vol., no., pp.50,55, 26-29 Aug. 2014 - doi: 10.1109/AVSS.2014.6918643
Machine Intelligence lab for Video, Image and Audio processing
2. State of the art
Single-layer representation or classification
Vacher et al. (2004), Clavel et al. (2005): GMM classifier
Valenzise et al. (2007): GMM for background modeling
Rabaoui et al. (2008): OC-SVM with a novel dissimilarity
measure.
Complex classification architecture or representation
Rouas et al. (2006): GMM + SVM
Ntalampiras et al. (2009): two-stage GMM classifier
Conte et al. (2012): two classifier with different time
resolutions
Chin and Burred (2012): sub-sequences matching
through Genetic Motif Discovery technique.
4. Audio representation
Biologically-inspired representation of audio streams as the
response of the cochlea membrane in the human auditory
system (Gammatone filter bank)
Scream Gun shot Glass breaking
6. Haar features
Haar Wavelets to describe local variations of energy in
the Gammatonegram images
f.i. abrupt variations of the energy distribution along time is effectively
described by a vertical Haar basis function
Efficiently computed from the Integral Image of the
Gammatonegram
8. Cascade Classifiers
Events of interest can occur at every position in time
Classification through a n x m sliding window
Multi-stage cascade classifier learned with AdaBoost
algorithm (inspired to VJ face detector)
Smaller and simpler classifiers in the first stages of the
cascade
Speed-up for the early rejection of negative windows
Input Image
rejected (no-events)
event
detected
9. Data Set (http://mivia.unisa.it)
4 classes of sounds
Glass breaking (GB), Gun shot (GS), Screams (S),
Background sound (BG)
2500 events for each class
1000 for training and 1500 for testing
The events are created by super-imposing
abnormal sounds on several background
sounds
Originally 173 background sounds + 278 sound
from the classes of interest
10. Experimental Evaluation
Recognition Rate
Correct detection/classification of events of
interest
False Positive Rate (False alarms)
Detection of events of interest when only
background sounds is present
Comparison with 2 other methods from the
literature based on a LVQ [1] and Bag of
Aural Words (BoAW) classifier [2]
[1] Conte et al. - An ensemble of rejecting classifiers for anomaly detection of audio events, AVSS 2012
[2] Carletti et al. - Audio surveillance using a bag of aural words classifier, AVSS 2013
13. Qualitative analysis
Many false scream detections occur on
background sounds that contain loud cheering
crowds or twistles
Scream
Twistle
Cheering
baby
14. Conclusions
Innovative approach for audio analysis and
events detection based on Computer Vision
techniques
High detection capabilities
Low processing time: complex features are
computed only for windows that are more
probable to contain an event of interest
Detection of sounds of interest with low
energy
15. References
P. Foggia,A. Saggese, N.Strisciuglio, M. Vento
"Cascade classifiers trained on gammatonegrams for
reliably detecting audio events"
Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE
International Conference on , vol., no., pp.50,55, 26-29 Aug. 2014
doi: 10.1109/AVSS.2014.6918643
Web: http://mivia.unisa.it
Email: nstrisciuglio[at]unisa.it