Cascade classifiers trained on gammatonegrams for reliably detecting audio events

•Download as PPTX, PDF•

2 likes•768 views

In this paper we propose a novel method for the detection of events of interest through audio analysis. The system that we propose is based on the representation of the audio streams through a Gammatone image, which describes the time-frequency distribution of the energy of the signal; this representation is inspired by the functioning of the human auditory system. A pool of AdaBoost cascade classifiers, one for each class of events of interest, is involved in the event detection stage. The performance of the proposed system has been evaluated on a large data set of audio events for surveillance applications and the achieved results, compared with two state of the art approaches, confirm its effectiveness. Downlaod the paper at: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6918643

Engineering

P. Foggia, A. Saggese, N. Strisciuglio, M. Vento
University of Salerno - Italy
"Cascade classifiers trained on gammatonegrams for reliably detecting audio events,"
Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on ,
vol., no., pp.50,55, 26-29 Aug. 2014 - doi: 10.1109/AVSS.2014.6918643
Machine Intelligence lab for Video, Image and Audio processing

State of the art
 Single-layer representation or classification
 Vacher et al. (2004), Clavel et al. (2005): GMM classifier
 Valenzise et al. (2007): GMM for background modeling
 Rabaoui et al. (2008): OC-SVM with a novel dissimilarity
measure.
 Complex classification architecture or representation
 Rouas et al. (2006): GMM + SVM
 Ntalampiras et al. (2009): two-stage GMM classifier
 Conte et al. (2012): two classifier with different time
resolutions
 Chin and Burred (2012): sub-sequences matching
through Genetic Motif Discovery technique.

Proposed Architecture
Image
Representation
Features
Extraction
(Haar)
Cascade Classifiers

Audio representation
 Biologically-inspired representation of audio streams as the
response of the cochlea membrane in the human auditory
system (Gammatone filter bank)
Scream Gun shot Glass breaking

Haar features
 Haar Wavelets to describe local variations of energy in
the Gammatonegram images
 f.i. abrupt variations of the energy distribution along time is effectively
described by a vertical Haar basis function
 Efficiently computed from the Integral Image of the
Gammatonegram

Cascade Classifiers
 Events of interest can occur at every position in time
 Classification through a n x m sliding window
 Multi-stage cascade classifier learned with AdaBoost
algorithm (inspired to VJ face detector)
 Smaller and simpler classifiers in the first stages of the
cascade
 Speed-up for the early rejection of negative windows
Input Image
rejected (no-events)
event
detected

Data Set (http://mivia.unisa.it)
 4 classes of sounds
 Glass breaking (GB), Gun shot (GS), Screams (S),
Background sound (BG)
 2500 events for each class
 1000 for training and 1500 for testing
 The events are created by super-imposing
abnormal sounds on several background
sounds
 Originally 173 background sounds + 278 sound
from the classes of interest

Experimental Evaluation
 Recognition Rate
 Correct detection/classification of events of
interest
 False Positive Rate (False alarms)
 Detection of events of interest when only
background sounds is present
 Comparison with 2 other methods from the
literature based on a LVQ [1] and Bag of
Aural Words (BoAW) classifier [2]
[1] Conte et al. - An ensemble of rejecting classifiers for anomaly detection of audio events, AVSS 2012
[2] Carletti et al. - Audio surveillance using a bag of aural words classifier, AVSS 2013

Experimental Evaluation (2)
 Recognition Rate
Avg. Rec. Rate = 95.89%
[1] [2]
Avg. Rec. Rate = 79.87% Avg. Rec. Rate = 95.67%

Experimental Evaluation (3)
 False Positive Rate
[1]
[2]
[1]
[2]

Qualitative analysis
 Many false scream detections occur on
background sounds that contain loud cheering
crowds or twistles
Scream
Twistle
Cheering
baby

Conclusions
 Innovative approach for audio analysis and
events detection based on Computer Vision
techniques
 High detection capabilities
 Low processing time: complex features are
computed only for windows that are more
probable to contain an event of interest
 Detection of sounds of interest with low
energy

References
P. Foggia,A. Saggese, N.Strisciuglio, M. Vento
"Cascade classifiers trained on gammatonegrams for
reliably detecting audio events"
Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE
International Conference on , vol., no., pp.50,55, 26-29 Aug. 2014
doi: 10.1109/AVSS.2014.6918643
Web: http://mivia.unisa.it
Email: nstrisciuglio[at]unisa.it

Similar to Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Ultrasound image denoising using generative adversarial networks with residua...Daniel983829

A review of Noise Suppression Technology for Real-Time Speech EnhancementIRJET Journal

Ijarcet vol-2-issue-4-1347-1351Editor IJARCET

A New Approach for video denoising and enhancement using optical flow EstimationIRJET Journal

Robust image processing algorithms, involving tools from digital geometry and...Antoine Vacavant

IRJET- Survey Paper on Anomaly Detection in Surveillance VideosIRJET Journal

Machine Learning at the (sub)Atomic Scale (or Are The Nanobots Nigh?)Philip Moriarty

ANALYSIS OF SEISMIC SIGNAL AND DETECTION OF ABNORMALITIESCSEIJJournal

Analysis of Seismic Signal and Detection of AbnormalitiesCSEIJJournal

Poster: Monash Research Month 2008Mahfuzul Haque

Emerging 3D Display TechnologiesMatt Hirsch - MIT Media Lab

Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...ijtsrd

The International Journal of Engineering and Science (The IJES)theijes

GVC-UA Automotive SectorFrancisco Miguel Martínez Verdú

Slides of my presentation at EUSIPCO 2017 Hamid Eghbal-zadeh

Defending against Adversarial Cyberspace Participantsnamblasec

twofold processing for denoising ultrasound medical imagesanil kumar

Adaptive non-linear-filtering-technique-for-image-restorationCemal Ardil

Identification of Bird Species using Automation ToolIRJET Journal

Jaminan mutu imagingHeri Kuswoyo

Similar to Cascade classifiers trained on gammatonegrams for reliably detecting audio events (20)

Ultrasound image denoising using generative adversarial networks with residua...

A review of Noise Suppression Technology for Real-Time Speech Enhancement

Ijarcet vol-2-issue-4-1347-1351

A New Approach for video denoising and enhancement using optical flow Estimation

Robust image processing algorithms, involving tools from digital geometry and...

IRJET- Survey Paper on Anomaly Detection in Surveillance Videos

Machine Learning at the (sub)Atomic Scale (or Are The Nanobots Nigh?)

ANALYSIS OF SEISMIC SIGNAL AND DETECTION OF ABNORMALITIES

Analysis of Seismic Signal and Detection of Abnormalities

Poster: Monash Research Month 2008

Emerging 3D Display Technologies

Acoustic Scene Classification by using Combination of MODWPT and Spectral Fea...

The International Journal of Engineering and Science (The IJES)

GVC-UA Automotive Sector

Slides of my presentation at EUSIPCO 2017

Defending against Adversarial Cyberspace Participants

twofold processing for denoising ultrasound medical images

Adaptive non-linear-filtering-technique-for-image-restoration

Identification of Bird Species using Automation Tool

Jaminan mutu imaging

Recently uploaded

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

Introduction and different types of Ethernet.pptxupamatechverse

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Extrusion Processes and Their Limitations120cr0395

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

Recently uploaded (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

UNIT-II FMM-Flow Through Circular Conduits

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

Introduction and different types of Ethernet.pptx

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

Extrusion Processes and Their Limitations

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

UNIT-III FMM. DIMENSIONAL ANALYSIS

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

Cascade classifiers trained on gammatonegrams for reliably detecting audio events

1. P. Foggia, A. Saggese, N. Strisciuglio, M. Vento University of Salerno - Italy "Cascade classifiers trained on gammatonegrams for reliably detecting audio events," Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on , vol., no., pp.50,55, 26-29 Aug. 2014 - doi: 10.1109/AVSS.2014.6918643 Machine Intelligence lab for Video, Image and Audio processing

2. State of the art  Single-layer representation or classification  Vacher et al. (2004), Clavel et al. (2005): GMM classifier  Valenzise et al. (2007): GMM for background modeling  Rabaoui et al. (2008): OC-SVM with a novel dissimilarity measure.  Complex classification architecture or representation  Rouas et al. (2006): GMM + SVM  Ntalampiras et al. (2009): two-stage GMM classifier  Conte et al. (2012): two classifier with different time resolutions  Chin and Burred (2012): sub-sequences matching through Genetic Motif Discovery technique.

3. Proposed Architecture Image Representation Features Extraction (Haar) Cascade Classifiers

4. Audio representation  Biologically-inspired representation of audio streams as the response of the cochlea membrane in the human auditory system (Gammatone filter bank) Scream Gun shot Glass breaking

5. Proposed Architecture Image Representation Features Extraction (Haar) Cascade Classifiers

6. Haar features  Haar Wavelets to describe local variations of energy in the Gammatonegram images  f.i. abrupt variations of the energy distribution along time is effectively described by a vertical Haar basis function  Efficiently computed from the Integral Image of the Gammatonegram

7. Proposed Architecture Image Representation Features Extraction (Haar) Cascade Classifiers

8. Cascade Classifiers  Events of interest can occur at every position in time  Classification through a n x m sliding window  Multi-stage cascade classifier learned with AdaBoost algorithm (inspired to VJ face detector)  Smaller and simpler classifiers in the first stages of the cascade  Speed-up for the early rejection of negative windows Input Image rejected (no-events) event detected

9. Data Set (http://mivia.unisa.it)  4 classes of sounds  Glass breaking (GB), Gun shot (GS), Screams (S), Background sound (BG)  2500 events for each class  1000 for training and 1500 for testing  The events are created by super-imposing abnormal sounds on several background sounds  Originally 173 background sounds + 278 sound from the classes of interest

10. Experimental Evaluation  Recognition Rate  Correct detection/classification of events of interest  False Positive Rate (False alarms)  Detection of events of interest when only background sounds is present  Comparison with 2 other methods from the literature based on a LVQ [1] and Bag of Aural Words (BoAW) classifier [2] [1] Conte et al. - An ensemble of rejecting classifiers for anomaly detection of audio events, AVSS 2012 [2] Carletti et al. - Audio surveillance using a bag of aural words classifier, AVSS 2013

11. Experimental Evaluation (2)  Recognition Rate Avg. Rec. Rate = 95.89% [1] [2] Avg. Rec. Rate = 79.87% Avg. Rec. Rate = 95.67%

12. Experimental Evaluation (3)  False Positive Rate [1] [2] [1] [2]

13. Qualitative analysis  Many false scream detections occur on background sounds that contain loud cheering crowds or twistles Scream Twistle Cheering baby

14. Conclusions  Innovative approach for audio analysis and events detection based on Computer Vision techniques  High detection capabilities  Low processing time: complex features are computed only for windows that are more probable to contain an event of interest  Detection of sounds of interest with low energy

15. References P. Foggia,A. Saggese, N.Strisciuglio, M. Vento "Cascade classifiers trained on gammatonegrams for reliably detecting audio events" Advanced Video and Signal Based Surveillance (AVSS), 2014 11th IEEE International Conference on , vol., no., pp.50,55, 26-29 Aug. 2014 doi: 10.1109/AVSS.2014.6918643 Web: http://mivia.unisa.it Email: nstrisciuglio[at]unisa.it

Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Recommended

Recommended

More Related Content

Similar to Cascade classifiers trained on gammatonegrams for reliably detecting audio events

Similar to Cascade classifiers trained on gammatonegrams for reliably detecting audio events (20)

Recently uploaded

Recently uploaded (20)

Cascade classifiers trained on gammatonegrams for reliably detecting audio events