SlideShare a Scribd company logo
1 of 20
Download to read offline
L3-Net Deep Audio
Embeddings to Improve
COVID-19 Detection
from Smartphone Data
Mattia G. Campana (IIT-CNR)


Andrea Rovati (UniMi)


Franca Delmastro (IIT-CNR)


Elena Pagani (UniMi)
IEEE SMARTCOMP 2022, June 20-24


Aalto University, Espoo, Finland
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
AI response to the COVID-19 pandemic
2 IEEE SMARTCOMP 2022
Help the healthcare system
• Machine Learning (ML) classifiers for blood test results


• Deep Learning (DL) models to analyze chest X-ray and lungs
Computed Tomography (CT) images
Track behaviours in public places
• Monitoring social distancing


• Face mask detection systems
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
m-health systems based on respiratory sounds
3 IEEE SMARTCOMP 2022
Diagnosis
• Pervasive & low-cost solution for fast screening


• Support the healthcare system in identifying new cases (prevention of new outbreaks)


• Track the disease evolution
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
COVID-19 Detection from respiratory sounds
4 IEEE SMARTCOMP 2022
Handcrafted acoustic features (HC)
Main drawbacks
• Dif
fi
cult to
fi
nd the best set of features • Typically outperformed by Deep Learning models
Shallow


classifier
Time domain Frequency domain Time-frequency representations
• RMS Energy (how loud is the signal)


• Zero crossing rate (how fast the signal changes)
• Spectral centroid


• Period (freq. with highest amplitude)
• Spectrogram


• Mel-Frequency Cepstral Coefficients (MFCC)
Features


Extraction
COVID-19


positive/negative
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
COVID-19 Detection from respiratory sounds
5 IEEE SMARTCOMP 2022
DL-based approach
Representative work:
E. A. Mohammed et al., “An ensemble learning approach to digital corona virus preliminary screening from cough
sounds”, Scienti
fi
c Reports, 2021.
Main drawback: Requires large-scale datasets, especially for complex models
Graphical representation


(i.e., Spectrogram-like image)
Convolutional Neural Network


(CNN)
COVID-19


positive/negative
Ensemble of CNN with different audio representations
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
COVID-19 Detection from respiratory sounds
6 IEEE SMARTCOMP 2022
“Hybrid” approach
Representative work:
Brown, Chloë, et al. "Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data." In
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020.
HC features
Deep audio
embeddings
+ Shallow


classifier
Pre-trained DL model
COVID-19


positive/negative
477 HC features + VGGish (trained with AudioSet ~ 2 million samples)
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Improving the Hybrid approach
7 IEEE SMARTCOMP 2022
Investigation of an alternative embedding model: L3-Net
HC features
Deep audio
embeddings
+ Shallow


classifier
Pre-trained DL model
COVID-19


positive/negative
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
L3-Net: Look, Listen and Learn
8 IEEE SMARTCOMP 2022
Arandjelovic, Relja, and Andrew Zisserman. "Look, listen and learn." Proceedings of the IEEE International Conference on Computer Vision. 2017.
Fusion layers


(Fully-connected)
Video embeddings
Audio embeddings
Mel-Spectrogram


(1s window)
Video frame image
Image and
audio come
from the
same video?
Video sub-network
Audio sub-network
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
L3-Net for COVID-19 Detection
9 IEEE SMARTCOMP 2022
+
Shallow
Classifier
COVID-19
positive/negative
Cough/Breath
audio sample
Audio frames Mel-Spectrogram
HC features
Dimensionality
reduction (PCA)
Audio
fi
le embeddings


Combination of the
frames embeddings


(Mean + std)
Audio embeddings
Audio sub-network
Cramer, Jason, et al. "Look, listen, and learn more: Design choices for deep audio embeddings." ICASSP 2019-2019 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.
OpenL3 model trained with AudioSet
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Experimental Evaluation: Goals
IEEE SMARTCOMP 2022
1) Improve the classification performance with respect to:












































2)Can we perform the classification task directly on the mobile device?
- Brown et al. (2020): same approach but different embedding model (i.e., VGGish vs L3-Net)


- Mohammed et al. (2021): ensemble model (CNN trained from scratch vs pre-trained model)
- Memory footprint evaluation
9
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Datasets
10 IEEE SMARTCOMP 2022
Cambridge


crowdsourced breath and cough audio samples (data agreement)


www.covid-19-sounds.org
COSWARA


crowdsourced cough samples


coswara.iisc.ac.in


https://github.com/iiscleap/Coswara-Data
Virufy


Cough samples collected in hospital; labels based on


COVID-19 PCR test results


https://github.com/virufy/virufy-covid
Cambridge COSWARA Virufy
62
860
282
7
2758
752
Healthy
COVID-19
Best model
Dev set Test set
Performances


AUC, Precision, Recall
Training & Tuning
Balanced Dataset


(Under-sampling)
Train set Validation set
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Evaluation protocol
11 IEEE SMARTCOMP 2022
5-fold nested Cross Validation


with stratified user-based splits
PCA explained variance: [0.7, 0.8, 0.9, 0.95, 0.99]
Shallow classifiers: Logistic Regression (LR), Support Vector Machines (SVM), AdaBoost (AB), Random Forest (RF)
Features sets:
F1: deep audio embeddings F2: embeddings + Period, Tempo, Duration
F3: embeddings + HC features, except Δ-MFCC, Δ2-MFCC F4: embeddings + all HC feature (i.e, 477 HC)
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Classification Results vs Brown et al. (2020)
12 IEEE SMARTCOMP 2022
TABLE III: Classification results
Task Method Modality Features Classifier PCA Mean (± std)
AUC Precision Recall
1
baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11)
our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158)
our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139)
2
baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23)
our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276)
our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237)
3
baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26)
our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269)
our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192)
rufy
baseline Top 4 Ensemble CNN - .77 .80 .71
our F3 LR .99 .99 (.001) .99 (.006) .99 (.007)
1: COVID-positive vs COVID-negative


2: COVID-positive with cough vs COVID-negative


3: COVID-positive with cough vs COVID-negative with asthma and cough
Dataset: Cambridge (cough & breath audio samples)
3 classification tasks
Gain (%)
Task 1 Task 2 Task 3
10
-12
-1
13
12
5
8
2
0
AUC
Precision
Recall
TABLE III: Classification results
Task Method Modality Features Classifier PCA Mean (± std)
AUC Precision Recall
1
baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11)
our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158)
our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139)
2
baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23)
our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276)
our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237)
3
baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26)
our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269)
our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192)
rufy
baseline Top 4 Ensemble CNN - .77 .80 .71
our F3 LR .99 .99 (.001) .99 (.006) .99 (.007)
TABLE III: Classification results
Task Method Modality Features Classifier PCA Mean (± std)
AUC Precision Recall
1
baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11)
our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158)
our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139)
2
baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23)
our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276)
our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237)
3
baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26)
our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269)
our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192)
rufy
baseline Top 4 Ensemble CNN - .77 .80 .71
our F3 LR .99 .99 (.001) .99 (.006) .99 (.007)
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Classification Results vs Mohammed et al. (2021)
13 IEEE SMARTCOMP 2022
Dataset: COSWARA + Virufy (cough audio samples)
TABLE III: Classification results
Task Method Modality Features Classifier PCA Mean (± std)
AUC Precision Recall
1
baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11)
our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158)
our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139)
2
baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23)
our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276)
our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237)
3
baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26)
our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269)
our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192)
ufy
baseline Top 4 Ensemble CNN - .77 .80 .71
our F3 LR .99 .99 (.001) .99 (.006) .99 (.007)
TABLE III: Classification results
Task Method Modality Features Classifier PCA Mean (± std)
AUC Precision Recall
1
baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11)
our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158)
our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139)
2
baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23)
our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276)
our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237)
3
baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26)
our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269)
our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192)
ufy
baseline Top 4 Ensemble CNN - .77 .80 .71
our F3 LR .99 .99 (.001) .99 (.006) .99 (.007)
4 CNN with different inputs:
- Power Spectrum


- MFCC
- Spectrogram


- Mel-spectrogram
SVM .99 (.001) .99 (.002) .98 (.01)
RF .81 (.024) .79 (.031) .59 (.07)
AB .85 (.011) .77 (.021) .75 (.03)
Gain (%)
28
19
22
AUC
Precision
Recall
Classification Task: COVID-positive vs COVID-negative
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Memory footprint
14 IEEE SMARTCOMP 2022
Cambridge Task 1 COSWARA + Virufy
Cambridge Task 2 Cambridge Task 3
LR with PCA 99%: 7.19 KB
AB with PCA 70%: 17 KB
LR with PCA 80%: 1.03 KB
SVM with PCA 70%: 48 KB
Low memory impact in all the experiments
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Contributions
15 IEEE SMARTCOMP 2022
• We investigated the use of a pre-trained instance of L3-Net (OpenL3) to improve the COVID-19
detection from respiratory sound data


• Evaluation: subject-independent experiments with 3 datasets


• Results: +8% AUC vs VGGish, +22% AUC vs ensemble of end-to-end CNN


• Low memory footprint: we can perform the whole task on resource-constrained devices
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Future Work
15 IEEE SMARTCOMP 2022
• Distinguish between COVID-19 and other respiratory diseases (e.g., asthma)
Fixed CNN layers Train FC layers
Diagnosis
• Fine-tuning OpenL3, proposing a single model for both features extraction and classification
• Extensive comparison of different audio embedding models
L3-Net Deep Audio
Embeddings to Improve
COVID-19 Detection
from Smartphone Data
Mattia G. Campana


Ubiquitous Internet Research Unit




Institute of Informatics and Telematics


National Research Council of Italy
mattiacampana.github.io
mattia.campana@iit.cnr.it
linkedin.com/in/mattiacampana
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
Handcrafted acoustic Features
15 IEEE SMARTCOMP 2022
• The audio sample is re-sampled to a standard value for audio tasks (e.g., 16kHz or 22kHz)


• Extraction of features related to both frame (i.e., audio chunks) and segment (whole sample) perspectives


• We used the same 477 HC features (including statistics) considered by Brown et al. (2020)
Feature Description
Duration Total length (in seconds) of the audio sample
Onset Number of pitch onset (i.e., “events”) in the audio signal
Tempo Rate of beats that occur at regular intervals throughout the entire audio signal
Period The frequency with the highest amplitude among those obtained from the Fast Fourier transform
(FFT)
RMS Energy Root-Mean-Square of the signal power (i.e., the magnitude of the short-time Fourier transform)
Spectral Centroid The centroid value of the frame-wise magnitude spectrogram. Identifies percussive and sustained
sounds.
Roll-off Frequency The frequency under which the 85% of the total energy of the frame-wise spectrum is contained
Zero-crossing rate The number of times the signal value crosses the zero axe, and it is computed for each frame
MFCC Shape of the cosine transformation of the song logarithmic spectrum, expressed in Mel-bands
Δ-MFCC and Δ2-MFCC The first and second order derivatives of MFCC along time
L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data
L3-Net vs VGGish
15 IEEE SMARTCOMP 2022
# parameters: • L3-Net: 4.7M
• VGGish: 62M
Cramer, Jason, et al. "Look, listen, and learn more: Design choices for deep audio embeddings." ICASSP 2019-2019 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.

More Related Content

Similar to L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data

AUTOMATIC COVID DETECTION USING COUGH SIGNAL ANALYSIS
AUTOMATIC COVID DETECTION USING COUGH SIGNAL ANALYSISAUTOMATIC COVID DETECTION USING COUGH SIGNAL ANALYSIS
AUTOMATIC COVID DETECTION USING COUGH SIGNAL ANALYSISIRJET Journal
 
Employing deep learning for lung sounds classification
Employing deep learning for lung sounds classificationEmploying deep learning for lung sounds classification
Employing deep learning for lung sounds classificationIJECEIAES
 
/conferences/spr2002/presentations/ssimmons/simmons.ppt
/conferences/spr2002/presentations/ssimmons/simmons.ppt/conferences/spr2002/presentations/ssimmons/simmons.ppt
/conferences/spr2002/presentations/ssimmons/simmons.pptVideoguy
 
MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012
MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012
MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012Charith Perera
 
IRJET- Hearing Loss Detection through Audiogram in Mobile Devices
IRJET-  	  Hearing Loss Detection through Audiogram in Mobile DevicesIRJET-  	  Hearing Loss Detection through Audiogram in Mobile Devices
IRJET- Hearing Loss Detection through Audiogram in Mobile DevicesIRJET Journal
 
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...Matt Moores
 
AMATH582_Final_Poster
AMATH582_Final_PosterAMATH582_Final_Poster
AMATH582_Final_PosterMark Chang
 
Biolelemetry1
Biolelemetry1Biolelemetry1
Biolelemetry1Samuely
 
OPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingOPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingAlpen-Adria-Universität
 
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfOPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfVignesh V Menon
 
Krishna thesis presentation
Krishna thesis presentationKrishna thesis presentation
Krishna thesis presentationahyaimie
 
Biometric presentation attack detection
Biometric presentation attack detectionBiometric presentation attack detection
Biometric presentation attack detectionGautam Saxena
 
EECS452EMGFinalProjectReportPDF
EECS452EMGFinalProjectReportPDFEECS452EMGFinalProjectReportPDF
EECS452EMGFinalProjectReportPDFAngie Zhang
 
Enhancing image based data hiding method using reduced difference expansion a...
Enhancing image based data hiding method using reduced difference expansion a...Enhancing image based data hiding method using reduced difference expansion a...
Enhancing image based data hiding method using reduced difference expansion a...MAURICE NTAHOBARI
 
Automatic COVID-19 lung images classification system based on convolution ne...
Automatic COVID-19 lung images classification system based  on convolution ne...Automatic COVID-19 lung images classification system based  on convolution ne...
Automatic COVID-19 lung images classification system based on convolution ne...IJECEIAES
 
Adaptive wavelet thresholding with robust hybrid features for text-independe...
Adaptive wavelet thresholding with robust hybrid features  for text-independe...Adaptive wavelet thresholding with robust hybrid features  for text-independe...
Adaptive wavelet thresholding with robust hybrid features for text-independe...IJECEIAES
 

Similar to L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data (20)

AUTOMATIC COVID DETECTION USING COUGH SIGNAL ANALYSIS
AUTOMATIC COVID DETECTION USING COUGH SIGNAL ANALYSISAUTOMATIC COVID DETECTION USING COUGH SIGNAL ANALYSIS
AUTOMATIC COVID DETECTION USING COUGH SIGNAL ANALYSIS
 
Employing deep learning for lung sounds classification
Employing deep learning for lung sounds classificationEmploying deep learning for lung sounds classification
Employing deep learning for lung sounds classification
 
/conferences/spr2002/presentations/ssimmons/simmons.ppt
/conferences/spr2002/presentations/ssimmons/simmons.ppt/conferences/spr2002/presentations/ssimmons/simmons.ppt
/conferences/spr2002/presentations/ssimmons/simmons.ppt
 
EEG based security
EEG based security EEG based security
EEG based security
 
MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012
MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012
MobiDE’2012, Phoenix, AZ, United States, 20 May, 2012
 
SImOS
SImOSSImOS
SImOS
 
IRJET- Hearing Loss Detection through Audiogram in Mobile Devices
IRJET-  	  Hearing Loss Detection through Audiogram in Mobile DevicesIRJET-  	  Hearing Loss Detection through Audiogram in Mobile Devices
IRJET- Hearing Loss Detection through Audiogram in Mobile Devices
 
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
 
AMATH582_Final_Poster
AMATH582_Final_PosterAMATH582_Final_Poster
AMATH582_Final_Poster
 
Biolelemetry1
Biolelemetry1Biolelemetry1
Biolelemetry1
 
OPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video StreamingOPTE: Online Per-title Encoding for Live Video Streaming
OPTE: Online Per-title Encoding for Live Video Streaming
 
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdfOPTE: Online Per-title Encoding for Live Video Streaming.pdf
OPTE: Online Per-title Encoding for Live Video Streaming.pdf
 
Krishna thesis presentation
Krishna thesis presentationKrishna thesis presentation
Krishna thesis presentation
 
F5242832
F5242832F5242832
F5242832
 
Biometric presentation attack detection
Biometric presentation attack detectionBiometric presentation attack detection
Biometric presentation attack detection
 
EECS452EMGFinalProjectReportPDF
EECS452EMGFinalProjectReportPDFEECS452EMGFinalProjectReportPDF
EECS452EMGFinalProjectReportPDF
 
Enhancing image based data hiding method using reduced difference expansion a...
Enhancing image based data hiding method using reduced difference expansion a...Enhancing image based data hiding method using reduced difference expansion a...
Enhancing image based data hiding method using reduced difference expansion a...
 
14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c14 00-20171207 rance-piv_c
14 00-20171207 rance-piv_c
 
Automatic COVID-19 lung images classification system based on convolution ne...
Automatic COVID-19 lung images classification system based  on convolution ne...Automatic COVID-19 lung images classification system based  on convolution ne...
Automatic COVID-19 lung images classification system based on convolution ne...
 
Adaptive wavelet thresholding with robust hybrid features for text-independe...
Adaptive wavelet thresholding with robust hybrid features  for text-independe...Adaptive wavelet thresholding with robust hybrid features  for text-independe...
Adaptive wavelet thresholding with robust hybrid features for text-independe...
 

Recently uploaded

Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...NETWAYS
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸mathanramanathan2005
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationNathan Young
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 

Recently uploaded (20)

Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
OSCamp Kubernetes 2024 | SRE Challenges in Monolith to Microservices Shift at...
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
 
Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸Mathan flower ppt.pptx slide orchids ✨🌸
Mathan flower ppt.pptx slide orchids ✨🌸
 
The Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism PresentationThe Ten Facts About People With Autism Presentation
The Ten Facts About People With Autism Presentation
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 

L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data

  • 1. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Mattia G. Campana (IIT-CNR) Andrea Rovati (UniMi) Franca Delmastro (IIT-CNR) Elena Pagani (UniMi) IEEE SMARTCOMP 2022, June 20-24 Aalto University, Espoo, Finland
  • 2. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data AI response to the COVID-19 pandemic 2 IEEE SMARTCOMP 2022 Help the healthcare system • Machine Learning (ML) classifiers for blood test results • Deep Learning (DL) models to analyze chest X-ray and lungs Computed Tomography (CT) images Track behaviours in public places • Monitoring social distancing • Face mask detection systems
  • 3. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data m-health systems based on respiratory sounds 3 IEEE SMARTCOMP 2022 Diagnosis • Pervasive & low-cost solution for fast screening • Support the healthcare system in identifying new cases (prevention of new outbreaks) • Track the disease evolution
  • 4. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data COVID-19 Detection from respiratory sounds 4 IEEE SMARTCOMP 2022 Handcrafted acoustic features (HC) Main drawbacks • Dif fi cult to fi nd the best set of features • Typically outperformed by Deep Learning models Shallow 
 classifier Time domain Frequency domain Time-frequency representations • RMS Energy (how loud is the signal) • Zero crossing rate (how fast the signal changes) • Spectral centroid • Period (freq. with highest amplitude) • Spectrogram • Mel-Frequency Cepstral Coefficients (MFCC) Features Extraction COVID-19 
 positive/negative
  • 5. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data COVID-19 Detection from respiratory sounds 5 IEEE SMARTCOMP 2022 DL-based approach Representative work: E. A. Mohammed et al., “An ensemble learning approach to digital corona virus preliminary screening from cough sounds”, Scienti fi c Reports, 2021. Main drawback: Requires large-scale datasets, especially for complex models Graphical representation 
 (i.e., Spectrogram-like image) Convolutional Neural Network 
 (CNN) COVID-19 
 positive/negative Ensemble of CNN with different audio representations
  • 6. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data COVID-19 Detection from respiratory sounds 6 IEEE SMARTCOMP 2022 “Hybrid” approach Representative work: Brown, Chloë, et al. "Exploring automatic diagnosis of COVID-19 from crowdsourced respiratory sound data." In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020. HC features Deep audio embeddings + Shallow 
 classifier Pre-trained DL model COVID-19 
 positive/negative 477 HC features + VGGish (trained with AudioSet ~ 2 million samples)
  • 7. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Improving the Hybrid approach 7 IEEE SMARTCOMP 2022 Investigation of an alternative embedding model: L3-Net HC features Deep audio embeddings + Shallow 
 classifier Pre-trained DL model COVID-19 
 positive/negative
  • 8. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data L3-Net: Look, Listen and Learn 8 IEEE SMARTCOMP 2022 Arandjelovic, Relja, and Andrew Zisserman. "Look, listen and learn." Proceedings of the IEEE International Conference on Computer Vision. 2017. Fusion layers (Fully-connected) Video embeddings Audio embeddings Mel-Spectrogram (1s window) Video frame image Image and audio come from the same video? Video sub-network Audio sub-network
  • 9. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data L3-Net for COVID-19 Detection 9 IEEE SMARTCOMP 2022 + Shallow Classifier COVID-19 positive/negative Cough/Breath audio sample Audio frames Mel-Spectrogram HC features Dimensionality reduction (PCA) Audio fi le embeddings Combination of the frames embeddings (Mean + std) Audio embeddings Audio sub-network Cramer, Jason, et al. "Look, listen, and learn more: Design choices for deep audio embeddings." ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. OpenL3 model trained with AudioSet
  • 10. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Experimental Evaluation: Goals IEEE SMARTCOMP 2022 1) Improve the classification performance with respect to: 
 








































 2)Can we perform the classification task directly on the mobile device? - Brown et al. (2020): same approach but different embedding model (i.e., VGGish vs L3-Net) - Mohammed et al. (2021): ensemble model (CNN trained from scratch vs pre-trained model) - Memory footprint evaluation 9
  • 11. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Datasets 10 IEEE SMARTCOMP 2022 Cambridge crowdsourced breath and cough audio samples (data agreement) www.covid-19-sounds.org COSWARA crowdsourced cough samples coswara.iisc.ac.in https://github.com/iiscleap/Coswara-Data Virufy Cough samples collected in hospital; labels based on 
 COVID-19 PCR test results https://github.com/virufy/virufy-covid Cambridge COSWARA Virufy 62 860 282 7 2758 752 Healthy COVID-19
  • 12. Best model Dev set Test set Performances 
 AUC, Precision, Recall Training & Tuning Balanced Dataset 
 (Under-sampling) Train set Validation set L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Evaluation protocol 11 IEEE SMARTCOMP 2022 5-fold nested Cross Validation 
 with stratified user-based splits PCA explained variance: [0.7, 0.8, 0.9, 0.95, 0.99] Shallow classifiers: Logistic Regression (LR), Support Vector Machines (SVM), AdaBoost (AB), Random Forest (RF) Features sets: F1: deep audio embeddings F2: embeddings + Period, Tempo, Duration F3: embeddings + HC features, except Δ-MFCC, Δ2-MFCC F4: embeddings + all HC feature (i.e, 477 HC)
  • 13. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Classification Results vs Brown et al. (2020) 12 IEEE SMARTCOMP 2022 TABLE III: Classification results Task Method Modality Features Classifier PCA Mean (± std) AUC Precision Recall 1 baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11) our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158) our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139) 2 baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23) our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276) our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237) 3 baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26) our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269) our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192) rufy baseline Top 4 Ensemble CNN - .77 .80 .71 our F3 LR .99 .99 (.001) .99 (.006) .99 (.007) 1: COVID-positive vs COVID-negative 2: COVID-positive with cough vs COVID-negative 3: COVID-positive with cough vs COVID-negative with asthma and cough Dataset: Cambridge (cough & breath audio samples) 3 classification tasks Gain (%) Task 1 Task 2 Task 3 10 -12 -1 13 12 5 8 2 0 AUC Precision Recall TABLE III: Classification results Task Method Modality Features Classifier PCA Mean (± std) AUC Precision Recall 1 baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11) our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158) our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139) 2 baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23) our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276) our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237) 3 baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26) our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269) our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192) rufy baseline Top 4 Ensemble CNN - .77 .80 .71 our F3 LR .99 .99 (.001) .99 (.006) .99 (.007) TABLE III: Classification results Task Method Modality Features Classifier PCA Mean (± std) AUC Precision Recall 1 baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11) our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158) our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139) 2 baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23) our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276) our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237) 3 baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26) our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269) our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192) rufy baseline Top 4 Ensemble CNN - .77 .80 .71 our F3 LR .99 .99 (.001) .99 (.006) .99 (.007)
  • 14. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Classification Results vs Mohammed et al. (2021) 13 IEEE SMARTCOMP 2022 Dataset: COSWARA + Virufy (cough audio samples) TABLE III: Classification results Task Method Modality Features Classifier PCA Mean (± std) AUC Precision Recall 1 baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11) our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158) our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139) 2 baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23) our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276) our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237) 3 baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26) our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269) our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192) ufy baseline Top 4 Ensemble CNN - .77 .80 .71 our F3 LR .99 .99 (.001) .99 (.006) .99 (.007) TABLE III: Classification results Task Method Modality Features Classifier PCA Mean (± std) AUC Precision Recall 1 baseline Cough + Breath F2 LR .95 .80 (.07) .72 (.06) .69 (.11) our (same) Cough + Breath F2 LR .95 .76 (.092) .69 (.095) .68 (.158) our (best) Cough + Breath F2 SVM .70 .80 (.068) .77 (.096) .68 (.139) 2 baseline Cough F2 SVM .90 .82 (.18) .80 (.16) .72 (.23) our (same) Cough F2 SVM .90 .69 (.227) .74 (.187) .61 (.276) our (best) Breath F1 LR .80 .84 (.168) .92 (.106) .60 (.237) 3 baseline Breath F3 SVM .70 .80 (.14) .69 (.20) .69 (.26) our (same) Breath F3 SVM .70 .64 (.254) .69 (.154) .66 (.269) our (best) Breath F1 AB .70 .88 (.066) .82 (.152) .79 (.192) ufy baseline Top 4 Ensemble CNN - .77 .80 .71 our F3 LR .99 .99 (.001) .99 (.006) .99 (.007) 4 CNN with different inputs: - Power Spectrum - MFCC - Spectrogram - Mel-spectrogram SVM .99 (.001) .99 (.002) .98 (.01) RF .81 (.024) .79 (.031) .59 (.07) AB .85 (.011) .77 (.021) .75 (.03) Gain (%) 28 19 22 AUC Precision Recall Classification Task: COVID-positive vs COVID-negative
  • 15. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Memory footprint 14 IEEE SMARTCOMP 2022 Cambridge Task 1 COSWARA + Virufy Cambridge Task 2 Cambridge Task 3 LR with PCA 99%: 7.19 KB AB with PCA 70%: 17 KB LR with PCA 80%: 1.03 KB SVM with PCA 70%: 48 KB Low memory impact in all the experiments
  • 16. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Contributions 15 IEEE SMARTCOMP 2022 • We investigated the use of a pre-trained instance of L3-Net (OpenL3) to improve the COVID-19 detection from respiratory sound data • Evaluation: subject-independent experiments with 3 datasets • Results: +8% AUC vs VGGish, +22% AUC vs ensemble of end-to-end CNN • Low memory footprint: we can perform the whole task on resource-constrained devices
  • 17. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Future Work 15 IEEE SMARTCOMP 2022 • Distinguish between COVID-19 and other respiratory diseases (e.g., asthma) Fixed CNN layers Train FC layers Diagnosis • Fine-tuning OpenL3, proposing a single model for both features extraction and classification • Extensive comparison of different audio embedding models
  • 18. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Mattia G. Campana Ubiquitous Internet Research Unit 
 Institute of Informatics and Telematics 
 National Research Council of Italy mattiacampana.github.io mattia.campana@iit.cnr.it linkedin.com/in/mattiacampana
  • 19. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data Handcrafted acoustic Features 15 IEEE SMARTCOMP 2022 • The audio sample is re-sampled to a standard value for audio tasks (e.g., 16kHz or 22kHz) • Extraction of features related to both frame (i.e., audio chunks) and segment (whole sample) perspectives • We used the same 477 HC features (including statistics) considered by Brown et al. (2020) Feature Description Duration Total length (in seconds) of the audio sample Onset Number of pitch onset (i.e., “events”) in the audio signal Tempo Rate of beats that occur at regular intervals throughout the entire audio signal Period The frequency with the highest amplitude among those obtained from the Fast Fourier transform (FFT) RMS Energy Root-Mean-Square of the signal power (i.e., the magnitude of the short-time Fourier transform) Spectral Centroid The centroid value of the frame-wise magnitude spectrogram. Identifies percussive and sustained sounds. Roll-off Frequency The frequency under which the 85% of the total energy of the frame-wise spectrum is contained Zero-crossing rate The number of times the signal value crosses the zero axe, and it is computed for each frame MFCC Shape of the cosine transformation of the song logarithmic spectrum, expressed in Mel-bands Δ-MFCC and Δ2-MFCC The first and second order derivatives of MFCC along time
  • 20. L3-Net Deep Audio Embeddings to Improve COVID-19 Detection from Smartphone Data L3-Net vs VGGish 15 IEEE SMARTCOMP 2022 # parameters: • L3-Net: 4.7M • VGGish: 62M Cramer, Jason, et al. "Look, listen, and learn more: Design choices for deep audio embeddings." ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.