SlideShare a Scribd company logo
1 of 1
Speaker Recognition at a Distance
Kerly Urbano1,4 Kevin P Cave1,3, Joey Skufca1,5 , Mike Fowler1,2, Joseph D Skufca1,2 , Stephanie Schuckers1,3
1CITER, 2Department of Mathematics, 3Department of Electrical Engineering, 4Department of Mechanical Engineering - Clarkson University
5Department of Computer Science - Stonybrook University
Acknowledgements: We thank Kevin
Chapman and Jonathon Bramsen for help during
spring semester; Prof JJ Remus, for starting this
project and developing the hardware we used.
We thank CUPO for funding support for this work.
Iris Cardio-Respiratory1 2 3 4 5 6 7 8
0
5
10
x 10
-4
Time, s
Amplitude,V
Subject #1395 Baseline ECG
1 2 3 4 5 6 7 8
-5
0
5
10
x 10
-4
Time, s
Amplitude,V
Subject #1395 Arrythmia ECG
We consider the impact of speaker-to-microphone
distance challenges on Speaker Identification, in particular
focusing on the mismatch in distance between the condition
when the speaker was enrolled into the system and the
conditions the system tests at. Previous research done by
others indicates that distance mismatch can significantly
diminish system performance. We will be creating a filter
that will make a recording from a distance of five feet
sound as if it were recorded at a distance at thirty-four
feet, and use this filtered data to improve the
identification process when comparing at thirty-four
feet. If successful, we can create more filters each tailored for a
different span of distances to create a general model for filters
that can improve identification.
Biometrics: The identification of human being based on their unique characteristics
and traits.
Motivation for research (National Biometrics Challenge [1])
• Robust biometrics at distance
• Multi-modal
• Non-cooperative subjects
• Speaker Identification impacted by “… performance issues associated with
the lack of comparable recording environments between the enrollment and
test sample.”
Speaker Recognition is the when you identify a person based on their voice signal.
This research is essential for improving current identification systems that use voice
as their identifier. Voice has characteristics that are unique to solely one person in
most cases making voice a good marker for identification systems.
If we have 5ft signal enrolled in the database and
want to test signal from a probe at 34ft, we can create a
filter that mimics 34ft data to improve performance of the
identification system.
• Initial experiments determined optimal placement of
microphone array with respect to speaker (maximum
response) and primary noise source (minimum response).
(See Fig 2.)
• Controlled speaker experiments collected in Room
CAMP194 at multiple distances.
• Fourier analysis used to asses attenuation vs. distance in
that environment.
• Multiple “trial” filters were developed for preprocessing
of audio to simulate the effect of recording at 34 feet
based on an initial data collection at 5 ft.
•Biometric matching performance comparing filtered and
unfiltered data is in progress.
5m - train
5m - test
34m - train
34m - test
noise
noise
29m
mismatch
0 500 1000 1500 2000 2500 3000 3500 4000
0
50
100
150
200
250
300
350
400
450
500
frequency (Hz)
Amplitude
34ft
34ft
5ft
5ft
threshold
𝐹𝑅𝑅 =
20
180 + 20
= 0.1
Genuine
Genuine
Imposter
Imposter
Measuring Performance
MATLAB Procedure
•Write MATLAB scripts to analyze our data in waveforms.
•Plot signal as amps vs. frequency.
•Analyze trend in 5ft data and pick points that best fit the trend. Those point s are used
as coordinates to filter the 34 ft signals.
•We run creatfilter .m and applyFiltter.m to the data collected last summer so we can
test our filter’s performance on a larger database.
•We retrain the UBM (universal background model) for last years’ data to obtain better
supervectors.
References
1. National Science Technology Council Report (2011).
2. M. Fowler, M McCurry, J. Barmsen, K Dunsin, J Remus. ICASSP 2011
Conference Proceedings.
3. F. Bimbot et al., EURASIP Journal on Applied Signal Processing, 2004:4,
430-451.
Fig 1. Multiiple modes of biometric signature.
Fig 2. Meausre directionality of
Microphone channel 17. Array was
positioned to maximiize speaker
signal while minimizing interfering
noise.
Good OK
Poor
Fig 3. A Mismatch in distance between training (enrollment) data and testing
(probe) data results in significantly degraded performace.
Fig 4. Fowler et al [2] showed that distance mismatch degrades performance.
Fig 5. Experiments were conducted in CAMP194 for
a controlled acoustic environment. The
microphone array (constructed by Mark McCurry
and Prof JJ Remus allowed for recording on 18
audio channels. Expermiments recorded speech at
5tf,8ft,13ft,21ft,34ft. Numbers chosen by Fibonacci
sequence allows for multiple measurements at the
same distance mismatch.
Fig 6. Fourier Transform of signals measured at 5ft
and 34ft, with two spectrum for each distance.
Note that the spectrums at 34ft are consistent, and
the spectrums at 5ft are consistent, but the two
distances are disparate.
Fig 7. Relative strength of attenuation 𝐴𝐹(𝑓)
comparing signal at 34 ft. to signal at 5 ft.
Attenuation ratio varies with frequencies. The red
line indicates the “filter” that was fitted to the
spectral data comparison.
Attenuation Filter equation based on Fourier amplitudes at 34ft and 5 ft.
𝐴𝐹(𝑓) = log 𝑘
𝐴34 𝑓
𝐴5 𝑓
Fig 8. How do we assess performance of a classifier.
Fig 10. Preliminary results comparing matching performace
using unfiltered data and using our developed attenuation
filter. Smaller errors indicate improved performance.
Fig 9. (Extracted from [3]. General structure of speaker identification. The blue circle
indicates where we preprocess using our filter.

More Related Content

Viewers also liked

HD-SDI 문자발생기, HD-SDI Charactor Generator
HD-SDI 문자발생기, HD-SDI Charactor GeneratorHD-SDI 문자발생기, HD-SDI Charactor Generator
HD-SDI 문자발생기, HD-SDI Charactor GeneratorDeok kyu Ahn
 
Grafico diario del s&p 500 para el 30 04 2012
Grafico diario del s&p 500 para el 30 04 2012Grafico diario del s&p 500 para el 30 04 2012
Grafico diario del s&p 500 para el 30 04 2012Experiencia Trading
 
Some Essential Skills You Need To Work - MAGNIFICO INC
Some Essential Skills You Need To Work - MAGNIFICO INCSome Essential Skills You Need To Work - MAGNIFICO INC
Some Essential Skills You Need To Work - MAGNIFICO INCMAGNIFICO INC
 
Resume_Kamlesh Patel 1.8.16
Resume_Kamlesh Patel 1.8.16Resume_Kamlesh Patel 1.8.16
Resume_Kamlesh Patel 1.8.16kamlesh Patel
 
Phân tích tình hình tài chính tại công ty cổ phần thương mại kcs việt nam
Phân tích tình hình tài chính tại công ty cổ phần thương mại kcs việt namPhân tích tình hình tài chính tại công ty cổ phần thương mại kcs việt nam
Phân tích tình hình tài chính tại công ty cổ phần thương mại kcs việt namhttps://www.facebook.com/garmentspace
 

Viewers also liked (6)

HD-SDI 문자발생기, HD-SDI Charactor Generator
HD-SDI 문자발생기, HD-SDI Charactor GeneratorHD-SDI 문자발생기, HD-SDI Charactor Generator
HD-SDI 문자발생기, HD-SDI Charactor Generator
 
Ontological & axiological density
Ontological & axiological densityOntological & axiological density
Ontological & axiological density
 
Grafico diario del s&p 500 para el 30 04 2012
Grafico diario del s&p 500 para el 30 04 2012Grafico diario del s&p 500 para el 30 04 2012
Grafico diario del s&p 500 para el 30 04 2012
 
Some Essential Skills You Need To Work - MAGNIFICO INC
Some Essential Skills You Need To Work - MAGNIFICO INCSome Essential Skills You Need To Work - MAGNIFICO INC
Some Essential Skills You Need To Work - MAGNIFICO INC
 
Resume_Kamlesh Patel 1.8.16
Resume_Kamlesh Patel 1.8.16Resume_Kamlesh Patel 1.8.16
Resume_Kamlesh Patel 1.8.16
 
Phân tích tình hình tài chính tại công ty cổ phần thương mại kcs việt nam
Phân tích tình hình tài chính tại công ty cổ phần thương mại kcs việt namPhân tích tình hình tài chính tại công ty cổ phần thương mại kcs việt nam
Phân tích tình hình tài chính tại công ty cổ phần thương mại kcs việt nam
 

SURE research poster-1-1 Cave-Urbano

  • 1. Speaker Recognition at a Distance Kerly Urbano1,4 Kevin P Cave1,3, Joey Skufca1,5 , Mike Fowler1,2, Joseph D Skufca1,2 , Stephanie Schuckers1,3 1CITER, 2Department of Mathematics, 3Department of Electrical Engineering, 4Department of Mechanical Engineering - Clarkson University 5Department of Computer Science - Stonybrook University Acknowledgements: We thank Kevin Chapman and Jonathon Bramsen for help during spring semester; Prof JJ Remus, for starting this project and developing the hardware we used. We thank CUPO for funding support for this work. Iris Cardio-Respiratory1 2 3 4 5 6 7 8 0 5 10 x 10 -4 Time, s Amplitude,V Subject #1395 Baseline ECG 1 2 3 4 5 6 7 8 -5 0 5 10 x 10 -4 Time, s Amplitude,V Subject #1395 Arrythmia ECG We consider the impact of speaker-to-microphone distance challenges on Speaker Identification, in particular focusing on the mismatch in distance between the condition when the speaker was enrolled into the system and the conditions the system tests at. Previous research done by others indicates that distance mismatch can significantly diminish system performance. We will be creating a filter that will make a recording from a distance of five feet sound as if it were recorded at a distance at thirty-four feet, and use this filtered data to improve the identification process when comparing at thirty-four feet. If successful, we can create more filters each tailored for a different span of distances to create a general model for filters that can improve identification. Biometrics: The identification of human being based on their unique characteristics and traits. Motivation for research (National Biometrics Challenge [1]) • Robust biometrics at distance • Multi-modal • Non-cooperative subjects • Speaker Identification impacted by “… performance issues associated with the lack of comparable recording environments between the enrollment and test sample.” Speaker Recognition is the when you identify a person based on their voice signal. This research is essential for improving current identification systems that use voice as their identifier. Voice has characteristics that are unique to solely one person in most cases making voice a good marker for identification systems. If we have 5ft signal enrolled in the database and want to test signal from a probe at 34ft, we can create a filter that mimics 34ft data to improve performance of the identification system. • Initial experiments determined optimal placement of microphone array with respect to speaker (maximum response) and primary noise source (minimum response). (See Fig 2.) • Controlled speaker experiments collected in Room CAMP194 at multiple distances. • Fourier analysis used to asses attenuation vs. distance in that environment. • Multiple “trial” filters were developed for preprocessing of audio to simulate the effect of recording at 34 feet based on an initial data collection at 5 ft. •Biometric matching performance comparing filtered and unfiltered data is in progress. 5m - train 5m - test 34m - train 34m - test noise noise 29m mismatch 0 500 1000 1500 2000 2500 3000 3500 4000 0 50 100 150 200 250 300 350 400 450 500 frequency (Hz) Amplitude 34ft 34ft 5ft 5ft threshold 𝐹𝑅𝑅 = 20 180 + 20 = 0.1 Genuine Genuine Imposter Imposter Measuring Performance MATLAB Procedure •Write MATLAB scripts to analyze our data in waveforms. •Plot signal as amps vs. frequency. •Analyze trend in 5ft data and pick points that best fit the trend. Those point s are used as coordinates to filter the 34 ft signals. •We run creatfilter .m and applyFiltter.m to the data collected last summer so we can test our filter’s performance on a larger database. •We retrain the UBM (universal background model) for last years’ data to obtain better supervectors. References 1. National Science Technology Council Report (2011). 2. M. Fowler, M McCurry, J. Barmsen, K Dunsin, J Remus. ICASSP 2011 Conference Proceedings. 3. F. Bimbot et al., EURASIP Journal on Applied Signal Processing, 2004:4, 430-451. Fig 1. Multiiple modes of biometric signature. Fig 2. Meausre directionality of Microphone channel 17. Array was positioned to maximiize speaker signal while minimizing interfering noise. Good OK Poor Fig 3. A Mismatch in distance between training (enrollment) data and testing (probe) data results in significantly degraded performace. Fig 4. Fowler et al [2] showed that distance mismatch degrades performance. Fig 5. Experiments were conducted in CAMP194 for a controlled acoustic environment. The microphone array (constructed by Mark McCurry and Prof JJ Remus allowed for recording on 18 audio channels. Expermiments recorded speech at 5tf,8ft,13ft,21ft,34ft. Numbers chosen by Fibonacci sequence allows for multiple measurements at the same distance mismatch. Fig 6. Fourier Transform of signals measured at 5ft and 34ft, with two spectrum for each distance. Note that the spectrums at 34ft are consistent, and the spectrums at 5ft are consistent, but the two distances are disparate. Fig 7. Relative strength of attenuation 𝐴𝐹(𝑓) comparing signal at 34 ft. to signal at 5 ft. Attenuation ratio varies with frequencies. The red line indicates the “filter” that was fitted to the spectral data comparison. Attenuation Filter equation based on Fourier amplitudes at 34ft and 5 ft. 𝐴𝐹(𝑓) = log 𝑘 𝐴34 𝑓 𝐴5 𝑓 Fig 8. How do we assess performance of a classifier. Fig 10. Preliminary results comparing matching performace using unfiltered data and using our developed attenuation filter. Smaller errors indicate improved performance. Fig 9. (Extracted from [3]. General structure of speaker identification. The blue circle indicates where we preprocess using our filter.