SURE research poster-1-1 Cave-Urbano

Speaker Recognition at a Distance
Kerly Urbano1,4 Kevin P Cave1,3, Joey Skufca1,5 , Mike Fowler1,2, Joseph D Skufca1,2 , Stephanie Schuckers1,3
1CITER, 2Department of Mathematics, 3Department of Electrical Engineering, 4Department of Mechanical Engineering - Clarkson University
5Department of Computer Science - Stonybrook University
Acknowledgements: We thank Kevin
Chapman and Jonathon Bramsen for help during
spring semester; Prof JJ Remus, for starting this
project and developing the hardware we used.
We thank CUPO for funding support for this work.
Iris Cardio-Respiratory1 2 3 4 5 6 7 8
0
5
10
x 10
-4
Time, s
Amplitude,V
Subject #1395 Baseline ECG
1 2 3 4 5 6 7 8
-5
0
5
10
x 10
-4
Time, s
Amplitude,V
Subject #1395 Arrythmia ECG
We consider the impact of speaker-to-microphone
distance challenges on Speaker Identification, in particular
focusing on the mismatch in distance between the condition
when the speaker was enrolled into the system and the
conditions the system tests at. Previous research done by
others indicates that distance mismatch can significantly
diminish system performance. We will be creating a filter
that will make a recording from a distance of five feet
sound as if it were recorded at a distance at thirty-four
feet, and use this filtered data to improve the
identification process when comparing at thirty-four
feet. If successful, we can create more filters each tailored for a
different span of distances to create a general model for filters
that can improve identification.
Biometrics: The identification of human being based on their unique characteristics
and traits.
Motivation for research (National Biometrics Challenge [1])
• Robust biometrics at distance
• Multi-modal
• Non-cooperative subjects
• Speaker Identification impacted by “… performance issues associated with
the lack of comparable recording environments between the enrollment and
test sample.”
Speaker Recognition is the when you identify a person based on their voice signal.
This research is essential for improving current identification systems that use voice
as their identifier. Voice has characteristics that are unique to solely one person in
most cases making voice a good marker for identification systems.
If we have 5ft signal enrolled in the database and
want to test signal from a probe at 34ft, we can create a
filter that mimics 34ft data to improve performance of the
identification system.
• Initial experiments determined optimal placement of
microphone array with respect to speaker (maximum
response) and primary noise source (minimum response).
(See Fig 2.)
• Controlled speaker experiments collected in Room
CAMP194 at multiple distances.
• Fourier analysis used to asses attenuation vs. distance in
that environment.
• Multiple “trial” filters were developed for preprocessing
of audio to simulate the effect of recording at 34 feet
based on an initial data collection at 5 ft.
•Biometric matching performance comparing filtered and
unfiltered data is in progress.
5m - train
5m - test
34m - train
34m - test
noise
noise
29m
mismatch
0 500 1000 1500 2000 2500 3000 3500 4000
0
50
100
150
200
250
300
350
400
450
500
frequency (Hz)
Amplitude
34ft
34ft
5ft
5ft
threshold
𝐹𝑅𝑅 =
20
180 + 20
= 0.1
Genuine
Genuine
Imposter
Imposter
Measuring Performance
MATLAB Procedure
•Write MATLAB scripts to analyze our data in waveforms.
•Plot signal as amps vs. frequency.
•Analyze trend in 5ft data and pick points that best fit the trend. Those point s are used
as coordinates to filter the 34 ft signals.
•We run creatfilter .m and applyFiltter.m to the data collected last summer so we can
test our filter’s performance on a larger database.
•We retrain the UBM (universal background model) for last years’ data to obtain better
supervectors.
References
1. National Science Technology Council Report (2011).
2. M. Fowler, M McCurry, J. Barmsen, K Dunsin, J Remus. ICASSP 2011
Conference Proceedings.
3. F. Bimbot et al., EURASIP Journal on Applied Signal Processing, 2004:4,
430-451.
Fig 1. Multiiple modes of biometric signature.
Fig 2. Meausre directionality of
Microphone channel 17. Array was
positioned to maximiize speaker
signal while minimizing interfering
noise.
Good OK
Poor
Fig 3. A Mismatch in distance between training (enrollment) data and testing
(probe) data results in significantly degraded performace.
Fig 4. Fowler et al [2] showed that distance mismatch degrades performance.
Fig 5. Experiments were conducted in CAMP194 for
a controlled acoustic environment. The
microphone array (constructed by Mark McCurry
and Prof JJ Remus allowed for recording on 18
audio channels. Expermiments recorded speech at
5tf,8ft,13ft,21ft,34ft. Numbers chosen by Fibonacci
sequence allows for multiple measurements at the
same distance mismatch.
Fig 6. Fourier Transform of signals measured at 5ft
and 34ft, with two spectrum for each distance.
Note that the spectrums at 34ft are consistent, and
the spectrums at 5ft are consistent, but the two
distances are disparate.
Fig 7. Relative strength of attenuation 𝐴𝐹(𝑓)
comparing signal at 34 ft. to signal at 5 ft.
Attenuation ratio varies with frequencies. The red
line indicates the “filter” that was fitted to the
spectral data comparison.
Attenuation Filter equation based on Fourier amplitudes at 34ft and 5 ft.
𝐴𝐹(𝑓) = log 𝑘
𝐴34 𝑓
𝐴5 𝑓
Fig 8. How do we assess performance of a classifier.
Fig 10. Preliminary results comparing matching performace
using unfiltered data and using our developed attenuation
filter. Smaller errors indicate improved performance.
Fig 9. (Extracted from [3]. General structure of speaker identification. The blue circle
indicates where we preprocess using our filter.

SURE research poster-1-1 Cave-Urbano

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

SURE research poster-1-1 Cave-Urbano