Assessing Virtual Assistants for Italian Dysarthric Speech
1. Fabio Ballati, Fulvio Corno, Luigi De Russis
Politecnico di Torino, Italy
Assessing Virtual Assistant
Capabilities with Italian
Dysarthric Speech
ASSETS 2018 - October 22-24, 2018 - Galway
2. 2
Usage of smartphone-based virtual assistants is growing,
worldwide
Such assistants generally have a positive impact on device
accessibility
People with speech impairments like dysarthria may be
unable to use those virtual assistants with proficiency
Background and Motivation
3. 3
We focused on ALS-inducted dysarthria and the Italian language
Propose a methodology for the collection of dysarthric speech
samples to evaluate smartphone-based virtual assistants
Investigate which assistant provides the most coherent answer
when the recognized speech is at least partially correct
Investigate whether and how people with moderate dysarthria could
be understood by three virtual assistants
• Siri, Google Assistant, Cortana
Goal
4. 4
We played the collected speech samples to
assess (i) the accuracy in transcription and (ii) the
coherence of the answers
ASSESSMENT
To collect dysarthric speech samples, we designed
a specific methodology and we recorded the 34
sentences from 8 people with ALS
DATA COLLECTION
Selection of 34 suitable sentences for virtual
assistants
SENTENCES SELECTION
Work Phases
5. 5
Sample sentences
(translated in English)
Do I need to take an umbrella, today?
How many proteins are in two eggs?
Add onion and tomatoes to my shopping
list
Who is the president of the Italian
republic?
Set the home temperature to 22 degrees.
Set an alarm at 8am.
…
• Goal: to have a set of sentences
to record, suitable for
smartphone-based virtual
assistants
• We extracted 34 sentences from
the recommended questions for
virtual assistants
• We, then, slightly modified them
to include all the phonemes of
the Italian language
Sentence Selection
SENTENCE SELECTION
6. 6
Goal: to have a dataset of dysarthric speech samples that may allow us
to assess the behavior of virtual assistants
Participants
• 8 native Italian speakers with ALS-induced dysarthria (4M, 4F), aged 64-
83
• Three types of dysarthria and within two speech intelligibility
categories
• Flaccid, Spastic, or Unilateral Upper Motor Neuron (Duffy classification)
• "Intelligible with repeating" and "Detectable speech disturbance" (ALS Functional
Rating Scale)
Data Collection
DATA COLLECTION
7. 7
• Simple process, to be easily reproduced
• The participant read each of the 34 sentences from an A4 sheet of
paper (one sheet per sentence), located in front of the reader, while we
recorded them
• The recordings were taken with a smartphone located at distance of 30-
40 centimeters from the participant
Procedure
DATA COLLECTION
8. 8
Goal: To investigate the accuracy in transcription and the coherence of the
answers of the virtual assistants
• The assessment took place in a quiet room of our university
• The recorded speech sample were played on a laptop connected to an
external high-quality speaker
• Each of the 272 sentences was played for Siri, Google Assistant, and
Cortana, separately, on three different smartphones
• iPhone 7 (iOS 11.2), Samsung A5 (Android 8.1), and Lumia 910 (Windows 10 Mobile)
• The results of the operation (recognized request and related response)
were noted down
Assessment
ASSESSMENT
9. 9
Qualitative QC
Classification of each provided
transcription in:
• Correct
• Same semantic meaning
• Incomplete
• Wrong
• Not recognized
Quantitative QC
Word Error Rate (WER)
WER = (S + I + D) / N,
where S = substitution, I = insertion, D =
deletion, and N = number of words in the
original sentence
Given by the similarity between the original sentence
and the provided transcription
Measures: Question Comprehension (QC)
ASSESSMENT
10. 10
• An indicator of the appropriateness of the assistants' responses
• Computed for sentences that were correct or with the same semantic
meaning, only
• Given as the number and percentage of times that a virtual assistant
provided a certain type of answer:
• Coherent answers, i.e., correct or logically consistent responses
• Incoherent answers, i.e., logically incoherent responses
• Default answers, i.e., responses that an assistant provides by default when it is
not able to fully understand or extract any context
Measures: Consistency in Answers
ASSESSMENT
11. 11
• WER was highly dependent upon the
participant
• The average WER for Google Assistant
was lower than Cortana
• Siri performed the worst
• Looking at the results of individual
participants, the same trend appeared
Results: Quantitative QC
ASSESSMENT
14. 14
We plan to publicly release the collected dataset
Google Assistant was the best in recognizing dysarthric speech
and in providing suitable answers
• Each virtual assistant behave differently
• The accuracy of transcription is strictly related to the speaker
• Some participants can use Google Assistant without any problems
• Siri performed the worst for the accuracy of the transcriptions but
provided a good number of suitable answers, when it properly
understood the request
Key Takeaways