2. I’ve been designing for voice
and multimodal interfaces
since 2006.
AT AMAZON:
First designer on Echo Look
and Alexa Notifications
AT MICROSOFT:
Designer for voice and
multimodal interfaces on
Windows Automotive and
Cortana
WEBDAGENE 2017
COMPUTER, WHO IS CHERYL?
CHERYL PLATZ //
@MUPPETAPHRODITE
4. The accessibility benefits are vast, and not just
limited to those with permanent accessibility
challenges.
WEBDAGENE 2017
Voice user
interfaces leverage
this experience to
improve lives.
CHERYL PLATZ //
@MUPPETAPHRODITE
5. “
”
My wife passed away 4 years ago leaving
me, not only a widow, but a widowed
quadriplegic trying to survive on his own…
Alexa has been a blessing beyond my
imagination. She has given me an opportunity
that I never thought would be possible.
AMAZON ECHO REVIEW FROM MICHAEL DAVIS, FEB
2017
DESCRIBING ECHO’S AID IN HIS LIFE AS A
QUADRIPLEGIC
WEBDAGENE 2017
CHERYL PLATZ //
@MUPPETAPHRODITE
6. WEBDAGENE 2017
CHERYL PLATZ //
@MUPPETAPHRODITE
35.6 MILLION
AMERICANS USE
A VOICE-
ACTIVATED
ASSISTANT
DEVICE AT
LEAST ONCE A
MONTH.
SOURCE: eMarketer
7. WEBDAGENE 2017
VOICE UI IS NOW
MAINSTREAM, BUT IT’S FAR
FROM MATURE.
IN TODAY’S WEAKNESSES
LIE THREE KEY
OPPORTUNITIES FOR THE
FUTURE OF VOICE UI.
CHERYL PLATZ //
@MUPPETAPHRODITE
8. Limited training data and a an affluent user
base is excluding underrepresented groups
with inaccuracy.
WEBDAGENE 2017
Today’s voice
interfaces are
inherently biased.
CHERYL PLATZ //
@MUPPETAPHRODITE
OPPORTUNITY 1
9. “
”
“…looking at race, I found that
Caucasian speakers had by far the
lowest error rate. African-American
speakers and speakers with a mixed
racial background had higher error rates.
DR. RACHEL TATMAN, LINGUISTICS, UNIVERSITY OF
WASHINGTON
ON ACCURACY OF SIRI FOR VARIOUS DEMOGRAPHIC
GROUPS
KUOW, SEPTEMBER 19 2017
WEBDAGENE 2017
CHERYL PLATZ //
@MUPPETAPHRODITE
10. GENDER
Systems were initially
trained with internal
data collection – at
companies where
engineering teams
are still largely male.
ETHNICITY
Training data
expands to include
early adopters, often
affluent.
This may exclude
underrepresented
ethnicities due to
wage gaps.
ACCENT
The North American
focus of most of
today’s products
mean we have yet to
attain critical mass of
training data for
second-language
speakers.
WEBDAGENE 2017
DECONSTRUCTING VOICE UI BIAS
CHERYL PLATZ //
@MUPPETAPHRODITE
11. WEBDAGENE 2017
CHERYL PLATZ //
@MUPPETAPHRODITE
Biased
Training Data
Poor Accuracy
for Excluded
Groups
High Attrition
by Excluded
Groups
BIAS
SPIRAL
12. WE MUST FIND A WAY TO BREAK THE BIAS
SPIRAL,
AND MAKE THE FUTURE OF VOICE UI
13. We are wasting time re-implementing the same
basic tasks on multiple systems. Most systems
emphasize a single modality at a time.
WEBDAGENE 2017
Today’s voice
interfaces are
simple and siloed.
CHERYL PLATZ //
@MUPPETAPHRODITE
OPPORTUNITY 2
14. We currently have an ecosystem of voice
assistants chasing each others’ tails.
What could we accomplish if we relied on each
other’s expertise?
WEBDAGENE 2017
TIME LOST TO TIMERS
CHERYL PLATZ //
@MUPPETAPHRODITE
17. WEBDAGENE 2017
DO WE NEED ONE ASSISTANT TO
RULE THEM ALL?
CHERYL PLATZ //
@MUPPETAPHRODITE
18. “
”
Through its collaboration with
Microsoft, Amazon said, Alexa
users will get answers to some
of the same questions that
Cortana can now answer – for
instance, when is the next
budget review with the boss?NICK WINGFIELD, NEW YORK TIMES
AUGUST 30, 2017
ILLUSTRATION: MENGXIN LI
WEBDAGENE 2017
CHERYL PLATZ //
@MUPPETAPHRODITE
19. LET’S BUILD A CHOIR OF
HARMONIOUS VOICE
INTERFACES TOGETHER.
20. Alexa, Google Home and Cortana essentially
allow only command-and-control scenarios.
WEBDAGENE 2017
Today’s voice UIs
aren’t
conversational –
yet.
CHERYL PLATZ //
@MUPPETAPHRODITE
OPPORTUNITY 3
21. IT LOOKS LIKE YOU MIGHT BE
IN THE AWKWARD EARLY
STAGES OF
CONVERSATIONAL UI. CAN I
HELP?PLEASE
NO
RUN
AWAY
22. AUDIBLE CUES PHYSICAL CUES
WEBDAGENE 2017
Tone
Speed
Volume
Eye contact & gaze
Heart rate
Posture
Gesture
SPOKEN CONVERSATION IS MORE
THAN WORDS
CHERYL PLATZ //
@MUPPETAPHRODITE
25. WEBDAGENE 2017
WHAT BENEFIT CAN
HUMANS GAIN FROM
TRUSTING THESE
ASSISTANTS?
CHERYL PLATZ //
@MUPPETAPHRODITE
26. “
”
The other night, I found Gary playing his
own version of a memory game with
Alexa. He was trying to come up with
songs he remembered and hadn't heard
for awhile and would ask her to play
them.
AMAZON ECHO REVIEW FROM ALEX S.
DESCRIBING ECHO’S AID IN HUSBAND’S STRUGGLE WITH
PARKINSON’S
WEBDAGENE 2017
CHERYL PLATZ //
@MUPPETAPHRODITE
28. “
”
People have serious conversations with Siri.
People talk to Siri about all kinds of things,
including when they’re having a stressful day
or have something serious on their mind.
They turn to Siri in emergencies or when they
want guidance on living a healthier life.
APPLE JOB POSTING, SIRI SOFTWARE ENGINEER, HEALTH
AND WELLNESS
APRIL 4, 2017
WEBDAGENE 2017
CHERYL PLATZ //
@MUPPETAPHRODITE
30. How can we (ethically) model a relationship over
time?
What information is saved, and what is discarded?
What level of transparency and control is required?
Does the assistant’s personality adapt, or remain
fixed?
WEBDAGENE 2017
WHAT DOES A RELATIONSHIP
LOOK LIKE?
CHERYL PLATZ //
@MUPPETAPHRODITE
32. Inclusive and unbiased speech
recognition
Harmonious cross-product partnerships
Semantic web represents common
knowledge
Trust built over time with shared context
Conversation informed by non-verbal
cuesWEBDAGENE 2017
THE FUTURE OF VOICE
CHERYL PLATZ //
@MUPPETAPHRODITE
33. WEBDAGENE 2017
THESE ADVANCES WILL
COMBINE TO OPEN NEW
OPPORTUNITIES AND A
NEW ERA IN HUMAN
EMPOWERMENT.
CHERYL PLATZ //
@MUPPETAPHRODITE
38. WEBDAGENE 2017
LET’S BUILD A FUTURE
OF INTERFACES WHERE
OUR HUMANITY IS
AMPLIFIED,
NOT ATROPHIED.
CHERYL PLATZ //
@MUPPETAPHRODITE
39. May the voice be with you.
http://ideaplatz.com
WEBDAGENE 2017
CHERYL PLATZ
Owner, IDEAPLATZ -- Senior Designer, MICROSOFT
Twitter & Medium: @MuppetAphrodite
We must lead the call for a more representative speech user experience across all platforms.
This may take the form of new products, improvement of existing products, or open-source speech models.
A semantic web – or third party ontologies for specific subject matter, like healthcare or IT, could allow each voice assistant to understand a similar concepts and innovate on the response.
“What if You Had an intelligent agent for voice editing?” – Adobe - https://www.youtube.com/watch?v=e6TccXFBY5g
Powerful and yet simplistic. Could this semantic structure be exposed to multiple assistants?
(Is it a coincidence that circles factor so prominently in the branding of this generation of assistants?)
Love letters: a time-honored human tradition that builds shared context. Today’s voice systems don’t build shared context.
Today’s popular voice assistants only maintain conversational context for a matter of seconds.
Human relationships require a shared understanding built over time. Trust without memory is difficult.
Screen capture of Apple job posting for “health and wellness” domain
Screen capture from BMW/Alexa announcement:
https://www.theverge.com/2017/9/27/16372566/bmw-alexa-integration-2018
We are an aging population, and many of us need companionship at times when we are alone.
As our voice interfaces become sophisticated, when is it appropriate for our digital voice assistants to fill this gap?
Star Trek IV: The Voyage Home. Copyright Paramount Pictures. http://www.youtube.com/watch?v=LkqiDu1BQXY