For UX practitioners looking for introduction to multimodal interface design tips and best practices. Given at Dallas BigDesign Conference in September 2018.
3. multimodal interface design is evolving at such a rapid pace,
it was almost impossible to keep this presentation current.
#multimodalinterfaces #bigd18
4. it was also almost impossible to find stock images referencing
future interface designs that did not look like this…
5.
6. what is a multimodal interface?
Multimodal interfaces allow users to seamlessly
integrate two more of their senses when
interacting with a system, so they can engage
with that system in much the same way they
engage with the physical world.
7. #bigd18#multimodal
some multimodal interfaces are already common…
• You hear the ringtone (hearing)
• You feel the vibration (touch)
• You see the name and image of the caller (sight)
You receive a phone call on your smartphone:
8. #bigd18#multimodal
most multimodal interfaces involve these
human senses and modalities
8
Sense Organ: Eyes
Modality: Visual
sight
Sense Organ: Ears
Modality: Auditory
hearing touch
Sense Organ: Skin
Modality: Haptic/Tactile
While taste, smell, and balance are probably on the horizon, we’ll save those for braver souls.
9. #bigd18#multimodal
equation from “designing for senses”
These are capabilities of
the device itself: Input
sensors such as GPS
positioning or facial
recognition; and output
methods such as text
notifications, voice alerts,
etc.
available
device modes+
These are human
capabilities such as
sight, hearing, touch,
and movement.
sensory
perceptions interfaces
Mapping and combining senses
and device modes effectively
will lead to cohesive and
valuable interfaces.
BUT
if done badly, the interface will
be disjointed and unusable.
=
10. instead, we will focus on why you need to start thinking about
multimodal interfaces, and look at processes you can begin using today
to effectively design your company’s next generation
of products and services.
in this session, I will assume that you are already familiar
with the basics of human-computer interactions.
14. #bigd18#multimodal
ordering a
tasty beverage
using a GUI
By looking at this screen you can:
• See categories of beverages
• See beverage options within each of
the categories
• See photos to help you identify
beverages quickly
• See the prices
• See which beverages you’ve already
ordered
• See the total beverages in your cart
15. #bigd18#multimodal
• Discoverable
• You can explore all the options
available to you
• Learnable
• Click in the left nav, the options
on the right side change
• Visual Cues
• You can quickly differentiate
items by the photo
On one hand… But on the other…
• Its SLOW….
• To place an order and check out, you may
have to visit 5+ screens
• Requires personal info to checkout
• Email, phone, address, age, your high
school mascot, your first dog’s name…
• Did we mention S……L……O……W…?
• In person, I’d simply say, “I’d like to order
two Session IPAs and put it on my tab.”
16. “I know! Let’s make this a voice app!
It will be so much faster!”
17. #bigd18#multimodal
“Hey Google,
I’d like to order a beer.”
Okay Marti. We have…
Weekend Warrior Pale Ale…
Big and Bright IPA…
Hoppalight IPA…
Music Session IPA…
Dallas Blonde Lager…
19. #bigd18#multimodal
• VUI is faster…IF….
• …you know what you want and
know precisely what to say.
• Hands and Eyes are not
required
• Great in situations where your
hands and eyes must be doing
something else
On one hand… But on the other…
• It’s terrible at presenting choices
• Any more than 3 options?
Users WILL forget.
• Gives user no time to study their
options
• Voice interfaces expect immediate
responses
• Has highly limited navigation options
• Voice interfaces are linear.
20. #multimodal
but my task has lots of options!
20
• Have 5+ options to present? We’ve already seen there will be no recall for early items, as
the user will be focused on items presented later in the list.
• A common workaround is to put options in groups of 3-4, ending with “would you like to hear
more?” after each grouping
• This will lower the information density, but does not solve the problem of “successive attention
peaks” which will impair usability (more later)
• The user will no doubt ask to go back, repeat previous options, etc.
• In short, not a great workaround at all
• Better solution: Consider a multimodal interface that combines voice and a screen
23. #multimodal
VUI problem #1: cognitive load
23
• The immediate and transient nature of voice interfaces requires the user to be fully alert
when the system responds
• They cannot control the speed of the information flow
• They cannot re-read to gain a better understanding
• They cannot scan multiple choices
• They cannot click away
• They cannot ignore the voice prompt without risking the cancellation of the entire interaction
• Therefore, they pay close attention
• This cognitive load requires all VUI responses be kept short, and limited in succession
• “Peaks of Attention”
24. #multimodal
peaks of attention: VUIs vs GUIs
BY DANIEL WESTERLUND
How to Go from Screens to Voice without Overwhelming the User
25. #multimodal
conversation guidelines: grice’s maxims
• The maxim of QUANTITY
Give as much information as needed, but NO MORE.
• The maxim of QUALITY
Be truthful. Information must be supported by evidence.
• The maxim of RELATION
Be relevant, saying only things that are pertinent to the discussion.
• The maxim of MANNER
Be clear, brief, and orderly to avoid obscurity and ambiguity.
26. #bigd18#multimodal
grice’s maxim example: “what time is it?”
26
“It’s 10:17 AM”“It’s morning”“It’s 10:17 and 46 seconds
AM, Central Daylight Savings
Time, on September 21,
2018”
27. By being aware of the user’s Peaks of Attention
and Grice’s Maxims, we can take the first steps toward
designing truly effective multimodal and voice experiences.
28. #multimodal
VUI problem #2: request and response interruptions
• Voice interfaces respond to requests, but interruptions cause problems
• You cannot count on user input being an isolated utterance. Requests often come
embedded within other conversations
• These complex, multi-activity settings are the NORM for most people using your app
• Real conversations are NOT scripted exchanges based on decision trees and flow charts
• In April 2018, Alexa was still experiencing a 50% failure rate
• Wake word not heard, “I didn’t understand the question”, etc.
• You must always be asking, “How can my app work through interruptions?”
• A MM interface gives you the opportunity to provide quick visual cues to help the user
recover and achieve their task.
29. #multimodal
VUI problem #3: learnability
• Sadly, most users have fairly unrealistic expectations about how to communicate using
only voice commands
• Remember the 50% fail rate?
• Users must be told what they can ask, as well as reminded what they did ask.
• You can ask: “Hello Marti. You can ask for current weather or a weekly forecast.”
• You did ask: “Todays weather is 82 and sunny” vs. “82 and sunny”
• Users must still be taught how to express intentions fully, which does not come naturally
• “Ask Astrology Daily for horoscope for Leo” vs “Read my horoscope.”
• Consequence: Only 3% of Alexa skills still have users after 2 weeks. (February 2018)
30. #multimodal
newbie vs experienced users? unexpected findings
“We were unable to prove the hypothesis
that people would be less satisfied with
the skill with a detailed tutorial after
using it for an extended period of time.
Quite the opposite was observed as the
participant group with the tutorial version
was significantly more satisfied over the
course of the entire experiment.
Christian Bopp
Facit User Research. January 2018
Two-week Alexa Diary study
31. A real challenge you are going to face will be
reconciliation of these important principles
with your org’s “brand voice”
34. marti’s top ten multimodal design
tips and takeaways
(as of september 2018)
35. #multimodal
#10: remember “what is good at what”
VOICE
Efficient Input
• Let’s the user give commands quickly.
• Permits multi-tasking because its hands-free
• Allows users to bypass multiple navigation levels
for familiar tasks and known commands.
SCREEN
Efficient Output
• Permits the display of large amounts of information
at one time to reduce memory burden
• Visual hints and affordances suggest new options
and commands
• Can show system status
36. #bigd18#multimodal
#9: think “voice first”
“Voice-first represents a new approach to the problem of integrating voice
commands into an existing graphical user interface.
First, the GUI is completely eliminated (as exemplified by the original
voice-only Echo); then a screen is re-introduced … as part of a holistic
system.”
Kathryn Whitenton
Nielsen Norman Group
37. #bigd18#multimodal
#8:
practice being “uni-modal”
• Learn to isolate your thoughts (become
“temporarily uni-modal”) and pay attention
to things you are actually “programmed”
to ignore
• You are wearing shoes, but you don’t feel
them unless you focus on them.
• All humans focus on important sensory
and cognitive functions and filter out the
irrelevant.
• Being able to identify discrete inputs and
outputs will improve your awareness as
you design.
38. #bigd18#multimodal
#7:
don’t forget about accessibility
• Even though you are designing a
multimodal application, the user should
be able to complete the entire task
using either voice or vision.
• As you work, be sure to include
accessibility acceptance criteria into
your User Stories.
40. #bigd18#multimodal
#6:
when scoping, put in 3x
your normal time for
testing
"80% of the effort that goes into building
these skills is probably going into testing
and refining the user experience, and the
things that users can say and how they can
say them and the different ways they can
say them.”
Tingiris,
founder and managing director of Dabblelab
41. #bigd18#multimodal
#6B:
contextual and field
testing is important
Voice and multimodal interactions are
difficult for researchers to observe.
• They are heavily dependent on context
and current activities
• They often happen in private spaces
• They only last a few seconds
Try to find some way to include
contextual testing as you build your app.
42. #bigd18#multimodal
#5:
find a linguistics major
and hire them
• Content strategists, copywriters, and brand
marketing will assume SME status on voice
projects. But…
• Words are not Speech
• Personality and tone
• Translations and Localization
• Remember: the perceived “attitude” of the voice will
make or break the app
• Someone with formal training in linguistics may
prove invaluable. (“I need short, but not rude, AND on brand.”)
43. #bigd18#multimodal
#4:
remember that
“voice tone” is critical
• A personable tone helps users forgive
those moments when your product is
unable to complete a task or answer a
question that an actual human would
have no problem with.
• More importantly “tone” will be
perceived as “attitude” by users.
Therefore, spend time testing different
voices.
44. #bigd18#multimodal
#3:
put new emphasis on
journey maps
• Remember to think “temporarily
uni-modal” and expand your user
journey maps
• what do they see? what should they hear?
what are they doing concurrently? what
can they touch or reach?
• You may also want to create detailed
storyboards and task flow charts to be
sure you’ve addressed each step.
45. #bigd18#multimodal
#2:
prepare your org for an
update right after launch
• Designing multimodal interfaces is the Wild
Wild West
• There are lots of “how to dos” but not
much on proven best practices
• No matter how carefully you test, there’s a
99.99% probability that users will need
something you didn’t include
• Most orgs do not actively plan to release a
feature update right after launch, so prepare
them.
50. #bigd18#multimodal
“wizard of oz”
prototyping options
• Presents the test subject with wireframes or other “silent”
GUI (usually a tablet)
• Human moderator supplies the “system voice”
• The moderator must click the GUI prototype to trigger the
user’s voice commands.
• This works for very early “sanity check” prototypes
(market feasibility, base feature set, etc.)
• Unreliable for more refined results
51. #bigd18#multimodal
prototypes with
pre-recorded voice
• Created with a more advanced tool that allows you to add
sound to triggering events
• proto.io, Framer, Axure, Webflow, etc.
• Still requires either the user or the moderator to click the
GUI prototype to trigger voice interactions.
• Better simulation than Wizard of Oz, but should still be
reserved for concept prototyping as results may be
unreliable.
52. #bigd18#multimodal
prototypes with
voice only
• Allows you to prototype voice interfaces without
writing code
• Storyline (getstoryline.com), Sayspring
• Best current option for voice testing your app’s
vocabulary, alternate paths, tone and context, etc.
• BUT no way to add images
• Use to refine and test core voice interactions, but
without images your tests will be incomplete
53. #bigd18#multimodal
amazon echo show
More robust prototyping option
• true multimodal interaction with voice and touch
• may provide the most valid test results
BUT
• you must code a real script (real Amazon Developer
account, real Python or JSON, etc.)
• Note: if you have some coding experience, this is quite
doable
• Visuals cannot be completely customized
54. #bigd18#multimodal
real code
prototypes
• Sadly, this is the only current option if you need
accurate simulations and test results
• Unless you are a real unicorn, this requires developers to
plug in real APIs to your simulated front-end to trigger voice
and display images.
• The upside is that if you plan properly and work
closely with your dev team, a great deal of the code
used in these prototypes can be used for production
development.
55. #bigd18#multimodal
a promising newcomer
Flutter (flutter.io)
• Fast development of mobile app prototypes with NO
throwaway code (designer and developers work in
the same application)
• Completely flexible UI design
• Provides access to all native features, APIs, and
existing code base
• Developed by Google and released as open source
57. #multimodal
challenge the obvious:
do you really NEED a multimodal application?
57
My most retweeted comment: “We work in a fashion industry”
Don’t just jump on the voice/multimodal bandwagon because it’s hot and
business owners are clamoring to include it in your products
• Often, a GUI/VUI only will permits users to happily solve their tasks in a far less
complicated way.
Carefully re-examine your journey maps and test results, and continually
push for the most simple and elegant solution
• “Are we trying to kill an ant with a hydrogen bomb?”
• “Are we doing this just because [competitor] did it?”
58. #multimodal
but if you truly need multimodal, don’t skip steps
58
That doesn’t mean work slowly. It means prototype and test like a demon
• Iterate rapidly, testing just one thing at a time if you have to.
• Document everything so you can refer back to it (more below).
• Build the most accurate prototypes that you can. The validity of your test results will repay
the extra effort.
Keep your journey maps and other planning documentation up to date
• Multimodal exponentially increases the app’s complexity. You will need to crosscheck
against your planning deliverables and documentation regularly.
Whatever you learn, please share with the community
something near and dear to the hearts of this particular group
ordering a beer
you can see… see… see….
be aware of the Peaks of Attention that they are creating, and the cognitive load associated with it.
With this concept of attention peaks in mind, designers should consider carefully both the duration and information density of responses.
From Wikipedia
In social science generally and linguistics specifically, the Cooperative Principle describes how effective communication in conversation is achieved in common social situations, that is, how listeners and speakers must act cooperatively and mutually accept one another to be understood in a particular way. As phrased by Paul Grice, who introduced it, "Make your contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged."[1] Though phrased as a prescriptive command, the principle is intended as a description of how people normally behave in conversation. Jeffries and McIntyre describe Grice's maxims as "encapsulating the assumptions that we prototypically hold when we engage in conversation”.[2]
be aware of the Peaks of Attention that they are creating, and the cognitive load associated with it.
With this concept of attention peaks in mind, designers should consider carefully both the duration and information density of responses.
This flies directly in the face of “cut scenes” in video games. But I wanted to include it. For multi-modal interfaces that combine VUI with GUI, I think having an “experienced user path” may be critical. IMO, more testing is required.
We touched on this on the previous slide, but this particular area is new to most of us. What makes a voice tone pleasant?