Dan Berlin, Jon Strohl, David Hawkins and I presented this at UXPA 2013. Eye tracking is well known and accepted in the UX community. Here we present preliminary evidence for the usefulness of adding electrodermal activity (EDA), continuous dial ratings, etc. to user experience research.
Beyond Eye Tracking: Using User Temperature, Rating Dials, and Facial Analysis to Understand the User Experience
1. Beyond Eye Tracking
Using user temperature, rating dials, and facial
analysis to understand the user experience
Jen Romano Bergstrom, Jon Strohl, David Hawkins
Dan Berlin
UXPA2013 | Washington, DC
@romanocog @forsmarshgroup @banderlin
3. 3
Client’s needs
• Traditionally…
– What works well
– What needs help
• Measure the UX
Observations
Selection/click behavior
Contextual observations
Time to complete taskReaction time
AccuracyAbility to complete tasks
4. 4
Task efficiency and accuracy
Accuracy
Steps to
Complete
Task*
Time to
Complete
Task*
Users 10% 8 170 seconds
Admins 21% 8.3 32 seconds
All
Participants
15% 8.2 101 seconds
5. Session observations
5
• Observational click behavior
• Facial expressions of frustration
• Fidgeting and other observations of emotion
Areas of the website that participants explored first.
7. Think aloud protocol
7
• Rooted in cognitive psychology and the study of thinking
• Makes explicit what is implicitly present to participants
• Concurrent vs. retrospective
“This
is
really
confusing!”
8. Satisfaction questionnaires & difficulty ratings
8
• Assess users subjective satisfaction
• Consistent questionnaire used across interfaces or
customized for its features and capabilities
• Structured vs. unstructured
Satisfaction Questionnaire
Please circle the numbers that most appropriately reflect your impressions about using this Web-based
instrument.
terrible wonderful
1. Overall reaction to the Web site:
1 2 3 4 5 6 7 8 9 not applicable
confusing clear
2. Screen layouts: 1 2 3 4 5 6 7 8 9 not applicable
inconsistent consistent
3. Use of terminology throughout the Web site: 1 2 3 4 5 6 7 8 9 not applicable
inadequate adequate
4. Information displayed on the screens: 1 2 3 4 5 6 7 8 9 not applicable
illogical logical
5. Arrangement of information on the screen: 1 2 3 4 5 6 7 8 9 not applicable
never always
6. Tasks can be performed in a straight-forward manner: 1 2 3 4 5 6 7 8 9 not applicable
confusing clear
7. Organization of information on the site: 1 2 3 4 5 6 7 8 9 not applicable
impossible easy
8. Forward navigation: 1 2 3 4 5 6 7 8 9 not applicable
9. 9
Client’s needs
• For this project…
– What grabs attention?
– What is engaging?
– What is a turn off?
– What about the videos?
– Good parts? Bad?
– Is green better than…?
11. 11
Client’s needs
• For this project…
– What grabs attention?
– What is engaging?
– What is a turn off?
– What about the videos?
– Good parts? Bad?
– Is green better than…?
Explicit
Post-task satisfaction questionnaires
Moderator follow up
In-session difficulty ratings
Verbal responses
Real-time +/- dial
Observations
Selection/click behavior
Contextual observations
Time to complete taskReaction time
AccuracyAbility to complete tasks
12. Implicit measures
12
• Physiological responses are difficult to control
• Implicit responses are unfiltered
• Responses occur before explicit measures
Definition: Underlying reactions (e.g., eye tracking, arousal) that people are unaware of, cannot control, or
cannot express at a granular level
Stimulus
Implicit
Responses
Thought
Processes
Explicit
Responses
13. Why don’t we measure the implicit?
13
• Very difficult, if even possible, to
communicate the subconscious.
• Responses occur in a very short time
interval.
• A lot of noise in the signal
• Unfamiliar lexicon used in the
literature.
• The technology is just beginning to
become usable by a wider audience.
• Analyses appear overwhelmingly time
consuming and complicated.
• It’s difficult to justify the ROI.
14. Why should we measure the implicit?
14
• Evaluates thought processes and emotions (not what the
participant tells you)
• Quantifiable data that goes beyond task performance
• Moment by moment interaction
• Cause and effect triggers
• Deeper insights
15. Why should we measure the implicit?
15
• Evaluates thought processes and emotions (not what the
participant tells you)
• Quantifiable data that goes beyond task performance
• Moment by moment interaction
• Cause and effect triggers
• Deeper insights
Traditional research is good at explaining what
people say and do, not what they think and feel.
16. 16
Observations
Selection/click behavior
Ethnography
Time to complete task
Reaction time
Accuracy
Ability to complete tasks
The Complete UX
Explicit
Post-task satisfaction questionnaires
Moderator follow up
In-session difficulty ratings
Verbal responses
Real-time +/- dial
Implicit
Eye tracking
Electrodermal activity (EDA)
Behavioral analysis
Pupil dilation
Facial expression coding
Implicit associations
Linguistic analysis of verbalizations
Heart rate variability
18. Neuroimaging metrics
18
• Indirectly or directly
measures activity in the
brain.
• Typically measures the
hemodynamic response
or brain electrical
activity.
• Examine what “people
are thinking”
23. What is eye tracking
23
• Observing and recording eye movements
as a participant interacts with a product
– Allows us to gain deeper insight into how users
perform tasks
• Allows UX researchers to collect objective
behavioral data
• Doesn’t include observing pupil dilation,
blink rate, or facial recognition
Yesterday
25. Qualitative heat maps
25
• Aggregate of fixation count or duration across participants
Example:
• Participants have similar fixation counts across links
• Displays uncertainty of where to click to get started
26. Qualitative gaze plots
26
• Plot of fixations for a single participant
Example:
• Participant fixates
back and forth
between two
different sections
• Displays
uncertainty on how
to use the sections
• The instructional
paragraph did not
facilitate web
reading
27. 27
Example:
• Participant has
repeated fixations in
the upper right hand
corner
• Participant said that
he/she was looking for
a search tool on the
page
• The search tool was
contained within a
disappearing banner
on the page
Qualitative gaze plots
28. Quantitative eye-tracking data
28
• Quantitative data
– Attention
• Time to first fixation
– Are users finding the important content quickly?
• Total number of fixations in an area of interest
• Percentages of fixations in an AOI compared to the total page
– Are users spending an inordinate amount of time looking at a
single area?
– Processing
• Fixation duration
– Are users spending a long period of time in this area?
– Efficiency
• Repeat fixations
– Is information clear and presented efficiently?
29. Quantitative eye tracking
29
• Break the page up into
separate “areas of interest” or
AOIs
• Compare the fixation data
between important areas and
less important ones
– Or compare data between
designs
Areas of Interest
30. Combining quantitative and qualitative data
30
• Using multiple sources of data makes the evidence more
compelling
• Example: “LAUNCH” was expected to be the most clicked
• Heat map supports the quantitative eye-tracking data
31. Beyond eye tracking
31
• Eye tracking is just one type of biometric measure
• It tells us where participants are looking
• It does not tell us
– Emotional state
– Level of arousal
– Level of mental workload
38. What is it?
38
• Electrodermal activity (EDA)
encompasses skin conductance
responses and body
temperature.
• Nerve fibers release sweat in
response to a stimulus.
• Sweat facilitates the travel of an
electrical signal.
• After a stimulus onset, glands
return to a baseline status.
• Sweat secretion is related to
sympathetic nervous system
activity.
39. Who cares?
39
• Skin conductance is an established measure of arousal
• Arousal can indicate engagement, fear, frustration, or other
emotional changes
• Continuously measure changes in arousal throughout a test
• Establish bench marks and use them to compare previous
iterations
• Determine if the design facilitated typical levels of arousal
or if there were specific triggers
40. EDA in UX research
40
• EDA can indicate usability problems
• Assess “good” and “bad” interfaces and compare biometrics (Ward
& Marsden, 2002)
• “Bad” interface causes higher skin conductivity, lower blood
volume, and increased pulse rate
• Assess frustration while playing a game (Lin and Hu, 2005)
41. 41
How do I do it?
• The electrodes on an EDA sensor measure the resistance electricity faces
when traveling across the skin.
• Electrodes can be placed on three locations
– Best option - Palm
– Good option - Finger
– Acceptable option – Wrist
• Wired and wireless available
EDA recording device & analysis software
44. Dial Rating
44
FMG Rating Dial
• Continuous real-time feedback on videos and
commercials
• Researcher can choose anchors for the ratings
• Tear dropped knob allows participant to remain
focused on the video
• Time sensitive
Position of dial
Max position of dial
Min position of dial
Dial Recorder Software
47. 47
• Tonic and phasic activity
– Tonic activity is slow, state-based level of arousal
– Phasic activity is a rapid, stimulus based change in arousal
• EDA activity is long periods of gradual change with a series of
peaks in activity.
2.6
2.8
3.0
0 4 8 11 15 19 23 26 30
µS
Seconds
Processing the EDA signal
48. 48
• The phasic response begins 1-4 seconds after onset of stimulus
• The signal is analyzed in discrete time intervals
• The area under the curve is analyzed to determine changes
2.6
2.8
3.0
0 4 8 11 15 19 23 26 30
µS
Seconds
Response onset Returning to baseline Response onset Peak is delayed
Analyzing EDA data
50. 50
P
I found my mind
wandering while the
advertisement was on
While the
advertisement was
on, I found myself
thinking about other
things
I had a hard time
keeping my mind
on the
advertisement
Average
P1
1 1 1
1.0
P2
1 2 1 1.3
P3
1 1 1
1.0
P4
3 3 3
3.0
P5
2 2 2
2.0
P6
2 2 2
2.0
Explicit rating of attention: Please indicate how much you agree with the following statements
Response options: 1 (Not at all) | 2 | 3 | 4 | 5 | 6 | 7 (Extremely)
51. 51
Explicit rating of emotion: Please indicate how much you experienced each of the following
while viewing the advertisement
P
Amused,
fun-loving,
silly
angry,
irritated,
or
annoyed
disgust,
distaste, or
revulsion
guilty,
repentant, or
blameworthy
inspired,
uplifted, or
elevated
interested,
alert, or
curious
joyful,
glad, or
happy
sad,
downheart
ed, or
unhappy
scared,
fearful,
or afraid
sympathy,
concern, or
compassion
surprised,
amazed, or
astonished
P1
2 1 1 1 1 3 2 1 1 1 1
P2
2 3 1 1 1 1 1 1 1 1 1
P3
4 1 1 1 2 3 3 1 1 1 2
P4
1 2 1 1 1 1 1 1 1 1 1
P5
4 1 1 1 3 4 4 1 1 1 1
P6
5 1 1 1 3 4 4 1 1 1 2
Response options: 1 (Not at all) | 2 | 3 | 4 | 5 | 6 | 7 (Extremely)
52. 52
• When?
– When did minds start to wander?
– When were people engaged?
• What?
– What did people focus on?
– What did people miss?
– What caused the negative/positive emotions?
• Was it something specific or overall?
Unanswered Questions
54. 54
Traditional Likert-Scale Overall Rating
New Continuous Dial Rating
Visa Video Ad Example
Question: Please indicate how much you experienced each of the following while viewing the advertisement.
Response options: Not At All | A little bit| Moderately | Quite a bit | Extremely
P
amused, fun-
loving, or silly
angry,
irritated, or
annoyed
disgust,
distaste, or
revulsion
guilty,
repentant, or
blameworthy
inspired,
uplifted, or
elevated
interested,
alert, or
curious
joyful, glad, or
happy
sad,
downhearted,
or unhappy
scared,
fearful, or
afraid
sympathy,
concern, or
compassion
surprised,
amazed, or
astonished
P1 2 1 1 1 1 3 2 1 1 1 1
P2 2 3 1 1 1 1 1 1 1 1 1
P3 4 1 1 1 2 3 3 1 1 1 2
P4 1 2 1 1 1 1 1 1 1 1 1
P5 4 1 1 1 3 4 4 1 1 1 1
P6 5 1 1 1 3 4 4 1 1 1 2
-1.1
0.0
1.1
P1
P2
P3
P4
P5
P6
Mean
55. 55
1.6
1.65
1.7
1.75
1.8
1.85
1.9
1.95
2
2.05
2.1
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Electrodermal Activity: Visa Video Ad
You
c
a
n
p
u
t
n
o
t
e
s
h
e
r
e
,
b
u
t
i
f
y
o
u
d
o
n
’
t
i
t
w
o
n
’
t
a
p
p
e
a
r
w
h
e
n
y
o
u
p
r
e
s
e
n
t
[music
only,
screen
change
from
bright
to
dark]
[drama<c
screen
change
to
black
with
white
words,
"without
the
worry
of
currency
exchange";
music
consistent]
[almost
falls
in
water]
[tail
end
of
previous
screen
which
appeared
for
several
seconds
and
then
change
to
first
men<on
of
brand]
[middle
of
second
screen
change—MUSIC
changes]
+
+
+
+
+
[music
change]
[scene
bright
and
beachy]
+
56. 56
Traditional Likert-Scale Overall Rating
New Physiological Measure of Arousal
Visa Video Ad Example
Question: Please indicate how much you experienced each of the following while viewing the advertisement.
Response options: Not At All | A little bit| Moderately | Quite a bit | Extremely
P
amused, fun-
loving, or silly
angry,
irritated, or
annoyed
disgust,
distaste, or
revulsion
guilty,
repentant, or
blameworthy
inspired,
uplifted, or
elevated
interested,
alert, or
curious
joyful, glad, or
happy
sad,
downhearted,
or unhappy
scared,
fearful, or
afraid
sympathy,
concern, or
compassion
surprised,
amazed, or
astonished
P1 2 1 1 1 1 3 2 1 1 1 1
P2 2 3 1 1 1 1 1 1 1 1 1
P3 4 1 1 1 2 3 3 1 1 1 2
P4 1 2 1 1 1 1 1 1 1 1 1
P5 4 1 1 1 3 4 4 1 1 1 1
P6 5 1 1 1 3 4 4 1 1 1 2
1.6
1.65
1.7
1.75
1.8
1.85
1.9
1.95
2
2.05
2.1
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
58. Artery Video Ad Example: Traditional Measures
58
Traditional Likert-Scale Overall Rating
Question: Please indicate how much you experienced each of the following while viewing the advertisement.
Response options: Not At All | A little bit| Moderately | Quite a bit | Extremely
P
amused, fun-
loving, or silly
angry,
irritated, or
annoyed
disgust,
distaste, or
revulsion
guilty,
repentant, or
blameworthy
inspired,
uplifted, or
elevated
interested,
alert, or
curious
joyful, glad, or
happy
sad,
downhearted,
or unhappy
scared,
fearful, or
afraid
sympathy,
concern, or
compassion
surprised,
amazed, or
astonished
P1 1 1 2 1 1 1 1 1 1 1 1
P2 1 1 5 1 1 1 1 2 1 1 4
P3 3 1 3 1 1 2 1 1 1 3 3
P4 1 3 5 1 1 3 1 3 1 1 5
P5 1 1 3 1 1 3 1 2 1 1 1
P6 1 1 5 1 1 1 1 1 1 1 3
59. Artery video example
59
Traditional Likert-Scale Overall Rating
New Continuous Dial Rating
Question: Please indicate how much you experienced each of the following while viewing the advertisement.
Response options: Not At All | A little bit| Moderately | Quite a bit | Extremely
P
amused, fun-
loving, or silly
angry,
irritated, or
annoyed
disgust,
distaste, or
revulsion
guilty,
repentant, or
blameworthy
inspired,
uplifted, or
elevated
interested,
alert, or
curious
joyful, glad, or
happy
sad,
downhearted,
or unhappy
scared,
fearful, or
afraid
sympathy,
concern, or
compassion
surprised,
amazed, or
astonished
P1 1 1 2 1 1 1 1 1 1 1 1
P2 1 1 5 1 1 1 1 2 1 1 4
P3 3 1 3 1 1 2 1 1 1 3 3
P4 1 3 5 1 1 3 1 3 1 1 5
P5 1 1 3 1 1 3 1 2 1 1 1
P6 1 1 5 1 1 1 1 1 1 1 3
-‐1.2
-‐1
-‐0.8
-‐0.6
-‐0.4
-‐0.2
0
0.2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
P2,
video
1
P3,
video
1
P4,
video
1
P5,
video
1
P6,
video
1
Mean
60. -‐1.2
-‐1
-‐0.8
-‐0.6
-‐0.4
-‐0.2
0
0.2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
P2,
video
1
P3,
video
1
P4,
video
1
P5,
video
1
P6,
video
1
Mean
Continuous dial rating: Artery video
60
[sound
of
rushing
air]
"this
much
was
found
stuck
to
the
aorta..."
"every
cigareWe
is
doing
you
damage"
61. Electrodermal activity: Artery video
61
Traditional Likert-Scale Overall Rating
New Physiological Measure of Arousal
Question: Please indicate how much you experienced each of the following while viewing the advertisement.
Response options: Not At All | A little bit| Moderately | Quite a bit | Extremely
P
amused, fun-
loving, or silly
angry,
irritated, or
annoyed
disgust,
distaste, or
revulsion
guilty,
repentant, or
blameworthy
inspired,
uplifted, or
elevated
interested,
alert, or
curious
joyful, glad, or
happy
sad,
downhearted,
or unhappy
scared,
fearful, or
afraid
sympathy,
concern, or
compassion
surprised,
amazed, or
astonished
P1 1 1 2 1 1 1 1 1 1 1 1
P2 1 1 5 1 1 1 1 2 1 1 4
P3 3 1 3 1 1 2 1 1 1 3 3
P4 1 3 5 1 1 3 1 3 1 1 5
P5 1 1 3 1 1 3 1 2 1 1 1
P6 1 1 5 1 1 1 1 1 1 1 3
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
P1
P2
P3
P4
P5
P6
Mean
62. Electrodermal activity: Artery video
62
"...the
main
artery
from
the
heart"
"every
cigareWe
is
doing
you
damage"
[voice,
pace
change]
"authorized
by
the
Australian
government"
"this
much
was
found
stuck
to
the
aorta..."
[sound
of
rushing
air]
[first
faWy
deposits
emerge]
+
+
+
+
+
+
“every
cigareWe
is
doing
you
damage
"
[sound
effect;
no
text]
“age
32“
[heartbeats]
[sound
of
crackling
embers]
+
+
+
+
63. 1.6
1.65
1.7
1.75
1.8
1.85
1.9
1.95
2
2.05
2.1
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
EDA does not capture valence
63
You
c
a
n
p
u
t
n
o
t
e
s
h
e
r
e
,
b
u
t
i
f
y
o
u
d
o
n
’
t
i
t
w
o
n
’
t
a
p
p
e
a
r
w
h
e
n
y
o
u
p
r
e
s
e
n
t
P1: Artery ad (Negative emotion)
P1: Visa ad (Positive emotion)
64. Continuous Dial Rating: Artery vs. Visa
64
-1.1
0.0
1.1
P1
P2
P3
P4
P5
P6
Mean
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
P2, video 1
P3, video 1
P4, video 1
P5, video 1
P6, video 1
Mean
65. EDA advantages and disadvantages
65
• Advantages
– Continuous measure of
automatic physiological
response
– Sensitive to minor changes in
arousal
– Informs order of magnitude
• Disadvantages
– Does not inform valence
– Peak of physiological response
is slow
– Sometimes difficult to collect
0
0.5
1
1.5
2
2.5
Dial Eye Tracker EDA
MeanIntrusivenessRating
Debriefing question: On a scale of 1 to 5, how intrusive was ____ while you were trying to complete the tasks and watch
videos?
Dial: Two participants rated the dial as very intrusive (4): “I was having to concentrate on what my reaction was, not just
have it.”
“It’s not something I normally do, or something I do consciously.”
EDA: Three participants rated the wrist band as moderately intrusive (3): “It was itchy.” “I had to remember not to move it.”
“I didn’t know where to put it.”
67. We need to be taking a collaborative approach
67
• Disparate measures of physiological response can tell a cohesive story!
• By analyzing different streams of data we can uncover a very rich level
of analysis.
68. We need to be taking a collaborative approach
68
69. Combining implicit measures for meaningful insights
69
-1.100
0.000
1.100
• Simulated pupil diameter data
• Simulated heart rate variability data
• Simulated EDA data
70. EDA: promising future
70
• Promising results
– When data is good, EDA provides continuous, “objective” arousal
measure
– There is consistency between:
• The Likert scale and the continuous dial data
• Self-reported emotion overall and EDA data
– EDA provides additional data above and beyond self-report
measures
– Most complete story can be told with a combination of measures.
71. 71
• Data Analyses
– Compare to baseline – different baseline per person and per stimulus
– How does pupil dilation data compare with EDA?
– Reduce the intrusiveness ratings for all metrics
Lessons learned
• Dial
– If ET is not used, allow participants to look at the dial when making
responses
– Include simple practice task to increase familiarity
• Eye Tracker
– Instruct participants to visually search as if they were at home on their own
computer
• EDA
– Improve quality of EDA data; explore equipment
– Provide a cushion/pad to rest arm
– Over-recruit
72. Select your measure carefully
72
• Where are participants dwelling on instructions and tasks?
– Eye tracking
• Which specific elements on a page are particularly
stressful?
– Eye tracking, EDA
• Which content is very engaging for the user?
– Eye tracking, EDA, satisfaction questions, debriefing interview
• Which design causes more stress on the user?
– EDA, debriefing interview
78. Pushing our research further
78
• There are lessons to be learned from neuromarketing
– Neuromarketing researchers have used EDA, heart rate
variability and even fMRI and EEG in an attempt to
determine how users experience an advertisement.
• UX has a different set of requirements
– To become more usable for practitioners, we need:
• Portable technology that can be taken when traveling
• Software that has a short learning curve
• Customizations that allow for sensors to be wrist mounted and
more literature to substantiate the use of this sensor location
• Analysis protocols that can be completed in a short period of
time.
79. Issues to keep in mind
79
• We want to mimic real-world experiences during a usability
study
• Complex setup will confound our experimental design
• Participant comfort is paramount
• Concurrent think-aloud vs. Retrospective think-aloud
• A talking participant is a distracted participant
• We always need to provide support for a ROI
80. Where do we go from here?
80
• We need to:
– Collaborate to move our
field forward
– Share methods and
analysis protocols
– Empirically test our
hypotheses
– Continually provide proof
for ROI
81. Thank you!
81
Jennifer Romano Bergstrom
jbergstrom@forsmarshgroup.com | @romanocog
Dan Berlin
dberlin@madpow.net | @banderlin
Jon Strohl
jstrohl@forsmarshgroup.com | @jonstrohl
David Hawkins
dhawkins@forsmarshgroup.com | @dHawk87
UXPA2013
|
Washington,
DC