Socially-Sensitive Interfaces: From Offline Studies to Interactive Experiences

Socially-Sensitive Interfaces:
From Offline Studies to Interactive Experiences
Elisabeth André
Augsburg University, Germany
http://hcm-lab.de

2
Human-Centered Multimedia
 Founded: April 2001
 Chair: Elisabeth André
 Research Topics:
 Human-Computer Interaction
 Social Signal Processing
 Affective Computing
 Embodied Conversational Agents
 Social Robotics

3
Motivation
 There is another level in human communication, which is
just as important as the spoken message:
nonverbal communication
 How can we enrich the precise and useful functions of
computers with the human’s ability to shape the meaning
of a message through nonverbal messages?

4
Observation
 Social signal processing has developed from a
side issue to a major area of research.
 Undertaken effort has not translated well into
applications. Why is this?
1998 ………………. …………………….. 2005 2006 ……. 2009 .. 2011 2012 2013 2015
Special Session on Face and Gesture Recognition
Keynote „Honest Signals“
1st HCM Workshop
1/3 of Grand Challenge Papers on Affective Computing
3 Workshops on „Social Cues“Brave New Topic: Affective Multimodal HCI
ACM MM

5
Challenge: Real-Life
Applications
 Total of 434 publications on SSPNet
 10% include term “real(-)time” and are related to detection
 Only 2 % address multi-modal detection
 Social Signal Processing in the Wild
90%
3%
2%
2%
1%0%2%
face (15) gesture (9) speech (9)
interaction (8) physiological (2) multimodal (13)
Meta Analysis by J. Wagner

6
Organization of the Talk
 Analysis of Emotional and Social Signals
 Generation of Expressive Behaviors in Virtual Agents
and Robots
 Applications of Socially Signal Processing and Embodied
Agents
 Socially sensitive Robots
 Training of Presentation Skills in
• Job Interviews
• Public Speaking
 Providing Information on Social Context to Blind
People

7
Challenge: Noisy and Corrupted
Data
 We only rely on previously seen data.
 We have to deal with noisy and corrupted data.
?
now
time
noise missing

8
Challenge: Non-Prototypical
Behaviors
 Previous research focused on the analysis of
prototypical samples in preferably pure form
 In daily life, we also observe subtle, blended and
suppressed emotions, i.e. non-prototypical emotional
displays.
Pictures from Ekman and Friesen’s database of emotional faces

9
Accuracy Drops with
Naturalness
 Systems developed under laboratory conditions
often perform poorly in real-world scenarios
100%
80% 70%
Accuracy
Naturalness
Acted Read WOZ

10
Contextualized Analysis
 Improvement by context-sensitive analysis
 Gender-specific information (Vogt & André 2006)
 Success / failure of student in tutoring applications
(Conati & McLaren 2009)
 Dialogue behavior of virtual agent / robot (Baur et al.
2014)
 Learning context using (B)LSTM (Metallinou et al.
2014)

11
Challenge: Multimodal Fusion
 Meta study by D’Mello and Kory on multimodal
affect detection shows that improvement
correlates with naturalness of corpus:
>10% for acted and only <5% for natural data
 In natural interaction people draw on a mixture
of strategies to express emotion leading to a
complementary rather than consistent display of
social behaviour
S.K. D'Mello, J.M. Kory: Consistent but modest: a
meta-analysis on unimodal and multimodal affect
detection accuracies from 30 studies. ICMI 2012: 31-38

12
Event-Based Fusion
 In case of contradictory cues, fusion methods
trust the “right” modality just as often as “wrong”
one
single modalities
fusion
techniques
sample
correct classification
incorrect classification
J. Wagner, E. André, F. Lingenfelser, J. Kim: Exploring Fusion Methods for
Multimodal Emotion Recognition with Missing Data. T. Affective Computing 2(4): 206-
218 (2011)

13
Event-Based Fusion
 Amount of misclassified samples significantly
higher when annotations mismatch
Yes 71%
No 29%
62% 36%
Agreement?

14
neutral
happy
Face
Voice
happy
neutral
?Fusion ?
happy
happy
Face
Voice
Fusion
happy
Event-based Fusion

15
Synchronous Fusion
 Synchronous fusion approaches are characterized by
the consideration of multiple modalities within the same
time frame

16
Asynchronous Fusion
 Asynchronous fusion algorithms refer to past time
frames with the help of some kind of memory support.
 Therefore, they are able to capture the asynchronous
nature of observed modalities.

18
Event-Based Fusion
 Take into account temporal relationships between
channels and learn when to combine information
 Move from segmentation-based processing to
asynchronous event-driven approaches
 More robust in the case of missing or noisy data
+
0
Fusion
time
haha hehe
Event
F. Lingenfelser, J. Wagner, E. André, G. McKeown, W. Curran: An Event Driven Fusion Approach
for Enjoyment Recognition in Real-time. ACM Multimedia 2014: 377-386

19
SSI Framework
 The Social Signal Interpretation (SSI) framework
is the attempt to provide a general architecture
to tackle the challenges we have discussed:
 collection of large and rich multi-modal corpora
 investigation of advanced fusion techniques
 simplifying the development of online systems
hehe
hehe
Johannes Wagner, Florian Lingenfelser, Tobias
Baur, Ionut Damian, Felix Kistler, Elisabeth André:
The social signal interpretation (SSI) framework:
multimodal signal processing and recognition in
real-time. ACM Multimedia 2013: 831-834
SSI is freely available under:
http://www.openssi.net

20
SSI Framework
Mic
Cam
Xsens
Wii
Smartex
Empatica
WAX9
AHM
Emotiv
Kinect
Leap
SensingTex
Touch Mouse
EyeTribe
SMI
Nexus
IOM
eHealth
Myo

23
Affective Feedback Loop
Create
Rapport
Mirror Emotional Behavior
Generate Implicit
Feedback
Behavior Analysis
Emotion Recognition
Sensors

24
Generation of Facial
Expressions
 FACS (Facial Action Coding System) can be used to
generate and recognize facial expressions.
 Action Units are used to describe emotional
expressions.
 Seven Action Units were identified for the robotic face
(out of 40 Action Units for the human face)
 Lower face:
 lip corner puller (AU 12),
 lip corner depressor (AU 15)
 and lip opening (AU 25)
 Upper face:
 inner brows raiser (AU 1),
 brown lowerer (AU 4),
 upper lid raiser (AU 5)
 and eye closure (AU 43).

25
Generation of Facial
Expressions

26
Realization of Social Lies for the
Hanson Robokind
 Social lies constitute a great part of human conversation.
 Social lies, as used for politeness reasons, are generally
accepted.
 Humans often show deceptive cues in their nonverbal
behavior while lying.
 Humanoid robots should show deceptive cues while
conducting social lies as well.

27
Deceptive Cues
 Deceptive cues in human faces, according to Ekman and
colleagues:
 Micro-expressions: A false emotion is displayed but
the felt emotion is unconsciously expressed for the
fraction of a second.
 Masks: The felt emotion is intentionally masked by a
not corresponding facial expression.
 Timing: The longer an expression is shown the more
likely it is accompanying a lie.
 Asymmetry: Voluntarily shown facial expressions
tend to be displayed in an asymmetrical way.

28
Real versus Faked Smile
Pan Am smile (without eyes)Real smile

29
Asymmetric (Pan Am) smileReal smile

30
Smile with blended anger (in the
eye region
Real smile

31
Results of a Study
 It was easier to detect faked smiles by the mouth region.
 Robots with an asymmetrical smile were rated as
significantly less happy than robots with a genuine smile.
 Results are in line with research on virtual agents:
 Rehm & André, AAMAS 2005:
• Agents that fake emotions are perceived as less trustworthy
and less convincing
• Subjects were not able to name reasons for their uneasiness
with the deceptive agent
B. Endrass, M. Häring, G. Akila, E. André: Simulating
Deceptive Cues of Joy in Humanoid Robots. IVA 2014:
174-177

32
TARDIS: a job interview training
system for young adults

33
Social Feedback Loop
Improve
Social Skills
Implicit Social Response
Generate Feedback
Explicit Hint on Social Behavior
Behavior Analysis
Social Behavior
Sensors

34
Behavior
Analysis
 Real-time multimodal analysis and classification
of social signals
 Expressivity features (Energy, Openness, Fluidity)
 Facial expressions (Smiles, Lip biting)
 Speech quality (Speech rate, Loudness, Pitch)
 Engagement, Nervousness

35
Evaluation
 Location:
 Parkschule School in Stadtbergen, Germany
 Participants:
 20 pupils (10m/10f), 13-16 years old, job seeking
 Two practitioners
I. Damian, T. Baur, B.Lugrin, P. Gebhard, G.
Mehlmann, E. André: Games are Better than Books:
In-Situ Comparison of an Interactive Job Interview
Game with Conventional Training. AIED 2015: 84-94

36
Evaluation
 Two conditions:
 TARDIS versus Book

37
Day 1 Day 2 Day 3
Pre-Interviews Training (Control) Training (TARDIS) Post-Interviews
20 pupils
2 practitioners
Task: mock-
interviews
Duration: ~10 min
10 pupils
Task: reading a
job interview guide
Duration: ~10 min
10 pupils
Task: Interaction
with TARDIS +
NovA
Duration: ~10 min
20 pupils
2 practitioners
Task: mock-
interviews
Duration: ~10 min
2x performance
questionnaires
(user +
practitioner)
user experience
questionnaires
user experience
questionnaires
2x performance
questionnaires
(user +
practitioner)
Experimental Setting

38
Results
 The overall behavior of the pupils who had interacted
with TARDIS was rated significantly better by job trainers
than the overall behavior of the pupils who prepared
themselves for the job interview using books.
 Only for the pupils who trained with TARDIS we were
able to measure statistically significant improvements:
 Their use of smiles appeared more appropriate.
 Their use of eye contact appeared more appropriate.
 They appeared significantly less nervous.

[...] using the system, pupils seem
to be highly motivated and able to
learn how to improve their
behaviour […] they usually lack
such motivation during class
[...] transports the experience into
the youngster’s own world
[...] makes the feedback be much
more believable

40
Augmenting
Social
Interactions
I. Damian, C.S. Tan, T. Baur, J. Schöning,
K. Luyten, E. André: Augmenting Social
Interactions: Realtime Behavioural
Feedback using Social Signal Processing
Techniques. CHI 2015: 565-574

41
Explicit
Feedback Generation
Behavior Analysis
Social Behavior
Sensors
Improve
Social Skills

42
Behavior Analysis
Social Behavior
Explicit
Feedback Generation
Haptic Feedback
Sensors
Improve
Social Skills

15 speakers, 2 observers
Task: Hold 5 min presentation
2 Conditions: system on, system off
- within subjects
- randomized order, 2 weeks apart
Data acquisition: social signal recordings,
questionnaires (speaker/observers)
Study 1: Quantitative study in
controlled environment

Objective analysis of recordings:
Amount of inappropriate behaviour
decreased when system was on
Off
On
%inappropriatebehaviour
(lowerisbetter)
44

Example user reaction:
Every time the user received negative
feedback, he quickly adjusted his openness
45

3 speakers, 13 observers
Task: Present PhD progress
Data acquisition: semi-structured
interview
Study 2: Qualitative study in a
real presentation setting

[...] once I saw the feedback that
I was talking too fast, I tried to
adapt

adapt
[...] most of the time I did not
perceive the system, only when I
consciously looked at the feedback

adapt
[...] most of the time I did not
perceive the system, only when I
consciously looked at the feedback
It was a good feeling seeing
everything [the icons] green ...
it’s like applause, or as if
someone looks at you and nods.
However, the green lasts longer
than a nod [laughs]

Exploring Eye-Tracking-Driven
Sonification for the Visually Impaired
Augmented Human, Geneva, 2016

51
Feedback Loop
Provide
Information on
Social Context
Behavior Analysis
Social Behavior
Feedback Generation
Explicit Audio Feedback
Sensors

52
Facial Expression Sonification
woodblock piano guitar
french horn bells
 Map facial expressions onto musical
instruments

53
User Study
 Users:
 7 blind and visually impaired participants
 Criteria:
 No nystagmus, unrestricted eye
movements
Age Gender Visual impairment Control method
68 male Cataract center point
49 female Cataract (early stage) eye gaze
43 female Optic atrophy eye gaze
73 male Congenital blindness center point
68 male Optic nerve damage (accident) center point
87 female Macular degeneration eye gaze
70 male Retinal degeneration eye gaze

54
Experiment
 Scenario:
 Two videos with a speaker giving a monologue are
shown
 Task:
 Rate emotional state of the speaker
 Results:
 Videos were rated more accurately with the system on

56
Overall Conclusions
 Social and emotional sensitivity are key elements of
human intelligence.
 Social signals are particularly difficult to interpret
requiring to understand and model the causes and
consequences of them.
 Offline applications start from too optimistic recognition
rates.
 More work needs to be devoted to interactive online
applications.
 More information and software available under:
http://www.hcm-lab.de

57
Current Work:
Mobile Social Signal Processing
SSJ: Realtime Social Signal Processing for Java/Android
SSI – Unix/Android build compatibility

Socially-Sensitive Interfaces: From Offline Studies to Interactive Experiences

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (12)

Similar to Socially-Sensitive Interfaces: From Offline Studies to Interactive Experiences

Similar to Socially-Sensitive Interfaces: From Offline Studies to Interactive Experiences (20)

Recently uploaded

Recently uploaded (20)

Socially-Sensitive Interfaces: From Offline Studies to Interactive Experiences