Slides for keynote "Social Media and AI: Don’t forget the users" at WWW 2017 workshop "International Workshop on Modeling Social Media: Machine Learning and AI for Modeling and Analyzing Social Media". I am arguing that we need consider two things: the source of what we use to make good algorithms and whether users are impacted the way we want to impact them. The talk is based on two uses cases around providing diversity (something many of us believe is good) to users:
1. Engaging through diversity: serendipity (same algorithm, different sources)
2. Engaging through diversity: awareness (effective algorithm, perception)
My goal is to say, we may have the best AI, but we may get it wrong if we forget the users. I don't have answers, but it is important that we ask the right questions in today's world.
2. Social Media and AI: Don’t forget the users
AI
(Algorithms)
Social Media
User
Engagement
USERS
3. Social Media and AI: Don’t forget the users
AI
(Algorithms)
Social Media
User
Engagement
USERS
Source of
information
User
perception
4. Outline
• User engagement
• Engaging through diversity: serendipity
• Engaging through diversity: awareness
We develop and deploy algorithms to “engage” users.
Use cases:
Diversity
5. Outline
• User engagement
• Engaging through diversity: serendipity
• Engaging through diversity: awareness
We develop and deploy algorithms to “engage” users.
Source of
information
User
perception
Use cases:
Diversity
6. Outline
• User engagement
• Engaging through diversity: serendipity
• Engaging through diversity: awareness
We develop and deploy algorithms to “engage” users.
7. Why is it important to engage users?
▪ In today’s wired world, users have enhanced expectations about their interactions with
technology
… resulting in increased competition amongst the
purveyors and designers of interactive systems.
▪ In addition to utilitarian factors, such as usability, we must consider the hedonic and
experiential factors of interacting with technology, such as fun, fulfillment, play, and user
engagement.
(O’Brien, Lalmas & Yom-Tov, 2014)
8. What is user engagement?
User engagement is a quality of the user experience that
emphasizes the positive aspects of interaction – in
particular the fact of being captivated by the technology
(Attfield et al, 2011).
user feelings: happy, sad,
excited, …
emotional, cognitive and behavioural connection
that exists, at any point in time and over time, between
a user and a technological resource
user interactions: click,
read, comment, buy…
user mental states: flow,
presence, immersion, …
(O’Brien, Lalmas & Yom-Tov, 2014)
9. Patterns of user engagement
Online sites differ concerning their engagement!
Games
Users spend much
time per visit
Search
Users come
frequently and do
not stay long
Social media
Users come
frequently and stay
long
Niche
Users come on
average once
a week e.g. weekly
post
News
Users come
periodically,
e.g. morning and
evening
Service
Users visit site,
when needed, e.g.
to renew
subscription
(Lehmann etal, 2012)
10. Why is it important to measure and interpret
user engagement well?
CTR
new ranking algorithm
11. Characteristics of user engagement
• Users must be focused to be engaged
• Distortions in the subjective perception of time used to measure it
Focused attention
(Webster & Ho, 1997; O’Brien, 2008)
• Emotions experienced by user are intrinsically motivating
• Initial affective “hook” can induce a desire for exploration, active
discovery or participation
Positive Affect
(O’Brien & Toms, 2008)
• Sensory, visual appeal of interface stimulates user & promotes
focused attention
• Linked to design principles (e.g. symmetry, balance, saliency)
Aesthetics
(Jacques et al, 1995; O’Brien, 2008)
• People remember enjoyable, useful, engaging experiences and
want to repeat them
• Reflected in e.g. the propensity of users to recommend an
experience/a site/a product
Endurability
(Read, MacFarlane, & Casey, 2002;
O’Brien, 2008)
12. Characteristics of user engagement
• Novelty, surprise, unfamiliarity and the unexpected
• Appeal to users’ curiosity; encourages inquisitive behavior
and promotes repeated engagement
Novelty
(Webster & Ho, 1997; O’Brien, 2008)
• Richness captures the growth potential of an activity
• Control captures the extent to which a person is able to
achieve this growth potential
Richness and control
(Jacques et al, 1995; Webster & Ho,
1997)
• Trust is a necessary condition for user engagement
• Implicit contract among people and entities which is more
than technological
Reputation, trust and
expectation (Attfield et al,
2011)
• Difficulties in setting up “laboratory” style experiments
• Why should users engage?
Motivation, interests,
incentives, and
benefits (Jacques et al., 1995;
O’Brien & Toms, 2008)
13. Outline
• User engagement
• Engaging through diversity: serendipity (same algorithm, different sources)
• Engaging through diversity: awareness
We develop and deploy algorithms to “engage” users.
Source of
information
(Bordino, Mejova & Lalmas, 2013)
15. Engaging through serendipity: which source?
community-driven question & answer
portal
• 67 336 144 questions & 261 770 047
answers
• January 1, 2010 – December 31, 2011
• English-language
community-driven encyclopedia
• 3 795 865 articles
• as of end of December 2011
• English Wikipedia
Entity
Search
build an entity-driven serendipitous search system based on entity
networks extracted from Wikipedia and Yahoo! Answers
Serendipity
finding something good or useful while not specifically looking
for it, serendipitous search systems provide relevant and
interesting results
Yahoo! Answers Wikipedia
16. Engaging through serendipity: which source?
community-driven question & answer
portal
• 67 336 144 questions & 261 770 047
answers
• January 1, 2010 – December 31, 2011
• English-language
community-driven encyclopedia
• 3 795 865 articles
• as of end of December 2011
• English Wikipedia
Entity
Search
build an entity-driven serendipitous search system based on entity
networks extracted from Wikipedia and Yahoo! Answers
Serendipity
finding something good or useful while not specifically looking
for it, serendipitous search systems provide relevant and
interesting results
Yahoo! Answers Wikipedia
curated
high-quality knowledge
variety of niche topics
minimally curated
opinions, gossip, personal info
variety of points of view
18. | relevant & unexpected | / | unexpected |
number of serendipitous results out of all of the
unexpected results retrieved
| relevant & unexpected | / | retrieved |
serendipitous out of all retrieved
Baseline Data
Top: 5 en&&es that occur most frequently WP 0.63 (0.58)
in top 5 search from Bing and Google YA 0.69 (0.63)
Top – WP: same as above, but excluding WP 0.63 (0.58)
Wikipedia page from results YA 0.70 (0.64)
Rel: top 5 en&&es in the related query WP 0.64 (0.61)
sugges&ons provided by Bing and Google YA 0.70 (0.65)
Rel + Top: union of Top and Rel WP 0.61 (0.54)
YA 0.68 (0.57)
Serendipity “making fortunate discoveries by accident”
Serendipity = unexpectedness + relevance
“Expected” result baselines from web search
19. Serendipitous ≠ Relevance
Serendipitous > Relevant
Relevant > Serendipitous
Oil Spill à
Penguins in Sweaters WP
Robert Pattinson à
Water for Elephants WP
Lady Gaga à Britney Spears WP
Egypt à Cairo WP
Netflix à Blu-ray Disc YA
Egypt à
Ptolemaic Kingdom WP & YA
Novelty
(see “common” knowledge in Weikum, 2017 WWW TempWeb Keynote)
20. Engaging through serendipity: which source?
• Engagement in search is to view search activities as part of
the current overall task of a user, including task of a leisurely
or explorative nature
• Not all social media sources provide serendipitous
search experience
(slides based on Bordini presentation @ CIKM 2016)
Source of
information
21. Outline
• User engagement
• Engaging through diversity: serendipity
• Engaging through diversity: awareness (effective algorithm, perception)
We develop and deploy algorithms to “engage” users.
User
perception
(Graells-Garrido, Lalmas & Baeza-Yates, 2016)
22. Twitter is global
• But there are cognitive and systemic biases that shape user behaviour
• Can we do something about it?
Leetaru et al., 2013.
Use case II: Geographical bias on social media
23. • Economic/political/media powers
are concentrated in Santiago
(the capital)
Región Metropolitana (RM) is
the capital region
• Twitter activity is centralized –
RM receives more tweets from
other locations than expected
due to population distribution
Context: Chile, a centralized country
24. Chart: flow of tweets
activities between
administrative
regions
Context: Chile, a centralized country
25. Create a geographically diverse timeline
● Proposed Method “PM”: Information entropy + sidelines (enforce location)
● Baseline “DIV”: Information entropy only
● Baseline “POP”: Most popular tweets (mostly tweets from Santiago/RM)
After reading timelines side-by-side, which one is more:
- diverse?
- interesting?
- informative?
Participants answered
using a Likert scale
from -3 to 3
Algorithms to overcome geographical bias
26. Being from a central or peripheral
location makes a difference
For peripheral/NOT-RM users,
there was no perception of the
diversity present by design on
both algorithms (DIV and PM)!
Main Result
Statistical interaction between location and algorithm POP/PM
RM participants find PM more diverse than POP
NOT-RM do not
27. • Users do not see the diversity in the timelines
because they cannot identify themselves (in the location
sense), even though diversity was present
• There is a diversity and representation awareness
problem
• How to make users aware of their representation in the
timeline, as well as the diversity inherent in it?
Algorithms are not enough to overcome bias
28. Previous work in news aggregators indicates that clustered representations help users to
become aware of diversity
Clustered Tweets by Location Standalone Tweets
29. • Inspired by newsmap.jp, use treemaps to depict differences in a tweet geographical
origin, as well as giving every location a balanced amount of exposure
• Allow users to filter locations by selecting a specific region
30. Purpose - evaluate user involvement
with the application as proxy of diversity
and representation awareness.
• Diversity - do users click on content
related to different locations?
• Representation - do users choose
to see only their location using the
filters?
• Interestingness - how many
interactions with content do users
make? Social bot (@todocl) to generate timelines every
hour and broadcast them, mentioning featured users,
and retweeting their tweets
“In the wild” study
31. Experimental Setup
Between-subjects design.
N = 321 (RM = 193, NOT-RM = 128)
Main Results
treemap increases:
- # of interaction events
# of locations interacted with
filter likelihood
Users interacted with more content, from
more locations, and filtered locations also!
(diversity)
Being from RM:
- increases locations interacted with
decreases filter likelihood
* NOT-RM increases representation awareness -
they find themselves!
32. • Bias (centralization) has effects on information perception and
user behavior
• Algorithms are not enough! We need to find ways of “showing”
information to users
• Not necessarily new algorithms – but new “I don’t know how to
call it”
Engaging through awareness: perception
(slides based on Graells-Garrido presentation @ IUI 2016)
User
perception
33. Final message
• E. Graells-Garrido, M. Lalmas and R. Baeza-Yates. Encouraging Diversity- and Representation-
Awareness in Geographically Centralized Content, ACM IUI, 2016.
• I. Bordino, Y. Mejova and M. Lalmas. Penguins in Sweaters, or Serendipitous Entity Search on User-
generated Content, ACM CIKM, 2016.
• M. Lalmas, H. O'Brien and E. Yom-Tov. Measuring User Engagement, Synthesis Lectures on
Information Concepts, Retrieval, and Services, Morgan & Claypool Publishers, 2014.
Not every culture has same notion of relevance and
importance. Even within a country there are differences!!
We need algorithms that do not forget about these
differences.