Network flows, social networking, smart devices, the Internet-of-Things. These innovations carry no deep value in themselves. Value invariably comes from understanding, obtaining accurate measurements, predicting, and controlling. This vision, however, rests on the application of machine intelligence and data mining techniques that can tackle large and diverse data collections, but also on our capacity to operationalise an interdisciplinary research in the intersection of many domains, such as statistics, signal processing, neuroscience, privacy and security, to name a few. In this talk, through the narration of my involvement in past and recent projects, I share my experience in the domain of user behaviour analysis and predictive modelling. I discuss offline and online experimental methods (and how they can be brought together), present current practices in measuring human behaviour in the online world, and highlight research challenges and opportunities that I have encountered.
User Behaviour Modelling - Online and Offline Methods, Metrics, and Challenges
1. User Behaviour Modelling
Online and Offline Methods, Metrics and Challenges
System and User Centered Evaluation Approaches in Interactive
Information Retrieval (SAUCE 2016)
PRESENTED BY Ioannis Arapakis (Sr Data Scientist, Eurecat)⎪ March 17, 2016
2. Contents
1. Short Biography
2. User Engagement in Web Search
3. User Modelling Using Mouse Cursor Interactions
4. On Human Information Processing in Information
Retrieval
4. Education & Research Experience
§Ph.D. in Computer Science, University of Glasgow (2010)
• Supervisors: Prof. Joemon M. Jose
§M.Sc. in Information Technology, Royal Institute of Technology (KTH),
Sweden (2007)
§2015 – 2016 Senior Data Scientist, Eurecat, Barcelona
• Data Mining Group
§2011 – 2015 Researcher, Yahoo Labs, Barcelona
• User Engagement, Web Search Group, Ad Processing and Retrieval Group
5. Research Interests
§Data Mining
• Pattern recognition, predictive modelling, statistical inference, time series
analysis
§Information Retrieval
• Multimedia mining and search, user modelling, personalised search
systems, recommender systems, evaluation and applications
§Human-Computer Interaction
• Experimental methods, user engagement, neuro-physiological signal
processing, sentiment analysis
6. Internal Projects
§User Engagement
§Ad Retrieval
§Modelling News Article Quality
§Mouse Tracking Analysis for Inferring User Behaviour
§Discovery and Localisation of Points of Interest
8. Trade-off between the speed of a search system and
the quality of its results
Too slow or too fast may result in financial consequences
for the search engine
9. Web Search Economics
§Web users
• are impatient,
• have limited time
• expect sub-second response times
§High response latency
• can distract users
• decrease user engagement over time
• results in fewer query submissions
§Sophisticated and costly solutions
• More information stored in the inverted
index
• Machine-learned ranking strategies
• Fusing results from multiple resources
10. Research Methodology
• Small samples
• Controlled conditions
• High internal validity
• Behavioural observations
• Questionnaires
• Neurophysiological
measures with high temporal
and spatial resolution
Controlled Experimentation
• Large datasets / samples
• High external validity
• Flexible parameter
exploration
• A/B testing
• Bucket testing
• Real-life conditions
Log Analysis
11. Research Questions
§What are the main components in the response latency of a search
engine?
§How sensitive are users to response latency?
§How much does response latency affect user behaviour?
14. Yahoo Labs
Impact of Search Latency on User Engagement in Web Search
Controlled Study (1)
15. Tasks
§Task 1: Investigates users’ perception of the search site
response (slow or fast?)
§Task 2: Users’ ability to estimate the experienced search
site latency (what was the latency in milliseconds?)
§Task 3: How brand bias affects perceived search site
usability and UX
16. Experimental Methodology (Task 1)
§Controlled study (12 participants) with two independent variables
• Search latency (0 – 2750ms)
• Search site speed (slow, fast)
§Participants submitted 40 navigational queries
§For each query we increased latency by a fixed amount (0 – 1750ms)
using a step of 250ms
§Each latency value (e.g., 0, 250, 500) was introduced five times, in a
random order
§After submitting each query, they were asked to report if the response
of the search site was “slow” or “normal”
17. Was it Too Slow or Too Fast?
§Up to a point (500ms) added response
time delays are not noticeable by the
users
§Beyond a certain threshold (1000ms)
the users can feel the added delay with
very high likelihood
250 750 1250 1750
Added latency (ms)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Likelihoodoffeelingaddedlatency
Slow SE (base)
Slow SE
Fast SE (base)
Fast SE
250 750 1250 1750
Added latency (ms)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Increaserelativetobaselikelihood
Slow SE
Fast SE
18. Experimental Methodology (Task 2)
§Controlled study (12 participants) with two independent variables
• Search latency (0 – 2750ms)
• Search site speed (slow, fast)
§Participants submitted 50 navigational queries
§For each query we increased latency by a fixed amount (500 –
2750ms) using a step of 250ms
§Each latency value (e.g., 0, 250, 500) was introduced five times,
in a random order
§After each query submission they provided an estimation of the
search latency in milliseconds
19. Counting the Seconds
1750 2000 2250 2500 2750
Actual latency (ms)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Predictedlatency(ms)
Actual
Males
Females
Average
750 1000 1250 1500 1750 2000 2250 2500 2750
Actual latency (ms)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Predictedlatency(ms)
Actual
Males
Females
Average
Perception of search latency varies considerably across the
population
20. Experimental Methodology (Task 3)
§Controlled study (20 participants) with two independent variables
• Search latency (0, 750, 1250, 1750)
• Search site speed (slow, fast)
§Participants submitted 50 navigational queries
§Participants performed four search tasks
• Asked to evaluate the performance of four different backend search systems
• Submit as many navigational queries from a list of 200 randomly sampled web
domains
• For each query they were asked to locate the target URL among the first ten
results of the SERP
21. Reported User Engagement and System Usability
§The tendency to overestimate or underestimate system
performance biases users’ perception of system usability
• Positive bias towards SEfast
• SEfast participants were more deeply engaged
SEslow latency SEfast latency
0ms 750ms 1250ms 1750ms 0ms 750ms 1250ms 1750ms
Post-Task Positive Affect 16.20 14.50 15.50 15.20 20.50 19.00 20.80 19.30
Post-Task Negative Affect 7.00 6.80 7.60 6.90 6.80 7.40 7.40 7.20
Frustration 3.20 3.10 2.90 3.30 2.80 3.00 3.50 2.60
Focused Attention 22.80 22.90 19.90 22.20 27.90 26.60 23.90 29.50
SYSUS 32.80 28.90 29.80 27.90 35.20 31.30 29.80 33.20
22. Yahoo Labs
Impact of Search Latency on User Engagement in Web Search
Large-scale Log Analysis (1)
23. Query Log Data
§Random sample of 30M web search queries obtained from Yahoo
§End-to-end (user perceived) latency values
§We select queries issued:
• Within the US
• To a particular search data centre
• From desktop computers
§ Compare presence of clicks for two given query instances qfast & qslow
• submitted by the same user
• having the same query string
• matching the same search results
24. 0.0
0.2
0.4
0.6
0.8
1.0
0.4 0.6 0.8 1.0 1.2 1.4
Clickedpageratio(normalizedbythemax)
Latency (normalized by the mean)
Variation of Clicked Page Ratio Metric
25. 0.02
0.04
0.06
0.08
0.10
0.12
0 250 500 750 100012501500 17502000
0.8
0.9
1.0
1.1
1.2
1.3
Fractionofquerypairs
Click-on-fast/Click-on-slow
Latency difference (in milliseconds)
Click-on-fast
Click-on-slow
Ratio
§ Given two content-wise identical
result pages, users are more
likely to click on the result page
that is served with lower latency
§ 500ms of latency difference is
the critical point beyond which
users are more likely to click on a
result retrieved with lower latency
Click Presence
26. 0.02
0.04
0.06
0.08
0.10
0.12
0 250 500 750 100012501500 17502000
0.8
0.9
1.0
1.1
1.2
1.3
Fractionofquerypairs
Click-more-on-fast/Click-more-on-slow
Latency difference (in milliseconds)
Click-more-on-fast
Click-more-on-slow
Ratio
§ Clicking on more results
becomes preferable to
submitting new queries when the
latency difference exceeds a
certain threshold (1250ms)
Click Count
27. Yahoo Labs
Impact of Search Latency on User Engagement in Web Search
Controlled Study (2)
28. Do Small Latency Increases Affect User Engagement?
§Consciously unaware of the mental
processes determining our behaviour
§Such unconscious influences reach
from basic or low-level mental
processes to high-level psychological
processes
§Conclusions based on self-report
methods are inherently limited
§Users cannot provide information that is
not consciously available to them
31. EDA Signal
§Applied 200ms smoothing filter & artifact removal
§A temporal series was constructed from each physiological signal
§Averaged the data every 1-second period (480 points == ~ 8 minutes)
§Each 10-second period following a query submission was visually
inspected for SCRs (skin conductance responses)
§Data sample: 132 SCRs; 10 points (seconds) per SCR
15.0
15.2
15.4
15.6
15.8
16.0
16.2
16.4
16.6
16.8
17.0
0 1 2 3 4 5 6 7 8 9 10 11 12
µS
Time after stimulus onset (in seconds)
32. § Band-pass filter 30-500Hz & artifact removal
§ A temporal series was constructed from each physiological signal
§ Averaged the data every 1-second period (480 points == ~ 8 minutes)
§ Included the data for the entire 3-second period after each query
submission
§ Outliers excluded. Data sample: 7256 samples (4 seconds by query)
EMG-CS Signal
33. Physiological Data
§Mixed multilevel models (a regression-based approach)
• Allows comparison of data at different levels
• Level 1: conditions within-subjects
• Level 2: subjects
• allows including random terms in the model for random factors
• random intercepts for between-subject variability; accounts for the difference in means
between subjects
• useful for physiological data, since between subject variability can be much larger than
variability due to experimental conditions, and, therefore, can mask it
• random slopes for the effects of time and order of presentation
• Deals with autocorrelated data (e.g. physiological data)
34. Mixed multilevel models (a regression-based approach)
EDA Model
Fixed factors Coefficients
Intercept - .31*
Latency 500ms .50***
Latency 750ms .42**
Latency 1000ms .60***
Seg 2 .11***
Seg 3 .36***
Seg 4 .68***
Seg 5 .88***
Seg 6 .90***
Seg 7 .80***
Seg 8 .74***
Seg 9 .72***
Seg 10 .69***
EMG-CS Model
Fixed factors Coefficients
Intercept .0188***
Latency 500ms .0019***
Latency 750ms .0034***
Latency 1000ms .0010*
Seg 1 .0000393
Seg 2 .0002397***
Seg 3 .0003163***
§ Higher EMG values à
more negative
experience
§ Higher EDA values à
more intense experience
§ Even short latency increases (>500ms) that are not
consciously perceived have sizeable physiological
effects
35. Yahoo Labs
Impact of Search Latency on User Engagement in Web Search
Large-scale Log Analysis (2)
36. Query Log Data
§Random sample of 30M web search queries obtained from Yahoo
§We select queries issued:
• Within the US
• To a particular search data centre
• From desktop computers
§ Compare presence of clicks for two given query instances qfast & qslow
• submitted by the same user
• having the same query string
• matching the same search results
§ Click presence (click-on-fast, click-on-slow)
§ Click count (click-more-on-fast, click-more-on-slow)
37. 0
0.05
0.10
0.15
0.20
0 500 750 1000
0
0,5
1.0
1.5
2.0
Fractionofquerypairs
Click-on-fast/Click-on-slow
Latency difference (in milliseconds)
Click-on-fast
Click-on-slow
Ratio
Fast or slow query response preference according to click
presence metric
38. 0
0.05
0.10
0.15
0.20
0 500 750 1000
0
0.5
1.0
1.5
2.0
Fractionofquerypairs
Click-more-on-fast/Click-more-on-slow
Latency difference (in milliseconds)
Click-more-on-fast
Click-more-on-slow
Ratio
Fast or slow query response preference according to click
count metric
41. Background Information
§ Abundance of multimedia content
§ Availability of large volumes of interaction data
§ Scalable data mining techniques
42. Part of the efforts have focused on understanding how users
interact and engage with web content
Measurement of within-content engagement remains a difficult
and unsolved task
personalisation
service quality
ad quality
Recommender
algorithms
43. § Lack of standardised methodologies
§ Absence of well-validated measures
§ Users often don’t provide explicit feedback about
their QoE
§ Existing methods don’t form scalable solutions
§ Traditional web analytics (e.g., clicks, dwell time,
pageviews) vs. users’ true intentions and motivations
Challenges
44. § Navigation & interaction with a digital
environment usually involves the use of a
mouse (i.e., selecting, hovering, clicking)
§ Can be easily performed in a non-invasive
manner, without removing users from their
natural setting
§ Several works have shown that the mouse
cursor is a proxy of gaze (attention)
§ Low-cost, scalable alternative to eye-tracking
Why Mouse Tracking?
45. Motivation
§Develop techniques for measuring
within-content engagement with
online news articles
§Quantify user engagement with
Direct Displays in web search, e.g.,
Knowledge Graph
46. Methodology
§Large scale analysis
§~15GB of mouse cursor data
(e.g., <x,y,t>, clicks) of users
interacting with online news
(bucket test)
§Learn mouse cursor patterns
(unsupervised approach)
§Controlled study
§A small sample (~50 participants)
of users interacting with engaging
and non-engaging news articles
§Create ground truth for our
prediction task
Apply learned patterns to smaller set and test on ground truth
47. § Time
§ Coverage
§ Type (e.g., vertical scroll)
§ Distance
§ Speed
§ Acceleration
§ Direction
§ Spectral Analysis
Feature Engineering
48. § Perform the clustering for k = 1..40
• Agglomerative Hierarchical Clustering
• K-Means
• Spectral Clustering
§ Compute cluster validity using a large
number of internal criteria; each criterion
results in a ranking
§ Perform Rank Aggregation to derive a single
ranked list L' that has the minimum distance
from a given set of ranked input lists L = {L1,
L2, …, Lm}
Learning Mouse Cursor Motifs
49. Prediction Task
§ The frequency distribution of mouse gestures varies
per user and content (interesting vs. uninteresting)
Classifier
Performance metrics
Precision Recall F-Measure Accuracy
Baseline .273 .523 .359 .522
1NN .664 .659 .659 .659
SMO .700 .682 .678 .681
Random Forest .727 .727 .727 .727
Stacking (1NN +
SMO)
.751 .750 .750 .750
51. Human Information Processing (HIP)
§ We are not consciously aware of the
mental processes determining our
behaviour
§ Such unconscious influences reach
from basic or low-level mental
processes to high-level psychological
processes like motivations,
preferences, or complex behaviours
52. Human Information Processing (HIP)
§ The search for information is often led by a human brain
§ HIP is the field of study of experimental psychology and cognitive
neuroscience
53. Psychological Variables
§ The most interesting psychological variables and processes for
the study of IR are those related to attentional and emotional
phenomena
Selective attention
Cognitive effort
/ arousal
Emotional reactions
54. Psychophysiological Measures of HIP
§ Standardised questionnaires for measuring
perceptual aspects, perceived usability, cognitive
working load, or affective
§ Online measures of user behavior and cognitive
states that are often unavailable for conscious
reports:
§ Behavioral
§ Psychophysiological
55. Characteristics of Psychological Methods
§ Helpful in unveiling attentional and emotional reactions not
consciously available to us
§ Offer high temporal and spatial resolution
§ Robust against cognitive biases (e.g., social desirability bias*)
§ Always provide “honest” responses
§ No direct question to the subject, no direct answer
§ The information on the research questions has to be inferred
from the variations on the physiological signals and the way they
are related to psychological constructs
* The tendency of survey respondents to answer questions in a manner that will be viewed favorably by others.
56. Electrodermal Activity (EDA)
§ Changes in conductivity of the skin due to
activation of sweat glands by activation of the
autonomous nervous system (sympathetic
division)
§ Reflects general activation both for attentional
and emotional measures (in fact, it is calibrated
by having participants perform complex math
calculations)
§ It’s the basis of the “truth machine”, though not
as effective as fiction has led us to believe…
58. Electrodermal Activity (EDA)
§ Unconscious Physiological Effects of Search
Latency on Users and Their Click Behaviour
(SIGIR 2015)
• Although the latency effects did not produce
changes on the self-reported data, their
impact on users’ physiological responses is
evident
• Even when short latency increases of under
500ms are not consciously perceived, they
have sizeable physiological effects that can
contribute to the overall user experience
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1 2 3 4 5 6 7 8 9 10
µS
Time after query onset (in seconds)
0ms
500ms
750ms
1000ms
15.0
15.2
15.4
15.6
15.8
16.0
16.2
16.4
16.6
16.8
17.0
0 1 2 3 4 5 6 7 8 9 10 11 12
µS
Time after stimulus onset (in seconds)
59. Electrodermal Activity (EDA)
§ A large-scale query log analysis ascertained the effect on the
clicking behaviour of users and revealed a significant decrease in
users’ engagement with the search result page, even at small
increases in latency
0
0.05
0.10
0.15
0.20
0 500 750 1000
0
0.5
1.0
1.5
2.0
Fractionofquerypairs
Click-more-on-fast/Click-more-on-slow
Latency difference (in milliseconds)
Click-more-on-fast
Click-more-on-slow
Ratio
0
0.05
0.10
0.15
0.20
0 500 750 1000
0
0,5
1.0
1.5
2.0
Fractionofquerypairs
Click-on-fast/Click-on-slow
Latency difference (in milliseconds)
Click-on-fast
Click-on-slow
Ratio
60. HIP Dynamics
§ Human information processing is both serial and parallel
§ Cognitive science has provided large amounts of evidence that
conscious information processing is mainly serial
§ When processing information in situations that require to shift the
focus of attention between different tasks and/or stimuli, this
results in an increase in the effort required to process that
information
§ Simon effect
62. HIP Dynamics (Serial Processing)
§ Switching tasks
§ Try to read the word in odd trials
and name the color on even
trials!
Green
Red
Blue
Red
Green
Yellow
63. HIP Dynamics (Parallel Processing)
§ Simon effect: Hit the left key if there is an A on screen and the
right if there is a B
65. Multimodal Behaviour Modelling
§ Behaviour measurements in ecological conditions
§ Behaviour understanding through cameras and microphones
§ Aggregating various online measures gives an accurate picture
of the user’s experience
§ Robust real-time behaviour analyses, information that can be
used for the purpose of research on human behaviour and user
experience
§ The opportunity is ripe to move beyond experimental laboratory
settings into real large-scale controlled studies
66. Take-away Messages
§ The use of neuro-physiological methods in IR research is
essential in order to obtain a complete picture of the mental
processes underlying user search behaviour
§ The collaboration between psychological and IR research can go
far beyond the application of sophisticated measuring
methodologies
§ Introduce actual knowledge on the dynamics of human
information processing into a real-world testing ground
§ The use of multimodal signals holds the promise of allowing
large-scale, controlled studies that will undoubtedly foster the
progress of both research fields
67. Thank you for your attention!
iarapakis
arapakis.ioannis@gmail.com
https://es.linkedin.com/in/ioannisarapakis