Tracking mouse cursor movements can be used to predict user attention on heterogeneous page layouts like SERPs. So far, previous work has relied heavily on handcrafted features, which is a time-consuming approach that often requires domain expertise. We investigate different representations of mouse cursor movements, including time series, heatmaps, and trajectory-based images, to build and contrast both recurrent and convolutional neural net- works that can predict user attention to direct displays, such as SERP advertisements. Our models are trained over raw mouse cur- sor data and achieve competitive performance. We conclude that neural network models should be adopted for downstream tasks involving mouse cursor movements, since they can provide an invaluable implicit feedback signal for re-ranking and evaluation.
3. §The construct of attention has become the common
currency on the Web
§SERPs have become sophisticated UIs that include
heterogeneous modules
§Multiple elements compete for user’s attention;
understanding which actually attract it is key to search
engines
4. §Past studies have shown that certain cognitive and
motor control mechanisms are embodied in our mouse
cursor movements
§There is proof of the utility of mouse cursor data for
predicting user’s emotional state and demographic
attributes like gender and age
5. §In the context of IR, a large body of research
established further the cognitive grounding for
hand-eye relationship and has demonstrated
the utility of mouse cursor analysis as a low-
cost and scalable proxy of visual attention
§Tracking mouse cursor movements can be
used to predict attention on heterogeneous web
page layouts
6. Experimental Design
§Brief transactional search task where participants were presented
with a predefined search query and the corresponding SERP, and were
asked to click on any element of the page that answered it best
§Between-subjects design with two independent variables:
• ad format (organic1 and direct display ads)
• ad position (top-left and top-right position)
§Each participant was exposed to a unique combination of query, ad
format, and ad position
1 Organic ads are only shown in the left part of Google
7. Experimental Design
§We used EvTrack1, an open-source JavaScript library that allows event
tracking via event listeners or via event polling
§Collected data from 3,206 participants, of age 18 − 66
§We collected ground-truth labels at post-task and asked the users to
what extent they paid attention to the ad using a 5-point Likert-type
scale: “Not at all” (1), “Not much” (2), “I can’t decide” (3), “Somewhat”
(4), and “Very much” (5)
1 https://github.com/luileito/evtrack
9. Experimental Design
§After excluding logs with incomplete mouse cursor data, we concluded
to 2,289 search sessions (45,082 mouse cursor positions):
• 763 correspond to the organic ad condition
• 793 correspond to the left-aligned direct display ad
• 733 correspond to the right-aligned direct display ad
§Ground-truth labels were converted to a binary scale (66% of positive
cases)
§Used 60-10-30 (%) disjoint stratified splits for training, validation, test
12. Results: Effect of Model Type
CNN (top) improves over RNN (bottom)
by 3.24% (F1) and 9.35% (AUC)
CNN (top) improves over RNN (bottom)
by 13.91% (F1) and 26.42% (AUC)
CNN (top) improves over RNN (bottom)
by 18.65% (F1) and 20.35% (AUC)
13. Results: Effect of Ad Placeholder
§Often, the presence of the ad placeholder in the representations improves
performance although the contribution is not statistically significant
§In the left-aligned direct display condition, F1 scores were significantly higher for
the models trained on the representations without the ad placeholder
Vs
14. Results: Effect of Ad Format
SqueezeNet
(AUC=0.690)
ResNet50
(AUC=0.690)
AlexNet
(AUC=0.708)
VGG19
(AUC=0.694) ResNet50
(AUC=0.739)
15. Results: Effect of Ad Format (AUC)
<
<
(Mdn=0.610) (Mdn=0.634)
(Mdn=0.594) (Mdn=0.634)
16. Results: Effect of Ad Format (F1)
<
<
(Mdn=0.616) (Mdn=0.708)
(Mdn=0.629) (Mdn=0.708)
17. Results: Effect of Representations
§The visual representations based on trajectories and coloured
trajectories with variable line thickness are consistently amongst the
top-ranked performers
18. Main Findings
§Our findings, reveal that the CNN models outperform the (rather shallow) RNN
models in the context of our study
§None of the models use hand-crafted features which are time-consuming to
engineer and require domain expertise
§The presence of the ad placeholder in the visual representation seems to improve
the models’ performance in most cases
§The visual representations based on trajectories and coloured trajectories with
variable line thickness are consistently amongst the top-ranked performers
§The application of transfer learning proved to be effective
19. Thank you for your attention!
Dataset Available Here
iarapakis
ioannis.arapakis@telefonica.com
http://iarapakis.github.io
luileito
luis.leiva@aalto.fi
https://luis.leiva.name/web/