The User is the Query - The Rise of Predictive Proactive Search

The User is The
Query: The Rise
of Predictive
Proactive Search

“Today you are you! That
is truer than true! There
is no one alive who is
you-er than you!” (Dr
Seuss)

When introducing Google Feed (now Discover)

Today’s topic…‘The User
is The Query’

WHO
IS
DAWN
ANDERSON
@dawnieando
dawn.anderson@@move-it-
marketing.co.uk
move-it-marketing.co.uk
linkedin.com/in/msdawnanderson
Meet Bert & Tedward

There’s a problem with
queries, content & users
too

“In 1998 the web
consisted of just 25 million
pages…” (Ben Gomez,
Google, 2018)

“… That’s roughly the
equivalent number of
those in a small
library” (Ben Gomez,
Google, 2018)

In 2019… we
know the web
is huge…
billions of web
pages
(Netcraft,
2019)

App usage is huge too - By 2018 – App Store has 20 million registered
developers. (Techcrunch, 2018)

42% of the
global
population
use social
media
(Emarsys,
2019)

We are competing
with programmatic
solutions spraying
content & information
EVERYWHERE

Over-connection &
Information Overload

Too much choice
often has
negative impacts

Almost 98% of visits are people window
shopping
Average ecommerce conversion +/- 2%

Despite this… users are
still seeking even more
information

Humans forage
(like bears) all
over the place
seeking
information… we
are informavores

Researching ALL THE THINGS… before making final decisions

The number of
Google searches
increases year on
year
(Internetlivestat,
2018, curation from
various sources)

15% of queries
every day are new
(Google)

We have become
very good at
filtering out things
which are NOT
interesting
enough (8 second
filter)

It’s not a
short
attention-
span thing

Otherwise we
would not binge on
‘Stranger Things’

This is cognitive
load management
& information
filtering

AT THE SAME TIME words
are problematic. Ambiguous…
polysemous… synonymous

In spoken word it is
even worse because of
homophones and
prosody

Like “four
candles” and
“fork handles”

Which does not bode well for the likes of
conversational search

In query understanding
sometimes users don’t
know what they want
either

Sometimes
exactly the same
users express an
information need
in a different way

Sometimes
different users use
lots of different
ways to mean
exactly the same
thing

'The Vocabulary
Problem’
Furnas, G.W., Landauer, T.K.,
Gomez, L.M. and Dumais,
S.T., 1987. The vocabulary
problem in human-system
communication. Communicatio
ns of the ACM, 30(11), pp.964-
971.
1987

One of the inventors of
‘Latent Semantic
Indexing’, created to
solve ‘The Vocabulary
Problem’ whilst
researching at Bellcore
(1990)

BTW… No-one said LSI was used by
Google (aside)

Often words have
multiple meanings.
Like “like” can be 5
possible parts of
speech (POS)

Sometimes the searcher query is a
‘cold start’ query

Broad or cold start
queries might call for
result diversification
due to lack of intent
detection

Search
engines may
return a
broad blend
of results to
match these
queries
Freshness
Serendipity
Novelty
Diversity

The searcher has to click
around to provide feedback
on their intent or
reformulate the query by
entering something else
(‘query refinement’)

To then deliver
sequential queries
with greater intent
understanding

Query
refinement says…
“Your move next”

And
maybe…
MARKOV
CHAINS
Where the results are
dependent on the
previous step
https://en.wikipedia.org/wiki/Mark
ov_chain

Like RPG (Role Playing Games)
Where choices made determine the next choices given

People also ask / related queries are a special kind of ‘Query
Refinement’

A kind of
‘probability-driven
fork in the road’
http://delivery.acm.org/10.1145/1780000/1772776/p841-
sadikov.pdf (Sadikov et al, 2010) CLUSTERING
QUERY REFINEMENTS BY USER INTENT

BUT word’s meaning &
user intent /context
combined are still very
hard to understand for
search engines

Despite huge leaps forward in
query classification & natural
language understanding

Accelerated mostly
by Google’s BERT

Stanford Question And
Answer Dataset 2.0
• Rajpurkar, P., Zhang, J.,
Lopyrev, K. and Liang, P., 2016.
Squad: 100,000+ questions for
machine comprehension of
text. arXiv preprint
arXiv:1606.05250.

Time (& Space) context
shifts query intent further

The exact same queries
have different intent at
different times &
different locations

Let’s Take The Query [Easter]

Modeling & Predicting Behavioural Dynamics on The Web (Radinsky
et al, 2012)

“When users’ information
needs change over time, the
ranking of results should also
change to accommodate
these needs.” (Radinsky,
2013)

This is ‘Query
Intent Shift’

Your ranking flux might well be shifting query intents at scale

The intent of queries changes over time

Thought: The Mobile-First
Index is built For mobile
first and mobile is probably
‘VERY’ query dynamic,
local & temporal (shifty)

The passage of time adds new meaning to
queries sometimes too

The rise and fall
of the
Blackberry?

At certain times far more intents will be transactional

And sometimes only
reasons a particular
audience would
understand spike
temporal queries

[Four candles] + [fork handles] interest over time

Sometimes it is other events which trigger unexpected queries

What… A…
Nightmare
Queries Are

Maybe It’s Time
For A Change?

Enter… The Next 20 Years of Search

Hmm… That sounds big Google… This is HUGE

Three FUNDAMENTAL shifts in
search

1. The shift from answers
to journeys
2. The shift from queries
to query-less
3. The shift from text to
visual information

The shift from text to more
visual information is simple

Images are much
easier to mentally
consume than text &
audio

Images & video engage… Images
& video entertain
Images & video provoke
emotion

Photography
app usage had
a 210%
increase
between 2016
and 2018
according to
App Annie

People spend on average
2.6x more time on pages
with video

Image search is curation. Totally different to text-based search

Think accessibility first with
images & videos. You won’t
go far wrong (alt / caption /
file / description / title)

Go nuts with quality images & video

This feels like
a huge UX /
accessibility
shift…
Hoorah

What about the
switch to journeys
and query-less?

“Easier if we can model: who is
asking, what they have done in
the past, where they are, when it
is, etc.” (Susan Dumais, CIKM,
2016)

“Queries Are Difficult To Understand
in Isolation” (Susan Dumais,
Microsoft Research, 2016)

AKA - Contextual Search =
User + Time + Location + Device
+ Task

Better still… what about predicting
the user’s informational needs to
proactively make suggestions

“Nevertheless, as the world is
becoming more mobile-centric, this
old-fashioned query-driven search
scenario and clickbased evaluation
mechanism can no longer catch up with
the rapid evolution of user demand on
mobile devices.” (Song and Guo,2016
(Microsoft Research))

“”Therefore,a more user-friendly,
mobile-centric and scenario driven
search paradigm that requires
minimal user inputs is ready to
come out” (Song and Guo,2016
(Microsoft Research))

Zero-Query Queries – No
Query Required

Personalising Search via Interests & Activities
2005 paper awarded the 2017 SIGIR Test of Time Award. Cited 1029 times to date
Teevan, J., Dumais, S.T. and Horvitz, E., 2005, August. Personalizing search via automated
analysis of interests and activities. In Proceedings of the 28th annual international ACM

QueryLess: Next Gen Proactive Search And Recommender Engines (2016)

It kind of sounds
like Google
Discover

At last announcement
Google Discover had
800 million users (May,
2018)

And it’s on the home
page of the mobile
browser now too...
How many users now?

It knows you… and the things you
do… where you’ve been… where
you’re going

Mobile
Device
Sensors (14
sensors or
more)
Proximity sensors
GPS sensor
Ambient light sensor
Accelerometer
Compass
Gyroscope
Back illuminated sensor

‘The User (needs) is The
Query’

This is Task-Driven
Search &
Recommender
Systems

Google’s Recommender Systems

Google Scholar is now a Recommender System
Too

QueryLess: Next Gen
Proactive Search And
Recommender Engines

“Patterns were spotted about
repetitive task driven search
behaviours – predictable” (Song
& Guo, 2016)

“Predictable task timeline
patterns are more prevalent on
mobile devices” (Song & Guo,
2016)

Like e.g. ‘checking the stock
market’ every morning if
you’re interested in stocks
and shares

People are creatures of habit it seems

“In many cases predicting
informational needs
removes the need for the
query & reactive search
engine” (Song & Guo,
2016)

Google Discover looks to
be focusing on hobbies,
interests, news and
social activities

An information need is rarely a
task with a single finite item

It’s more like a series of little chunks (sub-tasks)

Tasks & timelines
go hand in hand…
it seems

Many tasks & intents can
be modelled according
to predicted patterns

Very Recent Microsoft Research

The Ideal is
Personalisation
• Not easy to achieve
fully
• Sparsity of data
• Privacy concerns
• Broken sequences

In the absence of personalization… collaborative Filtering

There are other
people nearly
like you

You (and me) are
unique… but
may be similar

Matrix Factorisation (Netflix
Recommendation System) + Matrix
Factorisation (WALS Algorithm,
Tensorflow)

Tensorflow Matrix
Factorisation

Based on users
liking the same
things (with
hidden common
preferences)

Those sharing similar interests likely share
other hidden interests too (i.e. the system
does not know of them yet)

Understand the user,
understand their
cohort… Understand
other similar
informational needs

Reinforcement learning
thrives from rewards (implicit
feedback)

YouTube is a Recommender System

YouTube Feedback Controls is ‘The Human in The Loop’

Toward a Personal
Knowledge Graph

The two sides of assistant will both be proactive
Provide
answers
/ search
Conversation
Search
Help
with
activities
/ tasks
Conversation
Actions

Extend Actions on Google using Machine
Learning

Understand your customers to assist with AI
Perceived
Information need
Micro-task
Micro-task Micro-task Micro-task Micro-task Task
Micro-task Micro-task Micro-task Micro-task Task
Micro-task Micro-task Task
Micro-task
Micro-task
Micro-task Task
Micro-task Micro-task Task
Micro-task Task
We can identify the user’s
probable top tasks &
subtasks
Identify their needs & what
info they need along the
way

Tell us about
the tasks, order
and steps
involved in
booking a hotel

Many built-in intents & many ‘coming soon’

Multi-platforming
• Switching between search
and video
• Between search and a
recommender system

Connecting
Tasks Across
Devices &
Applications

Truly PERSONAL AI is not possible
without a PERSONAL KNOWLEDGE
GRAPH (Krisztian Balog, ECIR
2019)

Building a
Personal
Knowledge
Graph

A Recent Microsoft Personal Knowledge Graph Patent

Assistant + Home + Discover +
Search App + Desktop +
Location Tracker + Calendar +
Gmail + YouTube

Carrier’s for
Recommender Systems

Where the
user is truly
‘the query’

Realise… your ranking tools
are mostly wrong

Identify interests
& affinity groups

Map every single
informational need sub-
task you can think of to
the sections of a model
like the RACE model

Map & cluster ‘Related’
content by task type.
Categories are too broad,
and topics may be too

Instead of continually
creating Moooaaaarrr
content, make what
you have better

Continually update and improve on
solid URL evergreen content

Continually update and
improve on solid URL
seasonal temporal content

Map content clearly to tasks
& task timelines

Maximise the ‘Local’ Opportunity

Identify predictable
patterns of user behavior

The QueryLess
change will
not come
overnight…
things move
slowly

Go
• Go big on evergreen content & keep updated
Optimise
• Optimise images well – think curation / collections
Map
• Map user journeys to content plans
Optimise
• video well – enhance with markup / transcription
Get
• Get personal – keep refining segments / personas
Identify
• Identify & cluster content around task timelines
Use
• Use relatedness across content, tasks & temporality

Keep in Touch
•@dawnieando
•@BeBertey

You may need a
dual or multi-
armed content
strategy
2%

Book hotel
intent
When do you
want to stay?
dates
dates
How many
nights?
3 nights 2 nights
Overnight A week
Single or
double room?
Single room Double room
Programme your
own expected
questions and
answers

References
• Broder, A., 2002, September. A taxonomy of web search. In ACM Sigir forum (Vol. 36, No.
2, pp. 3-10). ACM.
• Chuklin, A., Severyn, A., Trippas, J., Alfonseca, E., Silen, H. and Spina, D., 2018. Prosody
Modifications for Question-Answering in Voice-Only Settings. arXiv preprint
arXiv:1806.03957.
• HigherVisibility. 2018. How Popular is Voice Search? | HigherVisibility. [ONLINE] Available
at: https://www.highervisibility.com/blog/how-popular-is-voice-search/
• Filippova, K., Alfonseca, E., Colmenares, C.A., Kaiser, L. and Vinyals, O., 2015. Sentence
compression by deletion with lstms. In Proceedings of the 2015 Conference on Empirical
Methods in Natural Language Processing (pp. 360-368).
• Filippova, K. and Alfonseca, E., 2015. Fast k-best sentence compression. arXiv preprint
arXiv:1510.08418.
• Google Developers. 2018. Content-based Actions | Actions on Google | Google
Developers. [ONLINE] Available at: https://developers.google.com/actions/content-
actions/. [Accessed 18 June 2018]

References
• Mitkov, R., 2014. Anaphora resolution. Routledge.
• NLP Department - Stanford University - Imran Q Sayed. 2018. Issues in Anaphora Resolution.
[ONLINE] Available
at: https://nlp.stanford.edu/courses/cs224n/2003/fp/iqsayed/project_report.pdf. [Accessed 28
June 2018].
• Radlinski, F. and Craswell, N., 2017, March. A theoretical framework for conversational search.
In Proceedings of the 2017 Conference on Conference Human Information Interaction and
Retrieval (pp. 117-126). ACM.
• Schalkwyk, J., Beeferman, D., Beaufays, F., Byrne, B., Chelba, C., Cohen, M., Kamvar, M. and
Strope, B., 2010. “Your word is my command”: Google search by voice: a case study. In Advances
in speech recognition (pp. 61-90). Springer, Boston, MA.
• SISTRIX. 2018. Stepping out of the SEO Bubble - SISTRIX. [ONLINE] Available
at: https://www.sistrix.com/blog/stepping-out-of-the-seo-bubble/. [Accessed 16 June 2018].
• Presentation at ESSIR2017 on work by Radinsky, K., Svore, K.M., Dumais, S.T., Shokouhi, M.,
Teevan, J., Bocharov, A. and Horvitz, E., 2013. Behavioral dynamics on the web: Learning,
modeling, and prediction. ACM Transactions on Information Systems (TOIS), 31(3), p.16.

References
• The Stanford Question Answering Dataset. 2018. The Stanford
Question Answering Dataset. [ONLINE] Available
at: https://rajpurkar.github.io/SQuAD-explorer/.
• Trippas, J.R., Spina, D., Cavedon, L., Joho, H. and Sanderson, M., 2018.
Informing the Design of Spoken Conversational Search.
• https://medium.com/@ashishgupta031/sequence-aware-
reinforcement-learning-over-knowledge-graphs-a8af155e716c
• Jansen, B.J., Booth, D.L. and Spink, A., 2008. Determining the
informational, navigational, and transactional intent of Web
queries. Information Processing & Management, 44(3), pp.1251-
1266.

References
Radinsky, K., Svore, K.M., Dumais, S.T., Shokouhi, M., Teevan, J., Bocharov, A. and Horvitz, E., 2013.
Behavioral dynamics on the web: Learning, modeling, and prediction. ACM Transactions on Information
Systems (TOIS), 31(3), p.16
Sadikov, E., Madhavan, J. and Halevy, A., Google LLC, 2013. Clustering query
refinements by inferred user intent. U.S. Patent 8,423,538.
Official Google Webmaster Central Blog. 2019. Official Google Webmaster Central
Blog: Rolling out mobile-first indexing . [ONLINE] Available
at: https://webmasters.googleblog.com/2018/03/rolling-out-mobile-first-
indexing.html. [Accessed 25 September 2019].
Zhou, S., Cheng, K. and Men, L., 2017, April. The survey of large-scale query
classification. In AIP Conference Proceedings (Vol. 1834, No. 1, p. 040045). AIP
Publishing.

References
Search Engine Land. 2019. Starting July 1, all new sites will be indexed using Google's
mobile-first indexing - Search Engine Land. [ONLINE] Available
at: https://searchengineland.com/july-1-new-sites-will-be-indexed-using-googles-mobile-
first-indexing-317490. [Accessed 25 September 2019].
Teevan, J., Dumais, S.T. and Horvitz, E., 2005, August. Personalizing search via
automated analysis of interests and activities. In Proceedings of the 28th annual
international ACM SIGIR conference on Research and development in information
retrieval (pp. 449-456). ACM.
Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R. and Deng, L.,
2016. MS MARCO: A Human-Generated MAchine Reading COmprehension Dataset.

The task (query mission)
tied to the page type
matters hugely

Understand the
shared preferences,
learn the hidden
preferences

Bias &
Reproducibility is
a Challenge

Bias Considerations
Presentation Bias Programming Bias
Audience
Manipulated Bias
(e.g fake reviews)
Machine Learning
/ AI Bias (Black
box algorithms)
Matthew’s Law
Zipfian
Distribution of
Web Content

Spotify add novelty items to home
page to avoid biased personalisation

Do yourself a favour and follow
Mounia Lalmas @mounialalmas

Bias on the web and recommender systems

Reproducibility
problems in
research & RecSys
(very high)

MS MARCO Paper
• Nguyen, T., Rosenberg, M.,
Song, X., Gao, J., Tiwary, S.,
Majumder, R. and Deng, L.,
2016. MS MARCO: A Human-
Generated MAchine Reading
COmprehension Dataset.

Query Classifications
perhaps? - There are
some we know of
already

A Taxonomy of Web
Search (Broder, 2002)
Informational
Navigational
Transactional

Google’s
Quality
Raters
Guide
simplifies &
extends
these
Know query == Informational
Website query == Navigational
Do query == Transactional
Visit in person == Local intent

There are
also several
types of
queries too
(Krisztian
Balog, ECIR,
2019)
Keyword queries (Normal keyword queries)
Keyword++ queries (Faceted / filtered
queries)
Zero-Query queries (User is the query)
Natural language queries
Structured queries (e.g. SQL)

80% of all
queries are
information
al in nature
(Jansen et
al, 2008)
80%
10%
10%
Query Intent Split
Informational Transactional Navigational

Temporal Dynamic Intent (Burstiness) is a huge factor for intent

Broder et al
39 - 48% Informational
20 – 25% Navigational
30 – 36% Transactional

“dresses”, “shoes”,
“bags”
“buy dresses”, “buy
shoes”, “buy bags”,
“dress sales”, “shoe
sales”
Really means

Google now processes over
40,000 search queries
every second on average
(Source: Dubious
Internetlivestats estimation)

And local
intent
considerations

Another Great ‘Ronnies’ Sketch
BTW

‘iPhone’ – Query Example (Google Quality Raters Guidelines)

IR researchers competing over natural language understanding

What did you
really mean
when you
searched for
‘Easter’?
• Radinsky, K., Svore, K.M., Dumais, S.T., Shokouhi, M., Teevan, J.,
Bocharov, A. and Horvitz, E., 2013. Behavioral dynamics on the web:
Learning, modeling, and prediction. ACM Transactions on Information
Systems (TOIS), 31(3), p.16.
When did
you search
for ‘Easter’?
A few
weeks
before
Easter
A few days
before
Easter
During
Easter
What you mostly
meant
When is
Easter?
Things to do at
Easter
What is the
meaning of
Easter?

The problem is consistent
high precision is nowhere in
sight

And if we are to move into multi-device
ubiquitious search then…

Accuracy of
results is more
important
than quantity

Paraphrase handling on
‘Actions’ appears to be
programmable

Gets Round The
Vocabulary &
Disambiguation
Problem

Understand your customers to assist with AI
Customer
Service Data
Customer
Panels
Email
questions
FAQs
Build
Assistant App

Extend Actions
on Google
using Machine
Learning

Think about
the different
features
which
matter to
users more
dependent
on the
domain
News (freshness)
Jobs (salary, job title, location)
Restaurants (location, cuisine)
Shopping (price)

By 2022 PCs will
account for only 19
percent of IP traffic
(Comscore, 2019)

We know there are
various versions of query
intent classifications
(Broder, 2002; Rose;
Jansen, 2008)

Interest over time for Google Home & Amazon Alexa

The User is the Query - The Rise of Predictive Proactive Search

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Similar to The User is the Query - The Rise of Predictive Proactive Search

Similar to The User is the Query - The Rise of Predictive Proactive Search (20)

More from Dawn Anderson MSc DigM

More from Dawn Anderson MSc DigM (20)

Recently uploaded

Recently uploaded (20)

The User is the Query - The Rise of Predictive Proactive Search