5. Introduction Query Recommender Systems
Introduction
Query recommendation/suggestion consists of:
making expert users help not-expert ones.
It is long standing
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 4 / 56
6. Introduction Query Recommender Systems
Introduction
Query recommendation/suggestion consists of:
making expert users help not-expert ones.
It is long standing
I know...
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 4 / 56
7. Introduction Query Recommender Systems
Introduction
Query recommendation/suggestion consists of:
making expert users help not-expert ones.
It is long standing
I know... search engines already have very good query suggestion
methods!
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 4 / 56
8. Introduction Query Recommender Systems
Disclaimer: I am not going to talk about...
.
... autocomplete like systems
.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 5 / 56
9. Introduction Query Recommender Systems
Disclaimer: I am not going to talk about...
.
... query spelling correction
.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 6 / 56
10. Introduction Query Recommender Systems
Instead, I will cover...
.
... Related Search like systems
.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 7 / 56
11. Introduction Query Distribution
Outline
.
1 Introduction
Query Recommender Systems
Query Distribution
.
2 (Recent) Related Work
.
3 Query Suggestion in the Long Tail
Search Shortcuts: an IR approach
A Graph-based approach
.
4 Conclusion and Future Work
Conclusion
Applications of Search Shortcuts
Future Directions
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 8 / 56
12. Introduction Query Distribution
The (in)famous Power Law
.
The Main Ingredient: Query Logs
.
One of the best kept industrial secrets...
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 9 / 56
13. Introduction Query Distribution
The (in)famous Power Law
.
The Main Ingredient: Query Logs
.
One of the best kept industrial secrets... Do you remember AOL?
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 9 / 56
14. Introduction Query Distribution
The (in)famous Power Law
.
The Main Ingredient: Query Logs
.
One of the best kept industrial secrets... Do you remember AOL?
.
.
0.16
0.14
0.12
0.1
Head queries
0.08
0.06
Torso queries
0.04 Tail queries
0.02
0
. 0 200 400 600 800
. . .
1000
. . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 9 / 56
15. Introduction Query Distribution
The (in)famous Power Law
.
More/Real Power Laws
.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 10 / 56
16. Introduction Query Distribution
Ordinary People with Extraordinary Tastes
.
S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. In
Proceedings of the third ACM international conference on Web search and data mining (WSDM 2010). ACM, New York, NY,
.
USA, 201-210.
.
Main Characteristics
.
A relatively small number of items account for a disproportionately
large fraction of total consumption; and
The tail, in aggregate, is nevertheless relatively heavy.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 11 / 56
17. Introduction Query Distribution
Ordinary People with Extraordinary Tastes
.
S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. In
Proceedings of the third ACM international conference on Web search and data mining (WSDM 2010). ACM, New York, NY,
.
USA, 201-210.
.
Main Characteristics
.
A relatively small number of items account for a disproportionately
large fraction of total consumption; and
The tail, in aggregate, is nevertheless relatively heavy.
.
.
The bottom line
.
Satisfying requests for the head of the distribution is good in the majority
of the cases but only corresponds to a partial satisfaction for the vast
majority of users.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 11 / 56
18. (Recent) Related Work
Outline
.
1 Introduction
Query Recommender Systems
Query Distribution
.
2 (Recent) Related Work
.
3 Query Suggestion in the Long Tail
Search Shortcuts: an IR approach
A Graph-based approach
.
4 Conclusion and Future Work
Conclusion
Applications of Search Shortcuts
Future Directions
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 12 / 56
19. (Recent) Related Work
(Some) Recent Works on Query Suggestion
.
S. Bhatia, D. Majumdar, and P. Mitra. Query suggestions in the absence of
query logs. In SIGIR 2011. 795-804.
.
.
Y. Song, D. Zhou, and L. He. Query suggestion by constructing
term-transition graphs. In WSDM 2012. 353-362.
.
.
U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. Learning to
suggest: a machine learning framework for ranking query suggestions. In
.
SIGIR 2012. 25-34.
.
R. L. T. Santos, C. Macdonald, I. Ounis. Learning to rank query suggestions
for
. adhoc and diversity search. In Information Retrieval. September 2012.
.
R. Baeza-Yates and A. Tiberi. Extracting semantic relations from query logs.
. KDD 2007. 76-85.
In . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 13 / 56
20. (Recent) Related Work
Query Flow Graph
.
P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In
.
Proceedings of the 17th ACM conference on Information and knowledge management (CIKM 2008). 609-618.
.
Query Log as a Graph
Edge weights are learned from
query log based features
Recommendations are computed
using Random Walk with
Restart from the submitted
query
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 14 / 56
21. (Recent) Related Work
Head Queries are Easy!
.
The number of head queries is limited ⇒ Precompute recommendations!
.
Query Suggestions
google “google images”, “google email”, “google books”, . . .
facebook “facebook home page”, “facebook search”, . . .
apple “apple store”, “best buy”, “att”, “iphone 5”, . . .
. ... ...
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 15 / 56
22. (Recent) Related Work
Head Queries are Easy!
.
The number of head queries is limited ⇒ Precompute recommendations!
.
Query Suggestions
google “google images”, “google email”, “google books”, . . .
facebook “facebook home page”, “facebook search”, . . .
apple “apple store”, “best buy”, “att”, “iphone 5”, . . .
. ... ...
.
Roughly, a sort of Static Cache for recommendations on frequent
queries.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 15 / 56
23. (Recent) Related Work
Head Queries are Easy!
.
The number of head queries is limited ⇒ Precompute recommendations!
.
Query Suggestions
google “google images”, “google email”, “google books”, . . .
facebook “facebook home page”, “facebook search”, . . .
apple “apple store”, “best buy”, “att”, “iphone 5”, . . .
. ... ...
.
Roughly, a sort of Static Cache for recommendations on frequent
queries.
What about other queries?
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 15 / 56
24. (Recent) Related Work
Head Queries are Easy!
.
The number of head queries is limited ⇒ Precompute recommendations!
.
Query Suggestions
google “google images”, “google email”, “google books”, . . .
facebook “facebook home page”, “facebook search”, . . .
apple “apple store”, “best buy”, “att”, “iphone 5”, . . .
... ...
.
If Google does it...
.
we can do it as well!
Roughly, a sort of Static Cache for recommendations on frequent
queries. .
What about other queries?
.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 15 / 56
25. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Outline
.
1 Introduction
Query Recommender Systems
Query Distribution
.
2 (Recent) Related Work
.
3 Query Suggestion in the Long Tail
Search Shortcuts: an IR approach
A Graph-based approach
.
4 Conclusion and Future Work
Conclusion
Applications of Search Shortcuts
Future Directions
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 16 / 56
26. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
The Original Idea
.
Problem Definition: Search Shortcuts
.
Let σ =< q1 . . . qn > be a satisfactory session. The similarity of a k-way
shortcut h on a head σt| and a tail σ|t is defined as
∑ ∑
n−t [ ( ) ]
q = σ|t m f (m)
( ( ) ) q∈h(σt| ) m=1
s h σt| , σ|t =
|h(σt| )|
Where f (·) is a monotonic increasing function. The function [q = σm ] = 1
. 1 ≤ if and only if the query q is equal to the query σm .
for
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 17 / 56
27. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
The Original Idea
.
Problem Definition: Search Shortcuts
.
Let σ =< q1 . . . qn > be a satisfactory session. The similarity of a k-way
shortcut h on a head σt| and a tail σ|t is defined as
∑ ∑
n−t [ ( ) ]
q = σ|t m f (m)
( ( ) ) q∈h(σt| ) m=1
s h σt| , σ|t =
|h(σt| )|
Where f (·) is a monotonic increasing function. The function [q = σm ] = 1 .
. 1 ≤ if and only if the query q is equal to the query σm .
for
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 17 / 56
28. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
The Original Idea: Almost Without Equations
.
Problem Definition: Search Shortcuts
.
+++"
++"
Shortcuts)
Gen)
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 18 / 56
29. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Take 1: Shortcuts via Collaborative Filtering
.
R. Baraglia, F. Cacheda, V. Carneiro, D. Fernandez, V. Formoso, R. Perego, and F. Silvestri. Search shortcuts: a new approach
.
to the recommendation of queries. In Proc. of the third ACM conference on Recommender systems (RecSys 2009). 77-84.
.
Main features:
.
Experiments on the AOL and MSN query logs.
We map shortcuts into a collaborative filtering problem:
Sessions are users
Queries are items
We only focus on the last query of a session:
last query in a session is clicked → rating = 10
not clicked → rating = 0
all the other queries in the session have neutral rating = 5.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 19 / 56
30. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Take 2: Indexing Sessions
.
D. Broccolo, L. Marcon, F. M. Nardini, R. Perego, F. Silvestri. Generating suggestions for queries in the long tail with an
.
inverted index. Inf. Process. Manage. 48(2). 326-339 (2012)
.
Sessions as Virtual Documents
.
<DOC>
<DOC>
<DOCNO> google images </DOCNO>
<DOCNO> fabrizio silvestri </DOCNO>
<BODY>
<BODY>
google
silvestri isti
image search
silvestri cnr
multimedia search engine
ir cnr pisa
image search engine
fabrizio caching pisa
picture search
fabrizio query log
photo search
</BODY>
</BODY>
</DOC>
. </DOC>
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 20 / 56
31. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Take 2: Indexing Sessions
.
Index virtual documents using your favourite IR system
Rank virtual documents, i.e., queries according to
δ(τ, σ s ) = α · BM25(τ, σ s ) + β · freq(σn )
s
where:
σ ′ is the current session performed by the user,
σn is the last query in session σ s
s
the sequence τ is the concatenation of all terms with possible
repetitions appearing in σt| , i.e., the head of length t of session σ ′ .
′
Intuitively, δ(τ, σ s ) measures how much a previously seen session
overlaps with the user need expressed so far (the concatenation of
terms τ serves as a bag-of-words model of user need).
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 21 / 56
32. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Experimental Setting
.
Dataset
.
Microsoft RFP 2006 query log as the dataset.
sessions are consecutive queries by the same users submitted within 30 minutes.
Terrier search engines used to index the resulting 1, 191, 143 virtual documents.
We compare Search Shortcuts (SS) with Covergraph (CG) and Query Flow Graph (QFG).
.
.
We exploited the query topics provided by NIST for running the
TREC 2009 Web Track’s Diversity Task
TREC query (n. 8): appraisal
S1: What companies can give an appraisal of my home’s value?
S2: I’m looking for companies that appraise jewelry
S3: Find examples of employee performance appraisals
S4: I’m looking for web sites that do antique appraisals
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 22 / 56
33. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Metrics Used
.
Coverage
.
Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
suggestion technique T , we say that T has coverage equal to c if n · c
subtopics match suggestions generated by T .
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 23 / 56
34. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Metrics Used
.
Coverage
.
Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
suggestion technique T , we say that T has coverage equal to c if n · c
subtopics match suggestions generated by T .
A coverage of 0.8 for the top-10 suggestions generated for a query q having 5 subtopics
means that 4 subtopics of q are covered by at least one suggestion.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 23 / 56
35. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Metrics Used
.
Coverage
.
Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
suggestion technique T , we say that T has coverage equal to c if n · c
subtopics match suggestions generated by T .
A coverage of 0.8 for the top-10 suggestions generated for a query q having 5 subtopics
means that 4 subtopics of q are covered by at least one suggestion.
.
.
Effectiveness
.
Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
suggestion technique T generating k suggestions, we say that T has
effectiveness equal to e if k · e suggestions cover at least one subtopic.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 23 / 56
36. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Metrics Used
.
Coverage
.
Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
suggestion technique T , we say that T has coverage equal to c if n · c
subtopics match suggestions generated by T .
A coverage of 0.8 for the top-10 suggestions generated for a query q having 5 subtopics
means that 4 subtopics of q are covered by at least one suggestion.
.
.
Effectiveness
.
Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
suggestion technique T generating k suggestions, we say that T has
effectiveness equal to e if k · e suggestions cover at least one subtopic.
An effectiveness of 0.1 on the top-10 suggestions generated for a query q means that
only
. one suggestion is relevant for one of the subtopics of q.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 23 / 56
37. Query Suggestion in the Long Tail Search Shortcuts: an IR approach
Anecdotal Results
.
Query and its subtopics SS QFG CG
TREC query (n. 8): appraisal performance appraisal (S3) online appraisals (S4) appraisersdotcom (S4)
S1: What companies can give an hernando county property employee appraisals (S3)
appraisal of my home’s value? appraiser (S1) real estate appraisals (S1)
S2: I’m looking for companies antique appraisal (S4) appraisers (S1)
that appraise jewelry. appraisers in employee appraisals
S3: Find examples of employee colorado (S1) forms (S3)
performance appraisals. appraisals etc (S1) appraisers.com (S4)
S4: I’m looking for web sites appraisers.com (S4) gmac
that do antique appraisals. find appraiser (S1) appraisers
beverly wv (S1)
wachovia bank picket fence
appraisals (S1) appraisal (S1)
. appraisersdotcom (S4) fossillo creek san antonio
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 24 / 56
40. Query Suggestion in the Long Tail A Graph-based approach
Outline
.
1 Introduction
Query Recommender Systems
Query Distribution
.
2 (Recent) Related Work
.
3 Query Suggestion in the Long Tail
Search Shortcuts: an IR approach
A Graph-based approach
.
4 Conclusion and Future Work
Conclusion
Applications of Search Shortcuts
Future Directions
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 27 / 56
41. Query Suggestion in the Long Tail A Graph-based approach
Take 3: TQ-Graph
.
F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. Efficient query recommendations in the long tail via
center-piece subgraphs. In Proceedings of the 35th international ACM SIGIR conference on Research and development in
.
information retrieval (SIGIR 2012). 345-354.
.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 28 / 56
42. Query Suggestion in the Long Tail A Graph-based approach
RWR vs. CePS
.
Given a query Q = {t1 , t2 , . . . , tk } there are two different alternatives:
.
Random Walk with Restart (RWR) from nodes corresponding to
terms in Q;
Center-Piece Subgraph (CePS) (Tong and Faloutsos KDD 2006)
induced by nodes of terms in Q.
.
.
Query: lower heart rate (not occurring in the Query Log!)
.
TQ-Graph Suggestions RWR Suggestions
things to lower heart rate broken heart
lower heart rate through exercise prime rate
accelerated heart rate and pregnant exchange rate
web md bank rate
. heart problems currency exchange rate
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 29 / 56
43. Query Suggestion in the Long Tail A Graph-based approach
CePS: a Primer
.
Given an edge-weighted undirected graph G, set vertices Q from G,
and an integer budget b
Find a connected subgraph H containing vertices in Q and at most b
other vertices that maximizes a “goodness” function g (H).
.
.
∑
g (H) = r (Q, j)
j∈H
∏
r (Q, j) = r (i, j)
. i∈Q
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 30 / 56
44. Query Suggestion in the Long Tail A Graph-based approach
CePS: a Primer
.
Given an edge-weighted undirected graph G, set vertices Q from G,
and an integer budget b
Find a connected subgraph H containing vertices in Q and at most b
other vertices that maximizes a “goodness” function g (H).
. r (i, j) is the stationary probabil-
ity of term j in . a RWR from i.
. Restart probability is α. ∑
g (H) = r (Q, j)
j∈H
∏
r (Q, j) = r (i, j)
. i∈Q
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 30 / 56
45. Query Suggestion in the Long Tail A Graph-based approach
Assessing Effectiveness
.
Datasets
.
Two TQ-Graphs built from MSN and Yahoo! query logs
Two different query sets for evaluations:
50 queries of the standard TREC Web diversification track testbed.
100 queries randomly chosen from the Yahoo! query log.
.
.
Statistics
.
MSN Yahoo!
#queries 15M 581M
#terms 36M 1, 344M
#query nodes 7M 29M
#term nodes 2M 6M
#dangling nodes 15% 35%
#queries (freq = 1) 5M 162M
. #terms (freq = 1) 5M 2M
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 31 / 56
46. Query Suggestion in the Long Tail A Graph-based approach
Assessing Effectiveness
.
User Study
.
Top-5 recommendations for each test query and technique
The suggestions were shuffled and presented to 10 non-CS assessors that
were asked to rate them using useful, somewhat useful, and not useful.
.
.
Frequency in the corresponding log of all the queries in the two testbeds.
.
100000
Frequency on Yahoo
10000
Frequency on MSN
100
1000
100 10
10
1 1
0 10 20 30 40 50 0 20 40 60 80 100
Query TREC Random Queries .
. ..
.
..
. . . . . . .
.. .. .. .. ..
. . . . . .
.. .. .. .. .. .. .. .. ..
.
..
.
..
.
..
.
..
.
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 32 / 56
47. Query Suggestion in the Long Tail A Graph-based approach
Setting the parameter α
.
TREC on MSN useful somewhat not useful
α = 0.9 57% 16% 27%
α = 0.5 32% 13% 55%
α = 0.1 22% 12% 66%
100 queries on Yahoo! useful somewhat not useful
α = 0.9 48% 11% 41%
α = 0.5 41% 20% 39%
. α = 0.1 37% 20% 43%
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 33 / 56
48. Query Suggestion in the Long Tail A Graph-based approach
Effectiveness on Tail Queries
.
TREC on MSN (unseen) useful somewhat not useful
TQ-Graph α = 0.9 46% 10% 44%
QFG 0% 0% 100%
TREC on MSN (dangling) useful somewhat not useful
TQ-Graph α = 0.9 60% 30% 10%
QFG 0% 0% 100%
TREC on MSN (others) useful somewhat not useful
TQ-Graph α = 0.9 59% 17% 24%
. QFG 61% 13% 26%
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 34 / 56
49. Query Suggestion in the Long Tail A Graph-based approach
Effectiveness on Tail Queries
.
TREC on MSN (unseen) useful somewhat not useful
TQ-Graph α = 0.9 46% 10% 44%
For popular queries
QFG 0% 0% 100%
effectiveness is com-
.
TREC on MSN (dangling) useful somewhat not useful
TQ-Graph α = 0.9
parable with that of
QFG
60%
0%
30%
0%
10%
100%
QFG-based(others) useful
TREC on MSN
models. somewhat not useful
TQ-Graph α = 0.9 59% 17% 24%
. QFG 61% 13% 26%
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 34 / 56
50. Query Suggestion in the Long Tail A Graph-based approach
CePS Efficiency
.
CePS is not NP-Hard even if its solution requires
Ω (|Q| × (|E| + |V|)) where:
|Q|: number of query terms
|E|: number of graph edges
|V|: number of queries
Alternatively, |Q| random walks with restart.
Yet, it remains unfeasible for computing query suggestions online.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 35 / 56
51. Query Suggestion in the Long Tail A Graph-based approach
Stationary Probabilities as Inverted Lists
TQGraph(
Inverted%Index%representa9on%of%the%RWRs%
computed%on%the%TQGraph.%The%lexicon%is%made%
up%of%term%nodes,%pos9ngs%are%the%sta9onary%
distribu9on%values.%
Sta+onary(Distribu+on(of(Query(Nodes(in(the(
TQGraph(as(obtained(by(a(RWR(from(Term(1(
.
Term%1% [1,ε)% [ε,ε2)% [ε2,ε3)% [εi,ε(i+1))%
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Within(buckets(queries(are(sorted(by(their(IDs.(Scores(are(
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.
i( December 5, 2012 36 / 56
52. Query Suggestion in the Long Tail A Graph-based approach
Stationary Probabilities as Inverted Lists
.
For each term we have to store a vector of
⟨queryId1 , pr1 ⟩, ⟨queryId2 , pr2 ⟩, . . . , ⟨queryId|V| , pr|V| ⟩
29 Millions queries x 6 Millions terms = 174 Trillion!!!
Conjecture: Most of the entries are useless.
Solution: remove the entries with lowest probability, i.e., apply
pruning.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 37 / 56
53. Query Suggestion in the Long Tail A Graph-based approach
Index Pruning: Effectiveness
.
MSN
.
MSN query log
70
RWR α=0.1
60 RWR α=0.5
Percentage of dissimilarity
RWR α=0.9
50
40
30
20
10
0
0 5000 10000 15000 20000
. Pruning threshold p
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 38 / 56
55. Query Suggestion in the Long Tail A Graph-based approach
Compressing the Index (I)
.
After pruning we have #terms vectors with p entries of ⟨queryId, pr⟩
(inverted lists).
Is there any efficient way to store the p entries?
Sort by queryId, δ encoded (pr stored as it is)
>= 32 bits per entry.
Sort by pr, δ encoded (queryId stored as it is)
≥ log (#query) bits per entry
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 40 / 56
56. Query Suggestion in the Long Tail A Graph-based approach
Compressing the Index (I)
.
After pruning we have #terms vectors with p entries of ⟨queryId, pr⟩
(inverted lists).
Is there any efficient way to store the p entries?
Sort by queryId, δ encoded (pr stored as it is)
>= 32 bits per entry.
Sort by pr, δ encoded (queryId stored as it is)
≥ log (#query) bits per entry
Could this be improved?
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 40 / 56
57. Query Suggestion in the Long Tail A Graph-based approach
Compressing the Index (II)
.
Lossy Compression
.
For each term, we sort its p entries by their probability values.
We create groups of queryIds with similar probability values:
ϵ−i < pr ≤ ϵ−(i+1) where ϵ ≤ 1 (bucketing).
QueryIds in a bucket are δ encoded.
Given a queryId in bucket i the relative probability is approximated
by ϵ−(i+1) .
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 41 / 56
58. Query Suggestion in the Long Tail A Graph-based approach
Bits per entry vs. ϵ
.
MSN query log
20
RWR α=0.1
RWR α=0.5
18 RWR α=0.9 Smaller ϵ vals → less
buckets
Bits per entry
16
bits per entry
14
ϵ = 1 → naïve
12
approach
10
0 0.2 0.4 0.6 0.8 1
. ε
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 42 / 56
59. Query Suggestion in the Long Tail A Graph-based approach
Error guarantees
.
Our bucketing scheme introduces approximation errors.
→ Error is bounded
The approximated value for probability in the list is at most ϵ−1
smaller than the real value
The error might introduce inversions in real ranking
For two queries q, q′ of m terms with q preceding q′ in a suggestion
ranking, the inversion cannot happen if rq > ϵ−m rq′
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 43 / 56
60. Query Suggestion in the Long Tail A Graph-based approach
Precision Loss after Pruning and Bucketing: avg.
dissimilarity
.
MSN query log
p RWR α = 0.1 RWR α = 0.5 RWR α = 0.9
5, 000 18.48 (8.47) 22.58 (14.11) 42.04 (40.32)
10, 000 17.39 (8.47) 20.97 (12.50) 39.49 (30.65) With ϵ = 0.95 and
15, 000 17.39 (8.47) 20.16 (12.10) 36.31 (25.40) p = 20k we have a
20, 000 17.39 (8.06) 18.55 (11.29) 33.12 (22.18)
100, 000 17.39 (8.06) 18.55 (11.29) 32.48 (21.77) about 37% results
200, 000 17.39 (8.06) 18.55 (11.29) 32.48 (21.77) difference with 14 bits
per entry instead of 71.
Yahoo! query log
p RWR α = 0.1 RWR α = 0.5 RWR α = 0.9 We save 1.1PB in the
5, 000 33.75 (40.30) 37.87 (42.22) 45.11 (47.12)
10, 000 27.76 (34.12) 32.84 (36.89) 40.23 (42.22)
case of Yahoo! query
15, 000 26.18 (31.13) 31.36 (34.33) 38.22 (39.23) log.
20, 000 23.97 (27.72) 28.70 (31.56) 37.07 (38.38)
100, 000 19.24 (17.48) 21.89 (20.26) 31.32 (28.78)
. 200, 000 19.24 (17.70) 21.30 (19.62) 31.90 (28.14)
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 44 / 56
61. Query Suggestion in the Long Tail A Graph-based approach
Effectiveness after Pruning and Bucketing: User Study
.
MSN query log Effectiveness of the
p useful somewhat not useful suggestions provided
5, 000 56% 17% 27% with pruning and
20, 000 55% 15% 30% bucketing as a function
200, 000 55% 15% 30% of p for ϵ = 0.95 and
α = 0.9.
Yahoo! query log In the case of MSN the
p useful somewhat not useful effectiveness for useful
5, 000 46% 29% 25% suggestions was 57% in
20, 000 47% 29% 24% the case of Yahoo!
200, 000 46% 28% 26% 48%.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 45 / 56
62. Query Suggestion in the Long Tail A Graph-based approach
Scaling-up Suggestion Building
.
Does it scale?
.
We pre-compute and store the inverted index.
Maintaining the entire inverted index in main memory is still not
feasible.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 46 / 56
63. Query Suggestion in the Long Tail A Graph-based approach
Scaling-up Suggestion Building
.
Does it scale?
.
We pre-compute and store the inverted index.
Maintaining the entire inverted index in main memory is still not
feasible.
Caching!
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 46 / 56
64. Query Suggestion in the Long Tail A Graph-based approach
MSN Cache Miss (%)
.
MSN query log
70
5,000 entries (15.81 bits)
20,000 entries (16.33 bits)
60 200,000 entries (15.21 bits)
5,000 entries (73.71 bits)
Percentage of cache miss
20,000 entries (73.21 bits)
50
200,000 entries (70.05 bits) With 8GB of main
40 memory we have a
30 cache miss rate
20
< 10%.
10
0
1 2 4 8 16 32
. Cache Size (GB)
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 47 / 56
65. Query Suggestion in the Long Tail A Graph-based approach
Yahoo! Cache Miss (%)
.
Yahoo! query log
100
5,000 entries (13.67 bits)
20,000 entries (14.27 bits)
200,000 entries (16.09 bits)
80 5,000 entries (72.33 bits)
Even in the case of the
Percentage of cache miss
20,000 entries (71.32 bits)
200,000 entries (70.30 bits)
60
big Yahoo! log with
8GB of main memory
40 we have a cache miss
rate < 10%.
20
0
1 2 4 8 16 32
. Cache Size (GB)
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 48 / 56
66. Conclusion and Future Work Conclusion
Outline
.
1 Introduction
Query Recommender Systems
Query Distribution
.
2 (Recent) Related Work
.
3 Query Suggestion in the Long Tail
Search Shortcuts: an IR approach
A Graph-based approach
.
4 Conclusion and Future Work
Conclusion
Applications of Search Shortcuts
Future Directions
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 49 / 56
67. Conclusion and Future Work Conclusion
Conclusions
.
Query recommendation is a tough problem.
We have proposed an overall idea (i.e., reducing the length of users’
querying sessions) and three diverse techniques addressing them.
The graph-based one is the one that better balance requirements and
“theoretical” justification.
Query recommendation based on TQ-Graph gets up to 99% of
coverage,
An index for speeding up online TQ-Graph computation:
Reduce the space occupancy by an average of 80%
95% of hit-ratio (few gigabytes of main memory.)
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 50 / 56
68. Conclusion and Future Work Applications of Search Shortcuts
Outline
.
1 Introduction
Query Recommender Systems
Query Distribution
.
2 (Recent) Related Work
.
3 Query Suggestion in the Long Tail
Search Shortcuts: an IR approach
A Graph-based approach
.
4 Conclusion and Future Work
Conclusion
Applications of Search Shortcuts
Future Directions
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 51 / 56
69. Conclusion and Future Work Applications of Search Shortcuts
Other Applications of Shortcuts
.
Europeana.eu
.
Shortcuts will be used as the suggestion mechanism for queries submitted
to
. the Europeana portal.
.
Search Results Diversity
.
G. Capannini, F. M. Nardini, R. Perego, F. Silvestri. Efficient
Diversification of Web Search Results. PVLDB 4(7). 451-459 (2011)
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 52 / 56
70. Conclusion and Future Work Applications of Search Shortcuts
Other Applications of Shortcuts
.
Session Retrieval
.
I. A. Adeyanju, F. M. Nardini, M-D. Albakour, D. S., U. Kruschwitz.
RGU-ISTI-Essex at TREC 2011 Session Track. Proceedings of the
International Text REtrieval Conference (TREC) 2011. November 2011.
.
.
Intranet Query Recommendations
.
I A. Adeyanju, D. Song, M-D. Albakour, U. Kruschwitz, A. De Roeck, and
M. Fasli. Adaptation of the concept hierarchy model with search
logs for query recommendation on intranets. In Proceedings of the
35th international ACM SIGIR conference on Research and development in
information retrieval (SIGIR 2012). 5-14.
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 53 / 56
71. Conclusion and Future Work Future Directions
Outline
.
1 Introduction
Query Recommender Systems
Query Distribution
.
2 (Recent) Related Work
.
3 Query Suggestion in the Long Tail
Search Shortcuts: an IR approach
A Graph-based approach
.
4 Conclusion and Future Work
Conclusion
Applications of Search Shortcuts
Future Directions
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 54 / 56
72. Conclusion and Future Work Future Directions
Beyonds Query Suggestion
.
Task Recommendation
.
Discover what and how user search tasks are composed together to
accomplish even more complex missions
Suggestions should be also refer to other tasks of a bigger mission
(i.e., task recommendation)
.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 55 / 56