SlideShare a Scribd company logo
1 of 73
Query Recommendations in the Long Tail:
                      Efficient and Effective Techniques.

                                                  Fabrizio Silvestri

                                                 ISTI - CNR, Pisa, Italy


                                                December 5, 2012




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      1 / 56
Outline

   .
   1   Introduction
          Query Recommender Systems
          Query Distribution
   .
   2   (Recent) Related Work
   .
   3   Query Suggestion in the Long Tail
         Search Shortcuts: an IR approach
         A Graph-based approach
   .
   4   Conclusion and Future Work
         Conclusion
         Applications of Search Shortcuts
         Future Directions
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      2 / 56
Introduction    Query Recommender Systems


  Outline

   .
   1   Introduction
          Query Recommender Systems
          Query Distribution
   .
   2   (Recent) Related Work
   .
   3   Query Suggestion in the Long Tail
         Search Shortcuts: an IR approach
         A Graph-based approach
   .
   4   Conclusion and Future Work
         Conclusion
         Applications of Search Shortcuts
         Future Directions
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      3 / 56
Introduction    Query Recommender Systems


  Introduction




          Query recommendation/suggestion consists of:
                  making expert users help not-expert ones.




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      4 / 56
Introduction    Query Recommender Systems


  Introduction




          Query recommendation/suggestion consists of:
                  making expert users help not-expert ones.
          It is long standing




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      4 / 56
Introduction    Query Recommender Systems


  Introduction




          Query recommendation/suggestion consists of:
                  making expert users help not-expert ones.
          It is long standing
          I know...




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      4 / 56
Introduction    Query Recommender Systems


  Introduction




          Query recommendation/suggestion consists of:
                  making expert users help not-expert ones.
          It is long standing
          I know... search engines already have very good query suggestion
          methods!




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      4 / 56
Introduction    Query Recommender Systems


  Disclaimer: I am not going to talk about...

   .
   ... autocomplete like systems
   .




   .
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      5 / 56
Introduction    Query Recommender Systems


  Disclaimer: I am not going to talk about...



   .
   ... query spelling correction
   .




   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      6 / 56
Introduction    Query Recommender Systems


  Instead, I will cover...

   .
   ... Related Search like systems
   .




   .

                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      7 / 56
Introduction    Query Distribution


  Outline

   .
   1   Introduction
          Query Recommender Systems
          Query Distribution
   .
   2   (Recent) Related Work
   .
   3   Query Suggestion in the Long Tail
         Search Shortcuts: an IR approach
         A Graph-based approach
   .
   4   Conclusion and Future Work
         Conclusion
         Applications of Search Shortcuts
         Future Directions
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      8 / 56
Introduction    Query Distribution


  The (in)famous Power Law
   .
   The Main Ingredient: Query Logs
   .
          One of the best kept industrial secrets...
   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      9 / 56
Introduction    Query Distribution


  The (in)famous Power Law
   .
   The Main Ingredient: Query Logs
   .
          One of the best kept industrial secrets... Do you remember AOL?
   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                      9 / 56
Introduction     Query Distribution


  The (in)famous Power Law
   .
   The Main Ingredient: Query Logs
   .
          One of the best kept industrial secrets... Do you remember AOL?
   .
   .
                              0.16



                              0.14



                              0.12



                               0.1


                                         Head queries
                              0.08



                              0.06


                                           Torso queries
                              0.04                                 Tail queries

                              0.02



                                0
   .                                 0        200           400           600             800

                                                                                          .     .    .
                                                                                                                   1000

                                                                                                            . . . . . . . . . . . .               .    .        .    .    .
                                                                                     ..   ..    ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                 December 5, 2012                      9 / 56
Introduction    Query Distribution


  The (in)famous Power Law

   .
   More/Real Power Laws
   .




   .
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    10 / 56
Introduction    Query Distribution


  Ordinary People with Extraordinary Tastes

   .
   S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. In
   Proceedings of the third ACM international conference on Web search and data mining (WSDM 2010). ACM, New York, NY,
   .
   USA, 201-210.


   .
   Main Characteristics
   .
       A relatively small number of items account for a disproportionately
       large fraction of total consumption; and
          The tail, in aggregate, is nevertheless relatively heavy.
   .




                                                                                       .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                  ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                           December 5, 2012                    11 / 56
Introduction    Query Distribution


  Ordinary People with Extraordinary Tastes

   .
   S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. In
   Proceedings of the third ACM international conference on Web search and data mining (WSDM 2010). ACM, New York, NY,
   .
   USA, 201-210.


   .
   Main Characteristics
   .
       A relatively small number of items account for a disproportionately
       large fraction of total consumption; and
          The tail, in aggregate, is nevertheless relatively heavy.
   .
   .
   The bottom line
   .
   Satisfying requests for the head of the distribution is good in the majority
   of the cases but only corresponds to a partial satisfaction for the vast
   majority of users.
   .
                                                                                       .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                  ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                           December 5, 2012                    11 / 56
(Recent) Related Work


  Outline

   .
   1   Introduction
          Query Recommender Systems
          Query Distribution
   .
   2   (Recent) Related Work
   .
   3   Query Suggestion in the Long Tail
         Search Shortcuts: an IR approach
         A Graph-based approach
   .
   4   Conclusion and Future Work
         Conclusion
         Applications of Search Shortcuts
         Future Directions
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    12 / 56
(Recent) Related Work


  (Some) Recent Works on Query Suggestion
   .
   S. Bhatia, D. Majumdar, and P. Mitra. Query suggestions in the absence of
   query logs. In SIGIR 2011. 795-804.
   .
   .
   Y. Song, D. Zhou, and L. He. Query suggestion by constructing
   term-transition graphs. In WSDM 2012. 353-362.
   .
   .
   U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. Learning to
   suggest: a machine learning framework for ranking query suggestions. In
   .
   SIGIR 2012. 25-34.
   .
   R. L. T. Santos, C. Macdonald, I. Ounis. Learning to rank query suggestions
   for
   . adhoc and diversity search. In Information Retrieval. September 2012.
   .
   R. Baeza-Yates and A. Tiberi. Extracting semantic relations from query logs.
   . KDD 2007. 76-85.
   In                                                                                 .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    13 / 56
(Recent) Related Work


  Query Flow Graph
   .
   P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In
   .
   Proceedings of the 17th ACM conference on Information and knowledge management (CIKM 2008). 609-618.


   .


         Query Log as a Graph
         Edge weights are learned from
         query log based features
         Recommendations are computed
         using Random Walk with
         Restart from the submitted
         query

   .
                                                                                        .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                   ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                            December 5, 2012                    14 / 56
(Recent) Related Work


  Head Queries are Easy!

   .
   The number of            head queries is limited ⇒ Precompute recommendations!
   .
        Query                Suggestions
        google               “google images”, “google email”, “google books”, . . .
       facebook              “facebook home page”, “facebook search”, . . .
         apple               “apple store”, “best buy”, “att”, “iphone 5”, . . .
   .      ...                ...




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    15 / 56
(Recent) Related Work


  Head Queries are Easy!

   .
   The number of            head queries is limited ⇒ Precompute recommendations!
   .
        Query                Suggestions
        google               “google images”, “google email”, “google books”, . . .
       facebook              “facebook home page”, “facebook search”, . . .
         apple               “apple store”, “best buy”, “att”, “iphone 5”, . . .
   .      ...                ...

   .
          Roughly, a sort of Static Cache for recommendations on frequent
          queries.

   .


                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    15 / 56
(Recent) Related Work


  Head Queries are Easy!

   .
   The number of            head queries is limited ⇒ Precompute recommendations!
   .
        Query                Suggestions
        google               “google images”, “google email”, “google books”, . . .
       facebook              “facebook home page”, “facebook search”, . . .
         apple               “apple store”, “best buy”, “att”, “iphone 5”, . . .
   .      ...                ...

   .
          Roughly, a sort of Static Cache for recommendations on frequent
          queries.
          What about other queries?
   .


                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    15 / 56
(Recent) Related Work


  Head Queries are Easy!

   .
   The number of            head queries is limited ⇒ Precompute recommendations!
   .
        Query                Suggestions
        google               “google images”, “google email”, “google books”, . . .
       facebook              “facebook home page”, “facebook search”, . . .
         apple               “apple store”, “best buy”, “att”, “iphone 5”, . . .
          ...                ...
   .
                                    If Google does it...
   .
                                  we can do it as well!
          Roughly, a sort of Static Cache for recommendations on frequent
          queries.                                                            .
          What about other queries?
   .
   .
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    15 / 56
Query Suggestion in the Long Tail    Search Shortcuts: an IR approach


  Outline

   .
   1   Introduction
          Query Recommender Systems
          Query Distribution
   .
   2   (Recent) Related Work
   .
   3   Query Suggestion in the Long Tail
         Search Shortcuts: an IR approach
         A Graph-based approach
   .
   4   Conclusion and Future Work
         Conclusion
         Applications of Search Shortcuts
         Future Directions
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    16 / 56
Query Suggestion in the Long Tail    Search Shortcuts: an IR approach


  The Original Idea

   .
   Problem Definition: Search Shortcuts
   .
   Let σ =< q1 . . . qn > be a satisfactory session. The similarity of a k-way
   shortcut h on a head σt| and a tail σ|t is defined as

                                                        ∑        ∑
                                                                 n−t   [   ( ) ]
                                                                        q = σ|t m f (m)
                          ( ( )       )              q∈h(σt| ) m=1
                         s h σt| , σ|t =
                                                                       |h(σt| )|

   Where f (·) is a monotonic increasing function. The function [q = σm ] = 1
   . 1 ≤ if and only if the query q is equal to the query σm .
   for



                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    17 / 56
Query Suggestion in the Long Tail    Search Shortcuts: an IR approach


  The Original Idea

   .
   Problem Definition: Search Shortcuts
   .
   Let σ =< q1 . . . qn > be a satisfactory session. The similarity of a k-way
   shortcut h on a head σt| and a tail σ|t is defined as

                                                        ∑        ∑
                                                                 n−t   [   ( ) ]
                                                                        q = σ|t m f (m)
                          ( ( )       )              q∈h(σt| ) m=1
                         s h σt| , σ|t =
                                                                       |h(σt| )|

   Where f (·) is a monotonic increasing function. The function [q = σm ] = 1 .
   . 1 ≤ if and only if the query q is equal to the query σm .
   for

   .

                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    17 / 56
Query Suggestion in the Long Tail    Search Shortcuts: an IR approach


  The Original Idea: Almost Without Equations

   .
   Problem Definition: Search Shortcuts
   .



                                                     +++"
                                            ++"

                                     Shortcuts)
                                       Gen)




   .
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    18 / 56
Query Suggestion in the Long Tail      Search Shortcuts: an IR approach


  Take 1: Shortcuts via Collaborative Filtering

   .
   R. Baraglia, F. Cacheda, V. Carneiro, D. Fernandez, V. Formoso, R. Perego, and F. Silvestri. Search shortcuts: a new approach
   .
   to the recommendation of queries. In Proc. of the third ACM conference on Recommender systems (RecSys 2009). 77-84.


   .
   Main features:
   .
       Experiments on the AOL and MSN query logs.
       We map shortcuts into a collaborative filtering problem:
                  Sessions are users
                  Queries are items
                  We only focus on the last query of a session:
                          last query in a session is clicked → rating = 10
                          not clicked → rating = 0
                          all the other queries in the session have neutral rating = 5.
   .

                                                                                         .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                    ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                             December 5, 2012                    19 / 56
Query Suggestion in the Long Tail      Search Shortcuts: an IR approach


  Take 2: Indexing Sessions

   .
   D. Broccolo, L. Marcon, F. M. Nardini, R. Perego, F. Silvestri. Generating suggestions for queries in the long tail with an
   .
   inverted index. Inf. Process. Manage. 48(2). 326-339 (2012)


   .
   Sessions as Virtual Documents
   .
       <DOC>
                                                                      <DOC>
       <DOCNO> google images </DOCNO>
                                                                      <DOCNO> fabrizio silvestri </DOCNO>
       <BODY>
                                                                      <BODY>
           google
                                                                          silvestri isti
           image search
                                                                          silvestri cnr
           multimedia search engine
                                                                          ir cnr pisa
           image search engine
                                                                          fabrizio caching pisa
           picture search
                                                                          fabrizio query log
           photo search
                                                                      </BODY>
       </BODY>
                                                                      </DOC>
   .   </DOC>
                                                                                           .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                      ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                               December 5, 2012                    20 / 56
Query Suggestion in the Long Tail    Search Shortcuts: an IR approach


  Take 2: Indexing Sessions

   .
          Index virtual documents using your favourite IR system
          Rank virtual documents, i.e., queries according to

                                  δ(τ, σ s ) = α · BM25(τ, σ s ) + β · freq(σn )
                                                                             s


          where:
                  σ ′ is the current session performed by the user,
                  σn is the last query in session σ s
                    s

                  the sequence τ is the concatenation of all terms with possible
                  repetitions appearing in σt| , i.e., the head of length t of session σ ′ .
                                              ′


          Intuitively, δ(τ, σ s ) measures how much a previously seen session
          overlaps with the user need expressed so far (the concatenation of
          terms τ serves as a bag-of-words model of user need).
   .
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    21 / 56
Query Suggestion in the Long Tail      Search Shortcuts: an IR approach


  Experimental Setting

   .
   Dataset
   .
          Microsoft RFP 2006 query log as the dataset.
          sessions are consecutive queries by the same users submitted within 30 minutes.
          Terrier search engines used to index the resulting 1, 191, 143 virtual documents.
          We compare Search Shortcuts (SS) with Covergraph (CG) and Query Flow Graph (QFG).
   .
   .
          We exploited the query topics provided by NIST for running the
          TREC 2009 Web Track’s Diversity Task
          TREC query (n. 8): appraisal
                  S1:    What companies can give an appraisal of my home’s value?
                  S2:    I’m looking for companies that appraise jewelry
                  S3:    Find examples of employee performance appraisals
                  S4:    I’m looking for web sites that do antique appraisals
   .
                                                                                         .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                    ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                             December 5, 2012                    22 / 56
Query Suggestion in the Long Tail    Search Shortcuts: an IR approach


  Metrics Used

   .
   Coverage
   .
   Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
   suggestion technique T , we say that T has coverage equal to c if n · c
   subtopics match suggestions generated by T .

   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    23 / 56
Query Suggestion in the Long Tail    Search Shortcuts: an IR approach


  Metrics Used

   .
   Coverage
   .
   Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
   suggestion technique T , we say that T has coverage equal to c if n · c
   subtopics match suggestions generated by T .
   A coverage of 0.8 for the top-10 suggestions generated for a query q having 5 subtopics
   means that 4 subtopics of q are covered by at least one suggestion.
   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    23 / 56
Query Suggestion in the Long Tail    Search Shortcuts: an IR approach


  Metrics Used

   .
   Coverage
   .
   Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
   suggestion technique T , we say that T has coverage equal to c if n · c
   subtopics match suggestions generated by T .
   A coverage of 0.8 for the top-10 suggestions generated for a query q having 5 subtopics
   means that 4 subtopics of q are covered by at least one suggestion.
   .
   .
   Effectiveness
   .
   Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
   suggestion technique T generating k suggestions, we say that T has
   effectiveness equal to e if k · e suggestions cover at least one subtopic.

   .
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    23 / 56
Query Suggestion in the Long Tail    Search Shortcuts: an IR approach


  Metrics Used

   .
   Coverage
   .
   Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
   suggestion technique T , we say that T has coverage equal to c if n · c
   subtopics match suggestions generated by T .
   A coverage of 0.8 for the top-10 suggestions generated for a query q having 5 subtopics
   means that 4 subtopics of q are covered by at least one suggestion.
   .
   .
   Effectiveness
   .
   Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query
   suggestion technique T generating k suggestions, we say that T has
   effectiveness equal to e if k · e suggestions cover at least one subtopic.
   An effectiveness of 0.1 on the top-10 suggestions generated for a query q means that
   only
   . one suggestion is relevant for one of the subtopics of q.
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    23 / 56
Query Suggestion in the Long Tail     Search Shortcuts: an IR approach


  Anecdotal Results



   .
       Query and its subtopics                           SS                         QFG                                              CG
       TREC query (n. 8): appraisal          performance appraisal (S3)     online appraisals (S4)                    appraisersdotcom (S4)
       S1: What companies can give an         hernando county property                                               employee appraisals (S3)
       appraisal of my home’s value?                appraiser (S1)                                                  real estate appraisals (S1)
       S2: I’m looking for companies           antique appraisal (S4)                                                      appraisers (S1)
       that appraise jewelry.                        appraisers in                                                      employee appraisals
       S3: Find examples of employee                colorado (S1)                                                             forms (S3)
       performance appraisals.                    appraisals etc (S1)                                                   appraisers.com (S4)
       S4: I’m looking for web sites             appraisers.com (S4)                                                             gmac
       that do antique appraisals.                find appraiser (S1)                                                           appraisers
                                                                                                                           beverly wv (S1)
                                                  wachovia bank                                                              picket fence
                                                  appraisals (S1)                                                           appraisal (S1)
   .                                           appraisersdotcom (S4)                                                 fossillo creek san antonio




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                December 5, 2012                    24 / 56
0.2
                                   Query Suggestion in the Long Tail            Search Shortcuts: an IR approach
    0.1


  Results
       0
           1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50




   .
   Coverage
   .
                                                                       CG     SS     QFG

       1

    0.9

    0.8

    0.7

    0.6

    0.5

    0.4

    0.3

    0.2

    0.1

       0

   .       1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50




                                                                                                         .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                                    ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                             December 5, 2012                    25 / 56
Query Suggestion in the Long Tail            Search Shortcuts: an IR approach


  Results

   .
   Effectiveness
   .
                                                                       CG     SS     QFG

       1

   0.9

   0.8

   0.7

   0.6

   0.5

   0.4

   0.3

   0.2

   0.1

       0

   .       1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50




                                                                                                         .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                                    ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                             December 5, 2012                    26 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Outline

   .
   1   Introduction
          Query Recommender Systems
          Query Distribution
   .
   2   (Recent) Related Work
   .
   3   Query Suggestion in the Long Tail
         Search Shortcuts: an IR approach
         A Graph-based approach
   .
   4   Conclusion and Future Work
         Conclusion
         Applications of Search Shortcuts
         Future Directions
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    27 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Take 3: TQ-Graph

   .
   F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. Efficient query recommendations in the long tail via
   center-piece subgraphs. In Proceedings of the 35th international ACM SIGIR conference on Research and development in
   .
   information retrieval (SIGIR 2012). 345-354.


   .




   .
                                                                                       .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                  ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                           December 5, 2012                    28 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  RWR vs. CePS
   .
   Given a query Q = {t1 , t2 , . . . , tk } there are two different alternatives:
   .
       Random Walk with Restart (RWR) from nodes corresponding to
       terms in Q;
          Center-Piece Subgraph (CePS) (Tong and Faloutsos KDD 2006)
          induced by nodes of terms in Q.
   .
   .
   Query: lower heart rate (not occurring in the Query Log!)
   .
         TQ-Graph Suggestions                   RWR Suggestions
         things to lower heart rate             broken heart
         lower heart rate through exercise      prime rate
         accelerated heart rate and pregnant exchange rate
         web md                                 bank rate
   .     heart problems                         currency exchange rate
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    29 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  CePS: a Primer

   .
          Given an edge-weighted undirected graph G, set vertices Q from G,
          and an integer budget b
          Find a connected subgraph H containing vertices in Q and at most b
          other vertices that maximizes a “goodness” function g (H).
   .
   .
                                                                ∑
                                               g (H) =                r (Q, j)
                                                                j∈H
                                                                 ∏
                                               r (Q, j) =              r (i, j)
   .                                                             i∈Q




                                                                                       .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                  ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                           December 5, 2012                    30 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  CePS: a Primer

   .
          Given an edge-weighted undirected graph G, set vertices Q from G,
          and an integer budget b
          Find a connected subgraph H containing vertices in Q and at most b
          other vertices that maximizes a “goodness” function g (H).
   .       r (i, j) is the stationary probabil-
            ity of term j in . a RWR from i.
   .             Restart probability is α. ∑
                                    g (H) =     r (Q, j)
                                                                j∈H
                                                                 ∏
                                               r (Q, j) =              r (i, j)
   .                                                             i∈Q

   .
                                                                                       .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                  ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                           December 5, 2012                    30 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Assessing Effectiveness
   .
   Datasets
   .
       Two TQ-Graphs built from MSN and Yahoo! query logs
       Two different query sets for evaluations:
                  50 queries of the standard TREC Web diversification track testbed.
                  100 queries randomly chosen from the Yahoo! query log.
   .
   .
   Statistics
   .
                                                                    MSN         Yahoo!
                                  #queries                          15M         581M
                                  #terms                            36M         1, 344M
                                  #query nodes                      7M          29M
                                  #term nodes                       2M          6M
                                  #dangling nodes                   15%         35%
                                  #queries (freq = 1)               5M          162M
   .                              #terms (freq = 1)                 5M          2M
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    31 / 56
Query Suggestion in the Long Tail             A Graph-based approach


  Assessing Effectiveness
   .
   User Study
   .
                  Top-5 recommendations for each test query and technique
                  The suggestions were shuffled and presented to 10 non-CS assessors that
                  were asked to rate them using useful, somewhat useful, and not useful.
   .
   .
   Frequency in the corresponding log of all the queries in the two testbeds.
   .
                            100000



                                                                             Frequency on Yahoo
                            10000
         Frequency on MSN




                                                                                                  100

                             1000

                              100                                                                 10

                               10

                                1                                                                  1
                                     0   10     20      30     40      50                               0   20           40       60                    80              100
                                                Query TREC                                                             Random Queries .
   .                                                                                                         ..
                                                                                                                  .
                                                                                                                  ..
                                                                                                                       . . . . . . .
                                                                                                                       ..   .. .. .. ..
                                                                                                                                                    . . . . . .
                                                                                                                                           .. .. .. .. .. .. .. .. ..
                                                                                                                                                                          .
                                                                                                                                                                          ..
                                                                                                                                                                               .
                                                                                                                                                                                   ..
                                                                                                                                                                                        .
                                                                                                                                                                                        ..
                                                                                                                                                                                             .
                                                                                                                                                                                             ..
                                                                                                                                                                                                  .


Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                                      December 5, 2012                     32 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Setting the parameter α


   .
                  TREC on MSN                             useful        somewhat                         not useful
                  α = 0.9                                  57%            16%                              27%
                  α = 0.5                                  32%            13%                              55%
                  α = 0.1                                  22%            12%                              66%

                  100 queries on Yahoo!                   useful        somewhat                         not useful
                  α = 0.9                                  48%            11%                              41%
                  α = 0.5                                  41%            20%                              39%
   .              α = 0.1                                  37%            20%                              43%



                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    33 / 56
Query Suggestion in the Long Tail     A Graph-based approach


  Effectiveness on Tail Queries

   .
               TREC on MSN (unseen)                             useful     somewhat                           not useful
               TQ-Graph α = 0.9                                  46%         10%                                 44%
               QFG                                                0%          0%                                100%

               TREC on MSN (dangling)                           useful     somewhat                           not useful
               TQ-Graph α = 0.9                                  60%         30%                                 10%
               QFG                                                0%          0%                                100%

               TREC on MSN (others)                             useful     somewhat                           not useful
               TQ-Graph α = 0.9                                  59%         17%                                24%
   .           QFG                                               61%         13%                                26%


                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    34 / 56
Query Suggestion in the Long Tail     A Graph-based approach


  Effectiveness on Tail Queries

   .
               TREC on MSN (unseen)                             useful     somewhat                           not useful
               TQ-Graph α = 0.9                                  46%         10%                                 44%
          For popular queries
               QFG                                                0%          0%                                100%

         effectiveness is com-
                       .
           TREC on MSN (dangling) useful                                   somewhat                           not useful
           TQ-Graph α = 0.9
         parable with that of
           QFG
                                   60%
                                    0%
                                                                             30%
                                                                              0%
                                                                                                                 10%
                                                                                                                100%
         QFG-based(others) useful
           TREC on MSN
                          models.                                          somewhat                           not useful
               TQ-Graph α = 0.9                                 59%          17%                                24%
   .           QFG                                              61%          13%                                26%

   .
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    34 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  CePS Efficiency



   .
          CePS is not NP-Hard even if its solution requires
          Ω (|Q| × (|E| + |V|)) where:
                  |Q|: number of query terms
                  |E|: number of graph edges
                  |V|: number of queries
          Alternatively, |Q| random walks with restart.
          Yet, it remains unfeasible for computing query suggestions online.
   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    35 / 56
Query Suggestion in the Long Tail     A Graph-based approach


  Stationary Probabilities as Inverted Lists



                                      TQGraph(




                                                           Inverted%Index%representa9on%of%the%RWRs%
                                                           computed%on%the%TQGraph.%The%lexicon%is%made%
                                                           up%of%term%nodes,%pos9ngs%are%the%sta9onary%
                                                           distribu9on%values.%




                                                 Sta+onary(Distribu+on(of(Query(Nodes(in(the(
                                                 TQGraph(as(obtained(by(a(RWR(from(Term(1(
   .
                   Term%1%            [1,ε)%             [ε,ε2)%            [ε2,ε3)%                                 [εi,ε(i+1))%
                                                                                        .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                   ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
                 Within(buckets(queries(are(sorted(by(their(IDs.(Scores(are(
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.
                                                           i(                                                   December 5, 2012                    36 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Stationary Probabilities as Inverted Lists



   .
          For each term we have to store a vector of
          ⟨queryId1 , pr1 ⟩, ⟨queryId2 , pr2 ⟩, . . . , ⟨queryId|V| , pr|V| ⟩
          29 Millions queries x 6 Millions terms = 174 Trillion!!!
          Conjecture: Most of the entries are useless.
          Solution: remove the entries with lowest probability, i.e., apply
          pruning.
   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    37 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Index Pruning: Effectiveness
   .
   MSN
   .
                                                                                 MSN query log
                                           70
                                                                                                                 RWR α=0.1
                                           60                                                                    RWR α=0.5
             Percentage of dissimilarity




                                                                                                                 RWR α=0.9
                                           50

                                           40

                                           30

                                           20

                                           10

                                            0
                                                0                 5000                  10000                    15000                                  20000

   .                                                                          Pruning threshold p
                                                                                                            .     .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                                       ..   ..    ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                                 December 5, 2012                    38 / 56
Query Suggestion in the Long Tail   A Graph-based approach


  Index Pruning: Effectiveness
   .
   Yahoo!
   .
                                                                            Yahoo! query log
                                           70
                                                                                                            RWR α=0.1
                                           60                                                               RWR α=0.5
             Percentage of dissimilarity




                                                                                                            RWR α=0.9
                                           50

                                           40

                                           30

                                           20

                                           10

                                            0
                                                  20       40      60       80      100    120     140                  160           180            200
                                                                        Pruning threshold p*103
   .
                                                                                                       .     .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                                  ..   ..    ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                            December 5, 2012                    39 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Compressing the Index (I)


   .
          After pruning we have #terms vectors with p entries of ⟨queryId, pr⟩
          (inverted lists).
          Is there any efficient way to store the p entries?
                  Sort by queryId, δ encoded (pr stored as it is)
                          >= 32 bits per entry.
                  Sort by pr, δ encoded (queryId stored as it is)
                          ≥ log (#query) bits per entry


   .



                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    40 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Compressing the Index (I)


   .
          After pruning we have #terms vectors with p entries of ⟨queryId, pr⟩
          (inverted lists).
          Is there any efficient way to store the p entries?
                  Sort by queryId, δ encoded (pr stored as it is)
                          >= 32 bits per entry.
                  Sort by pr, δ encoded (queryId stored as it is)
                          ≥ log (#query) bits per entry

          Could this be improved?
   .



                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    40 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Compressing the Index (II)



   .
   Lossy Compression
   .
       For each term, we sort its p entries by their probability values.
          We create groups of queryIds with similar probability values:
          ϵ−i < pr ≤ ϵ−(i+1) where ϵ ≤ 1 (bucketing).
          QueryIds in a bucket are δ encoded.
          Given a queryId in bucket i the relative probability is approximated
          by ϵ−(i+1) .
   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    41 / 56
Query Suggestion in the Long Tail     A Graph-based approach


  Bits per entry vs. ϵ


      .

                                                       MSN query log
                   20
                            RWR α=0.1
                            RWR α=0.5
                   18       RWR α=0.9                                                                                  Smaller ϵ vals → less
                                                                                                                            buckets
  Bits per entry




                   16
                                                                                                                            bits per entry
                   14
                                                                                                                       ϵ = 1 → naïve
                   12
                                                                                                                       approach

                   10
                        0               0.2          0.4           0.6            0.8         1

      .                                                      ε




                                                                                                       .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                                  ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                           December 5, 2012                    42 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Error guarantees



   .
          Our bucketing scheme introduces approximation errors.
          → Error is bounded
          The approximated value for probability in the list is at most ϵ−1
          smaller than the real value
          The error might introduce inversions in real ranking
                  For two queries q, q′ of m terms with q preceding q′ in a suggestion
                  ranking, the inversion cannot happen if rq > ϵ−m rq′
   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    43 / 56
Query Suggestion in the Long Tail     A Graph-based approach


  Precision Loss after Pruning and Bucketing: avg.
  dissimilarity

   .
                              MSN query log
                p    RWR α = 0.1   RWR α = 0.5              RWR α = 0.9
          5, 000      18.48   (8.47)     22.58   (14.11)    42.04 (40.32)
         10, 000      17.39   (8.47)     20.97   (12.50)    39.49 (30.65)                              With ϵ = 0.95 and
         15, 000      17.39   (8.47)     20.16   (12.10)    36.31 (25.40)                              p = 20k we have a
         20, 000      17.39   (8.06)     18.55   (11.29)    33.12 (22.18)
        100, 000      17.39   (8.06)     18.55   (11.29)    32.48 (21.77)                              about 37% results
        200, 000      17.39   (8.06)     18.55   (11.29)    32.48 (21.77)                              difference with 14 bits
                                                                                                       per entry instead of 71.
                             Yahoo! query log
                p    RWR α = 0.1    RWR α = 0.5             RWR α = 0.9                                We save 1.1PB in the
          5, 000     33.75    (40.30)    37.87   (42.22)    45.11   (47.12)
         10, 000     27.76    (34.12)    32.84   (36.89)    40.23   (42.22)
                                                                                                       case of Yahoo! query
         15, 000     26.18    (31.13)    31.36   (34.33)    38.22   (39.23)                            log.
         20, 000     23.97    (27.72)    28.70   (31.56)    37.07   (38.38)
        100, 000     19.24    (17.48)    21.89   (20.26)    31.32   (28.78)

   .    200, 000     19.24    (17.70)    21.30   (19.62)    31.90   (28.14)


                                                                                       .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                  ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                           December 5, 2012                    44 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Effectiveness after Pruning and Bucketing: User Study

   .
                             MSN query log                                                            Effectiveness of the
         p                useful somewhat                not useful                                   suggestions provided
         5, 000            56%     17%                     27%                                        with pruning and
         20, 000           55%     15%                     30%                                        bucketing as a function
         200, 000          55%     15%                     30%                                        of p for ϵ = 0.95 and
                                                                                                      α = 0.9.
                            Yahoo! query log                                                          In the case of MSN the
         p                useful somewhat                not useful                                   effectiveness for useful
         5, 000            46%       29%                   25%                                        suggestions was 57% in
         20, 000           47%       29%                   24%                                        the case of Yahoo!
         200, 000          46%       28%                   26%                                        48%.
   .

                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    45 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Scaling-up Suggestion Building




   .
   Does it scale?
   .
       We pre-compute and store the inverted index.
       Maintaining the entire inverted index in main memory is still not
       feasible.

   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    46 / 56
Query Suggestion in the Long Tail    A Graph-based approach


  Scaling-up Suggestion Building




   .
   Does it scale?
   .
       We pre-compute and store the inverted index.
       Maintaining the entire inverted index in main memory is still not
       feasible.
                  Caching!
   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    46 / 56
Query Suggestion in the Long Tail         A Graph-based approach


  MSN Cache Miss (%)


          .

                                                         MSN query log
                             70
                                                                       5,000 entries (15.81 bits)
                                                                      20,000 entries (16.33 bits)
                             60                                      200,000 entries (15.21 bits)
                                                                       5,000 entries (73.71 bits)
  Percentage of cache miss




                                                                      20,000 entries (73.21 bits)
                             50
                                                                     200,000 entries (70.05 bits)                             With 8GB of main
                             40                                                                                               memory we have a
                             30                                                                                               cache miss rate
                             20
                                                                                                                              < 10%.
                             10

                              0
                                  1 2   4   8                 16                                    32

          .                                             Cache Size (GB)




                                                                                                              .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                                         ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                                  December 5, 2012                    47 / 56
Query Suggestion in the Long Tail           A Graph-based approach


  Yahoo! Cache Miss (%)


          .

                                                          Yahoo! query log
                             100
                                                                          5,000 entries (13.67 bits)
                                                                         20,000 entries (14.27 bits)
                                                                        200,000 entries (16.09 bits)
                              80                                          5,000 entries (72.33 bits)
                                                                                                                                 Even in the case of the
  Percentage of cache miss




                                                                         20,000 entries (71.32 bits)
                                                                        200,000 entries (70.30 bits)
                              60
                                                                                                                                 big Yahoo! log with
                                                                                                                                 8GB of main memory
                              40                                                                                                 we have a cache miss
                                                                                                                                 rate < 10%.
                              20



                               0
                                   1 2   4   8                   16                                    32

          .                                               Cache Size (GB)




                                                                                                                 .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                                            ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                                                     December 5, 2012                    48 / 56
Conclusion and Future Work      Conclusion


  Outline

   .
   1   Introduction
          Query Recommender Systems
          Query Distribution
   .
   2   (Recent) Related Work
   .
   3   Query Suggestion in the Long Tail
         Search Shortcuts: an IR approach
         A Graph-based approach
   .
   4   Conclusion and Future Work
         Conclusion
         Applications of Search Shortcuts
         Future Directions
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    49 / 56
Conclusion and Future Work      Conclusion


  Conclusions

   .
          Query recommendation is a tough problem.
          We have proposed an overall idea (i.e., reducing the length of users’
          querying sessions) and three diverse techniques addressing them.
          The graph-based one is the one that better balance requirements and
          “theoretical” justification.
          Query recommendation based on TQ-Graph gets up to 99% of
          coverage,
          An index for speeding up online TQ-Graph computation:
                  Reduce the space occupancy by an average of 80%
                  95% of hit-ratio (few gigabytes of main memory.)
   .

                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    50 / 56
Conclusion and Future Work      Applications of Search Shortcuts


  Outline

   .
   1   Introduction
          Query Recommender Systems
          Query Distribution
   .
   2   (Recent) Related Work
   .
   3   Query Suggestion in the Long Tail
         Search Shortcuts: an IR approach
         A Graph-based approach
   .
   4   Conclusion and Future Work
         Conclusion
         Applications of Search Shortcuts
         Future Directions
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    51 / 56
Conclusion and Future Work      Applications of Search Shortcuts


  Other Applications of Shortcuts



   .
   Europeana.eu
   .
   Shortcuts will be used as the suggestion mechanism for queries submitted
   to
   . the Europeana portal.
   .
   Search Results Diversity
   .
   G. Capannini, F. M. Nardini, R. Perego, F. Silvestri. Efficient
   Diversification of Web Search Results. PVLDB 4(7). 451-459 (2011)
   .




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    52 / 56
Conclusion and Future Work      Applications of Search Shortcuts


  Other Applications of Shortcuts

   .
   Session Retrieval
   .
   I. A. Adeyanju, F. M. Nardini, M-D. Albakour, D. S., U. Kruschwitz.
   RGU-ISTI-Essex at TREC 2011 Session Track. Proceedings of the
   International Text REtrieval Conference (TREC) 2011. November 2011.
   .
   .
   Intranet Query Recommendations
   .
   I A. Adeyanju, D. Song, M-D. Albakour, U. Kruschwitz, A. De Roeck, and
   M. Fasli. Adaptation of the concept hierarchy model with search
   logs for query recommendation on intranets. In Proceedings of the
   35th international ACM SIGIR conference on Research and development in
   information retrieval (SIGIR 2012). 5-14.
   .

                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    53 / 56
Conclusion and Future Work      Future Directions


  Outline

   .
   1   Introduction
          Query Recommender Systems
          Query Distribution
   .
   2   (Recent) Related Work
   .
   3   Query Suggestion in the Long Tail
         Search Shortcuts: an IR approach
         A Graph-based approach
   .
   4   Conclusion and Future Work
         Conclusion
         Applications of Search Shortcuts
         Future Directions
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    54 / 56
Conclusion and Future Work      Future Directions


  Beyonds Query Suggestion
   .
   Task Recommendation
   .
       Discover what and how user search tasks are composed together to
       accomplish even more complex missions
          Suggestions should be also refer to other tasks of a bigger mission
          (i.e., task recommendation)




   .
                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    55 / 56
Thanks




                                                                                      .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                 ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..
Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail.                          December 5, 2012                    56 / 56

More Related Content

Viewers also liked

Long tail keywords – Untapped opportunity
Long tail keywords – Untapped opportunityLong tail keywords – Untapped opportunity
Long tail keywords – Untapped opportunityAsif Dilshad
 
Dominating the Long Tail | Digital Horizons Oct 2016
Dominating the Long Tail | Digital Horizons Oct 2016Dominating the Long Tail | Digital Horizons Oct 2016
Dominating the Long Tail | Digital Horizons Oct 2016Meghan Burton
 
DOES THE LONG TAIL APPLY TO ONLINE NEWS? A QUANTITATIVE STUDY OF FRENCH SPEAK...
DOES THE LONG TAIL APPLY TO ONLINE NEWS? A QUANTITATIVE STUDY OF FRENCH SPEAK...DOES THE LONG TAIL APPLY TO ONLINE NEWS? A QUANTITATIVE STUDY OF FRENCH SPEAK...
DOES THE LONG TAIL APPLY TO ONLINE NEWS? A QUANTITATIVE STUDY OF FRENCH SPEAK...smyrnaios
 
Interactive and Context-Aware Tag Spell Check and Correction
Interactive and Context-Aware Tag Spell Check and CorrectionInteractive and Context-Aware Tag Spell Check and Correction
Interactive and Context-Aware Tag Spell Check and CorrectionFabrizio Silvestri
 
Engaging the Long Tail -Part One - FACVB
Engaging the Long Tail -Part One -  FACVBEngaging the Long Tail -Part One -  FACVB
Engaging the Long Tail -Part One - FACVBStephen Joyce
 
the Long Tail and Beyond
the Long Tail and Beyondthe Long Tail and Beyond
the Long Tail and BeyondO2UX
 
Contrasting Offline and Online Results when Evaluating Recommendation Algorithms
Contrasting Offline and Online Results when Evaluating Recommendation AlgorithmsContrasting Offline and Online Results when Evaluating Recommendation Algorithms
Contrasting Offline and Online Results when Evaluating Recommendation AlgorithmsMarco Rossetti
 
the long tail of marketing
the long tail of marketingthe long tail of marketing
the long tail of marketingEsteban Kolsky
 
Leading with Intent: Leveraging the Long Tail and Measuring Content ROI
Leading with Intent: Leveraging the Long Tail and Measuring Content ROILeading with Intent: Leveraging the Long Tail and Measuring Content ROI
Leading with Intent: Leveraging the Long Tail and Measuring Content ROIgShift
 
Recommendations @ Rakuten Group
Recommendations @ Rakuten GroupRecommendations @ Rakuten Group
Recommendations @ Rakuten Grouprecsysfr
 
Music Recommendation Tutorial
Music Recommendation TutorialMusic Recommendation Tutorial
Music Recommendation TutorialOscar Celma
 

Viewers also liked (15)

Long tail
Long tailLong tail
Long tail
 
Long tail keywords – Untapped opportunity
Long tail keywords – Untapped opportunityLong tail keywords – Untapped opportunity
Long tail keywords – Untapped opportunity
 
Dominating the Long Tail | Digital Horizons Oct 2016
Dominating the Long Tail | Digital Horizons Oct 2016Dominating the Long Tail | Digital Horizons Oct 2016
Dominating the Long Tail | Digital Horizons Oct 2016
 
Wmemc carvajal 16
Wmemc carvajal 16Wmemc carvajal 16
Wmemc carvajal 16
 
DOES THE LONG TAIL APPLY TO ONLINE NEWS? A QUANTITATIVE STUDY OF FRENCH SPEAK...
DOES THE LONG TAIL APPLY TO ONLINE NEWS? A QUANTITATIVE STUDY OF FRENCH SPEAK...DOES THE LONG TAIL APPLY TO ONLINE NEWS? A QUANTITATIVE STUDY OF FRENCH SPEAK...
DOES THE LONG TAIL APPLY TO ONLINE NEWS? A QUANTITATIVE STUDY OF FRENCH SPEAK...
 
Interactive and Context-Aware Tag Spell Check and Correction
Interactive and Context-Aware Tag Spell Check and CorrectionInteractive and Context-Aware Tag Spell Check and Correction
Interactive and Context-Aware Tag Spell Check and Correction
 
Engaging the Long Tail -Part One - FACVB
Engaging the Long Tail -Part One -  FACVBEngaging the Long Tail -Part One -  FACVB
Engaging the Long Tail -Part One - FACVB
 
the Long Tail and Beyond
the Long Tail and Beyondthe Long Tail and Beyond
the Long Tail and Beyond
 
Contrasting Offline and Online Results when Evaluating Recommendation Algorithms
Contrasting Offline and Online Results when Evaluating Recommendation AlgorithmsContrasting Offline and Online Results when Evaluating Recommendation Algorithms
Contrasting Offline and Online Results when Evaluating Recommendation Algorithms
 
the long tail of marketing
the long tail of marketingthe long tail of marketing
the long tail of marketing
 
Leading with Intent: Leveraging the Long Tail and Measuring Content ROI
Leading with Intent: Leveraging the Long Tail and Measuring Content ROILeading with Intent: Leveraging the Long Tail and Measuring Content ROI
Leading with Intent: Leveraging the Long Tail and Measuring Content ROI
 
Julie
JulieJulie
Julie
 
Recommendations @ Rakuten Group
Recommendations @ Rakuten GroupRecommendations @ Rakuten Group
Recommendations @ Rakuten Group
 
Long Tail Business Models
Long Tail Business ModelsLong Tail Business Models
Long Tail Business Models
 
Music Recommendation Tutorial
Music Recommendation TutorialMusic Recommendation Tutorial
Music Recommendation Tutorial
 

Similar to Query Recommendations in the Long Tail

Masters Thesis: A reuse repository with automated synonym support and cluster...
Masters Thesis: A reuse repository with automated synonym support and cluster...Masters Thesis: A reuse repository with automated synonym support and cluster...
Masters Thesis: A reuse repository with automated synonym support and cluster...Laust Rud Jacobsen
 
Active Learning Literature Survey
Active Learning Literature SurveyActive Learning Literature Survey
Active Learning Literature Surveybutest
 
A proposed taxonomy of software weapons
A proposed taxonomy of software weaponsA proposed taxonomy of software weapons
A proposed taxonomy of software weaponsUltraUploader
 
lernOS for You Guide (Version 1.5)
lernOS for You Guide (Version 1.5)lernOS for You Guide (Version 1.5)
lernOS for You Guide (Version 1.5)Cogneon Akademie
 
lernOS for You Guide (Version 1.6)
lernOS for You Guide (Version 1.6)lernOS for You Guide (Version 1.6)
lernOS for You Guide (Version 1.6)Cogneon Akademie
 
162tipsandtricksforworkingwithe learningtools-111207024037-phpapp02
162tipsandtricksforworkingwithe learningtools-111207024037-phpapp02162tipsandtricksforworkingwithe learningtools-111207024037-phpapp02
162tipsandtricksforworkingwithe learningtools-111207024037-phpapp02RAJ Kumar
 
Managing groups and_teams
Managing groups and_teamsManaging groups and_teams
Managing groups and_teamsprofessorsrb
 
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...KBHN KT
 
Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...
Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...
Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...Hendrik Drachsler
 
Master Thesis - Algorithm for pattern recognition
Master Thesis - Algorithm for pattern recognitionMaster Thesis - Algorithm for pattern recognition
Master Thesis - Algorithm for pattern recognitionA. LE
 
Triangulating our professional development
Triangulating our professional developmentTriangulating our professional development
Triangulating our professional developmentNancy Wright White
 
From collection to reflection philip mendels
From collection to reflection philip mendelsFrom collection to reflection philip mendels
From collection to reflection philip mendelsswaipnew
 
Life cycle assessment (LCA) - from analysing methodology development to intro...
Life cycle assessment (LCA) - from analysing methodology development to intro...Life cycle assessment (LCA) - from analysing methodology development to intro...
Life cycle assessment (LCA) - from analysing methodology development to intro...Janie Ling Chin
 

Similar to Query Recommendations in the Long Tail (20)

Masters Thesis: A reuse repository with automated synonym support and cluster...
Masters Thesis: A reuse repository with automated synonym support and cluster...Masters Thesis: A reuse repository with automated synonym support and cluster...
Masters Thesis: A reuse repository with automated synonym support and cluster...
 
Thesis
ThesisThesis
Thesis
 
Tesi
TesiTesi
Tesi
 
Active Learning Literature Survey
Active Learning Literature SurveyActive Learning Literature Survey
Active Learning Literature Survey
 
A proposed taxonomy of software weapons
A proposed taxonomy of software weaponsA proposed taxonomy of software weapons
A proposed taxonomy of software weapons
 
lernOS for You Guide (Version 1.5)
lernOS for You Guide (Version 1.5)lernOS for You Guide (Version 1.5)
lernOS for You Guide (Version 1.5)
 
lernOS for You Guide (Version 1.6)
lernOS for You Guide (Version 1.6)lernOS for You Guide (Version 1.6)
lernOS for You Guide (Version 1.6)
 
162tipsandtricksforworkingwithe learningtools-111207024037-phpapp02
162tipsandtricksforworkingwithe learningtools-111207024037-phpapp02162tipsandtricksforworkingwithe learningtools-111207024037-phpapp02
162tipsandtricksforworkingwithe learningtools-111207024037-phpapp02
 
Managing groups and_teams
Managing groups and_teamsManaging groups and_teams
Managing groups and_teams
 
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
Identifying and prioritizing stakeholder needs in neurodevelopmental conditio...
 
FULLTEXT01.pdf
FULLTEXT01.pdfFULLTEXT01.pdf
FULLTEXT01.pdf
 
Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...
Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...
Romano, G. (2019) Dancing Trainer: A System For Humans To Learn Dancing Using...
 
Master Thesis - Algorithm for pattern recognition
Master Thesis - Algorithm for pattern recognitionMaster Thesis - Algorithm for pattern recognition
Master Thesis - Algorithm for pattern recognition
 
LTMR
LTMRLTMR
LTMR
 
Triangulating our professional development
Triangulating our professional developmentTriangulating our professional development
Triangulating our professional development
 
NATO 2020: Assured Security; Dynamic Engagement
 NATO 2020: Assured Security; Dynamic Engagement NATO 2020: Assured Security; Dynamic Engagement
NATO 2020: Assured Security; Dynamic Engagement
 
From collection to reflection philip mendels
From collection to reflection philip mendelsFrom collection to reflection philip mendels
From collection to reflection philip mendels
 
main
mainmain
main
 
Life cycle assessment (LCA) - from analysing methodology development to intro...
Life cycle assessment (LCA) - from analysing methodology development to intro...Life cycle assessment (LCA) - from analysing methodology development to intro...
Life cycle assessment (LCA) - from analysing methodology development to intro...
 
Thesis
ThesisThesis
Thesis
 

Recently uploaded

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 

Recently uploaded (20)

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 

Query Recommendations in the Long Tail

  • 1. Query Recommendations in the Long Tail: Efficient and Effective Techniques. Fabrizio Silvestri ISTI - CNR, Pisa, Italy December 5, 2012 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 1 / 56
  • 2. Outline . 1 Introduction Query Recommender Systems Query Distribution . 2 (Recent) Related Work . 3 Query Suggestion in the Long Tail Search Shortcuts: an IR approach A Graph-based approach . 4 Conclusion and Future Work Conclusion Applications of Search Shortcuts Future Directions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 2 / 56
  • 3. Introduction Query Recommender Systems Outline . 1 Introduction Query Recommender Systems Query Distribution . 2 (Recent) Related Work . 3 Query Suggestion in the Long Tail Search Shortcuts: an IR approach A Graph-based approach . 4 Conclusion and Future Work Conclusion Applications of Search Shortcuts Future Directions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 3 / 56
  • 4. Introduction Query Recommender Systems Introduction Query recommendation/suggestion consists of: making expert users help not-expert ones. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 4 / 56
  • 5. Introduction Query Recommender Systems Introduction Query recommendation/suggestion consists of: making expert users help not-expert ones. It is long standing . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 4 / 56
  • 6. Introduction Query Recommender Systems Introduction Query recommendation/suggestion consists of: making expert users help not-expert ones. It is long standing I know... . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 4 / 56
  • 7. Introduction Query Recommender Systems Introduction Query recommendation/suggestion consists of: making expert users help not-expert ones. It is long standing I know... search engines already have very good query suggestion methods! . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 4 / 56
  • 8. Introduction Query Recommender Systems Disclaimer: I am not going to talk about... . ... autocomplete like systems . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 5 / 56
  • 9. Introduction Query Recommender Systems Disclaimer: I am not going to talk about... . ... query spelling correction . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 6 / 56
  • 10. Introduction Query Recommender Systems Instead, I will cover... . ... Related Search like systems . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 7 / 56
  • 11. Introduction Query Distribution Outline . 1 Introduction Query Recommender Systems Query Distribution . 2 (Recent) Related Work . 3 Query Suggestion in the Long Tail Search Shortcuts: an IR approach A Graph-based approach . 4 Conclusion and Future Work Conclusion Applications of Search Shortcuts Future Directions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 8 / 56
  • 12. Introduction Query Distribution The (in)famous Power Law . The Main Ingredient: Query Logs . One of the best kept industrial secrets... . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 9 / 56
  • 13. Introduction Query Distribution The (in)famous Power Law . The Main Ingredient: Query Logs . One of the best kept industrial secrets... Do you remember AOL? . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 9 / 56
  • 14. Introduction Query Distribution The (in)famous Power Law . The Main Ingredient: Query Logs . One of the best kept industrial secrets... Do you remember AOL? . . 0.16 0.14 0.12 0.1 Head queries 0.08 0.06 Torso queries 0.04 Tail queries 0.02 0 . 0 200 400 600 800 . . . 1000 . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 9 / 56
  • 15. Introduction Query Distribution The (in)famous Power Law . More/Real Power Laws . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 10 / 56
  • 16. Introduction Query Distribution Ordinary People with Extraordinary Tastes . S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. In Proceedings of the third ACM international conference on Web search and data mining (WSDM 2010). ACM, New York, NY, . USA, 201-210. . Main Characteristics . A relatively small number of items account for a disproportionately large fraction of total consumption; and The tail, in aggregate, is nevertheless relatively heavy. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 11 / 56
  • 17. Introduction Query Distribution Ordinary People with Extraordinary Tastes . S. Goel, A. Broder, E. Gabrilovich, and B. Pang. Anatomy of the long tail: ordinary people with extraordinary tastes. In Proceedings of the third ACM international conference on Web search and data mining (WSDM 2010). ACM, New York, NY, . USA, 201-210. . Main Characteristics . A relatively small number of items account for a disproportionately large fraction of total consumption; and The tail, in aggregate, is nevertheless relatively heavy. . . The bottom line . Satisfying requests for the head of the distribution is good in the majority of the cases but only corresponds to a partial satisfaction for the vast majority of users. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 11 / 56
  • 18. (Recent) Related Work Outline . 1 Introduction Query Recommender Systems Query Distribution . 2 (Recent) Related Work . 3 Query Suggestion in the Long Tail Search Shortcuts: an IR approach A Graph-based approach . 4 Conclusion and Future Work Conclusion Applications of Search Shortcuts Future Directions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 12 / 56
  • 19. (Recent) Related Work (Some) Recent Works on Query Suggestion . S. Bhatia, D. Majumdar, and P. Mitra. Query suggestions in the absence of query logs. In SIGIR 2011. 795-804. . . Y. Song, D. Zhou, and L. He. Query suggestion by constructing term-transition graphs. In WSDM 2012. 353-362. . . U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. Learning to suggest: a machine learning framework for ranking query suggestions. In . SIGIR 2012. 25-34. . R. L. T. Santos, C. Macdonald, I. Ounis. Learning to rank query suggestions for . adhoc and diversity search. In Information Retrieval. September 2012. . R. Baeza-Yates and A. Tiberi. Extracting semantic relations from query logs. . KDD 2007. 76-85. In . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 13 / 56
  • 20. (Recent) Related Work Query Flow Graph . P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. The query-flow graph: model and applications. In . Proceedings of the 17th ACM conference on Information and knowledge management (CIKM 2008). 609-618. . Query Log as a Graph Edge weights are learned from query log based features Recommendations are computed using Random Walk with Restart from the submitted query . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 14 / 56
  • 21. (Recent) Related Work Head Queries are Easy! . The number of head queries is limited ⇒ Precompute recommendations! . Query Suggestions google “google images”, “google email”, “google books”, . . . facebook “facebook home page”, “facebook search”, . . . apple “apple store”, “best buy”, “att”, “iphone 5”, . . . . ... ... . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 15 / 56
  • 22. (Recent) Related Work Head Queries are Easy! . The number of head queries is limited ⇒ Precompute recommendations! . Query Suggestions google “google images”, “google email”, “google books”, . . . facebook “facebook home page”, “facebook search”, . . . apple “apple store”, “best buy”, “att”, “iphone 5”, . . . . ... ... . Roughly, a sort of Static Cache for recommendations on frequent queries. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 15 / 56
  • 23. (Recent) Related Work Head Queries are Easy! . The number of head queries is limited ⇒ Precompute recommendations! . Query Suggestions google “google images”, “google email”, “google books”, . . . facebook “facebook home page”, “facebook search”, . . . apple “apple store”, “best buy”, “att”, “iphone 5”, . . . . ... ... . Roughly, a sort of Static Cache for recommendations on frequent queries. What about other queries? . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 15 / 56
  • 24. (Recent) Related Work Head Queries are Easy! . The number of head queries is limited ⇒ Precompute recommendations! . Query Suggestions google “google images”, “google email”, “google books”, . . . facebook “facebook home page”, “facebook search”, . . . apple “apple store”, “best buy”, “att”, “iphone 5”, . . . ... ... . If Google does it... . we can do it as well! Roughly, a sort of Static Cache for recommendations on frequent queries. . What about other queries? . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 15 / 56
  • 25. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Outline . 1 Introduction Query Recommender Systems Query Distribution . 2 (Recent) Related Work . 3 Query Suggestion in the Long Tail Search Shortcuts: an IR approach A Graph-based approach . 4 Conclusion and Future Work Conclusion Applications of Search Shortcuts Future Directions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 16 / 56
  • 26. Query Suggestion in the Long Tail Search Shortcuts: an IR approach The Original Idea . Problem Definition: Search Shortcuts . Let σ =< q1 . . . qn > be a satisfactory session. The similarity of a k-way shortcut h on a head σt| and a tail σ|t is defined as ∑ ∑ n−t [ ( ) ] q = σ|t m f (m) ( ( ) ) q∈h(σt| ) m=1 s h σt| , σ|t = |h(σt| )| Where f (·) is a monotonic increasing function. The function [q = σm ] = 1 . 1 ≤ if and only if the query q is equal to the query σm . for . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 17 / 56
  • 27. Query Suggestion in the Long Tail Search Shortcuts: an IR approach The Original Idea . Problem Definition: Search Shortcuts . Let σ =< q1 . . . qn > be a satisfactory session. The similarity of a k-way shortcut h on a head σt| and a tail σ|t is defined as ∑ ∑ n−t [ ( ) ] q = σ|t m f (m) ( ( ) ) q∈h(σt| ) m=1 s h σt| , σ|t = |h(σt| )| Where f (·) is a monotonic increasing function. The function [q = σm ] = 1 . . 1 ≤ if and only if the query q is equal to the query σm . for . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 17 / 56
  • 28. Query Suggestion in the Long Tail Search Shortcuts: an IR approach The Original Idea: Almost Without Equations . Problem Definition: Search Shortcuts . +++" ++" Shortcuts) Gen) . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 18 / 56
  • 29. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Take 1: Shortcuts via Collaborative Filtering . R. Baraglia, F. Cacheda, V. Carneiro, D. Fernandez, V. Formoso, R. Perego, and F. Silvestri. Search shortcuts: a new approach . to the recommendation of queries. In Proc. of the third ACM conference on Recommender systems (RecSys 2009). 77-84. . Main features: . Experiments on the AOL and MSN query logs. We map shortcuts into a collaborative filtering problem: Sessions are users Queries are items We only focus on the last query of a session: last query in a session is clicked → rating = 10 not clicked → rating = 0 all the other queries in the session have neutral rating = 5. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 19 / 56
  • 30. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Take 2: Indexing Sessions . D. Broccolo, L. Marcon, F. M. Nardini, R. Perego, F. Silvestri. Generating suggestions for queries in the long tail with an . inverted index. Inf. Process. Manage. 48(2). 326-339 (2012) . Sessions as Virtual Documents . <DOC> <DOC> <DOCNO> google images </DOCNO> <DOCNO> fabrizio silvestri </DOCNO> <BODY> <BODY> google silvestri isti image search silvestri cnr multimedia search engine ir cnr pisa image search engine fabrizio caching pisa picture search fabrizio query log photo search </BODY> </BODY> </DOC> . </DOC> . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 20 / 56
  • 31. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Take 2: Indexing Sessions . Index virtual documents using your favourite IR system Rank virtual documents, i.e., queries according to δ(τ, σ s ) = α · BM25(τ, σ s ) + β · freq(σn ) s where: σ ′ is the current session performed by the user, σn is the last query in session σ s s the sequence τ is the concatenation of all terms with possible repetitions appearing in σt| , i.e., the head of length t of session σ ′ . ′ Intuitively, δ(τ, σ s ) measures how much a previously seen session overlaps with the user need expressed so far (the concatenation of terms τ serves as a bag-of-words model of user need). . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 21 / 56
  • 32. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Experimental Setting . Dataset . Microsoft RFP 2006 query log as the dataset. sessions are consecutive queries by the same users submitted within 30 minutes. Terrier search engines used to index the resulting 1, 191, 143 virtual documents. We compare Search Shortcuts (SS) with Covergraph (CG) and Query Flow Graph (QFG). . . We exploited the query topics provided by NIST for running the TREC 2009 Web Track’s Diversity Task TREC query (n. 8): appraisal S1: What companies can give an appraisal of my home’s value? S2: I’m looking for companies that appraise jewelry S3: Find examples of employee performance appraisals S4: I’m looking for web sites that do antique appraisals . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 22 / 56
  • 33. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Metrics Used . Coverage . Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query suggestion technique T , we say that T has coverage equal to c if n · c subtopics match suggestions generated by T . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 23 / 56
  • 34. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Metrics Used . Coverage . Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query suggestion technique T , we say that T has coverage equal to c if n · c subtopics match suggestions generated by T . A coverage of 0.8 for the top-10 suggestions generated for a query q having 5 subtopics means that 4 subtopics of q are covered by at least one suggestion. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 23 / 56
  • 35. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Metrics Used . Coverage . Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query suggestion technique T , we say that T has coverage equal to c if n · c subtopics match suggestions generated by T . A coverage of 0.8 for the top-10 suggestions generated for a query q having 5 subtopics means that 4 subtopics of q are covered by at least one suggestion. . . Effectiveness . Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query suggestion technique T generating k suggestions, we say that T has effectiveness equal to e if k · e suggestions cover at least one subtopic. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 23 / 56
  • 36. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Metrics Used . Coverage . Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query suggestion technique T , we say that T has coverage equal to c if n · c subtopics match suggestions generated by T . A coverage of 0.8 for the top-10 suggestions generated for a query q having 5 subtopics means that 4 subtopics of q are covered by at least one suggestion. . . Effectiveness . Given a query topic A with subtopics {a1 , a2 , . . . , an }, and a query suggestion technique T generating k suggestions, we say that T has effectiveness equal to e if k · e suggestions cover at least one subtopic. An effectiveness of 0.1 on the top-10 suggestions generated for a query q means that only . one suggestion is relevant for one of the subtopics of q. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 23 / 56
  • 37. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Anecdotal Results . Query and its subtopics SS QFG CG TREC query (n. 8): appraisal performance appraisal (S3) online appraisals (S4) appraisersdotcom (S4) S1: What companies can give an hernando county property employee appraisals (S3) appraisal of my home’s value? appraiser (S1) real estate appraisals (S1) S2: I’m looking for companies antique appraisal (S4) appraisers (S1) that appraise jewelry. appraisers in employee appraisals S3: Find examples of employee colorado (S1) forms (S3) performance appraisals. appraisals etc (S1) appraisers.com (S4) S4: I’m looking for web sites appraisers.com (S4) gmac that do antique appraisals. find appraiser (S1) appraisers beverly wv (S1) wachovia bank picket fence appraisals (S1) appraisal (S1) . appraisersdotcom (S4) fossillo creek san antonio . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 24 / 56
  • 38. 0.2 Query Suggestion in the Long Tail Search Shortcuts: an IR approach 0.1 Results 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 . Coverage . CG SS QFG 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 25 / 56
  • 39. Query Suggestion in the Long Tail Search Shortcuts: an IR approach Results . Effectiveness . CG SS QFG 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 . 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 26 / 56
  • 40. Query Suggestion in the Long Tail A Graph-based approach Outline . 1 Introduction Query Recommender Systems Query Distribution . 2 (Recent) Related Work . 3 Query Suggestion in the Long Tail Search Shortcuts: an IR approach A Graph-based approach . 4 Conclusion and Future Work Conclusion Applications of Search Shortcuts Future Directions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 27 / 56
  • 41. Query Suggestion in the Long Tail A Graph-based approach Take 3: TQ-Graph . F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. Efficient query recommendations in the long tail via center-piece subgraphs. In Proceedings of the 35th international ACM SIGIR conference on Research and development in . information retrieval (SIGIR 2012). 345-354. . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 28 / 56
  • 42. Query Suggestion in the Long Tail A Graph-based approach RWR vs. CePS . Given a query Q = {t1 , t2 , . . . , tk } there are two different alternatives: . Random Walk with Restart (RWR) from nodes corresponding to terms in Q; Center-Piece Subgraph (CePS) (Tong and Faloutsos KDD 2006) induced by nodes of terms in Q. . . Query: lower heart rate (not occurring in the Query Log!) . TQ-Graph Suggestions RWR Suggestions things to lower heart rate broken heart lower heart rate through exercise prime rate accelerated heart rate and pregnant exchange rate web md bank rate . heart problems currency exchange rate . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 29 / 56
  • 43. Query Suggestion in the Long Tail A Graph-based approach CePS: a Primer . Given an edge-weighted undirected graph G, set vertices Q from G, and an integer budget b Find a connected subgraph H containing vertices in Q and at most b other vertices that maximizes a “goodness” function g (H). . . ∑ g (H) = r (Q, j) j∈H ∏ r (Q, j) = r (i, j) . i∈Q . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 30 / 56
  • 44. Query Suggestion in the Long Tail A Graph-based approach CePS: a Primer . Given an edge-weighted undirected graph G, set vertices Q from G, and an integer budget b Find a connected subgraph H containing vertices in Q and at most b other vertices that maximizes a “goodness” function g (H). . r (i, j) is the stationary probabil- ity of term j in . a RWR from i. . Restart probability is α. ∑ g (H) = r (Q, j) j∈H ∏ r (Q, j) = r (i, j) . i∈Q . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 30 / 56
  • 45. Query Suggestion in the Long Tail A Graph-based approach Assessing Effectiveness . Datasets . Two TQ-Graphs built from MSN and Yahoo! query logs Two different query sets for evaluations: 50 queries of the standard TREC Web diversification track testbed. 100 queries randomly chosen from the Yahoo! query log. . . Statistics . MSN Yahoo! #queries 15M 581M #terms 36M 1, 344M #query nodes 7M 29M #term nodes 2M 6M #dangling nodes 15% 35% #queries (freq = 1) 5M 162M . #terms (freq = 1) 5M 2M . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 31 / 56
  • 46. Query Suggestion in the Long Tail A Graph-based approach Assessing Effectiveness . User Study . Top-5 recommendations for each test query and technique The suggestions were shuffled and presented to 10 non-CS assessors that were asked to rate them using useful, somewhat useful, and not useful. . . Frequency in the corresponding log of all the queries in the two testbeds. . 100000 Frequency on Yahoo 10000 Frequency on MSN 100 1000 100 10 10 1 1 0 10 20 30 40 50 0 20 40 60 80 100 Query TREC Random Queries . . .. . .. . . . . . . . .. .. .. .. .. . . . . . . .. .. .. .. .. .. .. .. .. . .. . .. . .. . .. . Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 32 / 56
  • 47. Query Suggestion in the Long Tail A Graph-based approach Setting the parameter α . TREC on MSN useful somewhat not useful α = 0.9 57% 16% 27% α = 0.5 32% 13% 55% α = 0.1 22% 12% 66% 100 queries on Yahoo! useful somewhat not useful α = 0.9 48% 11% 41% α = 0.5 41% 20% 39% . α = 0.1 37% 20% 43% . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 33 / 56
  • 48. Query Suggestion in the Long Tail A Graph-based approach Effectiveness on Tail Queries . TREC on MSN (unseen) useful somewhat not useful TQ-Graph α = 0.9 46% 10% 44% QFG 0% 0% 100% TREC on MSN (dangling) useful somewhat not useful TQ-Graph α = 0.9 60% 30% 10% QFG 0% 0% 100% TREC on MSN (others) useful somewhat not useful TQ-Graph α = 0.9 59% 17% 24% . QFG 61% 13% 26% . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 34 / 56
  • 49. Query Suggestion in the Long Tail A Graph-based approach Effectiveness on Tail Queries . TREC on MSN (unseen) useful somewhat not useful TQ-Graph α = 0.9 46% 10% 44% For popular queries QFG 0% 0% 100% effectiveness is com- . TREC on MSN (dangling) useful somewhat not useful TQ-Graph α = 0.9 parable with that of QFG 60% 0% 30% 0% 10% 100% QFG-based(others) useful TREC on MSN models. somewhat not useful TQ-Graph α = 0.9 59% 17% 24% . QFG 61% 13% 26% . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 34 / 56
  • 50. Query Suggestion in the Long Tail A Graph-based approach CePS Efficiency . CePS is not NP-Hard even if its solution requires Ω (|Q| × (|E| + |V|)) where: |Q|: number of query terms |E|: number of graph edges |V|: number of queries Alternatively, |Q| random walks with restart. Yet, it remains unfeasible for computing query suggestions online. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 35 / 56
  • 51. Query Suggestion in the Long Tail A Graph-based approach Stationary Probabilities as Inverted Lists TQGraph( Inverted%Index%representa9on%of%the%RWRs% computed%on%the%TQGraph.%The%lexicon%is%made% up%of%term%nodes,%pos9ngs%are%the%sta9onary% distribu9on%values.% Sta+onary(Distribu+on(of(Query(Nodes(in(the( TQGraph(as(obtained(by(a(RWR(from(Term(1( . Term%1% [1,ε)% [ε,ε2)% [ε2,ε3)% [εi,ε(i+1))% . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Within(buckets(queries(are(sorted(by(their(IDs.(Scores(are( Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. i( December 5, 2012 36 / 56
  • 52. Query Suggestion in the Long Tail A Graph-based approach Stationary Probabilities as Inverted Lists . For each term we have to store a vector of ⟨queryId1 , pr1 ⟩, ⟨queryId2 , pr2 ⟩, . . . , ⟨queryId|V| , pr|V| ⟩ 29 Millions queries x 6 Millions terms = 174 Trillion!!! Conjecture: Most of the entries are useless. Solution: remove the entries with lowest probability, i.e., apply pruning. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 37 / 56
  • 53. Query Suggestion in the Long Tail A Graph-based approach Index Pruning: Effectiveness . MSN . MSN query log 70 RWR α=0.1 60 RWR α=0.5 Percentage of dissimilarity RWR α=0.9 50 40 30 20 10 0 0 5000 10000 15000 20000 . Pruning threshold p . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 38 / 56
  • 54. Query Suggestion in the Long Tail A Graph-based approach Index Pruning: Effectiveness . Yahoo! . Yahoo! query log 70 RWR α=0.1 60 RWR α=0.5 Percentage of dissimilarity RWR α=0.9 50 40 30 20 10 0 20 40 60 80 100 120 140 160 180 200 Pruning threshold p*103 . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 39 / 56
  • 55. Query Suggestion in the Long Tail A Graph-based approach Compressing the Index (I) . After pruning we have #terms vectors with p entries of ⟨queryId, pr⟩ (inverted lists). Is there any efficient way to store the p entries? Sort by queryId, δ encoded (pr stored as it is) >= 32 bits per entry. Sort by pr, δ encoded (queryId stored as it is) ≥ log (#query) bits per entry . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 40 / 56
  • 56. Query Suggestion in the Long Tail A Graph-based approach Compressing the Index (I) . After pruning we have #terms vectors with p entries of ⟨queryId, pr⟩ (inverted lists). Is there any efficient way to store the p entries? Sort by queryId, δ encoded (pr stored as it is) >= 32 bits per entry. Sort by pr, δ encoded (queryId stored as it is) ≥ log (#query) bits per entry Could this be improved? . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 40 / 56
  • 57. Query Suggestion in the Long Tail A Graph-based approach Compressing the Index (II) . Lossy Compression . For each term, we sort its p entries by their probability values. We create groups of queryIds with similar probability values: ϵ−i < pr ≤ ϵ−(i+1) where ϵ ≤ 1 (bucketing). QueryIds in a bucket are δ encoded. Given a queryId in bucket i the relative probability is approximated by ϵ−(i+1) . . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 41 / 56
  • 58. Query Suggestion in the Long Tail A Graph-based approach Bits per entry vs. ϵ . MSN query log 20 RWR α=0.1 RWR α=0.5 18 RWR α=0.9 Smaller ϵ vals → less buckets Bits per entry 16 bits per entry 14 ϵ = 1 → naïve 12 approach 10 0 0.2 0.4 0.6 0.8 1 . ε . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 42 / 56
  • 59. Query Suggestion in the Long Tail A Graph-based approach Error guarantees . Our bucketing scheme introduces approximation errors. → Error is bounded The approximated value for probability in the list is at most ϵ−1 smaller than the real value The error might introduce inversions in real ranking For two queries q, q′ of m terms with q preceding q′ in a suggestion ranking, the inversion cannot happen if rq > ϵ−m rq′ . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 43 / 56
  • 60. Query Suggestion in the Long Tail A Graph-based approach Precision Loss after Pruning and Bucketing: avg. dissimilarity . MSN query log p RWR α = 0.1 RWR α = 0.5 RWR α = 0.9 5, 000 18.48 (8.47) 22.58 (14.11) 42.04 (40.32) 10, 000 17.39 (8.47) 20.97 (12.50) 39.49 (30.65) With ϵ = 0.95 and 15, 000 17.39 (8.47) 20.16 (12.10) 36.31 (25.40) p = 20k we have a 20, 000 17.39 (8.06) 18.55 (11.29) 33.12 (22.18) 100, 000 17.39 (8.06) 18.55 (11.29) 32.48 (21.77) about 37% results 200, 000 17.39 (8.06) 18.55 (11.29) 32.48 (21.77) difference with 14 bits per entry instead of 71. Yahoo! query log p RWR α = 0.1 RWR α = 0.5 RWR α = 0.9 We save 1.1PB in the 5, 000 33.75 (40.30) 37.87 (42.22) 45.11 (47.12) 10, 000 27.76 (34.12) 32.84 (36.89) 40.23 (42.22) case of Yahoo! query 15, 000 26.18 (31.13) 31.36 (34.33) 38.22 (39.23) log. 20, 000 23.97 (27.72) 28.70 (31.56) 37.07 (38.38) 100, 000 19.24 (17.48) 21.89 (20.26) 31.32 (28.78) . 200, 000 19.24 (17.70) 21.30 (19.62) 31.90 (28.14) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 44 / 56
  • 61. Query Suggestion in the Long Tail A Graph-based approach Effectiveness after Pruning and Bucketing: User Study . MSN query log Effectiveness of the p useful somewhat not useful suggestions provided 5, 000 56% 17% 27% with pruning and 20, 000 55% 15% 30% bucketing as a function 200, 000 55% 15% 30% of p for ϵ = 0.95 and α = 0.9. Yahoo! query log In the case of MSN the p useful somewhat not useful effectiveness for useful 5, 000 46% 29% 25% suggestions was 57% in 20, 000 47% 29% 24% the case of Yahoo! 200, 000 46% 28% 26% 48%. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 45 / 56
  • 62. Query Suggestion in the Long Tail A Graph-based approach Scaling-up Suggestion Building . Does it scale? . We pre-compute and store the inverted index. Maintaining the entire inverted index in main memory is still not feasible. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 46 / 56
  • 63. Query Suggestion in the Long Tail A Graph-based approach Scaling-up Suggestion Building . Does it scale? . We pre-compute and store the inverted index. Maintaining the entire inverted index in main memory is still not feasible. Caching! . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 46 / 56
  • 64. Query Suggestion in the Long Tail A Graph-based approach MSN Cache Miss (%) . MSN query log 70 5,000 entries (15.81 bits) 20,000 entries (16.33 bits) 60 200,000 entries (15.21 bits) 5,000 entries (73.71 bits) Percentage of cache miss 20,000 entries (73.21 bits) 50 200,000 entries (70.05 bits) With 8GB of main 40 memory we have a 30 cache miss rate 20 < 10%. 10 0 1 2 4 8 16 32 . Cache Size (GB) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 47 / 56
  • 65. Query Suggestion in the Long Tail A Graph-based approach Yahoo! Cache Miss (%) . Yahoo! query log 100 5,000 entries (13.67 bits) 20,000 entries (14.27 bits) 200,000 entries (16.09 bits) 80 5,000 entries (72.33 bits) Even in the case of the Percentage of cache miss 20,000 entries (71.32 bits) 200,000 entries (70.30 bits) 60 big Yahoo! log with 8GB of main memory 40 we have a cache miss rate < 10%. 20 0 1 2 4 8 16 32 . Cache Size (GB) . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 48 / 56
  • 66. Conclusion and Future Work Conclusion Outline . 1 Introduction Query Recommender Systems Query Distribution . 2 (Recent) Related Work . 3 Query Suggestion in the Long Tail Search Shortcuts: an IR approach A Graph-based approach . 4 Conclusion and Future Work Conclusion Applications of Search Shortcuts Future Directions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 49 / 56
  • 67. Conclusion and Future Work Conclusion Conclusions . Query recommendation is a tough problem. We have proposed an overall idea (i.e., reducing the length of users’ querying sessions) and three diverse techniques addressing them. The graph-based one is the one that better balance requirements and “theoretical” justification. Query recommendation based on TQ-Graph gets up to 99% of coverage, An index for speeding up online TQ-Graph computation: Reduce the space occupancy by an average of 80% 95% of hit-ratio (few gigabytes of main memory.) . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 50 / 56
  • 68. Conclusion and Future Work Applications of Search Shortcuts Outline . 1 Introduction Query Recommender Systems Query Distribution . 2 (Recent) Related Work . 3 Query Suggestion in the Long Tail Search Shortcuts: an IR approach A Graph-based approach . 4 Conclusion and Future Work Conclusion Applications of Search Shortcuts Future Directions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 51 / 56
  • 69. Conclusion and Future Work Applications of Search Shortcuts Other Applications of Shortcuts . Europeana.eu . Shortcuts will be used as the suggestion mechanism for queries submitted to . the Europeana portal. . Search Results Diversity . G. Capannini, F. M. Nardini, R. Perego, F. Silvestri. Efficient Diversification of Web Search Results. PVLDB 4(7). 451-459 (2011) . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 52 / 56
  • 70. Conclusion and Future Work Applications of Search Shortcuts Other Applications of Shortcuts . Session Retrieval . I. A. Adeyanju, F. M. Nardini, M-D. Albakour, D. S., U. Kruschwitz. RGU-ISTI-Essex at TREC 2011 Session Track. Proceedings of the International Text REtrieval Conference (TREC) 2011. November 2011. . . Intranet Query Recommendations . I A. Adeyanju, D. Song, M-D. Albakour, U. Kruschwitz, A. De Roeck, and M. Fasli. Adaptation of the concept hierarchy model with search logs for query recommendation on intranets. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (SIGIR 2012). 5-14. . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 53 / 56
  • 71. Conclusion and Future Work Future Directions Outline . 1 Introduction Query Recommender Systems Query Distribution . 2 (Recent) Related Work . 3 Query Suggestion in the Long Tail Search Shortcuts: an IR approach A Graph-based approach . 4 Conclusion and Future Work Conclusion Applications of Search Shortcuts Future Directions . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 54 / 56
  • 72. Conclusion and Future Work Future Directions Beyonds Query Suggestion . Task Recommendation . Discover what and how user search tasks are composed together to accomplish even more complex missions Suggestions should be also refer to other tasks of a bigger mission (i.e., task recommendation) . . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 55 / 56
  • 73. Thanks . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Fabrizio Silvestri (ISTI - CNR, Pisa, Italy) Query Recommendations in the Long Tail. December 5, 2012 56 / 56