From federated to aggregated search

From federated to aggregated search Fernando Diaz, Mounia Lalmas and Milad Shokouhi [email_address] [email_address] [email_address]

Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object]

A classical example of federated search www.theeuropeanlibrary.org Collections to be searched One query

A classical example of federated search www.theeuropeanlibrary.org Merged list of results

Motivation for federated search ,[object Object],[object Object],[object Object],[object Object],[object Object]

Challenges for federated search ,[object Object],[object Object],[object Object],[object Object],[object Object]

From federated search to aggregated search ,[object Object],[object Object],[object Object],[object Object]

A classical example of aggregated search News Homepage Wikipedia Real-time results Video Twitter Structured Data

Motivation for aggregated search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Google universal search 2007 : [ … ] search across all its content sources, compare and rank all the information in real time, and deliver a single, integrated set of search results [ … ] will incorporate information from a variety of previously separate sources – including videos, images, news, maps, books, and websites – into a single set of results. http://www.google.com/intl/en/press/pressrel/universalsearch_20070516.html

Motivation for aggregated search (Arguello et al , 09) 25K editorially classified queries

Motivation for aggregated search

Challenges in aggregated search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Ambiguous non-stationary intent Query - Travel - Molusk - Paul Vertical - Wikipedia - News - Image

Recap – Introduction federated search aggregated search heterogeneity low high scale (documents, users) small large user feedback little a lot

Terminology ,[object Object],[object Object],[object Object]

Problem definition Present the “querier” with a summary of search results from one or more resources.

General architecture User Search Interface/ Portal/ Broker Source/ Server/ Vertical Source/ Server/ Vertical Source/ Server/ Vertical Source/ Server/ Vertical Raw Query Source/ Server/ Vertical Query Query Query Query Query

Peer-to-peer network Peer Directory Server

Peer to Peer (P2P) networks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Federated search Query Broker Collection A Query Query Query Query Query Collection B Collection C Collection D Collection E Sum A Sum B Sum C Sum D Sum E Merged results

Federated search ,[object Object],[object Object],[object Object],[object Object]

http://funnelback.com/pdfs/brochures/enterprise.pdf

Metasearch User Metasearch engine Raw Query WWW Query Query Query Query

Metasearch ,[object Object],[object Object],[object Object],[object Object]

Aggregated search User Angelina Jolie Results WWW Index (text) Query Query Query Query

Aggregated search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Data fusion Query GOV2 BM25 KL Inquery Anchor only Title only One document collection Different document representations Different retrieval models Merging One ranked list of result (merged) (e.g. Voorhees etal, 95)

Data fusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Terminology - Resource ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Terminology - Aggregation ,[object Object],[object Object],[object Object],[object Object],[object Object]

Aggregated search (tiled) http://au.alpha.yahoo.com/

Aggregated search (tiled) Naver.com

Others ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Yippy – Clustering search engine from Vivisimo clusty.com

Multi-document summarization http://newsblaster.cs.columbia.edu/

“ Fictitious” document generation (Paris et al, 10)

Entity search http://sandbox.yahoo.com/Correlator

Recap ,[object Object],[object Object],[object Object],[object Object]

Architecture: what are the general components of federated and aggregated search systems.

Aggregated search architecture ,[object Object],[object Object],[object Object],[object Object]

Pre and post-retrieval, pre-web

Resource representation: how to represent resources, so that we know what documents each contain.

Resource representation in federated search (Also known as resource summary/description)

Resource representation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Resource representation (cooperative environments) ,[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],Resource representation (cooperative environments)

Resource representation (uncooperative environments) ,[object Object],[object Object],[object Object],[object Object],Query selector Query Sampled documents

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Resource representation (uncooperative environments)

[object Object],[object Object],[object Object],[object Object],Resource representation (uncooperative environments)

[object Object],Resource representation (Collection size estimation) Sample A (Capture) Sample B (recapture) http://www.dorlingkindersley-uk.co.uk/static/cs/uk/11/clipart/nature/image_nature040.html

Resource representation (Collection size estimation)

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Resource representation (Collection size estimation)

Resource representation (Updating summaries) ,[object Object],[object Object]

Resource representation in aggregated search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Vertical content includes text NEWS

Vertical content includes structure SPORTS

Vertical content includes images IMAGES

Issues with vertical content ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Addressing content dynamics ,[object Object],[object Object],[object Object],[object Object],(Konig et al, 09)

Addressing heterogeneous content ,[object Object],[object Object],(Arguello et al, 09) performance of two different methods of dealing with heterogeneous content

Vertical query logs ,[object Object],[object Object]

Issues with vertical query logs ,[object Object],[object Object],[object Object],[object Object],[object Object]

Hybrid approaches ,[object Object],[object Object],[object Object],[object Object]

Recap – Resource representation federated search aggregated search Representation completeness low low-high Representation generation sampling/shared dictionaries sampling, API Freshness important critical

Resource selection: how to select the resource(s) to be searched for relevant documents.

Resource selection for federated search Query Broker Collection A Query Query Query Collection B Collection C Collection D Collection E Sum A Sum B Sum C Sum D Sum E

[object Object],[object Object],[object Object],[object Object],Resource selection (Lexicon-based methods) Collection C Collection A Collection B Sampling Sampling Sampling Broker

Resource selection (Lexicon-based methods) ,[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],Resource selection (Document-surrogate methods) Collection C Collection A Collection B Sampling Sampling Sampling Broker

Resource selection (Document-surrogate methods) ,[object Object],[object Object],[object Object],[object Object],[object Object],Query Ranking Broker

[object Object],Resource selection (Document-surrogate methods) http://www.monthly.se/nucleus/index.php?itemid=1464

[object Object],Resource selection (Document-surrogate methods) ,[object Object],[object Object],http://www.monthly.se/nucleus/index.php?itemid=1464

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Resource selection (Supervised methods)

Resource selection in aggregated Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Content-based predictors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Issues with content-based predictors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

String-based predictors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

String-based predictors ,[object Object],[object Object],[object Object],[object Object]

Log-based predictors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Comparing predictor performance (Arguello et al, 09)

Predictor cost ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Combining predictors ,[object Object],[object Object],[object Object],[object Object],[object Object],(Diaz, 09; Arguello etal, 09; Konig etal, 09)

Editorial data ,[object Object],[object Object],[object Object],[object Object],[object Object]

Combining predictors (Arguello etal, 09)

Click data ,[object Object],[object Object],[object Object],[object Object],[object Object]

Gathering click data ,[object Object],[object Object],[object Object],[object Object],[object Object]

Gathering click data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Click precision and recall (Konig etal, 09) ability to predict queries using thresholded click-through-rate to infer relevance

Non-target data have training data no data

Non-target data ,[object Object],[object Object],[object Object],[object Object],[object Object]

Non-target data ,[object Object]

Generic model ,[object Object],[object Object],[object Object],[object Object],[object Object]

Non-target data ,[object Object],adapted model

Adapted model ,[object Object],[object Object],[object Object],[object Object],[object Object]

Non-target query classification ,[object Object],average precision on target query classification; red (blue) indicates statistically significant improvements (degradations) compared to the single predictor

Training set characteristics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Online adaptation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Online adaptation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Online adaptation ,[object Object],[object Object],[object Object],[object Object],[object Object]

Resource presentation: how to return results retrieved from several resources to users.

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Result merging (Metasearch engines)

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Result merging (Data fusion)

Result merging in federated search User Broker Collection A Query Query Collection B Collection C Collection D Collection E Sum A Sum B Sum C Sum D Sum E Merged results Query

[object Object],[object Object],Result merging

Result merging ,[object Object],A G B C D E F H Query Ranking Selected resources L R D F Q Broker

Result merging http://upload.wikimedia.org/wikipedia/en/1/13/Linear_regression.png Source-specific score Broker score

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Result merging - Miscellaneous scenarios

Images on top Images in the middle Images at the bottom Images at top-right Images on the left Images at the bottom-right Slotted vs tiled result presentation 3 verticals 3 positions 3 degree of vertical intents (Sushmita et al, 10)

[object Object],[object Object],[object Object],[object Object],Slotted vs tiled

Recap – Result presentation federated search aggregated search Content type homogenous (text documents) heterogeneous Document scores depends on environment heterogeneous Oracle centralized index none

Evaluation Evaluation: how to measure the effectiveness of federated and aggregated search systems.

[object Object],[object Object],[object Object],[object Object],Resource representation (summaries) evaluation – Federated search

Resource selection evaluation – Federated search

Result merging evaluation – Federated search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Vertical Selection Evaluation – Aggregated search ,[object Object],[object Object],[object Object],[object Object],[object Object],single vertical selection

Behavioral data ,[object Object],[object Object],[object Object],[object Object],[object Object]

Test collections (a la TREC) * There are on an average more than 100 events/shots contained in each video clip (document) (Zhou & Lalmas, 10) Statistics on Topics number of topics 150 average rel docs per topic 110.3 average rel verticals per topic 1.75 ratio of “General Web” topics 29.3% ratio of topics with two vertical intents 66.7% ratio of topics with more than two vertical intents 4.0% quantity/media text image video total size (G) 2125 41.1 445.5 2611.6 number of documents 86,186,315 670,439 1,253* 86,858,007

ImageCLEF photo retrieval track …… TREC web track INEX ad-hoc track TREC blog track topic t 1 doc d 1 d 2 d 3 … d n judgment R N R … R …… Blog Vertical Reference (Encyclopedia) Vertical Image Vertical General Web Vertical Shopping Vertical topic t 1 doc d 1 d 2 … d V1 judgment R N … R vertical V 1 V 2 d 1 d 2 … d V2 N N … R …… V k d 1 d 2 … d Vk N N … N t 1 existing test collections (simulated) verticals Test collections (a la TREC)

Recap – Evaluation federated search aggregated search Editorial data document relevance judgments query labels Behavioral data none critical

Open problems in federated search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Open problems in aggregated search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Bibliography ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Bibliography

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Bibliography

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Bibliography

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Bibliography

From federated to aggregated search

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to From federated to aggregated search

Similar to From federated to aggregated search (20)

More from Mounia Lalmas-Roelleke

More from Mounia Lalmas-Roelleke (20)

Recently uploaded

Recently uploaded (20)

From federated to aggregated search

Editor's Notes