23. Federated search Query Broker Collection A Query Query Query Query Query Collection B Collection C Collection D Collection E Sum A Sum B Sum C Sum D Sum E Merged results
30. Data fusion Query GOV2 BM25 KL Inquery Anchor only Title only One document collection Different document representations Different retrieval models Merging One ranked list of result (merged) (e.g. Voorhees etal, 95)
76. Resource selection: how to select the resource(s) to be searched for relevant documents.
77. Resource selection for federated search Query Broker Collection A Query Query Query Collection B Collection C Collection D Collection E Sum A Sum B Sum C Sum D Sum E
115. Resource presentation: how to return results retrieved from several resources to users.
116.
117.
118. Result merging in federated search User Broker Collection A Query Query Collection B Collection C Collection D Collection E Sum A Sum B Sum C Sum D Sum E Merged results Query
123. Images on top Images in the middle Images at the bottom Images at top-right Images on the left Images at the bottom-right Slotted vs tiled result presentation 3 verticals 3 positions 3 degree of vertical intents (Sushmita et al, 10)
124.
125. Recap – Result presentation federated search aggregated search Content type homogenous (text documents) heterogeneous Document scores depends on environment heterogeneous Oracle centralized index none
126.
127. Evaluation Evaluation: how to measure the effectiveness of federated and aggregated search systems.
134. Test collections (a la TREC) * There are on an average more than 100 events/shots contained in each video clip (document) (Zhou & Lalmas, 10) Statistics on Topics number of topics 150 average rel docs per topic 110.3 average rel verticals per topic 1.75 ratio of “General Web” topics 29.3% ratio of topics with two vertical intents 66.7% ratio of topics with more than two vertical intents 4.0% quantity/media text image video total size (G) 2125 41.1 445.5 2611.6 number of documents 86,186,315 670,439 1,253* 86,858,007
135. ImageCLEF photo retrieval track …… TREC web track INEX ad-hoc track TREC blog track topic t 1 doc d 1 d 2 d 3 … d n judgment R N R … R …… Blog Vertical Reference (Encyclopedia) Vertical Image Vertical General Web Vertical Shopping Vertical topic t 1 doc d 1 d 2 … d V1 judgment R N … R vertical V 1 V 2 d 1 d 2 … d V2 N N … R …… V k d 1 d 2 … d Vk N N … N t 1 existing test collections (simulated) verticals Test collections (a la TREC)
136. Recap – Evaluation federated search aggregated search Editorial data document relevance judgments query labels Behavioral data none critical
137.
138.
139.
140.
141.
142.
143.
144.
145.
146.
147.
148.
149.
150.
Editor's Notes
Add URL
MILAD: Anchor text should not be THERE (you said it – please updated) MILAD: there was a comment from Andrew Trotman (we can ignore) about cooperative search engines. Anything you want to add about this (as I said we can safely ignore)
There was a comment about Amdox (Yellow Page): Mliad???
Say why some are underlined.
Formula does not print
Slide did not print well (stuff missing)
Milad you said “Collection overlap estimation” was misplaced here.
I have a comment here that says add the MJ slide
Server vs collection here – does it matter at the end? Would be nice to have collection here
Server vs collection
Server vs collection
Milad, you did speak quite a bit here, so maybe add something more?
I have a comment here: KDD cup?
All should be in % (or at least same format) Text needed here.
Say in some text what is combined here.
For other issues here, I have as comment add refs.
I have as comment here “predict newsworthiness of queries”