SlideShare a Scribd company logo
1 of 35
Download to read offline
Overview of the Living Labs for
IR Evaluation (LL4IR) CLEF Lab
http://living-labs.net
@livinglabsnet
“Give us your ranking, we’ll have it clicked!”
Krisztian Balog

University of Stavanger
Liadh Kelly

Trinity College Dublin
Anne Schuth

Blendle
7th International Conference of the CLEF Association (CLEF 2016) | Évora, Portugal, 2016
Living Labs 

for IR Evaluation
Motivation
- Overall goal: make information retrieval
evaluation more realistic
new retrieval methodusers live site
interaction
data
How to test a new method with real
users in their natural task
environment (i.e., on the live site)?
#1
How to make interaction data
available for method development?
#2
Key idea
new retrieval
methods
users live site
data 

(docs/products,
logs, etc.)
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14
API
Key idea
new retrieval
methods
users live site
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14
An API orchestrates all data exchange
between the live site and experimental
systems#1
API
data 

(docs/products,
logs, etc.)
Key idea
new retrieval
methods
users live site
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14
Focus on frequent (head) queries.

- Ranked result lists can be generated offline

- Enough traffic on them (historical & live)#2
API
data 

(docs/products,
logs, etc.)
Key idea
new retrieval
methods
users live site
K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14
Medium to large organizations with
fair amount of search volume

Typically lack their own R&D department#3
API
data 

(docs/products,
logs, etc.)
Methodology
1. Queries, candidate documents, historical search and
click data made available
API
{
"queries": [
{
"creation_time": "Wed, 22 Apr 2015 09:15:41 -0000",
"qid": "R-q1",
"qstr": "monster high",
"type": "train"
},
{
"creation_time": "Wed, 22 Apr 2015 09:15:41 -0000",
"qid": "R-q51",
Methodology
1. Queries, candidate documents, historical search and
click data made available
API
{
"doclist": [
{
"docid": "R-d1291",
"site_id": "R",
"title": "LEGO DUPLO Hamupipu0151ke hintu00f3ja 6153"
},
{
"docid": "R-d1306",
"site_id": "R",
"title": "LEGO Rendu0151rkapitu00e1nysu00e1g 5681"
Methodology
1. Queries, candidate documents, historical search and
click data made available
API
{
"content": {
"age_max": 3,
"age_min": 1,
"arrived": "2014-08-28",
"available": 0,
"brand": "Lego",
"category": "LEGO",
"category_id": "38",
"characters": [],
"description": "Lego Duplo - u00c9pu00edtu0151-u00e9s j
Methodology
2. Rankings are generated for each query and uploaded
through an API
API
{
"qid": "U-q22",
"runid": "82"
"creation_time": "Wed, 04 Jun 2014 15:03:56 -0000",
"doclist": [
{
"docid": "U-d4"
},
{
"docid": "U-d2"
}, ...
Methodology
3. When any of the test queries is fired, the live site
request rankings from the API and interleaves them
with that of the production system
API
Interleaving
- Site provides the set of candidate items that can be
re-ranked (safety mechanism)

- Experimental ranking is interleaved with the
production ranking

- Meeds 1-2 order of magnitudes data than A/B testing (also,
it is within subject as opposed to between subject design)
doc 1
doc 2
doc 3
doc 4
doc 5
doc 2
doc 4
doc 7
doc 1
doc 3
system A system B
doc 1
doc 2
doc 4
doc 3
doc 7
interleaved list
A>B
Inference:
Methodology
4. Participants get detailed feedback on user
interactions (clicks)
API
{
"feedback": [
{
"qid": "S-q1",
"runid": "baseline",
"type": "tdi",
"doclist": [
{
"docid": "S-d1",
"clicked": true,
"team": "site",
Methodology
5. Ultimate measure is the number of “wins” against the
production system (aggregated over a period of time)
Outcome =
#Wins
#Wins + #Losses
What is in it for
participants?
- Access to privileged commercial data 

- (Search and click-through data)
- Opportunity to test IR systems with real,
unsuspecting users in a live setting

- (Not the same as crowdsourcing!)
- (Continuous evaluation is possible, not limited to
yearly evaluation cycle)
The Living Labs Platform
Source code

https://bitbucket.org/living-labs/ll-api
Documentation

http://doc.living-labs.net/
Dashboard

http://dashboard.living-labs.net/
CLEF LL4IR
Use-cases
• Product search

(REGIO Játék)
• Web search

(Seznam)
• Product search

(REGIO Játék)
Benchmark organization
training period test period
query
type
train
- feedback available

- individual feedback

- update possible
test
- feedback available

- no individual feedback

- update possible
- no feedback available

- no individual feedback

- update not possible
Product search
- Ad-hoc retrieval over a product catalog

- Several thousand products

- Limited amount of text, lots of structure

- Categories, characters, brands, etc.
Product data
Product data Product name
Price / bonus price
Short
description
Recommended
age from/to
Gender
recommendation
Categories
Brands
Long
description
(Links to) photos
{
"content": {
"age_max": 10,
"age_min": 6,
"arrived": "2014-08-28",
"available": 1,
"brand": "Mattel",
"category": "Babu00e1k, kellu00e9kek",
"category_id": "25",
"characters": [],
"description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i […]",
"gender": 2,
"main_category": "Baba, babakocsi",
"main_category_id": "3",
"photos": [
"http://regiojatek.hu/data/regio_images/normal/20777_0.jpg",
"http://regiojatek.hu/data/regio_images/normal/20777_1.jpg",
[…]
],
"price": 8675.0,
"product_name": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le",
"queries": {
"clawdeen": "0.037",
"monster": "0.222",
"monster high": "0.741"
},
"short_description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i 

elsu0151 ku00fclfu00f6ldi u00fatjukra indulnak..."
},
"creation_time": "Mon, 11 May 2015 04:52:59 -0000",
"docid": "R-d43",
"site_id": "R",
"title": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le"
}
Frequent queries that
led to the product
Queries
- Typically very short
monster high
magnetiz
duplo
lego friends
geomag
trash+pack
barbie
monopoly
lego duplo
transformers
star wars
nerf
carrera
baba
Results (2015)Outcome
0
0,1
0,2
0,3
0,4
0,5
0,6
Evaluation round
0 1 2 3 4 5
Baseline UiS GESIS IRIT
Inventory changes
New arrival
Became available
Became unavailable
Days
#Products
−40−20020406080−40−20020406080
05−01 05−03 05−05 05−07 05−09 05−11 05−13 05−15
Summary and Outlook
Summary
- Successes

- Experimental methodology
- Many interesting opportunities to address current limitations 

(come to NewsREEL & LL4IR session tomorrow)
- The living labs platform
- Open source, can be used for a variety of tasks
- Some interesting work for product search
- See best of the labs session
- Lack of success

- Raise sufficient interest in the use-cases at CLEF
Limitations / Open issues
- Head queries only: Considerable portion of traffic,
but only popular info needs

- Lack of context: No knowledge of the searcher’s
location, previous searches, etc.

- No real-time feedback: API provides detailed
feedback, but it’s not immediate

- Limited control: Experimentation is limited to single
searches, where results are interleaved with those of
the production system; no control over the entire
result list

- Ultimate measure of success: Search is only a
means to an end, it is not the ultimate goal
TREC Open Search

http://trec-open-search.org/
- Use-case: academic search

- Ad-hoc document search
- Sites

- CiteSeerX
- SSOAR — German Social Sciences
- Microsoft Academic Search
- Round #3 runs from Oct 1 to Nov 15
We you!
living-labs.net
Thanks to

More Related Content

Similar to Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab

DevSecCon London 2018: Open DevSecOps
DevSecCon London 2018: Open DevSecOpsDevSecCon London 2018: Open DevSecOps
DevSecCon London 2018: Open DevSecOpsDevSecCon
 
Tracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stackTracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stackAnna Ossowski
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4DataWorks Summit
 
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInDataWorks Summit
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInAmy W. Tang
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalogMongoDB
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at ScaleEoin Hurrell, PhD
 
OREChem Services and Workflows
OREChem Services and WorkflowsOREChem Services and Workflows
OREChem Services and Workflowsmarpierc
 
BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)Dave Rogers
 
Swathi.V_BE(CSE)_Resume_2016
Swathi.V_BE(CSE)_Resume_2016Swathi.V_BE(CSE)_Resume_2016
Swathi.V_BE(CSE)_Resume_2016Swathi V
 
Detection of REST Patterns and Antipatterns: A Heuristics-based Approach
Detection of REST Patterns and Antipatterns: A Heuristics-based ApproachDetection of REST Patterns and Antipatterns: A Heuristics-based Approach
Detection of REST Patterns and Antipatterns: A Heuristics-based ApproachFrancis Palma
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...NETWAYS
 
Introduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesIntroduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesChris Schalk
 
44rd CEN WS/LT meeting PT social data
44rd CEN WS/LT meeting PT social data44rd CEN WS/LT meeting PT social data
44rd CEN WS/LT meeting PT social dataJoris Klerkx
 
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Big Data Spain
 
Test trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely testsTest trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely testsHugh McCamphill
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBigDataExpo
 
Xerte Conference, June 2018
Xerte Conference, June 2018Xerte Conference, June 2018
Xerte Conference, June 2018Ian Dolphin
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks
 

Similar to Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab (20)

DevSecCon London 2018: Open DevSecOps
DevSecCon London 2018: Open DevSecOpsDevSecCon London 2018: Open DevSecOps
DevSecCon London 2018: Open DevSecOps
 
Tracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stackTracking and visualizing COVID-19 with Elastic stack
Tracking and visualizing COVID-19 with Elastic stack
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4Koshy june27 140pm_room210_c_v4
Koshy june27 140pm_room210_c_v4
 
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-time Data Pipeline: Apache Kafka at LinkedIn
 
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedInBuilding a Real-Time Data Pipeline: Apache Kafka at LinkedIn
Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn
 
Retail referencearchitecture productcatalog
Retail referencearchitecture productcatalogRetail referencearchitecture productcatalog
Retail referencearchitecture productcatalog
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at Scale
 
OREChem Services and Workflows
OREChem Services and WorkflowsOREChem Services and Workflows
OREChem Services and Workflows
 
BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)BBC Linked Data Platform (SemTechBiz San Fran 2013)
BBC Linked Data Platform (SemTechBiz San Fran 2013)
 
Swathi.V_BE(CSE)_Resume_2016
Swathi.V_BE(CSE)_Resume_2016Swathi.V_BE(CSE)_Resume_2016
Swathi.V_BE(CSE)_Resume_2016
 
Detection of REST Patterns and Antipatterns: A Heuristics-based Approach
Detection of REST Patterns and Antipatterns: A Heuristics-based ApproachDetection of REST Patterns and Antipatterns: A Heuristics-based Approach
Detection of REST Patterns and Antipatterns: A Heuristics-based Approach
 
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin &  Leanne La...
OSMC 2023 | Experiments with OpenSearch and AI by Jochen Kressin & Leanne La...
 
Introduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologiesIntroduction to Google Cloud platform technologies
Introduction to Google Cloud platform technologies
 
44rd CEN WS/LT meeting PT social data
44rd CEN WS/LT meeting PT social data44rd CEN WS/LT meeting PT social data
44rd CEN WS/LT meeting PT social data
 
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
 
Test trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely testsTest trend analysis: Towards robust reliable and timely tests
Test trend analysis: Towards robust reliable and timely tests
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It Happens
 
Xerte Conference, June 2018
Xerte Conference, June 2018Xerte Conference, June 2018
Xerte Conference, June 2018
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
 

More from krisztianbalog

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...krisztianbalog
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...krisztianbalog
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?krisztianbalog
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphskrisztianbalog
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligencekrisztianbalog
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluationkrisztianbalog
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generationkrisztianbalog
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Nextkrisztianbalog
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Editionkrisztianbalog
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Searchkrisztianbalog
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)krisztianbalog
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systemskrisztianbalog
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)krisztianbalog
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendationkrisztianbalog
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)krisztianbalog
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seachkrisztianbalog
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Searchkrisztianbalog
 

More from krisztianbalog (19)

Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
Towards Filling the Gap in Conversational Search: From Passage Retrieval to C...
 
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...Conversational AI from an Information Retrieval Perspective: Remaining Challe...
Conversational AI from an Information Retrieval Perspective: Remaining Challe...
 
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?What Does Conversational Information Access Exactly Mean and How to Evaluate It?
What Does Conversational Information Access Exactly Mean and How to Evaluate It?
 
Personal Knowledge Graphs
Personal Knowledge GraphsPersonal Knowledge Graphs
Personal Knowledge Graphs
 
Entities for Augmented Intelligence
Entities for Augmented IntelligenceEntities for Augmented Intelligence
Entities for Augmented Intelligence
 
On Entities and Evaluation
On Entities and EvaluationOn Entities and Evaluation
On Entities and Evaluation
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
 
Entity Search: The Last Decade and the Next
Entity Search: The Last Decade and the NextEntity Search: The Last Decade and the Next
Entity Search: The Last Decade and the Next
 
Overview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search EditionOverview of the TREC 2016 Open Search track: Academic Search Edition
Overview of the TREC 2016 Open Search track: Academic Search Edition
 
Entity Linking
Entity LinkingEntity Linking
Entity Linking
 
Evaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented SearchEvaluation Initiatives for Entity-oriented Search
Evaluation Initiatives for Entity-oriented Search
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
 
Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)Entity Retrieval (WSDM 2014 tutorial)
Entity Retrieval (WSDM 2014 tutorial)
 
Time-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation SystemsTime-aware Evaluation of Cumulative Citation Recommendation Systems
Time-aware Evaluation of Cumulative Citation Recommendation Systems
 
Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)Entity Retrieval (SIGIR 2013 tutorial)
Entity Retrieval (SIGIR 2013 tutorial)
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)
 
Semistructured Data Seach
Semistructured Data SeachSemistructured Data Seach
Semistructured Data Seach
 
Collection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity SearchCollection Ranking and Selection for Federated Entity Search
Collection Ranking and Selection for Federated Entity Search
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab

  • 1. Overview of the Living Labs for IR Evaluation (LL4IR) CLEF Lab http://living-labs.net @livinglabsnet “Give us your ranking, we’ll have it clicked!” Krisztian Balog University of Stavanger Liadh Kelly Trinity College Dublin Anne Schuth Blendle 7th International Conference of the CLEF Association (CLEF 2016) | Évora, Portugal, 2016
  • 2. Living Labs 
 for IR Evaluation
  • 3. Motivation - Overall goal: make information retrieval evaluation more realistic new retrieval methodusers live site interaction data How to test a new method with real users in their natural task environment (i.e., on the live site)? #1 How to make interaction data available for method development? #2
  • 4. Key idea new retrieval methods users live site data 
 (docs/products, logs, etc.) K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 API
  • 5. Key idea new retrieval methods users live site K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 An API orchestrates all data exchange between the live site and experimental systems#1 API data 
 (docs/products, logs, etc.)
  • 6. Key idea new retrieval methods users live site K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 Focus on frequent (head) queries.
 - Ranked result lists can be generated offline
 - Enough traffic on them (historical & live)#2 API data 
 (docs/products, logs, etc.)
  • 7. Key idea new retrieval methods users live site K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evaluation. CIKM'14 Medium to large organizations with fair amount of search volume
 Typically lack their own R&D department#3 API data 
 (docs/products, logs, etc.)
  • 8. Methodology 1. Queries, candidate documents, historical search and click data made available API { "queries": [ { "creation_time": "Wed, 22 Apr 2015 09:15:41 -0000", "qid": "R-q1", "qstr": "monster high", "type": "train" }, { "creation_time": "Wed, 22 Apr 2015 09:15:41 -0000", "qid": "R-q51",
  • 9. Methodology 1. Queries, candidate documents, historical search and click data made available API { "doclist": [ { "docid": "R-d1291", "site_id": "R", "title": "LEGO DUPLO Hamupipu0151ke hintu00f3ja 6153" }, { "docid": "R-d1306", "site_id": "R", "title": "LEGO Rendu0151rkapitu00e1nysu00e1g 5681"
  • 10. Methodology 1. Queries, candidate documents, historical search and click data made available API { "content": { "age_max": 3, "age_min": 1, "arrived": "2014-08-28", "available": 0, "brand": "Lego", "category": "LEGO", "category_id": "38", "characters": [], "description": "Lego Duplo - u00c9pu00edtu0151-u00e9s j
  • 11. Methodology 2. Rankings are generated for each query and uploaded through an API API { "qid": "U-q22", "runid": "82" "creation_time": "Wed, 04 Jun 2014 15:03:56 -0000", "doclist": [ { "docid": "U-d4" }, { "docid": "U-d2" }, ...
  • 12. Methodology 3. When any of the test queries is fired, the live site request rankings from the API and interleaves them with that of the production system API
  • 13. Interleaving - Site provides the set of candidate items that can be re-ranked (safety mechanism) - Experimental ranking is interleaved with the production ranking - Meeds 1-2 order of magnitudes data than A/B testing (also, it is within subject as opposed to between subject design) doc 1 doc 2 doc 3 doc 4 doc 5 doc 2 doc 4 doc 7 doc 1 doc 3 system A system B doc 1 doc 2 doc 4 doc 3 doc 7 interleaved list A>B Inference:
  • 14. Methodology 4. Participants get detailed feedback on user interactions (clicks) API { "feedback": [ { "qid": "S-q1", "runid": "baseline", "type": "tdi", "doclist": [ { "docid": "S-d1", "clicked": true, "team": "site",
  • 15. Methodology 5. Ultimate measure is the number of “wins” against the production system (aggregated over a period of time) Outcome = #Wins #Wins + #Losses
  • 16. What is in it for participants? - Access to privileged commercial data - (Search and click-through data) - Opportunity to test IR systems with real, unsuspecting users in a live setting - (Not the same as crowdsourcing!) - (Continuous evaluation is possible, not limited to yearly evaluation cycle)
  • 17. The Living Labs Platform
  • 22. Use-cases • Product search
 (REGIO Játék) • Web search
 (Seznam) • Product search
 (REGIO Játék)
  • 23. Benchmark organization training period test period query type train - feedback available
 - individual feedback
 - update possible test - feedback available
 - no individual feedback
 - update possible - no feedback available
 - no individual feedback
 - update not possible
  • 24. Product search - Ad-hoc retrieval over a product catalog - Several thousand products - Limited amount of text, lots of structure - Categories, characters, brands, etc.
  • 26. Product data Product name Price / bonus price Short description Recommended age from/to Gender recommendation Categories Brands Long description (Links to) photos
  • 27. { "content": { "age_max": 10, "age_min": 6, "arrived": "2014-08-28", "available": 1, "brand": "Mattel", "category": "Babu00e1k, kellu00e9kek", "category_id": "25", "characters": [], "description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i […]", "gender": 2, "main_category": "Baba, babakocsi", "main_category_id": "3", "photos": [ "http://regiojatek.hu/data/regio_images/normal/20777_0.jpg", "http://regiojatek.hu/data/regio_images/normal/20777_1.jpg", […] ], "price": 8675.0, "product_name": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le", "queries": { "clawdeen": "0.037", "monster": "0.222", "monster high": "0.741" }, "short_description": "A Monster Highu00ae iskola szu00f6rnycsemetu00e9i 
 elsu0151 ku00fclfu00f6ldi u00fatjukra indulnak..." }, "creation_time": "Mon, 11 May 2015 04:52:59 -0000", "docid": "R-d43", "site_id": "R", "title": "Monster High Scaris Paravu00e1rosi baba tu00f6bbfu00e9le" } Frequent queries that led to the product
  • 28. Queries - Typically very short monster high magnetiz duplo lego friends geomag trash+pack barbie monopoly lego duplo transformers star wars nerf carrera baba
  • 30. Inventory changes New arrival Became available Became unavailable Days #Products −40−20020406080−40−20020406080 05−01 05−03 05−05 05−07 05−09 05−11 05−13 05−15
  • 32. Summary - Successes - Experimental methodology - Many interesting opportunities to address current limitations 
 (come to NewsREEL & LL4IR session tomorrow) - The living labs platform - Open source, can be used for a variety of tasks - Some interesting work for product search - See best of the labs session - Lack of success - Raise sufficient interest in the use-cases at CLEF
  • 33. Limitations / Open issues - Head queries only: Considerable portion of traffic, but only popular info needs - Lack of context: No knowledge of the searcher’s location, previous searches, etc. - No real-time feedback: API provides detailed feedback, but it’s not immediate - Limited control: Experimentation is limited to single searches, where results are interleaved with those of the production system; no control over the entire result list - Ultimate measure of success: Search is only a means to an end, it is not the ultimate goal
  • 34. TREC Open Search
 http://trec-open-search.org/ - Use-case: academic search - Ad-hoc document search - Sites - CiteSeerX - SSOAR — German Social Sciences - Microsoft Academic Search - Round #3 runs from Oct 1 to Nov 15