Rated Ranking Evaluator Enterprise is an enterprise version of the open source Rated Ranking Evaluator search quality evaluation tool. It features query discovery to automatically extract queries from a search API, rating generation using both explicit ratings and implicit feedback, and an interactive UI for exploring and comparing evaluation results. The UI provides overview, exploration, and comparison views of evaluation data to meet the needs of business stakeholders and software engineers. Future work aims to improve the tool's capabilities around configuration, multimedia support, insights generation, and click modeling.
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Rated Ranking Evaluator Enterprise: the next generation of free Search Quality Evaluation Tools
1. Rated Ranking Evaluator Enterprise:
the Next Generation of Free Search
Quality Evaluation Tools
Alessandro Benedetti, Director
Andrea Gazzarini, Co-Founder
15th
September 2021
2. ‣ Born in Tarquinia(ancient Etruscan city)
‣ R&D Software Engineer
‣ Director
‣ Master in Computer Science
‣ Apache Lucene/Solr PMC member/committer
‣ Elasticsearch expert
‣ Semantic, NLP, Machine Learning
technologies passionate
‣ Beach Volleyball player and Snowboarder
Who We Are
Alessandro Benedetti
3. ‣ Born in Viterbo
‣ Hermit Software Engineer
‣ Master in Economy
‣ Programming Passionate
‣ RRE Creator
‣ Apache Lucene/Solr Expert
‣ Elasticsearch Expert
‣ Apache Qpid Committer
‣ Father, Husband
‣ Bass player, aspiring (still frustrated at the moment) Chapman
Stick player
Who We Are
Andrea Gazzarini
4. ‣ Headquarter in London/distributed
‣ Open Source Enthusiasts
‣ Apache Lucene/Solr experts
‣ Elasticsearch experts
‣ Community Contributors
‣ Active Researchers
‣ Hot Trends : Learning To Rank,
Document Similarity,
Search Quality Evaluation,
Relevancy Tuning
Search Services
www.sease.io
8. ‣ Open Source library for Search Quality Evaluation
‣ Ratings are expected in input (Json supported)
‣ Many offline metrics available out-of-the-box
(Precision@k, NDCG@k, F-Measure…)
‣ Apache Solr and Elasticsearch support
‣ Development-centric approach
‣ Evaluation on the fly and results in various formats
‣ Community building up!
‣ RRE-User mailing list:
https://groups.google.com/g/rre-user
‣ https://github.com/SeaseLtd/rated-ranking-evaluator
Rated Ranking Evaluator : RRE Open Source
What is it ?
9. Rated Ranking Evaluator : The Genesis
2018
Search Consultancy Project
A customer explicitly asked for a
rudimental search quality evaluation
tool while we were working on its
search infrastructure.
Jun
Search Quality Evaluation
A Developer Perspective
0.9
Search Quality Evaluation
Tools And Techniques
Oct
1.0
to be continued...
mumble mumble
10. Rated Ranking Evaluator: The Idea
RRE has been thought as a development tool that
executes search quality evaluations as part of a
project build process.
It’s like a read–eval–print loop (REPL) tied on top of an
Information Retrieval subsystem that encourages an
incremental / iterative approach.
The underlying idea
New system Existing system
Here are the requirements
Ok
V1.0 has been released
Cool!
a month later…
We have a change request.
We found a bug
We need to improve our search
system, users are complaining about
junk in search results.
Ok
v0.1
…
v0.9
v1.1
v1.2
v1.3
…
v2.0
v2.0
In terms of retrieval effectiveness,
how can we know the system
performance across various versions?
11. Rated Ranking Evaluator: Domain Model
RRE Domain Model is organized into a composite /
tree-like structure where the relationships between
entities are always 1 to many.
The top level entity is a placeholder representing an
evaluation execution.
Versioned metrics are computed at query level and
then reported, using an aggregation function, at
upper levels.
The benefit of having a composite structure is clear:
we can see a metric value at different levels (e.g. a
query, all queries belonging to a query group, all
queries belonging to a topic or at corpus level)
Domain Model
Evaluation
Corpus
1..*
v1.0
P@10
NDCG
AP
F-MEASURE
….
v1.1
P@10
NDCG
AP
F-MEASURE
….
v1.2
P@10
NDCG
AP
F-MEASURE
….
v1.n
P@10
NDCG
AP
F-MEASURE
….
Topic
Query Group
Query
1..*
1..*
1..*
…
Top level domain entity
Test dataset / collection
Information need
Query variants
Queries
12. Rated Ranking Evaluator : How it works
Data
Configuration
Ratings
Search Platform
uses a
produces
Evaluation Data
INPUT LAYER EVALUATION LAYER OUTPUT LAYER
JSON
RRE Console
…
used for generating
13. Explainability - Why is important in Information Retrieval?
Dev, tune & Build
Check evaluation results
We are thinking about how
to fill a third monitor
15. RRE Enterprise : The Genesis
2018
Search Consultancy Project
A customer explicitly asked for a rudimental search quality evaluation
tool while we were working on its search infrastructure.
2019
Rated Ranking Evaluator
An Open Source Approach
to be continued...
The first sketches depicting an idea
about an enterprise-level version of
RRE.
The development starts few months
later.
16. Rated Ranking Evaluator : The Genesis
2018
Search Consultancy Project
A customer explicitly asked for a rudimental search quality evaluation
tool while we were working on its search infrastructure.
2019
to be continued...
2020
2021
JBCP
ID Discovery
Query Discovery
18. RRE Open Source Recap: How it works (½)
Data
Configuration
Ratings
targets produces
Evaluation Data
INPUT LAYER EVALUATION LAYER OUTPUT LAYER
JSON
RRE Console
…
used for generating
OR
19. RRE Open Source Recap: How it works (2/2)
Data
Configuration
Ratings
targets produces
Evaluation Data
INPUT LAYER EVALUATION LAYER OUTPUT LAYER
JSON
RRE Console
…
used for generating
OR
20. RRE Enterprise: Query Discovery?
API
Problem: I have an intermediate Search-API that builds complex Apache Solr/Elasticsearch queries
21. RRE Open Source: No Query Discovery
targets
OR
1
Evaluation Data
produces
2
22. Query Discovery
RRE Enterprise: Query Discovery & Evaluation
2
Evaluation Data
produces
3
targets
OR
{SEARCH API}
1
correlates on
targets
23. Input Rating: RRE Open Source vs RREE
RRE Enterprise
RRE Open Source
Topic
Query Group
Query
Information need
Query variant
(Search Engine) Query
+ 3
Rated Document
Topic
Query Group
API Request
Information need
Query variant
(Search API) Request
+ 3
Rated Document
Query (Search Engine) Query
25. RRE Enterprise: Rating Generation
A fundamental requirement of
offline search quality
evaluation is to gather
<query,document,rating>
triples that represent the
relevance(rating) of a
document given a user
information need(query).
Before assessing the retrieval
effectiveness of a system it is
necessary to associate a
relevance rating to each pair
<query, document> involved in
our evaluation.
26. RRE Enterprise: Explicit Ratings (1/2)
• Explicitly provided by domain experts
• High accuracy
• High effort / time / resources
• RRE Open Source accepts only explicit ratings
Explicit Ratings
RRE Rating Structure
Topic
Query Group
Query
Information need
Query variant
Query
+ 3
Rated Document
Question: how to minimize the relevant effort required
to provide explicit ratings?
27. RRE Enterprise: Explicit Ratings (2/2)
• Chrome plugin which applies an
evaluation layer on top of an arbitrary
website
• Ratings are generated using directly the
customer website
• Users Lowest learning curve
• Generated ratings are sent to RREE
through a specific endpoint
• An “ID Discovery” component translates
the received data (rated web items) in RRE
Ratings (rated Solr/Elasticsearch
documents)
Judgment Collector
47. ‣ Release with free usage plan
‣ Configuration support
‣ Support for multimedia document properties
‣ Intelligent insights on weak performing queries,
groups, topics
‣ Improvements on Click Modelling for Implicit
Relevance estimation
RRE Enterprise: Future Work