Interleaving, Evaluation to Self-learning Search @904Labs

Interleaving.
From evaluation
to self learning
John T. Kane – representing 904Labs in USA, +
Solution Architect / Product Manager @ Voyager Search

About myself and 904Labs
I’ve been in the search field for 15+ years starting with SQL Server Full-text
Search (FTS) in 1998 with roles in Tech. Support, Sales Engineering (FAST) &
Product Manager roles at HP, Lucidworks (Fusion 1.0) and recently at HPE.
While I currently work for Voyager Search, I’m at Haystack representing
904Labs.
904Labs is a Dutch search company founded by Manos Tsagkias and Wouter
Weerkamp, two former academic researchers in the field of Information
Retrieval. The company offers Online Learning to Rank as-a-Service
(OLtRaaS)

For decades people tried to come up with clever ways to model
“relevance”. In the early 70s, TF-IDF was introduced, relying on counting
word overlap between queries and documents. (main use case: early digital
library / card catalog)
In early 80’s, researchers came up with BM25 (used in early SharePoint
Search 2001), a parameterized version of TF-IDF. It wasn’t until 2015 that
Lucene/Solr changed it’s default ranking function to BM25.
So, today’s standard search relevance uses 40 year old ranking functions.
How to determine relevance?

Enter machine learning
Since a couple of years people have started to realize that search,
or modeling relevance, has become too complex to fit in BM25. A
paradigm shift is taking place, moving into the direction of
learning the ranking function from training data.
This paradigm shift is translated into learning to rank plugins for
Apache Solr and Elasticsearch, and is also apparent from the
many talks at Haystack about learning to rank.

Learning to rank is a batch process. Training data is collected,
features are extracted, and a model is trained using an objective
function. Every couple of hours/days/weeks, this process is repeated
and a new model is trained. This requires heavy data processing
infrastructure + required software + expert personnel to run.

So, what’s next?
Reinforcement learning: don’t retrain, but update the existing
model in real time using feedback on the ranking produced by the
current model. Think of this as stage 2 of Paradigm Shift
No need to retrain, no need for batch data processing. This
allows for us to easily launch new features, weights are learned
on the fly (or online) & this allows us to adapts to changing user
behavior almost immediately (in real time).

Online learning to rank uses a pre-trained model to generate an initial
ranking. The user interacts with the ranking, giving (implicit) feedback
on its quality. This feedback is used to update the current model, and
the updated model then becomes the active model. And repeat...

Interleaving
From evaluation
to self learning

Search Engine A Search Engine B

Search Engine A Search Engine B
A
A
B
B

Interleaving
for evaluation (recap 1)
Two competing search Engines, A and B
1) Both generate results for the same query
2) Results are then interleaved into one final result list
3) The final result list is shown to the user
4) Clicks on results are mapped to the originating search
engine
5) Winner is the search engine that receives most clicks

Interleaving
for evaluation (recap 2)
Fast and low-risk evaluation method for algorithmic changes,
esp. compared to A/B test. It is... Always ongoing &
...faster because every user evaluates both search engines at the
same time.
...low-risk because every user always sees several results from
the current search engine, which has a known quality.

Interleaving
for online learning
Interleaving is about identifying the winning search engine in a
competition. We can run a competition with every query to get a
continuous learning cycle. (think Ranking Models in one Search Engine)
Search Model B is always a slight adaptation of the current model. In
case B wins the competition, the original model (A) is updated into the
direction of B. The updated model becomes search engine A for the
next query, and competes with a new B.

Online learning to rank
in practice (demo offline)

~30%
Increase in revenue and conversion rate for three of eCommerce
Search customers using online learning to rank on top of Apache Solr.
Blog posts with improvements in revenue:
https://www.904labs.com/en/blog-eci-increases-revenue-substantially-with-ai-for-search.html
and
https://www.904labs.com/en/blog-self-learning-search-improves-revenue-for-e-commerce.html

Is 904Labs Open Source?
904Labs’ online learning to rank system is SaaS. It is implemented
on top of a client’s (or customer’s) own Apache Solr or
Elasticsearch. The data remains at the client side, and if the client
wants to move away from 904Labs, they can do so, without
extensive vendor lockin!
Many other (SaaS) search solutions provide Solr/Elasticsearch as
core part of their solution. Moving away from these solutions
leaves clients without any search infrastructure.

Feature engineering. Which features are readily available?
Delayed feedback. How to update the model when feedback is
delayed until after another update has already happened?
Efficiency vs. effectiveness. How to balance the number of
queries to Solr and the extent of the candidate document set?
Exploration vs exploitation. We want to exploit the current best
model, but need to explore to keep learning. What is the best
way?
Some open issues (as time allows)

Take home message for 904Labs
Search has moved from modeling relevance to learning from
user behavior data. The next Paradigm Shift is to learn these
models in real time, allowing immediate adaptation to changes in
user behavior and removing the necessity of large-scale data
(pre)processing for batch learning.
Many open issues remain, so expect lots of cool research on
that.

904Labs Contacts & Resources
Manos (CEO) - manos@904labs.com
Wouter (COO) - wouter@904labs.com
https://www.904labs.com/
Blog: https://www.904labs.com/en/blog.html
Academic tutorial on interleaving (technical)
http://studylib.net/doc/9453448/slides---yisong-yue

Backup slides & Resources
Academic tutorial on interleaving (technical)
http://studylib.net/doc/9453448/slides---yisong-yue

Some pointers
sofia-ml
https://code.google.com/archive/p/sofia-ml/
Lerot
https://bitbucket.org/ilps/lerot
(and you all know the LtR plugins for Apache Solr and Elasticsearch)

Pointwise: Try to predict the relevance of one document at a
time.
Pairwise: For a pair of documents, predict which is more relevant.
Listwise: Try to optimize the full ranking using existing IR
metrics.
Approaches to learning to rank

Interleaving, Evaluation to Self-learning Search @904Labs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Interleaving, Evaluation to Self-learning Search @904Labs

Similar to Interleaving, Evaluation to Self-learning Search @904Labs (20)

Recently uploaded

Recently uploaded (20)

Interleaving, Evaluation to Self-learning Search @904Labs

Editor's Notes