Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Interleaving, Evaluation to Self-learning Search @904Labs

Presented at Open Source Connections Haystack Relevance Conference on 904Labs' "Interleaving: from Evaluation to Self-Learning". 904Labs is the first to commercialize "Online Learning to Rank" as a state-of-art for technical Self-learning Search Ranking that automatically takes into account your customers human behaviors for personalized search results.

  • Login to see the comments

Interleaving, Evaluation to Self-learning Search @904Labs

  1. 1. Interleaving. From evaluation to self learning John T. Kane – representing 904Labs in USA, + Solution Architect / Product Manager @ Voyager Search
  2. 2. About myself and 904Labs I’ve been in the search field for 15+ years starting with SQL Server Full-text Search (FTS) in 1998 with roles in Tech. Support, Sales Engineering (FAST) & Product Manager roles at HP, Lucidworks (Fusion 1.0) and recently at HPE. While I currently work for Voyager Search, I’m at Haystack representing 904Labs. 904Labs is a Dutch search company founded by Manos Tsagkias and Wouter Weerkamp, two former academic researchers in the field of Information Retrieval. The company offers Online Learning to Rank as-a-Service (OLtRaaS)
  3. 3. For decades people tried to come up with clever ways to model “relevance”. In the early 70s, TF-IDF was introduced, relying on counting word overlap between queries and documents. (main use case: early digital library / card catalog) In early 80’s, researchers came up with BM25 (used in early SharePoint Search 2001), a parameterized version of TF-IDF. It wasn’t until 2015 that Lucene/Solr changed it’s default ranking function to BM25. So, today’s standard search relevance uses 40 year old ranking functions. How to determine relevance?
  4. 4. Paradigm shift
  5. 5. Enter machine learning Since a couple of years people have started to realize that search, or modeling relevance, has become too complex to fit in BM25. A paradigm shift is taking place, moving into the direction of learning the ranking function from training data. This paradigm shift is translated into learning to rank plugins for Apache Solr and Elasticsearch, and is also apparent from the many talks at Haystack about learning to rank.
  6. 6. Learning to rank is a batch process. Training data is collected, features are extracted, and a model is trained using an objective function. Every couple of hours/days/weeks, this process is repeated and a new model is trained. This requires heavy data processing infrastructure + required software + expert personnel to run.
  7. 7. So, what’s next? Reinforcement learning: don’t retrain, but update the existing model in real time using feedback on the ranking produced by the current model. Think of this as stage 2 of Paradigm Shift No need to retrain, no need for batch data processing. This allows for us to easily launch new features, weights are learned on the fly (or online) & this allows us to adapts to changing user behavior almost immediately (in real time).
  8. 8. Online learning to rank uses a pre-trained model to generate an initial ranking. The user interacts with the ranking, giving (implicit) feedback on its quality. This feedback is used to update the current model, and the updated model then becomes the active model. And repeat...
  9. 9. Interleaving From evaluation to self learning
  10. 10. Search Engine A Search Engine B
  11. 11. Search Engine A Search Engine B A A B B
  12. 12. Search Engine A Search Engine B A A B B
  13. 13. Interleaving for evaluation (recap 1) Two competing search Engines, A and B 1) Both generate results for the same query 2) Results are then interleaved into one final result list 3) The final result list is shown to the user 4) Clicks on results are mapped to the originating search engine 5) Winner is the search engine that receives most clicks
  14. 14. Interleaving for evaluation (recap 2) Fast and low-risk evaluation method for algorithmic changes, esp. compared to A/B test. It is... Always ongoing & ...faster because every user evaluates both search engines at the same time. ...low-risk because every user always sees several results from the current search engine, which has a known quality.
  15. 15. Interleaving for online learning Interleaving is about identifying the winning search engine in a competition. We can run a competition with every query to get a continuous learning cycle. (think Ranking Models in one Search Engine) Search Model B is always a slight adaptation of the current model. In case B wins the competition, the original model (A) is updated into the direction of B. The updated model becomes search engine A for the next query, and competes with a new B.
  16. 16. Online learning to rank in practice (demo offline)
  17. 17. Example query: case
  18. 18. Example query: kitchen
  19. 19. ~30% Increase in revenue and conversion rate for three of eCommerce Search customers using online learning to rank on top of Apache Solr. Blog posts with improvements in revenue: https://www.904labs.com/en/blog-eci-increases-revenue-substantially-with-ai-for-search.html and https://www.904labs.com/en/blog-self-learning-search-improves-revenue-for-e-commerce.html
  20. 20. Is 904Labs Open Source? 904Labs’ online learning to rank system is SaaS. It is implemented on top of a client’s (or customer’s) own Apache Solr or Elasticsearch. The data remains at the client side, and if the client wants to move away from 904Labs, they can do so, without extensive vendor lockin! Many other (SaaS) search solutions provide Solr/Elasticsearch as core part of their solution. Moving away from these solutions leaves clients without any search infrastructure.
  21. 21. Feature engineering. Which features are readily available? Delayed feedback. How to update the model when feedback is delayed until after another update has already happened? Efficiency vs. effectiveness. How to balance the number of queries to Solr and the extent of the candidate document set? Exploration vs exploitation. We want to exploit the current best model, but need to explore to keep learning. What is the best way? Some open issues (as time allows)
  22. 22. Take home message for 904Labs Search has moved from modeling relevance to learning from user behavior data. The next Paradigm Shift is to learn these models in real time, allowing immediate adaptation to changes in user behavior and removing the necessity of large-scale data (pre)processing for batch learning. Many open issues remain, so expect lots of cool research on that.
  23. 23. 904Labs Contacts & Resources Manos (CEO) - manos@904labs.com Wouter (COO) - wouter@904labs.com https://www.904labs.com/ Blog: https://www.904labs.com/en/blog.html Academic tutorial on interleaving (technical) http://studylib.net/doc/9453448/slides---yisong-yue
  24. 24. Backup slides & Resources Academic tutorial on interleaving (technical) http://studylib.net/doc/9453448/slides---yisong-yue
  25. 25. Some pointers sofia-ml https://code.google.com/archive/p/sofia-ml/ Lerot https://bitbucket.org/ilps/lerot (and you all know the LtR plugins for Apache Solr and Elasticsearch)
  26. 26. Pointwise: Try to predict the relevance of one document at a time. Pairwise: For a pair of documents, predict which is more relevant. Listwise: Try to optimize the full ranking using existing IR metrics. Approaches to learning to rank

×