Search relevance is how questions are answered through search. It's the process of changing the ranking of search results for a user query to return what users want. A search for 'iPhone XS' should rank documents highly when the product name matches. But a different query, 'smartphone with two cameras' would require a completely different strategy for ranking candidate results. What gives teams a headache is that all the diverse use cases for search must be handled by a single ranking algorithm.
This is where Learning to Rank comes in. Ramzi Alqrainy will discuss how search can be treated as a machine learning problem. 'Learning to Rank' takes the step to returning optimized results to users based on patterns in usage behavior. Ramzi Alqrainy will talk through where Learning to Rank has shined, as well as the limitations of a machine learning based solution to improve search relevance.
2. OpenSooq Technology 2
● Chief Technology Officer – OpenSooq
● MSc. In Artificial intelligence and
Information Retrieval, University of
Jordan
● Contributor in Apache Solr
● Contributor in Slack
● Technical Reviewer for “Scaling
Apache Solr” , “Scaling Big Data with
Hadoop and Solr” and “Apache Solr
Search Patterns” (Books)
Ramzi Alqrainy
3. OpenSooq Technology 3
Enabling MENA to Buy, CHAT and Sell
3
OpenSooq is the #1 mobile-first classifieds marketplace connecting
buyers and sellers in MENA.
4. OpenSooq Technology
But we have some Arabic search challenges
Search for “OBAMA” in Iraq, what do you get ?
4
7. OpenSooq Technology
Arabic today is the fourth spoken language
globally
7
The number of Arabic
speaking Internet users grew
from 135 million in 2017 to
225 million in 2019
8. OpenSooq Technology
Yet the Arabic Language is very complex with
5 core challenges
8
Orthography
Diacritics Arabizi
Morphology
Dialects
9. OpenSooq Technology
1. Arabic Orthography and Print
9
Arabic has a right-to-left connected script that uses
28 basic letters, which change shape depending on
their positions in words
10. OpenSooq Technology
2. Arabic Diacritics
10
The two words
َمﻠَﻋ (Ealam - meaning “flag”) and
ﻠمِﻋ (Eilm - meaning “knowledge”)
share the same letters ﻋﻠم (Elm) but differ in diacritics.
Diacritics help disambiguate the meaning of words.
11. OpenSooq Technology
3. Arabic Morphology
11
Arabic words are divided into three main types:
nouns, verbs, and particles
Arabic nouns, which include adjectives and adverbs,
and verbs are derived from a closed set of around
10,000 roots.
Write Book Writer Written Small Book
ﻛﺗب ﻛﺗﺎب ﻛﺎﺗب ﻣﻛﺗوب ﻛﺗﯾب
12. OpenSooq Technology
4. Arabic Dialects
There are 6 dominant dialects with a lot of variations
and a dozen more less spoken dialects.
The concept corresponding to “I want” is expressed
as ﻋﺎوز (EAwz) in Egyptian, أﺑﻐﻰ (Abgy) in Gulf, أﺑﻲ (Aby)
in Iraqi, and ﺑدي (bdy) in Levantine.
English Egyption Gulf Iraq Levantine
I want ﻋﺎوز أﺑﻐﻰ أﺑﻲ ﺑدي
13. OpenSooq Technology
5. Arabizi
Arabic is sometimes written using Latin characters in
transliterated form along with Arabic characters!
Arabizi uses numerals to represent Arabic letters.
"2" and “3” represent the letters أ (that sounds like “a”
as in apple) and ع (E) (that is a guttural “aa”)
respectively.
15. OpenSooq Technology
Arabic Light Stemmer
The light stemmer, light10, outperformed the other approaches. It is
becoming widely used in Arabic information retrieval.
Source: http://www.mtholyoke.edu/~lballest/Pubs/arab_stem05.pdf
15
17. OpenSooq Technology
The Problem
• Improving search relevance is hard,
• TF-IDF and BM25 are good for text-keyword
but what about other models of relevance?
• Text matching is sometimes not the best solution
• Users don’t always say what they mean
17
18. OpenSooq Technology
The Solution : Learning to Rank Overview
• Learning to rank lets you pick “features” of a
document that “matter” and teach the machine
how to rank a set of items.
• One possible source of ordering is user
behavior (i.e. the only clicks were on the
speaker shaped like a rock)
• Solr provides a Learning to Rank implementation.
18
The Solution
20. OpenSooq Technology
The Solution : Learning to Rank Overview
• Define features (relevancy factors)
• Derive Clicks of Users
• Use Solr’s Learning to Rank implementation
20
The Solution
30. OpenSooq Technology
...but is it better?
• Models compared:
• Solr Out-of-the-box BM25
ranking using textual
features only
• Logistic Regression using
all features except the
signals feature
• Logistic Regression using
all features
30
The Solution
31. OpenSooq Technology
And we thought it was limited to “Obama” :-)
31
“Ghost” - Kuwait Mercedes S-Class W140
(1991-1998)
32. OpenSooq Technology
And we thought it was limited to “Obama” :-)
32
“Whale” - Kuwait Mercedes S-Class W220
(1998 - 2005)
33. OpenSooq Technology
And we thought it was limited to “Obama” :-)
33
“Submarine” - Jordan Mercedes S-Class W220
(1998-2005)