Online Learning to Rank for Recommender Systems
by Daan Odijk (Blendle)
https://recsys.acm.org/recsys17/industry-session-3/
Every morning, at Blendle, we have a huge cold-start problem when over 8.000 new articles from the latest editions of newspapers arrive in our system. At that moment, these articles are read by virtually no-one and we are tasked with sending out personalised newsletters to over 1 million users. We can thus not rely on collaborative filtering type of recommendations, nor can we use the popularity of the articles as clues for what our user might want to read. We overcome our cold-start problem by a mix of curation by our editorial team and an automated analysis of the content of these articles. We extract named entities, semantic links, authors, the language and plenty of stylometrics. For each of our users, we build a very fine grained profile based on the attributes of the articles that they read. The combination of enriched articles and user profiles is fed into our machine learning pipeline. We are currently experimenting with an online learning to rank setup, where each of our users is exposed to a slightly perturbed version of our ranking model. We observe the interactions of our users to infer in which direction we should be updating the model.
Our editorial team gets up at around 5am every morning to read what was published over night. They are done reading and recommending their selection of articles around 8am, which is also the time we would ideally send out the newsletter so that our users, on their commute to work, can read our newsletter. These timing restrictions pose yet another challenge: our content analysis and machine learning pipeline needs to be really fast. We solve this by using a streaming infrastructure build on Kafka. In this infrastructure, an article is analysed and scored for relevance towards each of our users as soons as it arrives. This has the advantage that at 8am, when our editorial team is done reading, personalisation is much more lightweight. We use the precomputed relevance scores and balance them with diversity to arrive at a unique ranking for each of our users. In this talk, I will detail how we enrich articles in a streaming fashion and how we use online learning methods to learn a ranking model. I will also talk about how we deal with the time constraints of the problem we are trying to solve.
ABOUT THE SPEAKER
Daan Odijk is lead data scientist at Blendle, a New York Times backed startup that builds a platform where users can explore and support the world’s best journalism and only pay for what they read. Daan heads a team of eight data scientists and engineers who work on personalised recommendations. Daan has a PhD in information retrieval and has worked on leveraging context when searching for news.
2. @dodijk
Mission
Help you discover and support
the world’s best journalism
International
May 2014: The Netherlands
Sept 2015: Germany
March 2016: United States
Publisher-backed
a.o. NY Times, Nikkei, Axel Springer
70 employees
10 journalists & 50 developers
Blendle
5. Scale at Blendle
@dodijk
Articles
> 6M in total
> 7K new every day
> 30% is read
Users
> 1M users
~ 1 in 5 converts to a
paying user
Events
~ 2B in total
> 2M new every day
! "
6. @dodijk
Our editors select the best
articles for our email
newsletter every day
Our personalisation
algorithms create a
personal bundle from this
12. @dodijk
Random Forest classifier trained
on a year of editorial picks
Clustered based on Cosine
similarity with TF.IDF vectors
Prioritised
Selection
14. Daily cold start
• >7K new articles every night
• Our newsletter is an
important traffic driver
• No usage info to rank the
newsletter before we send
the newsletter
17. Learning to rank: preference learning
Model
Enrich EnrichProfile Profile
Extract ML Features
Learning to predict
18. Learning to rank: preference learning
Model
Enrich EnrichProfile Profile
Extract ML Features
Learning to predict
Enrich Profile
Extract ML Features Rank
Ranking
19. Online Learning to Rank
• Learning with a user in the loop
• Daily updates to our model
20. query
[Yue et al, 2009; Hofmann et al., 2011]
Dueling Bandit Gradient Descent
wAuthor
wTopic
wAuthor
wTopic
Explorative RankerExploitative Ranker
For Blendle the useris the query
21. Interleaved Ranking Explorative RankingExploitative Ranking
A
B
C
D
E
F
C
G
D
A
B
E
query
TeamDraft Interleave
Radlinski, F., Kurup, M., & Joachims, T. (2008). How does clickthrough data reflect retrieval quality? In CIKM ’08.
22. Interleaved Ranking Explorative RankingExploitative Ranking
A
B
D
E
F
C
G
D
A
B
E
query
TeamDraft Interleave
Radlinski, F., Kurup, M., & Joachims, T. (2008). How does clickthrough data reflect retrieval quality? In CIKM ’08.
23. Interleaved Ranking Explorative RankingExploitative Ranking
AB
D
E
F
C
G
D
B
E
query
TeamDraft Interleave
Radlinski, F., Kurup, M., & Joachims, T. (2008). How does clickthrough data reflect retrieval quality? In CIKM ’08.
39. Timing problem
• Our editors wake up at 5am
and are done reading at
7am
• Which is also when we want
to send our newsletter
• We simply can’t wait for a
batch process