Alexander Konduforov: Building modern recommender systems
AI & BigData Online Day 2021
Website - http://aiconf.com.ua
Youtube - https://www.youtube.com/startuplviv
FB - https://www.facebook.com/aiconf
6. • Choice overload – infinite "shelves"
• Surprising suggestions
• Higher conversion, more purchases
• Preserving customer attention
• Competitive advantage across industry
Why do we need them?
9. Uses similarity between items to recommend items similar to what the user likes.
2 approaches:
1) Build vector for item
2) Find similar item vectors
(cosine, dot product, Euclidian, etc.)
1) Build vectors for user items
2) Derive a user vector from them
3) Find similar item vectors
Content-based filtering
Image source
15. User-user:
• Users have not many interactions
• KNN is sensitive to single interactions (high
variance)
• More personalized results (low bias)
Neighborhood-based CF
Item-item:
• Items have many users interacted with them
• KNN is less sensitive to single interactions (lower
variance)
• Less personalized results (higher bias)
• Works better for new users (not enough history)
• More likely to converge
General issues:
• KNN is time consuming, doesn't scale well
• Higher effect of "rich-get-richer" for popular items
16. Also referred as Latent factor methods
Approaches:
1. SVD
2. Matrix factorization
3. Neural Networks
Model-based CF
18. Training:
• Initialize P and Q matrices with small random numbers
• Teach P and Q
• Alternating Least Squares
• Stochastic Gradient Descent
Predictions:
MF algorithm
19. MF example
Latent features are calculated via MF:
User-item score is the dot product:
Item-item similarity is the cosine similarity:
20. • MF with Biases: handling bias of some users giving higher ratings than others
• MF with Side Features: adding data to handle the cold start problem (i.e. user occupation)
• MF with Temporal Features: handling temporal changes of the data (i.e. occupation change)
• Factorization Machine: extra item features + higher order interactions
• MF with Fixture of tastes: give users several tastes
• Variational MF
MF improvements
Source article
21. Why?
• Collaborative filtering uses robust approach based on similarities between customer tastes
and can make cross-genre(category) recommendations
• Pure CF lacks information about items and suffer from sparsity in data
• Content-based filtering may find deep similarities between items and suggest novel and
surprising new items
• Only CBF can bring value from images, sounds, texts
• All modern implementations are hybrid
Hybrid
More info: https://www.researchgate.net/publication/263377228_Hybrid_Recommender_Systems_Survey_and_Experiments
24. • Modeling the non-linear interactions in the data
• Feature extraction directly from the content:
• Image, audio, text
• Ability to use heterogeneous data (interactions, content) in the same model
• Powerful for sequential modeling tasks (next item prediction, session-based recommendations)
• Better representation learning of users and items for CF
Deep Learning for RecSys
25. Bringing NNs to CF
Neural Collaborative Filtering Deep Factorization Machine
Several more NN-based improvements can be found here: https://towardsdatascience.com/recsys-series-part-5-neural-
matrix-factorization-for-collaborative-filtering-a0aebfe15883
26. • Prior to Deep Learning approach used Matrix Factorization
• 2 Neural Networks:
• Candidates generation: broad personalization via CF
(retrieves only couple of hundreds of videos)
• Ranking: assigns a score to each video using a rich
set of features from item and user
• Uses implicit signals (full watching is positive), not explicit
(thumbs up/down)
• Use video "age" as a feature to recommend more new
content
• Rely mostly on A/B testing results
https://research.google/pubs/pub45530/
Youtube recommendations
27. • Uses implicit customer feedback
• Back in 2014 used Weighted MF
• Used MFCC and CNNs to extract features from music –
helps with cold start and finding unpopular tracks
• Used NLP to extract features from song texts and other
textual information
• Basically uses a hybrid recommender system
https://benanne.github.io/2014/08/05/spotify-cnns.html
https://www.oreilly.com/radar/personalization-of-spotify-home-
and-tensorflow/
Spotify recommendations
28. • Autoencoder-based recommendations:
• to learn the lower-dimensional feature representations at the bottleneck layer
• to fill in the blanks of the user-item interaction matrix directly in the reconstruction layer
• CNN-based recommendations:
• to extract features from images
• to extract features from audio and video
• to extract features from texts
• RNN-based recommendations
• to extract sequential patterns in session-based tasks
• and many others: https://towardsdatascience.com/recommendation-system-series-part-2-the-10-
categories-of-deep-recommendation-systems-that-189d60287b58
Other DL approaches
29. Metrics:
• When predicting rating: MSE, RMSE, etc.
• When predicting binary output: Accuracy, Precision, Recall, F1
• When predicting top N: MAP@N, MAR@N
• Coverage: % of items which are recommended (average CF predicts ~8-10%)
Real-world testing:
• CTR or CR (product, ads, etc.)
• Avg Time spent or Customer retention (content)
• A/B testing
• Serendipity: pleasant surprise, unintended discovery
Evaluation
30. • Novelty
• Diversity
• Serendipity
• Interpretability
• Adaptation
Qualities of a good recommender