Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Playlist Recommendations @ Spotify by Nikhil Tibrewal 1915 views
- Music & interaction by Aydincan Ataberk 2823 views
- Scala Data Pipelines @ Spotify by Neville Li 46798 views
- Music survey results (2) by jademarieashworthxx 2192 views
- Mugo one pager by ori segal 545 views
- Jackdaw research music survey report by Jan Dawson 5437 views

4,115 views

Published on

Published in:
Technology

No Downloads

Total views

4,115

On SlideShare

0

From Embeds

0

Number of Embeds

510

Shares

0

Downloads

85

Comments

9

Likes

10

No notes for slide

- 1. October 17, 2015 Data Pipelines for Music Recommendations @ Spotify Vidhya Murali @vid052
- 2. Vidhya Murali Who Am I? 2 •Areas of Interest: Data & Machine Learning •Data Engineer @Spotify •Masters Student from the University of Wisconsin Madison aka Happy Badger for life!
- 3. “Torture the data, and it will confess!” 3 – Ronald Coase, Nobel Prize Laureate
- 4. Spotify’s Big Data 4 •Started in 2006, now available in 58 countries • 70+ million active users, 20+ million paid subscribers • 30+ million songs in our catalog, ~20K added every day • 1.5 billion playlists so far and counting • 1 TB of user data logged every day • Hadoop cluster with 1500 nodes • ~20,000 Hadoop jobs per day
- 5. Music Recommendations at Spotify Features: Discover Discover Weekly Moments Radio Related Artists 5
- 6. 6 30 million tracks… What to recommend?
- 7. Approaches 7 •Manual curation by Experts •Editorial Tagging •Metadata (e.g. Label provided data, NLP over News, Blogs) •Audio Signals •Collaborative Filtering Model
- 8. Approaches 7 •Manual curation by Experts •Editorial Tagging •Metadata (e.g. Label provided data, NLP over News, Blogs) •Audio Signals •Collaborative Filtering Model
- 9. Collaborative Filtering Model 8 •Find patterns from user’s past behavior to generate recommendations •Domain independent •Scalable •Accuracy (Collaborative Model) >= Accuracy (Content Based Model)
- 10. Deﬁnition of CF 9 Hey, I like tracks P, Q, R, S! Well, I like tracks Q, R, S, T! Then you should check out track P! Nice! Btw try track T! Legacy Slide of Erik Bernhardsson
- 11. The YoLo Problem 10
- 12. The YoLo Problem 10 •YoLo Problem: “You Only Listen Once” to judge recommendations •Goal: Predict if users will listen to new music (new to user)
- 13. The YoLo Problem 10 •YoLo Problem: “You Only Listen Once” to judge recommendations •Goal: Predict if users will listen to new music (new to user) •Challenges •Scale of catalog (30M songs + ~20K added every day) •Repeated consumption of music is not very uncommon •Music is niche •Music consumption is heavily influenced by user’s lifestyle
- 14. The YoLo Problem 10 •YoLo Problem: “You Only Listen Once” to judge recommendations •Goal: Predict if users will listen to new music (new to user) •Challenges •Scale of catalog (30M songs + ~20K added every day) •Repeated consumption of music is not very uncommon •Music is niche •Music consumption is heavily influenced by user’s lifestyle •Input: Feedback is implicit through streaming behavior, collection adds, browse history, search history etc
- 15. User Plays to Track Recs 11
- 16. User Plays to Track Recs 11 1. Weighted play counts from logs
- 17. User Plays to Track Recs 11 1. Weighted play counts from logs 2. Train Model using the input signals
- 18. User Plays to Track Recs 11 1. Weighted play counts from logs 2. Train Model using the input signals 3. Generate recs from the trained model
- 19. User Plays to Track Recs 11 1. Weighted play counts from logs 2. Train Model using the input signals 3. Generate recs from the trained model 4. Post process the recommendations
- 20. 12 Step 1: ETL of Logs •Extract and transform the anonymized logs to training data set •Case: Logs -> (user, track, wt.count)
- 21. Step 2: Construct Big Matrix! 13 Tracks(n) Users(m) Vidhya Burn by Ellie Goulding
- 22. Step 2: Construct Big Matrix! 13 Tracks(n) Users(m) Vidhya Burn by Ellie Goulding Order of 70M x 30M!
- 23. Latent Factor Models 14 Vidhya Burn .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(tracks): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. m m n m n
- 24. Latent Factor Models 14 Vidhya Burn .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(tracks): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. m m n m n User Track Matrix: (m x n)
- 25. Latent Factor Models 14 Vidhya Burn .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(tracks): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. m m n m n User Vector Matrix: X: (m x f) User Track Matrix: (m x n)
- 26. Latent Factor Models 14 Vidhya Burn .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(tracks): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. m m n m n User Vector Matrix: X: (m x f) Track Vector Matrix: Y: (n x f) User Track Matrix: (m x n)
- 27. Latent Factor Models 14 Vidhya Burn .. . . . . .. . . . . .. . . . . .. . . . . .. . . . . •Use a “small” representation for each user and items(tracks): f-dimensional vectors .. . .. . .. . .. . . . ... ... ... ... .. (here, f = 2) m m n m n User Vector Matrix: X: (m x f) Track Vector Matrix: Y: (n x f) User Track Matrix: (m x n)
- 28. Matrix Factorization using Implicit Feedback 15
- 29. Matrix Factorization using Implicit Feedback User Track Play Count Matrix 15
- 30. Matrix Factorization using Implicit Feedback User Track Play Count Matrix User Track Preference Matrix Binary Label: 1 => played 0 => not played 15
- 31. Matrix Factorization using Implicit Feedback User Track Play Count Matrix User Track Preference Matrix Binary Label: 1 => played 0 => not played Weights Matrix Weights based on play count and smoothing 15
- 32. Equation(s) Alert! 16
- 33. Implicit Matrix Factorization 17 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the track latent factor vectors in Y. X YUsers Tracks • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vectoryi
- 34. Alternating Least Squares 18 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers Tracks • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vector Fix tracks •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the track latent factor vectors in Y. yi
- 35. 19 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vector Fix tracks Solve for users •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the track latent factor vectors in Y. Alternating Least Squares yi Tracks
- 36. 20 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vector Fix users •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the track latent factor vectors in Y. Alternating Least Squares yi Tracks
- 37. 21 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vector Fix users Solve for tracks •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the track latent factor vectors in Y. Alternating Least Squares yi Tracks
- 38. 22 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vector Fix users Solve for tracks Repeat until convergence… •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the track latent factor vectors in Y. Alternating Least Squares yi Tracks
- 39. 23 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 X YUsers • = bias for user • = bias for item • = regularization parameter • = 1 if user streamed track else 0 • • = user latent factor vector • = item latent factor vector Fix users Solve for tracks Repeat until convergence… •Aggregate all (user, track) streams into a large matrix •Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight •Why?: Once learned, the top recommendations for a user are the top inner products between their latent factor vector in X and the track latent factor vectors in Y. Alternating Least Squares yi Tracks
- 40. Vectors •“Compact” representation for users and items(tracks) in the same space
- 41. Why Vectors? 25 •Vectors encode higher order dependencies •Users and Items in the same vector space! •Use vector similarity to compute: •Item-Item similarities •User-Item recommendations •Linear complexity: order of number of latent factors •Easy to scale up
- 42. 26 •Compute track similarities and track recommendations for users as a similarity measure Step 3: Compute Recs!
- 43. •Euclidian Distance •Cosine Similarity •Pearson Correlation 26 •Compute track similarities and track recommendations for users as a similarity measure Step 3: Compute Recs!
- 44. Recommendations via Cosine Similarity 27
- 45. Recommendations via Cosine Similarity 27
- 46. 28 Annoy •70 million users, at least 4 million tracks for candidates per user •Brute Force Approach: •O(70M x 4M x 10) ~= 0(3 peta-operations)! • Approximate Nearest Neighbor Oh Yeah! • Uses Local Sensitive Hashing • Clone: https://github.com/spotify/annoy
- 47. 29 •Apply Filters •Interacted music •Holiday music anyone? •Factor for: •Diversity •Freshness •Popularity •Demographics •Seasonality Step 4: Post Processing
- 48. 30 70 Million users x 30 Million tracks. How to scale?
- 49. Matrix Factorization with MapReduce 31 Reduce stepMap step u % K = 0 i % L = 0 u % K = 0 i % L = 1 ... u % K = 0 i % L = L-1 u % K = 1 i % L = 0 u % K = 1 i % L = 1 ... ... ... ... ... ... u % K = K-1 i % L = 0 ... ... u % K = K-1 i % L = L-1 item vectors item%L=0 item vectors item%L=1 item vectors i % L = L-1 user vectors u % K = 0 user vectors u % K = 1 user vectors u % K = K-1 all log entries u % K = 1 i % L = 1 u % K = 0 u % K = 1 u % K = K-1 •Split the matrix up into K x L blocks. •Each mapper gets a different block, sums up intermediate terms, then key by user (or item) to reduce final user (or item) vector.
- 50. Matrix Factorization with MapReduce 32 One map task Distributed cache: All user vectors where u % K = x Distributed cache: All item vectors where i % L = y Mapper Emit contributions Map input: tuples (u, i, count) where u % K = x and i % L = y Reducer New vector! •Input to Mapper is a list of (user, item, count) tuples – user modulo K is the same for all users in block – item modulo L is the same for all items in the block – Mapper aggregates intermediate contributions for each user (or item) – Eg: K=4, Mapper #1 gets user 1, 5, 9, 13 etc – Reducer keys by user (or item), aggregates intermediate mapper sums and solves closed form for final user (or item) vector
- 51. Music Recommendations Data Flow 33
- 52. 34
- 53. Source: Revisiting YOLO! 35 “You Only Listen Once to judge recommendations” problem
- 54. Optimizing for the Yolo Problem •OFFLINE TESTING: •Experts’ Inputs •Measure accuracy •A/B TESTS: control vs a/b group. Some useful metrics we consider: •DAU / WAU / MAU •Retention •Session Length •Skip Rate 36
- 55. Challenge Accepted! •Cold start problem for both users and new music/upcoming artists: •Content based signals, real time recommendation •Measuring recommendation quality: •A/B test metrics •Active forums for getting user feedback •Scam Attacks: •Rule based model to detect scammers •Humans choices are not always predictable: •Faith in humanity 37
- 56. What Next? •Personalize user experience on Spotify for every moment: •Right Now •Recommend other media formats: •Podcasts •Video •Power music recommendations on other platforms: •Google Now 38
- 57. Join the Band! We are hiring! 39
- 58. Thank You! You can reach me @ Email: vidhya@spotify.com Twitter: @vid052

No public clipboards found for this slide

Login to see the comments