DataEngConf: Building a Music Recommender System from Scratch with Spotify Data Team

November 14, 2015
Building
a
Music Recommender
from
Scratch
Vidhya Murali
@vid052

Vidhya Murali
Who Am I?
2
•Areas of Interest: Data & Machine Learning
•Data Science Engineer @Spotify
•Masters Student from the University of Wisconsin Madison
aka Happy Badger for life!

“Torture the data, and it will
confess!”
3
– Ronald Coase, Nobel Prize Laureate

Music Recommendations at Spotify
Features:
Discover
Discover Weekly
Moments
Radio
Related Artists
4

5
30 million tracks…
What to recommend?

6
•Manual Curation by Experts
•Editorial Tagging
•Metadata (e.g. Label provided data, NLP over News,
Blogs)
•Audio Signals
•Collaborative Filtering Model
Approaches

Deﬁnition of CF
7
Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!
Then you should check out
track P!
Nice! Btw try track T!
Legacy Slide of Erik Bernhardsson

Collaborative Filtering Model 8
•Find patterns from user’s past behavior to generate
recommendations
•Domain independent
•Scalable
•Accuracy (Collaborative Model) >= Accuracy (Content
Based Model)

Construct Big Matrix!
9
Artists(n)
Users(m)
Vidhya
Ellie Goulding

Construct Big Matrix!
9
Artists(n)
Users(m)
Vidhya
Ellie Goulding
Order of Millions!

Latent Factor Models 10
Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(artists): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n

Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
User Artist Matrix:
(m x n)

Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
User Vector Matrix:
X: (m x f)
User Artist Matrix:
(m x n)

Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
m m
n
m n
User Vector Matrix:
X: (m x f)
Artist Vector Matrix:
Y: (n x f)
User Artist Matrix:
(m x n)

Vidhya
Ellie
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
(here, f = 2)
m m
n
m n
User Vector Matrix:
X: (m x f)
Artist Vector Matrix:
Y: (n x f)
User Artist Matrix:
(m x n)

Why Vectors? 11
•Vectors encode higher order dependencies
•Users and Items in the same vector space!
•Use vector similarity to compute:
•Item-Item similarities
•User-Item recommendations
•Linear complexity: order of number of latent factors
•Easy to scale up

Explicit Matrix Factorization 12
•User explicitly rates a subset of the music catalog
•Goal: Predict how users will rate new music
•How: Approximate ratings matrix by the inner product of 2 smaller matrices
by minimizing the RMSE (root mean squared error)
X YUsers
Artists
• = bias for user
• = bias for item
• = regularization parameter
• = user rating for item
• = user latent factor vector
• = item latent factor vector

Matrix Factorization using Implicit Feedback 13

Matrix Factorization using Implicit Feedback
User Artist Play
Count Matrix
13

User Artist Play
Count Matrix
User Artist
Preference
Matrix
Binary Label:
1 => played
0 => not played
13

User Artist Play
Count Matrix
User Artist
Preference
Matrix
Binary Label:
1 => played
0 => not played
Weights
Matrix
Weights based on play count
and smoothing
13

Implicit Matrix Factorization 15
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
•Aggregate all (user, artist) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the artist latent factor vectors in Y.
X YUsers
Artists
• = bias for user
• = bias for item
• = 1 if user streamed artist else 0
•

Alternating Least Squares 16
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
Artists
• = bias for user
• = bias for item
•
Fix artists

17
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
•
Fix artists
Solve for users
Alternating Least Squares
Artists

18
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
•
Fix users
Artists

19
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
•
Fix users
Solve for artists
Artists

20
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
•
Fix users
Solve for artists
Repeat until convergence…
Artists

21
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
• = 1 if user streamed track else 0
•
Fix users
Solve for artists
Repeat until convergence…
Artists

Vectors
•“Compact” representation for users and items(artists) in the same space

23
Recommendations via Cosine Similarity

24
Annoy
•70 million users, at least 4 million tracks for candidates per user
•Brute Force Approach:
•O(70M x 4M x 10) ~= 0(3 peta-operations)!
• Approximate Nearest Neighbor Oh Yeah!
• Uses Local Sensitive Hashing
• Clone: https://github.com/spotify/annoy

Thank You!
You can reach me @
Email: vidhya@spotify.com
Twitter: @vid052

DataEngConf: Building a Music Recommender System from Scratch with Spotify Data Team

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DataEngConf: Building a Music Recommender System from Scratch with Spotify Data Team

Similar to DataEngConf: Building a Music Recommender System from Scratch with Spotify Data Team (20)

More from Hakka Labs

More from Hakka Labs (20)

Recently uploaded

Recently uploaded (20)

DataEngConf: Building a Music Recommender System from Scratch with Spotify Data Team