Matrix Factorization Technique for Recommender Systems

Introduction
Matrix Factorization Methods
Netﬂix Prize Competition
Conclusion
MATRIX FACTORIZATION TECHNIQUE FOR
RECOMMENDER SYSTEMS
Oluwashina Aladejubelo
Universite Joseph Fourier,
Grenoble, France
June 6, 2015
Oluwashina Aladejubelo Matrix Factorization Techniques for Recommender Systems

Introduction
Conclusion
About Me
Bachelor of Science, Ambrose Alli University, Nigeria
(2004-2008)
IT Business Analyst, Virgin Nigeria Airlines (2009-2011)
Team Lead/Software Architect, Speckless Innovations
Limited (2011-2014)
Master of Informatics (M2 MOSIG), Universit Joseph
Fourier, Grenoble (2014-2015)
Master Thesis on ”Distributed Large-Scale Learning” with
Pr. Massih-Reza Amini.

Introduction
Conclusion
Overview
1 Introduction
2 Matrix Factorization Methods
3 Netﬂix Prize Competition
4 Conclusion

Introduction
Conclusion
1 Introduction
Recommender Systems
Content Filtering Approach
Collaborative Filtering Approach
Content vs Collaborative Filtering
2 Matrix Factorization Methods
Matrix Factorization Model (MFM)
Stochastic Gradient Descent
Alternating Least Squares
Adding Biases
Additional Input Source
Temporal Dynamics
Varying conﬁdence levels
3 Netﬂix Prize Competition
4 Conclusion

Introduction
Conclusion
Recommender Systems
Recommender systems analyze patterns of user interest in
products to provide personalized recommendations
They seek to predict the rating or preference that user would
give to an item

Introduction
Conclusion
Recommender Systems
Such systems are very useful for entertainment products such
as movies, music, and TV shows.
Many customers will view the same movie and each customer
is likely to view numerous diﬀerent movies.
Huge volume of data arise from customer feedbacks which can
be analyzed to provide recommendations

Introduction
Conclusion
Content Filtering Approach
creating proﬁle for each user or product to characterize its
nature.
programs associate users with matching products.
it requires gathering external information that may not be
available

Introduction
Conclusion
depends on past user behaviour, e.g. previous transactions or
product rating
does not rely on creation of explicit proﬁles

Introduction
Conclusion
the primary areas of collaborative ﬁltering are neighborhood
methods and latent factor models
neighborhood is based on computing the relationships
between items or users
latent factor models tries to explain by characterizing both
items and users on say, 20 to 100 factors inferred from the
ratings patterns

Introduction
Conclusion
Content vs Collaborative Filtering
Collaborative filtering address data aspects that are difficult to
profile.
it is generally more accurate
suffers from cold startup problem (new product / new user) in
which case content filtering is better

Introduction
Conclusion
some of the most successful realizations of latent factor
models are based on matrix factorization
it characterizes both items and users by vectors of factors
inferred from item rating patterns
high correspondence between item and user factors leads to a
recommendation

Introduction
Conclusion
MFM maps both users & items to a joint latent factor space
of dimensionality f
the user-item interactions are modeled as inner products in
space f
each item i is associated with a vector qi ∈ Rf
each user u is associated with a vector pu ∈ Rf

Introduction
Conclusion
the approximate user rating is given by
ˆrui = qT
i Pu (1)
carelessly addressing only the relatively few known entries is
highly prone to overﬁtting
observed ratings can be modeled directly with regularization
as follows
minq∗,p∗
(u,i)∈κ
(rui − qT
i pu)2
+ λ(||qi ||2
+ ||pu||2
) (2)
κ is a set of (u, i) pairs for which rui is known

Introduction
Conclusion
Stochastic Gradient Descent (SGD) - Simon Funk; 2006
SGD approach can be used for solving the equation (2)
For each given training case, the system predicts rui and
computes the prediction error
eui = rui − qT
i pu
it modiﬁes the parameters by a magnitude proportional to γ
in the opposite direction of the gradient, yielding∈ Rf
qi ← qi + γ.(eui .pu − γ.qi )
pu ← pu + γ.(eui .qi − γ.pu)
combines ease with a relatively fast runtime

Introduction
Conclusion
Alternating least squares
Because both qi and pu are unknown, equation (2) is not
convex
if we ﬁx one of the unknowns the quadratic optimization can
be solved optimally
when all pu are ﬁxed the system recomputes the qi by solving
a least-squares problem and vice versa
each step decreases the minimization problem until
convergence
massively parallelizable

Introduction
Conclusion
Adding Biases
rating values are also aﬀected by biases independent of any
interaction
a ﬁrst-order approximation of the bias involved in rating rui is
bui = µ + bi + bu (3)
µ denotes the average rating, bu and bi are the observed
deviations of user u on item i
therefore,
ˆr = µ + bi + bu + qT
i pu (4)
equation(2) also becomes,
minq∗,p∗,b∗
(u,i)∈κ
(rui −µ−bu−bi −qT
i pu)2
+λ(||qi ||2
+||pu||2
+b2
u+b2
i ) (5)

Introduction
Conclusion
Additional Input Sources
cold start problem could be as a result of user supplying very
few ratings-diﬃculty to conclude on their taste
behavioural information such as purchase and browsing history
can be used for implicit feedback
let’s say N(u) denotes the set of itels for which user u
expressed an implicit preference
a new set of item factors is given by xi ∈ Rf

Introduction
Conclusion
a user who showed a preference for items in N(u) is
characterized by the vector
i∈N(u)
xi
normalizing the sum we have,
|N(u)|−0.5
i∈N(u)
xi
another information source is known as user attribute, e.g.
demographics, gender, age, income level and so on
let A(u) denote set of attributes of a user u

Introduction
Conclusion
a distinct factor vector ya ∈ Rf corresponds to each attribute
to describe a user through the set of user-associated
attributes:
a∈A(u) ya
the matrix factorization model should intergrate all signal
sources, with ehanced representation:
ˆrui = µ + bi + bu + qT
i [pu + |N(u)−0.5
i∈N(u)
xi +
a∈A(u)
ya] (6)
items can get a similar treatment

Introduction
Conclusion
Temporal Dynamics
in reality customers’ inclinations evolve, leading them to
redefine their taste
it is therefore important to accommodate this temporal effects
reflecting the dynamic, time-drifting nature of user-item
interactions
the following terms vary over time: item biases, bi (t); user
biases, bu(t); and user preferences, pu(t)
equation (4) therefore becomes,
ˆr(t) = µ + bi (t) + bu(t) + qT
i pu(t) (7)

Introduction
Conclusion
Varying Confidence Level
other factors like massive advertisement can influence
observed ratings, which do not reflect long-term characteristics
hence the need for a weighting scheme or confidence
confidence can stem from available numerical values that
describe the frequency of actions, e.g. how much time the
user watched a show
in matrix factorization less weight is given to less meaningful
action

Introduction
Conclusion
Varying Confidence Level
if confidence in observing rui is denoted as cui, then the model
enhances equation (5) to account for confidence as follows
minq∗,p∗,b∗
(u,i)∈κ
cui (rui −µ−bu−bi −qT
i pu)2
+λ(||qi ||2
+||pu||2
+b2
u+b2
i ) (8

Introduction
Conclusion
in 2006, Netﬂix announced a contest to improve the state of
its recommender system
training data comprised of 100 million ratings sapnning
500,000 annonymous customers’ rating of 17,000 movies
each movie was rated on a scale of 1 to 5 stars
test data was 3million ratings
the metrics was 10 percent or more root-mean-square error
(RMSE) performance better than Netﬂix algorithm

Introduction
Conclusion

Introduction
Conclusion
Conclusion
matrix factorization techniques have become a dominant
methodology within collaborative filtering recommenders
experience with the Netflix competion has shown that they
deliver accuracy superior to classical nearest-neighbor
techniques
they integrate many crucial aspects of the data, such as
multiple forms of feedback, temporal dynamics and confidence
levels.

Introduction
Conclusion
Reference
Y. Koren, R. Bell and C. Volinsky: Matrix Factorization Techniques
for Recommender Systems, AT&T Labs-Research, 2009

Introduction
Conclusion
THANK YOU!

Matrix Factorization Technique for Recommender Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Matrix Factorization Technique for Recommender Systems

Similar to Matrix Factorization Technique for Recommender Systems (20)

Recently uploaded

Recently uploaded (20)

Matrix Factorization Technique for Recommender Systems