This document discusses matrix factorization techniques for recommendation systems. It explains that user-item interaction data can be represented as a matrix and decomposed into two lower-rank matrices that capture latent features. One matrix represents users and the other represents items. The document outlines an alternating least squares algorithm to compute the decomposed matrices and discusses how the technique can be implemented in Apache Mahout and Myrrix for scalable recommendations.
3. Matrix = Associations
Things are associated
Rose Navy Olive
Like people to colors
Alice 0 +4 0 Associations have strengths
Like preferences and dislikes
Bob 0 0 +2
Can quantify associations
Alice loves navy = +4,
Carol -1 0 -2 Carol dislikes olive = -2
Dave +3 0 0 We don’t know all
associations
Many implicit zeroes
4. From One Matrix, Two
Like numbers, matrices can n
be factored
m•n matrix = m•k times k•n
Associations can
m P
=
decompose into others
k n
Alice likes navy =
•
Alice loves blues, and
k Y’
blues includes navy m X
5. In Terms of Few Features
Can explain associations by appealing to underlying
intermediate features (e.g. “blue-ness”)
Relatively few (one “blue-ness”, but many shades)
(Blue)
(Alice)
(Navy)
6. Losing Information is Helpful
When k (= features) is small, information is lost
Factorization is approximate
(Alice appears to like blue-ish periwinkle too)
(Blue)
(Alice)
(Periwinkle)
(Navy)
8. Skip the Singular Value
Decomposition for now …
n k n
• Σ •k T’
=
m A m S
9. Alternating Least Squares
Collaborative Filtering for Implicit Feedback Datasets
www2.research.att.com/~yifanhu/PUB/cf.pdf
R = matrix of user-item interactions “strengths”
P = R reduced to 0 and 1
Factor as approximate P ≈ X•Y’
Start with random Y
Compute X such that X•Y’ best approximates P
(Frobenius / L2 norm) (Least Squares)
Repeat for Y (Alternating)
Iterate, Iterate, Iterate
Large values in X•Y’ are good recommendations
15. BONUS: Folding in New Data
Model building takes time Apply some right inverse:
⌃
X•Y’•(Y’)-1 = Q•(Y’)-1 = so
Sometimes need X = Q•(Y’)-1
immediate, if approximate,
updates for new data OK, what is (Y’)-1?
For new user U, need new Of course (Y’•Y)•(Y’•Y)-1 = I
row, XU•Y’ = QU, but have PU
So Y’•(Y•(Y’•Y)-1) = I and
What is XU? right inverse is Y•(Y’•Y)-1
Xu = QU•Y•(Y’•Y)-1 and so
Xu ≈ Pu•Y•(Y’•Y)-1
16. In Mahout
org.apache.mahout.cf. MAHOUT-737
taste.hadoop.als.
ParallelALSFactorizationJob Alternate implementation
Alternating least squares of alternating least
squares
Distributed, Hadoop-
based And more…
org.apache.mahout.cf. DistributedLanczosSolver
taste.impl.recommender. SequentialOutOfCoreSvd
svd.SVDRecommender
…
SVD-based
Non-distributed, not
Hadoop
17. Complete product
Real-time Serving Layer
Myrrix Hadoop-based
Computation Layer
Tuned, documented
Free / open: Serving Layer,
for small data
Commercial: add
Computation Layer for big
data; Hosting
Matrix factorization-based,
attractive properties
http://myrrix.com