SlideShare a Scribd company logo
1 of 27
Download to read offline
Recommendation System
— Theory and Practice
IMI Colloquium @ Kyushu Univ.
February 18, 2015
Kimikazu Kato
Silver Egg Technology
1 / 27
About myself
Kimikazu Kato
Ph.D in computer science, Master's degree in mathematics
Experience in numerical computation, especially ...
Geometric computation, computer graphics
Partial differential equation, parallel computation, GPGPU
Now specialize in
Machine learning, especially, recommendation system
2 / 27
About our Company
Silver Egg Technology
Established: 1998
CEO: Tom Foley
Main Service: Recommendation System, Online Advertisement
Major Clients: QVC, Senshukai (Bell Maison), Tsutaya
We provide a recommendation system to Japan's leading web sites.
3 / 27
Today's Story
Introduction to recommendation system
Rating prediction
Shopping behavior prediction
Practical viewpoint
Conclusion
4 / 27
Recommendation System
Recommender systems or recommendation systems (sometimes
replacing "system" with a synonym such as platform or engine) are a
subclass of information filtering system that seek to predict the
'rating' or 'preference' that user would give to an item. — Wikipedia
In this talk, we focus on collaborative filtering method, which only utilize
users' behavior, activity, and preference.
Other methods includes:
Content-based methods
Method using demographic data
Hybrid
5 / 27
Our Service and Mechanism
ASP service named "Aigent Recommender"
Works as an add-on to the existing web site.
6 / 27
Netflix Prize
The Netflix Prize was an open competition for the best collaborative
filtering algorithm to predict user ratings for films, based on previous
ratings without any other information about the users or films, i.e.
without the users or the films being identified except by numbers
assigned for the contest. — Wikipedia
Shortly, an open competition for preference prediction.
Closed in 2009.
7 / 27
Description of the Problem
usermovie W X Y Z
A 5 4 1 4
B 4
C 2 3
D 1 4 ?
Given rating information for some user/movie pairs,
is it possible to predict a rating for an unknown user/movie pair?
8 / 27
Notations
Number of users:
Set of users:
Number of items (movies):
Set of items (movies):
Input matrix: ( matrix)
n
U = {1, 2, …, n}
m
I = {1, 2, …, m}
A n × m
9 / 27
Matrix Factorization
Based on the assumption that each item is described by a small number of
latent factors
Each rating is expressed as a linear combination of the latent factors
Achieve good performance in Netflix Prize
Find such matrices , where
A ≈ YX
T
X ∈ Mat(f , n) Y ∈ Mat(f , m) f ≪ n, m
10 / 27
Find and maximize
p(A|X, Y , σ) =  ( | , σ)
∏
≠0aui
Aui X
T
u
Yi
p(X| ) =  ( |0, I)σX
∏
u
Xu σX
p(Y | ) =  ( |0, I)σY
∏
i
Yi σY
X Y p(X, Y |A, σ)
11 / 27
According to Bayes' Theorem,
Thus,
where means Frobenius norm.
How can this be computed? Use MCMC. See [Salakhutdinov et al., 2008].
Once and are determined, and the prediction for is
estimated by
p(X, Y |A, σ)
= p(A|X, Y , σ)p(X| )p(X| ) × const.σX σX
log p(U, V |A, σ, , )σU σV
= ( − ) + ∥X + ∥Y + const.
∑
Aui
Aui X
T
u
Yi λX ∥2
Fro
λY ∥2
Fro
∥ ⋅ ∥Fro
X Y := YA
~
X
T
Aui
A
~
ui
12 / 27
Rating
usermovie W X Y Z
A 5 4 1 4
B 4
C 2 3
D 1 4 ?
Includes negative feedback
"1" means "boring"
Zero means "unknown"
Shopping (Browsing)
useritem W X Y Z
A 1 1 1 1
B 1
C 1
D 1 1 ?
Includes no negative feedback
Zero means "unknown" or
"negative"
More degree of the freedom
Difference between Rating and Shopping
Consequently, the algorithm effective for the rating matrix is not necessarily
effective for the shopping matrix.
13 / 27
Evaluation Metrics for Recommendation
Systems
Rating prediction
The Root of the Mean Squared Error (RMSE)
The square root of the sum of squared errors
Shopping prediction
Precision
(# of Recommended and Purchased)/(# of Recommended)
Recall
(# of Recommended and Purchased)/(# of Purchased)
The criteria are different. This is another reason different algorithms should
be applied.
14 / 27
Solutions
Adding a constraint to the optimization problem
Changing the objective function itself
15 / 27
Adding a Constraint
The problem is the too much degree of freedom
Desirable characteristic is that many elements of the product should be
zero.
Assume that a certain ratio of zero elements of the input matrix remains
zero after the optimization [Sindhwani et al., 2010]
Experimentally outperform the "zero-as-negative" method
16 / 27
[Sindhwani et al., 2010]
Introduced variables to relax the problem.
Minimize
subject to
pui
( − ) + ∥X + ∥Y
∑
!=0Aui
Aui X
T
u
Yi λX ∥2
Fro
λY ∥2
Fro
+ [ (0 − − (1 − )(1 − ]∑
=0Aui
pui
X
T
u
Yi )
2
pui
X
T
u
Yi )
2
+T [− log − (1 − ) log(1 − )]
∑
=0Aui
p
ui
p
ui
p
ui
p
ui
= r
1
|{ | = 0}|Aui Aui
∑
=0Aui
pui
17 / 27
Ranking prediction
Another strategy of shopping prediction
"Learn from the order" approach
Predict whether X is more likely to be bought than Y, rather than the
probability for X or Y.
18 / 27
Bayesian Probabilistic Ranking
[Rendle et al., 2009]
Consider matrix factorization model, but the update of elements is
according to the observation of the "orders"
The parameters are the same as usual matrix factorization, but the
objective function is different
Consider a total order for each . Suppose that means
"the user is more likely to buy than .
The objective is to calculate such that and (which means
and are not bought by ).
>u u ∈ U i j(i, j ∈ I)>u
u i j
p(i j)>u = 0Aui Auj
i j u
19 / 27
Let
and define
where we assume
According to Bayes' theorem, the function to be optimized becomes:
= {(u, i, j) ∈ U × I × I| = 1, = 0},DA Aui Auj
p( |X, Y ) := p(i j|X, Y )
∏
u∈U
>u
∏
(u,i,j)∈DA
>u
p(i j|X, Y )>u
σ(x)
= σ( − )X
T
u
Yi Xu Yj
=
1
1 + e
−x
∏
p(X, Y | ) =
∏
p( |X, Y ) × p(X)p(Y ) × const.>u >u
20 / 27
Taking log of this,
Now consider the following problem:
This means "find a pair of matrices which preserve the order of the
element of the input matrix for each ."
L := log
[∏
p( |X, Y ) × p(X)p(Y )
]
>u
= log p(i j|X, Y ) − ∥X − ∥Y
∏
(u,i,j)∈DA
>u λX ∥2
Fro
λY ∥2
Fro
= log σ( − ) − ∥X − ∥Y
∑
(u,i,j)∈DA
X
T
u
Yi X
T
u
Yj λX ∥2
Fro
λY ∥2
Fro
[
log σ( − ) − ∥X − ∥Y
]
max
X,Y
∑
(u,i,j)∈DA
X
T
u
Yi X
T
u
Yj λX ∥2
Fro
λY ∥2
Fro
X, Y
u
21 / 27
Computation
The function we want to optimize:
is huge, so in practice, a stochastic method is necessary.
Let the parameters be .
The algorithm is the following:
Repeat the following
Choose randomly
Update with
This method is called Stochastic Gradient Descent (SGD).
log σ( − ) − ∥X − ∥Y
∑
(u,i,j)∈DA
X
T
u
Yi X
T
u
Yj λX ∥2
Fro
λY ∥2
Fro
U × I × I
Θ = (X, Y )
(u, i, j) ∈ DA
Θ
Θ = Θ − α (log σ( − ) − ∥X − ∥Y )
∂
∂Θ
X
T
u
Yi X
T
u
Yj λX ∥2
Fro
λY ∥2
Fro
22 / 27
Practical Aspect of Recommendation
Problem
Computational time
Memory consumption
How many services can be integrated in a server rack?
Super high accuracy with a super computer is useless for real business
23 / 27
Sparsification
As an expression of a big matrix, a sparse matrix can save computational
time and memory consumption at the same time
It is advantageous to employ a model whose parameters become sparse
24 / 27
Example of sparse model: Elastic Net
In the regression model, adding L1 term makes the solution sparse:
The similar idea is used for the matrix factorization [Ning et al., 2011]:
Minimize
subject to
[
∥Xw − y + ∥w + λρ|w
]
min
w
1
2n
∥2
2
λ(1 − ρ)
2
∥2
2
|1
∥A − AW∥ + ∥W + λρ|W
λ(1 − ρ)
2
∥2
Fro
|1
diag W = 0
25 / 27
Conclusion: What is Important for Good
Prediction?
Theory
Machine learning
Mathematical optimization
Implementation
Algorithms
Computer architecture
Mathematics
Human factors!
Hand tuning of parameters
Domain specific knowledge
26 / 27
References
Salakhutdinov, Ruslan, and Andriy Mnih. "Bayesian probabilistic matrix
factorization using Markov chain Monte Carlo." Proceedings of the 25th
international conference on Machine learning. ACM, 2008.
Sindhwani, Vikas, et al. "One-class matrix completion with low-density
factorizations." Data Mining (ICDM), 2010 IEEE 10th International
Conference on. IEEE, 2010.
Rendle, Steffen, et al. "BPR: Bayesian personalized ranking from implicit
feedback." Proceedings of the Twenty-Fifth Conference on Uncertainty in
Artificial Intelligence. AUAI Press, 2009.
Zou, Hui, and Trevor Hastie. "Regularization and variable selection via the
elastic net." Journal of the Royal Statistical Society: Series B (Statistical
Methodology) 67.2 (2005): 301-320.
Ning, Xia, and George Karypis. "SLIM: Sparse linear methods for top-n
recommender systems." Data Mining (ICDM), 2011 IEEE 11th
International Conference on. IEEE, 2011.
27 / 27

More Related Content

What's hot

ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2
zukun
 

What's hot (20)

Intoduction to numpy
Intoduction to numpyIntoduction to numpy
Intoduction to numpy
 
Chapter 3 ds
Chapter 3 dsChapter 3 ds
Chapter 3 ds
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 
Chapter 9 ds
Chapter 9 dsChapter 9 ds
Chapter 9 ds
 
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
Flink Forward Berlin 2017: David Rodriguez - The Approximate Filter, Join, an...
 
Tensor flow (1)
Tensor flow (1)Tensor flow (1)
Tensor flow (1)
 
Tensor board
Tensor boardTensor board
Tensor board
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2ECCV2010: feature learning for image classification, part 2
ECCV2010: feature learning for image classification, part 2
 
Google TensorFlow Tutorial
Google TensorFlow TutorialGoogle TensorFlow Tutorial
Google TensorFlow Tutorial
 
Machine Learning - Introduction to Tensorflow
Machine Learning - Introduction to TensorflowMachine Learning - Introduction to Tensorflow
Machine Learning - Introduction to Tensorflow
 
NumPy/SciPy Statistics
NumPy/SciPy StatisticsNumPy/SciPy Statistics
NumPy/SciPy Statistics
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
Introducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlowIntroducton to Convolutional Nerural Network with TensorFlow
Introducton to Convolutional Nerural Network with TensorFlow
 
Introduction to Tensorflow
Introduction to TensorflowIntroduction to Tensorflow
Introduction to Tensorflow
 
Interconnections of hybrid systems
Interconnections of hybrid systemsInterconnections of hybrid systems
Interconnections of hybrid systems
 
The Elements of Machine Learning
The Elements of Machine LearningThe Elements of Machine Learning
The Elements of Machine Learning
 
Support Vector Machines (SVM)
Support Vector Machines (SVM)Support Vector Machines (SVM)
Support Vector Machines (SVM)
 
Explanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expertExplanation on Tensorflow example -Deep mnist for expert
Explanation on Tensorflow example -Deep mnist for expert
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 

Viewers also liked (6)

養成読本と私
養成読本と私養成読本と私
養成読本と私
 
A Safe Rule for Sparse Logistic Regression
A Safe Rule for Sparse Logistic RegressionA Safe Rule for Sparse Logistic Regression
A Safe Rule for Sparse Logistic Regression
 
Zuang-FPSGD
Zuang-FPSGDZuang-FPSGD
Zuang-FPSGD
 
About Our Recommender System
About Our Recommender SystemAbout Our Recommender System
About Our Recommender System
 
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
【論文紹介】Approximate Bayesian Image Interpretation Using Generative Probabilisti...
 
特定の不快感を与えるツイートの分類と自動生成について
特定の不快感を与えるツイートの分類と自動生成について特定の不快感を与えるツイートの分類と自動生成について
特定の不快感を与えるツイートの分類と自動生成について
 

Similar to Recommendation System --Theory and Practice

Selecting the best stochastic systems for large scale engineering problems
Selecting the best stochastic systems for large scale engineering problemsSelecting the best stochastic systems for large scale engineering problems
Selecting the best stochastic systems for large scale engineering problems
IJECEIAES
 

Similar to Recommendation System --Theory and Practice (20)

Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation system
 
D05511625
D05511625D05511625
D05511625
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
Next directions in Mahout's recommenders
Next directions in Mahout's recommendersNext directions in Mahout's recommenders
Next directions in Mahout's recommenders
 
Recsys matrix-factorizations
Recsys matrix-factorizationsRecsys matrix-factorizations
Recsys matrix-factorizations
 
NEW APPROACH FOR SOLVING FUZZY TRIANGULAR ASSIGNMENT BY ROW MINIMA METHOD
NEW APPROACH FOR SOLVING FUZZY TRIANGULAR ASSIGNMENT BY ROW MINIMA METHODNEW APPROACH FOR SOLVING FUZZY TRIANGULAR ASSIGNMENT BY ROW MINIMA METHOD
NEW APPROACH FOR SOLVING FUZZY TRIANGULAR ASSIGNMENT BY ROW MINIMA METHOD
 
Cuckoo Search: Recent Advances and Applications
Cuckoo Search: Recent Advances and ApplicationsCuckoo Search: Recent Advances and Applications
Cuckoo Search: Recent Advances and Applications
 
Data structures & problem solving unit 1 ppt
Data structures & problem solving unit 1 pptData structures & problem solving unit 1 ppt
Data structures & problem solving unit 1 ppt
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender Systems
 
The Development of Financial Information System and Business Intelligence Usi...
The Development of Financial Information System and Business Intelligence Usi...The Development of Financial Information System and Business Intelligence Usi...
The Development of Financial Information System and Business Intelligence Usi...
 
Selecting the best stochastic systems for large scale engineering problems
Selecting the best stochastic systems for large scale engineering problemsSelecting the best stochastic systems for large scale engineering problems
Selecting the best stochastic systems for large scale engineering problems
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
 
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
The Evaluation of Topsis and Fuzzy-Topsis Method for Decision Making System i...
 
Supply Chain Management - Optimization technology
Supply Chain Management - Optimization technologySupply Chain Management - Optimization technology
Supply Chain Management - Optimization technology
 
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
 
Session 4 .pdf
Session 4 .pdfSession 4 .pdf
Session 4 .pdf
 
Operation research history and overview application limitation
Operation research history and overview application limitationOperation research history and overview application limitation
Operation research history and overview application limitation
 
Anirban part1
Anirban part1Anirban part1
Anirban part1
 
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
Accelerated Particle Swarm Optimization and Support Vector Machine for Busine...
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 

More from Kimikazu Kato

More from Kimikazu Kato (14)

Tokyo webmining 2017-10-28
Tokyo webmining 2017-10-28Tokyo webmining 2017-10-28
Tokyo webmining 2017-10-28
 
機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPython機械学習ゴリゴリ派のための数学とPython
機械学習ゴリゴリ派のための数学とPython
 
Pythonを使った機械学習の学習
Pythonを使った機械学習の学習Pythonを使った機械学習の学習
Pythonを使った機械学習の学習
 
Fast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-MeansFast and Probvably Seedings for k-Means
Fast and Probvably Seedings for k-Means
 
Pythonで機械学習入門以前
Pythonで機械学習入門以前Pythonで機械学習入門以前
Pythonで機械学習入門以前
 
Pythonによる機械学習
Pythonによる機械学習Pythonによる機械学習
Pythonによる機械学習
 
Pythonによる機械学習の最前線
Pythonによる機械学習の最前線Pythonによる機械学習の最前線
Pythonによる機械学習の最前線
 
Sparse pca via bipartite matching
Sparse pca via bipartite matchingSparse pca via bipartite matching
Sparse pca via bipartite matching
 
正しいプログラミング言語の覚え方
正しいプログラミング言語の覚え方正しいプログラミング言語の覚え方
正しいプログラミング言語の覚え方
 
Sapporo20140709
Sapporo20140709Sapporo20140709
Sapporo20140709
 
ネット通販向けレコメンドシステム提供サービスについて
ネット通販向けレコメンドシステム提供サービスについてネット通販向けレコメンドシステム提供サービスについて
ネット通販向けレコメンドシステム提供サービスについて
 
関東GPGPU勉強会資料
関東GPGPU勉強会資料関東GPGPU勉強会資料
関東GPGPU勉強会資料
 
2012-03-08 MSS研究会
2012-03-08 MSS研究会2012-03-08 MSS研究会
2012-03-08 MSS研究会
 
純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門純粋関数型アルゴリズム入門
純粋関数型アルゴリズム入門
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Recommendation System --Theory and Practice

  • 1. Recommendation System — Theory and Practice IMI Colloquium @ Kyushu Univ. February 18, 2015 Kimikazu Kato Silver Egg Technology 1 / 27
  • 2. About myself Kimikazu Kato Ph.D in computer science, Master's degree in mathematics Experience in numerical computation, especially ... Geometric computation, computer graphics Partial differential equation, parallel computation, GPGPU Now specialize in Machine learning, especially, recommendation system 2 / 27
  • 3. About our Company Silver Egg Technology Established: 1998 CEO: Tom Foley Main Service: Recommendation System, Online Advertisement Major Clients: QVC, Senshukai (Bell Maison), Tsutaya We provide a recommendation system to Japan's leading web sites. 3 / 27
  • 4. Today's Story Introduction to recommendation system Rating prediction Shopping behavior prediction Practical viewpoint Conclusion 4 / 27
  • 5. Recommendation System Recommender systems or recommendation systems (sometimes replacing "system" with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that user would give to an item. — Wikipedia In this talk, we focus on collaborative filtering method, which only utilize users' behavior, activity, and preference. Other methods includes: Content-based methods Method using demographic data Hybrid 5 / 27
  • 6. Our Service and Mechanism ASP service named "Aigent Recommender" Works as an add-on to the existing web site. 6 / 27
  • 7. Netflix Prize The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest. — Wikipedia Shortly, an open competition for preference prediction. Closed in 2009. 7 / 27
  • 8. Description of the Problem usermovie W X Y Z A 5 4 1 4 B 4 C 2 3 D 1 4 ? Given rating information for some user/movie pairs, is it possible to predict a rating for an unknown user/movie pair? 8 / 27
  • 9. Notations Number of users: Set of users: Number of items (movies): Set of items (movies): Input matrix: ( matrix) n U = {1, 2, …, n} m I = {1, 2, …, m} A n × m 9 / 27
  • 10. Matrix Factorization Based on the assumption that each item is described by a small number of latent factors Each rating is expressed as a linear combination of the latent factors Achieve good performance in Netflix Prize Find such matrices , where A ≈ YX T X ∈ Mat(f , n) Y ∈ Mat(f , m) f ≪ n, m 10 / 27
  • 11. Find and maximize p(A|X, Y , σ) =  ( | , σ) ∏ ≠0aui Aui X T u Yi p(X| ) =  ( |0, I)σX ∏ u Xu σX p(Y | ) =  ( |0, I)σY ∏ i Yi σY X Y p(X, Y |A, σ) 11 / 27
  • 12. According to Bayes' Theorem, Thus, where means Frobenius norm. How can this be computed? Use MCMC. See [Salakhutdinov et al., 2008]. Once and are determined, and the prediction for is estimated by p(X, Y |A, σ) = p(A|X, Y , σ)p(X| )p(X| ) × const.σX σX log p(U, V |A, σ, , )σU σV = ( − ) + ∥X + ∥Y + const. ∑ Aui Aui X T u Yi λX ∥2 Fro λY ∥2 Fro ∥ ⋅ ∥Fro X Y := YA ~ X T Aui A ~ ui 12 / 27
  • 13. Rating usermovie W X Y Z A 5 4 1 4 B 4 C 2 3 D 1 4 ? Includes negative feedback "1" means "boring" Zero means "unknown" Shopping (Browsing) useritem W X Y Z A 1 1 1 1 B 1 C 1 D 1 1 ? Includes no negative feedback Zero means "unknown" or "negative" More degree of the freedom Difference between Rating and Shopping Consequently, the algorithm effective for the rating matrix is not necessarily effective for the shopping matrix. 13 / 27
  • 14. Evaluation Metrics for Recommendation Systems Rating prediction The Root of the Mean Squared Error (RMSE) The square root of the sum of squared errors Shopping prediction Precision (# of Recommended and Purchased)/(# of Recommended) Recall (# of Recommended and Purchased)/(# of Purchased) The criteria are different. This is another reason different algorithms should be applied. 14 / 27
  • 15. Solutions Adding a constraint to the optimization problem Changing the objective function itself 15 / 27
  • 16. Adding a Constraint The problem is the too much degree of freedom Desirable characteristic is that many elements of the product should be zero. Assume that a certain ratio of zero elements of the input matrix remains zero after the optimization [Sindhwani et al., 2010] Experimentally outperform the "zero-as-negative" method 16 / 27
  • 17. [Sindhwani et al., 2010] Introduced variables to relax the problem. Minimize subject to pui ( − ) + ∥X + ∥Y ∑ !=0Aui Aui X T u Yi λX ∥2 Fro λY ∥2 Fro + [ (0 − − (1 − )(1 − ]∑ =0Aui pui X T u Yi ) 2 pui X T u Yi ) 2 +T [− log − (1 − ) log(1 − )] ∑ =0Aui p ui p ui p ui p ui = r 1 |{ | = 0}|Aui Aui ∑ =0Aui pui 17 / 27
  • 18. Ranking prediction Another strategy of shopping prediction "Learn from the order" approach Predict whether X is more likely to be bought than Y, rather than the probability for X or Y. 18 / 27
  • 19. Bayesian Probabilistic Ranking [Rendle et al., 2009] Consider matrix factorization model, but the update of elements is according to the observation of the "orders" The parameters are the same as usual matrix factorization, but the objective function is different Consider a total order for each . Suppose that means "the user is more likely to buy than . The objective is to calculate such that and (which means and are not bought by ). >u u ∈ U i j(i, j ∈ I)>u u i j p(i j)>u = 0Aui Auj i j u 19 / 27
  • 20. Let and define where we assume According to Bayes' theorem, the function to be optimized becomes: = {(u, i, j) ∈ U × I × I| = 1, = 0},DA Aui Auj p( |X, Y ) := p(i j|X, Y ) ∏ u∈U >u ∏ (u,i,j)∈DA >u p(i j|X, Y )>u σ(x) = σ( − )X T u Yi Xu Yj = 1 1 + e −x ∏ p(X, Y | ) = ∏ p( |X, Y ) × p(X)p(Y ) × const.>u >u 20 / 27
  • 21. Taking log of this, Now consider the following problem: This means "find a pair of matrices which preserve the order of the element of the input matrix for each ." L := log [∏ p( |X, Y ) × p(X)p(Y ) ] >u = log p(i j|X, Y ) − ∥X − ∥Y ∏ (u,i,j)∈DA >u λX ∥2 Fro λY ∥2 Fro = log σ( − ) − ∥X − ∥Y ∑ (u,i,j)∈DA X T u Yi X T u Yj λX ∥2 Fro λY ∥2 Fro [ log σ( − ) − ∥X − ∥Y ] max X,Y ∑ (u,i,j)∈DA X T u Yi X T u Yj λX ∥2 Fro λY ∥2 Fro X, Y u 21 / 27
  • 22. Computation The function we want to optimize: is huge, so in practice, a stochastic method is necessary. Let the parameters be . The algorithm is the following: Repeat the following Choose randomly Update with This method is called Stochastic Gradient Descent (SGD). log σ( − ) − ∥X − ∥Y ∑ (u,i,j)∈DA X T u Yi X T u Yj λX ∥2 Fro λY ∥2 Fro U × I × I Θ = (X, Y ) (u, i, j) ∈ DA Θ Θ = Θ − α (log σ( − ) − ∥X − ∥Y ) ∂ ∂Θ X T u Yi X T u Yj λX ∥2 Fro λY ∥2 Fro 22 / 27
  • 23. Practical Aspect of Recommendation Problem Computational time Memory consumption How many services can be integrated in a server rack? Super high accuracy with a super computer is useless for real business 23 / 27
  • 24. Sparsification As an expression of a big matrix, a sparse matrix can save computational time and memory consumption at the same time It is advantageous to employ a model whose parameters become sparse 24 / 27
  • 25. Example of sparse model: Elastic Net In the regression model, adding L1 term makes the solution sparse: The similar idea is used for the matrix factorization [Ning et al., 2011]: Minimize subject to [ ∥Xw − y + ∥w + λρ|w ] min w 1 2n ∥2 2 λ(1 − ρ) 2 ∥2 2 |1 ∥A − AW∥ + ∥W + λρ|W λ(1 − ρ) 2 ∥2 Fro |1 diag W = 0 25 / 27
  • 26. Conclusion: What is Important for Good Prediction? Theory Machine learning Mathematical optimization Implementation Algorithms Computer architecture Mathematics Human factors! Hand tuning of parameters Domain specific knowledge 26 / 27
  • 27. References Salakhutdinov, Ruslan, and Andriy Mnih. "Bayesian probabilistic matrix factorization using Markov chain Monte Carlo." Proceedings of the 25th international conference on Machine learning. ACM, 2008. Sindhwani, Vikas, et al. "One-class matrix completion with low-density factorizations." Data Mining (ICDM), 2010 IEEE 10th International Conference on. IEEE, 2010. Rendle, Steffen, et al. "BPR: Bayesian personalized ranking from implicit feedback." Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009. Zou, Hui, and Trevor Hastie. "Regularization and variable selection via the elastic net." Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67.2 (2005): 301-320. Ning, Xia, and George Karypis. "SLIM: Sparse linear methods for top-n recommender systems." Data Mining (ICDM), 2011 IEEE 11th International Conference on. IEEE, 2011. 27 / 27