SlideShare a Scribd company logo
1 of 29
Download to read offline
SVD Applied to
Collaborative Filtering
      ~ URUG 7-12-07 ~
Recommendation System
Recommendation System
Answers the question:
What do I want next?!?
Recommendation System
Answers the question:
What do I want next?!?

 Very consumer driven.

 Must provide good results or a user may not
 trust the system in the future.
Collaborative Filtering
Base user recommendations off of:

  User’s past history.

  History of like-minded users.

View data as product X user matrix.

Find a “neighborhood” of similar users
for that user.

Return the top-N recommendations.
Early Approaches

Goldberg, et. al. (1992), Using
collaborative filtering to weave an
information tapestry
Konstan, J., el. at (1997), Applying
Collaborative Filtering to Usenet news.

Use Pearson Correlation or cosine similarity
as a measure of similarity to form
neighborhoods.
Early CF Challenges
Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.
Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.

Scalability - Nearest neighbor
algorithms computation time grows with
the number of products and users.
Early CF Challenges
Sparsity - No correlation between
users can be found. Reduced coverage
occurs.

Scalability - Nearest neighbor
algorithms computation time grows with
the number of products and users.

Synonymy
Dimensionality Reduction
Dimensionality Reduction
 Latent Semantic Indexing (LSI)
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.

   Reduces dimensionality of a dataset
   and captures the latent relationships.
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.

   Reduces dimensionality of a dataset
   and captures the latent relationships.

 Easily maps to CF!
Dimensionality Reduction
 Latent Semantic Indexing (LSI)

   Algorithm from IR community (late
   80s-early 90s.)

   Addresses the problems of synonymy,
   polysemy, sparsity, and scalability for
   large datasets.

   Reduces dimensionality of a dataset
   and captures the latent relationships.

 Easily maps to CF!
Framing LSI for CF
Products X Users matrix instead of Terms X
Documents.

        Netflix Dataset
480,189 users, 17,770 movies, only ~100 milion ratings.

17,770 X 480,189 matrix that is 99% sparse!

  About 8.5 billion potential ratings.
SVD- The math behind LSI
   Singular Value Decomposition

      For any M x N matrix A of rank r, it can
      decomposed as:

                                         T
      A = UΣV
 U is a M x M orthogonal matrix.
 V is a N X N orthogonal matrix.
 Σ is a M x N diagonal matrix whose first r diagonal
 entries are the nonzero singular values of A.
σ1 ≥ σ2 ... ≥ σr > σr+1 = ... = σn = 0
Related to eigenvalue
  decomposition (PCA)
U is the orthornormal eigenspace of
AA^T. Spans the “column space”, known
as left singular vectors.
V is the orthornormal eigenspace of
A^TA. Spans “row space”. Right vectors.
Singular values are the square roots of
the eigenvalues.
Reducing Dimensionality


                                  T
                      Ak = Uk ΣkVk

 A_k is the closest approximation to A.

 A_k minimizes the Frobenius norm over all
 rank-k matrices: ||A − Ak ||F
Making Recommendations
 Cosine Similarity- common way to find neighborhood.
                   i· j
 cos(i, j) =
             ||i||2 ∗ || j||2
Somehow base recommendations off of that
neighborhood and its users.

Can also make predictions of products with a simple
dot product if the singular values are combined with
the singular vectors.
                        1/2      1/2 T
     CPprod = Cavg +Uk Sk (c) · Sk Vk (p)
Challenges with SVD
Scalability - Once again, compute
time grows with the number of users
and products. O(m^3)
  Offline stage.
  Online stage.
Even doing the SVD computation offline
is not possible for large datasets.
Other methods are needed.
Incremental SVD
          T
 uk = u       Vk Σk
                  −1
Incremental SVD Results
GHA for SVD
  Gorrell (2006),GHA for Incremental SVD in
  NLP

      Based off of Sanger’s (1989) GHA for eigen
      decomposition.
  a
∆ci      b
      = ci · b(x −    ∑           a a
                            (a · c j )c j )
                      j<i
  b
∆ci      a
      = ci · a(b −   ∑           b b
                           (b · c j )c j )
                     j<i
GHA extended by Funk

 void train(int user, int movie, real rating)
 {
 
real err = lrate * (rating - predictRating(movie, user));

 
userValue[user] += err * movieValue[movie];
 
movieValue[movie] += err * userValue[user];
 }
Netflix Results
Best RMSEs

  0.9283

  0.9212

Blended to get 0.9189, 3.42% better than
Netflix.
Summary
SVD provides an elegant and automatic
recommendation system that has the
potential to scale.

There are many different algorithms to
calculate or at least approximate SVD which
can be used in offline stages for websites
that need to have CF.

Every dataset is different and requires
experimentation with to get the best results.

More Related Content

What's hot

Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemRishabh Mehta
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Movie lens movie recommendation system
Movie lens movie recommendation systemMovie lens movie recommendation system
Movie lens movie recommendation systemGaurav Sawant
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation systemPranav Prakash
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systemsKapil Garg
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systemsAravindharamanan S
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation SystemAnamta Sayyed
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017Shuai Zhang
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Alexandros Karatzoglou
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 

What's hot (20)

Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Movie lens movie recommendation system
Movie lens movie recommendation systemMovie lens movie recommendation system
Movie lens movie recommendation system
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation System
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 

Similar to SVD and the Netflix Dataset

NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured predictionzukun
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...IRJET Journal
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRitesh Sawant
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesAnne-Marie Tousch
 
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET Journal
 
Download
DownloadDownload
Downloadbutest
 
Download
DownloadDownload
Downloadbutest
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfPolytechnique Montréal
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsYONG ZHENG
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...SamanthaGallone
 
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHTPerformance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHTIRJET Journal
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimationData Con LA
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...acijjournal
 

Similar to SVD and the Netflix Dataset (20)

Group Project
Group ProjectGroup Project
Group Project
 
NIPS2007: structured prediction
NIPS2007: structured predictionNIPS2007: structured prediction
NIPS2007: structured prediction
 
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
Enhanced Watemarked Images by Various Attacks Based on DWT with Differential ...
 
Recommendation system using collaborative deep learning
Recommendation system using collaborative deep learningRecommendation system using collaborative deep learning
Recommendation system using collaborative deep learning
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Large Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the TrenchesLarge Scale Recommendation: a view from the Trenches
Large Scale Recommendation: a view from the Trenches
 
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
 
Download
DownloadDownload
Download
 
Download
DownloadDownload
Download
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Safety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdfSafety Verification of Deep Neural Networks_.pdf
Safety Verification of Deep Neural Networks_.pdf
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Gene's law
Gene's lawGene's law
Gene's law
 
Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...Evaluation of conditional images synthesis: generating a photorealistic image...
Evaluation of conditional images synthesis: generating a photorealistic image...
 
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHTPerformance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
Performance Analysis on Fingerprint Image Compression Using K-SVD-SR and SPIHT
 
HalifaxNGGs
HalifaxNGGsHalifaxNGGs
HalifaxNGGs
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
 

More from Ben Mabey

PCA for the uninitiated
PCA for the uninitiatedPCA for the uninitiated
PCA for the uninitiatedBen Mabey
 
Clojure, Plain and Simple
Clojure, Plain and SimpleClojure, Plain and Simple
Clojure, Plain and SimpleBen Mabey
 
Cucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already SpeakCucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already SpeakBen Mabey
 
Writing Software not Code with Cucumber
Writing Software not Code with CucumberWriting Software not Code with Cucumber
Writing Software not Code with CucumberBen Mabey
 
Outside-In Development With Cucumber
Outside-In Development With CucumberOutside-In Development With Cucumber
Outside-In Development With CucumberBen Mabey
 
Disconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecordDisconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecordBen Mabey
 
The WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpecThe WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpecBen Mabey
 

More from Ben Mabey (8)

PCA for the uninitiated
PCA for the uninitiatedPCA for the uninitiated
PCA for the uninitiated
 
Clojure, Plain and Simple
Clojure, Plain and SimpleClojure, Plain and Simple
Clojure, Plain and Simple
 
Github flow
Github flowGithub flow
Github flow
 
Cucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already SpeakCucumber: Automating the Requirements Language You Already Speak
Cucumber: Automating the Requirements Language You Already Speak
 
Writing Software not Code with Cucumber
Writing Software not Code with CucumberWriting Software not Code with Cucumber
Writing Software not Code with Cucumber
 
Outside-In Development With Cucumber
Outside-In Development With CucumberOutside-In Development With Cucumber
Outside-In Development With Cucumber
 
Disconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecordDisconnecting the Database with ActiveRecord
Disconnecting the Database with ActiveRecord
 
The WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpecThe WHY behind TDD/BDD and the HOW with RSpec
The WHY behind TDD/BDD and the HOW with RSpec
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

SVD and the Netflix Dataset

  • 1. SVD Applied to Collaborative Filtering ~ URUG 7-12-07 ~
  • 3. Recommendation System Answers the question: What do I want next?!?
  • 4. Recommendation System Answers the question: What do I want next?!? Very consumer driven. Must provide good results or a user may not trust the system in the future.
  • 5. Collaborative Filtering Base user recommendations off of: User’s past history. History of like-minded users. View data as product X user matrix. Find a “neighborhood” of similar users for that user. Return the top-N recommendations.
  • 6. Early Approaches Goldberg, et. al. (1992), Using collaborative filtering to weave an information tapestry Konstan, J., el. at (1997), Applying Collaborative Filtering to Usenet news. Use Pearson Correlation or cosine similarity as a measure of similarity to form neighborhoods.
  • 8. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs.
  • 9. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs. Scalability - Nearest neighbor algorithms computation time grows with the number of products and users.
  • 10. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs. Scalability - Nearest neighbor algorithms computation time grows with the number of products and users. Synonymy
  • 12. Dimensionality Reduction Latent Semantic Indexing (LSI)
  • 13. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.)
  • 14. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets.
  • 15. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships.
  • 16. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships. Easily maps to CF!
  • 17. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships. Easily maps to CF!
  • 18. Framing LSI for CF Products X Users matrix instead of Terms X Documents. Netflix Dataset 480,189 users, 17,770 movies, only ~100 milion ratings. 17,770 X 480,189 matrix that is 99% sparse! About 8.5 billion potential ratings.
  • 19. SVD- The math behind LSI Singular Value Decomposition For any M x N matrix A of rank r, it can decomposed as: T A = UΣV U is a M x M orthogonal matrix. V is a N X N orthogonal matrix. Σ is a M x N diagonal matrix whose first r diagonal entries are the nonzero singular values of A. σ1 ≥ σ2 ... ≥ σr > σr+1 = ... = σn = 0
  • 20. Related to eigenvalue decomposition (PCA) U is the orthornormal eigenspace of AA^T. Spans the “column space”, known as left singular vectors. V is the orthornormal eigenspace of A^TA. Spans “row space”. Right vectors. Singular values are the square roots of the eigenvalues.
  • 21. Reducing Dimensionality T Ak = Uk ΣkVk A_k is the closest approximation to A. A_k minimizes the Frobenius norm over all rank-k matrices: ||A − Ak ||F
  • 22. Making Recommendations Cosine Similarity- common way to find neighborhood. i· j cos(i, j) = ||i||2 ∗ || j||2 Somehow base recommendations off of that neighborhood and its users. Can also make predictions of products with a simple dot product if the singular values are combined with the singular vectors. 1/2 1/2 T CPprod = Cavg +Uk Sk (c) · Sk Vk (p)
  • 23. Challenges with SVD Scalability - Once again, compute time grows with the number of users and products. O(m^3) Offline stage. Online stage. Even doing the SVD computation offline is not possible for large datasets. Other methods are needed.
  • 24. Incremental SVD T uk = u Vk Σk −1
  • 26. GHA for SVD Gorrell (2006),GHA for Incremental SVD in NLP Based off of Sanger’s (1989) GHA for eigen decomposition. a ∆ci b = ci · b(x − ∑ a a (a · c j )c j ) j<i b ∆ci a = ci · a(b − ∑ b b (b · c j )c j ) j<i
  • 27. GHA extended by Funk void train(int user, int movie, real rating) { real err = lrate * (rating - predictRating(movie, user)); userValue[user] += err * movieValue[movie]; movieValue[movie] += err * userValue[user]; }
  • 28. Netflix Results Best RMSEs 0.9283 0.9212 Blended to get 0.9189, 3.42% better than Netflix.
  • 29. Summary SVD provides an elegant and automatic recommendation system that has the potential to scale. There are many different algorithms to calculate or at least approximate SVD which can be used in offline stages for websites that need to have CF. Every dataset is different and requires experimentation with to get the best results.