SlideShare a Scribd company logo
1 of 51
Download to read offline
People who liked this talk also liked …
Building Recommendation Systems
             Using Ruby

            Ryan Weald, @rweald
             LA RubyConf 2013




                                          1
Who is this guy?

 What does he know
about recommendation
       systems?

                       2
Data Scientist @Sharethrough




 Native advertising
     platform
                               3
4
Outline
1) What is a recommendation system?
2) Collaborative filtering based
   recommendations
3) Content based recommendations
4) Hybrid systems - the best of both worlds
5) Evaluating your recommendation system
6) Resources & existing libraries


                                              5
What this Talk is Not
• Everything there is to know about
  recommendation systems.
• Bleeding edge machine learning
• How to use a specific library




                                      6
What is a
recommendation system?



                         7
A program that predicts
a user’s preferences using information
 about the user, other users, and the
         items in your system.




                                         8
LinkedIn




           9
Netflix




         10
Spotify




          11
Amazon




         12
How do I build
recommendations?



                   13
Two Main Categories of Algorithm



1. Collaborative Filtering (CF)

2. Content Based - Classification




                                   14
Collaborative Filtering


Fill in missing user preferences using
         similar users or items




                                         15
Two Types of CF
1. Memory Based - Uses similarity
between users or items. Dataset
usually kept in memory

2. Model Based - Model generated
to “explain” observed ratings


                                    16
User Based CF


 (User x Item) Matrix + Similarity
Function = Top-K most similar users




                                      17
Collaborative Filtering
         Video 1    Video 2   Video 3      Video 4   Video 5

User 1      0          1          0           5         0

User 2      1          2          1           0         5

User 3      2          5          0           0         2

User 4      5          4          4           1         1

User 5      2          4                                2
                                 ?           ?
                   * 0 denotes not rated

                                                               18
Similarity Functions

• Pearson Correlation Coefficient
• Cosine Similarity




                                   19
Pearson Correlation Coefficient




                                 20
Calculating PCC




                  21
Calculating PCC




                  22
Calculating PCC




                  23
Calculating PCC




                  24
Calculating PCC




                  25
Calculating PCC




                  26
27
Using similarity to
recommend items



                      28
Collaborative Filtering
         Video 1    Video 2   Video 3      Video 4   Video 5

User 1      0          1          0           5         0

User 2      1          2          1           0         5

User 3      2          5          0           0         2

User 4      5          4          4           1         1

User 5      2          4                                2
                                 ?           ?
                   * 0 denotes not rated

                                                               29
30
Problems With CF

• Cold Start
• Data Sparsity
• Resource expensive



                        31
Doesn’t the video
content matter for
recommendations?


                     32
Content Based Recommendations


  Classify items based on features of
   the item. Pick other items from
      same class to recommend.




                                        33
Content Based Algorithms
• K-means clustering
• Random Forrest
• Support Vector Machines
• ...
• Insert your favorite ML algorithm

                                      34
Content Based Algorithms
          Type of    Duration   Maturity
          content                Rating
Video 1   comedy        60         G

Video 2    action      120         G

Video 3   comedy        34      PG-13

Video 4   romantic      15         R

Video 5    sports      120         G




                                           35
K-means Clustering


  Group items into K clusters.
Assign new item to a cluster and
  pick items from that cluster




                                   36
K-means Clustering




                     37
Problems With Content Based
      Recommendations

• Unsupervised Learning is hard
• Training data limited or expensive
• Doesn’t take user into account
• Limited by features of content

                                       38
Hybrid Recommendations


Combine collaborative filtering with
content based algorithm to achieve
          greater results




                                      39
Hybrid Recommendations

Input
           CF Based
         Recommender

                         Combiner   Reco


Input
         Content Based
         Recommender




                                           40
Hybrid Recommendations




                         41
Hybrid Recommendations



            Content         CF
Input                                 Reco
          Recommender   Recommender




                                             42
Hybrid Recommendations


            CF
        Recommender
Input                        Reco
          Content
        Recommender




                                    43
Evaluating Recommendation Quality


• Precision vs. Recall
• Clicks
• Click through rate
• Direct user feedback


                                    44
Precision vs. Recall




                       45
Precision vs. Recall




                       46
Summary of What We’ve Learned


 • Collaborative Filtering using similar users
 • Content clustering using k-means
 • Combining 2 algorithms to boost quality
 • How to evaluate your recommender


                                                 47
Don’t Reinvent the Wheel

• Apache Mahout
• JRuby mahout gem
• SciRuby
• Recommenderlab for R


                             48
Resources & Further Reading
• Recommender Systems: An Introduction
• Linden, Greg, Brent Smith, and Jeremy York.
"Amazon. com recommendations: Item-to-item
collaborative filtering."
• Resnick, Paul, et al. "GroupLens: an open architecture
for collaborative filtering of netnews."
• ACM RecSys Conference Proceedings


                                                           49
We’re Hiring
http://bit.ly/str-engineering




                                50
Thanks!
        Twitter: @rweald
Email: ryan@sharethrough.com




                               51

More Related Content

Similar to People who liked this talk also liked … Building Recommendation Systems Using Ruby

Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Eric Schwartzman
 
Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010guest3b9e35d
 
Code review in practice
Code review in practiceCode review in practice
Code review in practiceEdorian
 
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010Atlassian
 
Reviewing CPAN modules
Reviewing CPAN modulesReviewing CPAN modules
Reviewing CPAN modulesneilbowers
 
Software Quality via Unit Testing
Software Quality via Unit TestingSoftware Quality via Unit Testing
Software Quality via Unit TestingShaun Abram
 
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)El Mahdi Benzekri
 
10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to GreatChris Sietsema
 
Why We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsWhy We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsNikolaos Tsantalis
 
Automatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesAutomatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesJose Santos
 
How to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually WantHow to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually WantCameron Presley
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationGiannis Tsakonas
 
Content Audits and Analysis
Content Audits and AnalysisContent Audits and Analysis
Content Audits and Analysismeetcontent
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP StackLorna Mitchell
 
Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Steven Hoober
 
hybrid web-recommender-systems
 hybrid web-recommender-systems hybrid web-recommender-systems
hybrid web-recommender-systemsAravindharamanan S
 
Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Valerie Puffet-Michel
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Lili Wu
 

Similar to People who liked this talk also liked … Building Recommendation Systems Using Ruby (20)

Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010Social Media Boot Camp, Chicago June 17, 2010
Social Media Boot Camp, Chicago June 17, 2010
 
Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010Social Media Boot Camp SF April 29, 2010
Social Media Boot Camp SF April 29, 2010
 
Code review in practice
Code review in practiceCode review in practice
Code review in practice
 
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
Code Review for Teams Too Busy to Review Code - Atlassian Summit 2010
 
Reviewing CPAN modules
Reviewing CPAN modulesReviewing CPAN modules
Reviewing CPAN modules
 
Software Quality via Unit Testing
Software Quality via Unit TestingSoftware Quality via Unit Testing
Software Quality via Unit Testing
 
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
Caring About Code Quality (Clean Code, GRASP, Effective Java, Design Pattern)
 
10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great10 Easy Ways to Take Your Website from Good to Great
10 Easy Ways to Take Your Website from Good to Great
 
Enterprise Search @EPAM
Enterprise Search @EPAMEnterprise Search @EPAM
Enterprise Search @EPAM
 
Why We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub ContributorsWhy We Refactor? Confessions of GitHub Contributors
Why We Refactor? Confessions of GitHub Contributors
 
Automatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprisesAutomatic and dynamic profiling of enterprises
Automatic and dynamic profiling of enterprises
 
How to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually WantHow to Have Code Reviews That Developers Actually Want
How to Have Code Reviews That Developers Actually Want
 
Exploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluationExploring perspectives in digital library evaluation
Exploring perspectives in digital library evaluation
 
Content Audits and Analysis
Content Audits and AnalysisContent Audits and Analysis
Content Audits and Analysis
 
Tool up your lamp stack
Tool up your lamp stackTool up your lamp stack
Tool up your lamp stack
 
Tool Up Your LAMP Stack
Tool Up Your LAMP StackTool Up Your LAMP Stack
Tool Up Your LAMP Stack
 
Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...
 
hybrid web-recommender-systems
 hybrid web-recommender-systems hybrid web-recommender-systems
hybrid web-recommender-systems
 
Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...
 
Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products Avatara: OLAP for Web-scale Analytics Products
Avatara: OLAP for Web-scale Analytics Products
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

People who liked this talk also liked … Building Recommendation Systems Using Ruby