SlideShare a Scribd company logo
1 of 33
RECOMMENDATION
ENGINE
DEMYSTIFIED
NEIGHBORHOOD METHODS
COLLABORATIVE FILTERING
Alex Lin
Senior Architect
Intelligent Mining
Outline
 Introduction

 User-oriented  Collaborative Filtering
 Item-oriented Collaborative Filtering

 Challenges

 Best Practices
Recommendation Engine
What is a Recommendation Engine (RE)?
  RE takes “observation” data and uses machine
   learning / statistical algorithms to predict
   outcomes or levels of interest.
  “Recommender systems form a specific type of
   information filtering (IF) technique that attempts
   to present information items (movies, music,
   books, news, images, web pages, etc.) that are
   likely of interest to the user.” – Wikipedia
Recommendation Engine
                                            Demographic
                                               based
                                                                         Clustering
       Social                   Knowledge                                  based
       Graph                      based
       based

                                                          Content
                Collaborative                              based
                  Filtering                                         Category
                                               Mixture
                                               Models                based




  This
      presentation will focus on
  Neighborhood-based Collaborative Filtering
      User-oriented method
      Item-oriented method
Neighborhood-based
Collaborative Filtering
                                   Data Normalization

     Data
                   Input Data
 (Observations)   Representation
                                      User-user similarity
                                              vs.
                                      Item-item similarity
                   Neighborhood
                    Formation

                                           Make
                                      Recommendation
                  Recommendation
                    Generation
                                      Web Page, Email,
                                         Direct Mail,
                                     Marketing Campaign,
                    Delivering               etc.
                    Mechanism
Outline
 Introduction

 User-oriented  Collaborative Filtering
 Item-oriented Collaborative Filtering

 Challenges

 Best Practices
User-oriented CF
  Input       Data Representation: Users / Items matrix.
                                    users
                1   2   3   4   5    6   7   8   9   10   …   n

           1    1       1            1               1

           2                             1   1   1
   items




           3    1   1       1                1   1

           4        1           1            1   1

                        1                1
           …




           m



   Cell       value “1” means user purchased the item.
   Data       Normalization is not shown on this slide.
User-oriented CF
  Neighborhood             Formation:
         Find the k most like-minded users in the system.
                              users
                1   2   3   4   5   6   7   8   9   10   …   n

            1   1   1   1           1               1

            2                           1   1   1
  items




            3   1   1       1               1   1            1

            4       1           1           1   1

                        1               1
           …




            m               1                   1
User-oriented CF
  Neighborhood             Formation:
         Find the k most like-minded users in the system.
                              users
                1   2   3   4   5   6   7   8   9   10   …   n

            1   1   1   1           1               1

            2                           1   1   1
  items




            3   1   1       1               1   1            1

            4       1           1           1   1

                        1               1
           …




            m               1                   1


   Identify        U9 and U2 are similar to U8
User-oriented CF
  Recommendation            Generation:

                                 users
             1   2   3   4   5    6   7   8   9   10   …   n

         1   1   1   1            1               1

         2                            1   1   1
 items




         3   1   1       1                1   1            1

         4       1           1            1   1

                     1                1
         …




         m               1                    1


  Identify I1       and I9 are not yet purchased by U8
User-oriented CF
  Recommendation             Generation:

                                  users
             1    2   3   4   5    6   7   8     9   10   …   n

         1   1    1   1            1       0.7       1

         2                             1   1     1
 items




         3   1    1       1                1     1            1

         4        1           1            1     1

                      1                1
         …




         m                1                0.9   1


  Predict       by taking weighted sum
User-oriented CF
Practical Implementation
    Compute and store all user-user similarities.

                                                                   u•v
         Cosine similarity:   sim(u,v) = cos(u,v) =
                                                                 u ∗ v
                                                                    2       2

    Find N items that will be most likely purchased by user u.
         Find k most similar users to u, save to Usim
         Get all items purchased by Usim, save to Icandidate
                   €
         Remove unavailable items in Icandidate
         Get all items purchased by u, save to Ipurchased
         Take Icandidate – Ipurchased = Irecmd
         Re-order items in Irecmd based on sum of user-user similarity

                               ∑ v ∈k−similarUser(u)
                                                       userSim(u,v) * rvi
                   pred(u,i) =
                                ∑   v ∈k−similarUser( u)
                                                           userSim(u,v)



          €
Outline
 Introduction

 User-oriented  Collaborative Filtering
 Item-oriented Collaborative Filtering

 Challenges

 Best Practices
Item-oriented CF
  Input       Data Representation: Users / Items matrix.
                                    users
                1   2   3   4   5    6   7   8   9   10   …   n

           1    1       1            1               1

           2                             1   1   1
   items




           3    1   1       1                1   1

           4        1           1            1   1

                        1                1
           …




           m



   Cell       value “1” means user purchased the item.
   Data       Normalization is not shown on this slide.
Item-oriented CF
  Neighborhood             Formation:
         Find the k items that have the most similar user
          vectors
                             users
                1   2   3   4   5   6   7   8   9   10   …   n

            1   1       1           1               1

            2       1                   1   1   1
  items




            3   1           1               1   1

            4       1           1       1   1   1

                        1               1
            …




            m               1                   1
Item-oriented CF – cont.
  Recommendation           Generation
                                1
                   W2-1

             2     W2-3         3
                                    Predict by taking weighted sum
                                4
                                    TopN Recmd. for U8 : {1,9,3}
  U8         4
                   W4-1         1


                                8
             8

       purchased   W8-9         9
         items
                   item-item
                   similarity
Item-oriented CF
Practical Implementation
    Compute and store all item-item similarities.

                                                               a•b
         Cosine similarity: sim(a,b) = cos(a,b) =
                                                              a ∗ b
                                                                2     2


    Find N items that will be most likely purchased by user u.
         Get all items purchased by u, save to Ipurchased
                     €
         For each item in Ipurchased, find k most similar items save them
          to Icandidate
         Remove unavailable items in Icandidate
         Get all items purchased by u, save to Ipurchased
         Take Icandidate – Ipurchased = Irecmd
         Re-order items in Irecmd based on
                                   ∑  j ∈ purchasedItems(u)
                                                              itemSim(i, j) * ruj
                       pred(u,i) =
                                    ∑    j ∈ purchasedItems(u)
                                                                 itemSim(i, j)
Outline
 Introduction

 User-oriented  Collaborative Filtering
 Item-oriented Collaborative Filtering

 Challenges

 Best Practices
The Challenges you will face
  Data Sparsity Issue
  Cold Start Problem
  Curse of Dimensionality

  Scalability
Data Sparsity Issue
  Missing       values in the Users / Items matrix.
                                  users
                                                                        Not
             1   2   3    4   5    6   7     8     9     10       …   n
                                                                      Reliable     Mostly
                                                                                  Unknown
         1

         2                           ∑                       userSim(u,v) * rvi
 items




                                       v ∈k−similarUser(u)
         3   1
                         pred(u,i) =         1
                                      ∑    v ∈k−similarUser( u)
                                                                  userSim(u,v)
         4                                   1

                          1
         …




         m   €

  NetflixPrize data set: 98.82% of cells are blank
  Typical e-commerce txn data set can be 10-100
   time more sparse than Netflix Prize data set !!
Cold Start Problem
    It occurs when new item or new user is added to the
     data matrix.
                                     users
                 1   2   3   4   5    6   7   8   9   10   …   n

             1   1       1            1               1

             2                            1   1   1
     items




             3   1   1       1                1   1            1

             4       1           1            1   1

                         1                1
             …




             m


    RE does not have enough knowledge about this new user or
     this new item yet.
    Content-based REs can be incorporated to alleviate cold start
     problem.
Curse of Dimensionality
  Adding  more features (items or users) can
   increase the noise, and hence the error.
  There aren’t enough observations to get good
   estimates. users
            items
Scalability
  User  neighborhood formation: O(n2) for n users
  Item neighborhood formation: O(m2) for m items

  When m (# of items) << n (# of users), item-based
   CF will be more efficient than user-based CF
  Ability to update neighborhood incrementally
Outline
 Introduction

 User-oriented  Collaborative Filtering
 Item-oriented Collaborative Filtering

 Challenges

 Best Practices
Best Practices
  Understand  the data thoroughly
  Define business objectives and conversion metrics
   judiciously
  Understand context and user intent

  Apply adaptive reinforcement learning

  Optimize RE using cost-based methods

  Be aware of data-shift issue

  Optimize marketing messages delivered with
   recommendation results
Understand the data thoroughly
  What data are available?
  E-commerce data set typically contains:
      Clickstream
      Shopping cart / Saved Items / Wish list / Shared Item
      Order / Return
      User profile
      User ratings
  How  are these data points being collected?
  Is there pre-existing bias in the data? or leakage?

  Is the data related to what we want to predict?
Define business objectives and
conversion metrics judiciously
  Definingcorrect conversion metrics can be a
  competitive advantage.


                     mitigate
                                   increase
                    navigation
                   inefficiency
                                  # of orders




                     up-sell       increase
       Web                                        more
                     within        average
     property      the context    item price
                                                 revenue




                    cross-sell    increase
                     within         items
                   the context    per order




                 Recommendation
                     Engine
Understand context and user
intent
  Context
        should be considered when RE making
  recommendation.

             Month: December
             Temperature: 45oF
Apply adaptive reinforcement
learning

                      C1

                      C2

                      C3



                                                                Incorporating clickstream
                                                                 adaptive reinforcement


                 ∑
                 j ∈ purchasedItems(u)
                                         itemSim(i, j) * ruj
     pred(u,i) =                                               + W iC i
                  ∑   j ∈ purchasedItems(u)
                                              itemSim(i, j)



 €
Optimize RE using cost-based
models
  Cost-based     engine optimization

        Metrics




                                        Parameter
Be aware of the data-shift issue
      Data    collection UI changes will influence data
         significantly, creating artificial data shifting




Netflix prize data set
Y. Koren, "Collaborative Filtering with Temporal Dynamics," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data
Mining (KDD 09), ACM Press, 2009, pp. 447-455.12
Optimize marketing message
delivered with recommendation result
  It's   How You Say It




                           Screenshots from Amazon.com
RECOMMENDATION
ENGINE
DEMYSTIFIED
Alex Lin
Intelligent Mining
Email: alin@intelligentmining.com
Twitter: DKALab

More Related Content

Viewers also liked

Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Recommender Systems and Active Learning
Recommender Systems and Active LearningRecommender Systems and Active Learning
Recommender Systems and Active LearningDain Kaplan
 
Online recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationOnline recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationMarcus Ljungblad
 
Requirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender SystemsRequirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender SystemsStoitsis Giannis
 
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshRecommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshBigDataCloud
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Xavier Amatriain
 
Intro to Factorization Machines
Intro to Factorization MachinesIntro to Factorization Machines
Intro to Factorization MachinesPavel Kalaidin
 
Text mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science projectText mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science projectJonathan Sedar
 
Text Mining to Correct Missing CRM Information by Jonathan Sedar
Text Mining to Correct Missing CRM Information by Jonathan SedarText Mining to Correct Missing CRM Information by Jonathan Sedar
Text Mining to Correct Missing CRM Information by Jonathan SedarPyData
 
Warsaw Data Science - Factorization Machines Introduction
Warsaw Data Science -  Factorization Machines IntroductionWarsaw Data Science -  Factorization Machines Introduction
Warsaw Data Science - Factorization Machines IntroductionBartlomiej Twardowski
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineHakka Labs
 
Customer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubeyCustomer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubeyAnkita Dubey
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFMLiangjie Hong
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsDKALab
 
Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News PredictionsNattiya Kanhabua
 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.customersforever
 
Recommender.system.presentation.pjug.01.21.2014
Recommender.system.presentation.pjug.01.21.2014Recommender.system.presentation.pjug.01.21.2014
Recommender.system.presentation.pjug.01.21.2014rpbrehm
 

Viewers also liked (20)

Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Recommender Systems and Active Learning
Recommender Systems and Active LearningRecommender Systems and Active Learning
Recommender Systems and Active Learning
 
Online recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisationOnline recommendations at scale using matrix factorisation
Online recommendations at scale using matrix factorisation
 
Requirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender SystemsRequirements for Processing Datasets for Recommender Systems
Requirements for Processing Datasets for Recommender Systems
 
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshRecommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab Ghosh
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Intro to Factorization Machines
Intro to Factorization MachinesIntro to Factorization Machines
Intro to Factorization Machines
 
Text mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science projectText mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science project
 
Text Mining to Correct Missing CRM Information by Jonathan Sedar
Text Mining to Correct Missing CRM Information by Jonathan SedarText Mining to Correct Missing CRM Information by Jonathan Sedar
Text Mining to Correct Missing CRM Information by Jonathan Sedar
 
Warsaw Data Science - Factorization Machines Introduction
Warsaw Data Science -  Factorization Machines IntroductionWarsaw Data Science -  Factorization Machines Introduction
Warsaw Data Science - Factorization Machines Introduction
 
Datamining for crm
Datamining for crmDatamining for crm
Datamining for crm
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation Engine
 
Customer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubeyCustomer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubey
 
Factorization Machines with libFM
Factorization Machines with libFMFactorization Machines with libFM
Factorization Machines with libFM
 
Naive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event ModelsNaive Bayesian Text Classifier Event Models
Naive Bayesian Text Classifier Event Models
 
Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News Predictions
 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.
 
Recommender.system.presentation.pjug.01.21.2014
Recommender.system.presentation.pjug.01.21.2014Recommender.system.presentation.pjug.01.21.2014
Recommender.system.presentation.pjug.01.21.2014
 

Similar to Recommendation Engine Demystified

Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive ModelDKALab
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systemsNAVER Engineering
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System ExplainedCrossing Minds
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommendergu wendong
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfAlphaIssaghaDiallo
 
Semantically-Enhanced Recommendation Algorithms
Semantically-Enhanced Recommendation AlgorithmsSemantically-Enhanced Recommendation Algorithms
Semantically-Enhanced Recommendation AlgorithmsLuigi Ceccaroni
 
DN 2017 | Building a recommender system using collaborative filtering (CF) | ...
DN 2017 | Building a recommender system using collaborative filtering (CF) | ...DN 2017 | Building a recommender system using collaborative filtering (CF) | ...
DN 2017 | Building a recommender system using collaborative filtering (CF) | ...Dataconomy Media
 
Quest for prediction accuracy by Mekbib Awono
Quest for prediction accuracy by Mekbib AwonoQuest for prediction accuracy by Mekbib Awono
Quest for prediction accuracy by Mekbib AwonoFrosmo
 
Explain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You DoExplain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You DoDatabricks
 
Aaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-21-Recommender Systems: Content-based FilteringAaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-21-Recommender Systems: Content-based FilteringAminaRepo
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
Machine Learning with Mahout
Machine Learning with MahoutMachine Learning with Mahout
Machine Learning with Mahoutbigdatasyd
 
Fundamentals of Deep Recommender Systems
 Fundamentals of Deep Recommender Systems Fundamentals of Deep Recommender Systems
Fundamentals of Deep Recommender SystemsWQ Fan
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache GiraphDataWorks Summit
 
How to make intelligent web apps
How to make intelligent web appsHow to make intelligent web apps
How to make intelligent web appsiapain
 
EdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale DataEdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale Datagu wendong
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Creating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveCreating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveEna Arel
 

Similar to Recommendation Engine Demystified (20)

Building a Predictive Model
Building a Predictive ModelBuilding a Predictive Model
Building a Predictive Model
 
Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Tag And Tag Based Recommender
Tag And Tag Based RecommenderTag And Tag Based Recommender
Tag And Tag Based Recommender
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdf
 
Semantically-Enhanced Recommendation Algorithms
Semantically-Enhanced Recommendation AlgorithmsSemantically-Enhanced Recommendation Algorithms
Semantically-Enhanced Recommendation Algorithms
 
DN 2017 | Building a recommender system using collaborative filtering (CF) | ...
DN 2017 | Building a recommender system using collaborative filtering (CF) | ...DN 2017 | Building a recommender system using collaborative filtering (CF) | ...
DN 2017 | Building a recommender system using collaborative filtering (CF) | ...
 
Quest for prediction accuracy by Mekbib Awono
Quest for prediction accuracy by Mekbib AwonoQuest for prediction accuracy by Mekbib Awono
Quest for prediction accuracy by Mekbib Awono
 
Explain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You DoExplain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You Do
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Aaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-21-Recommender Systems: Content-based FilteringAaa ped-21-Recommender Systems: Content-based Filtering
Aaa ped-21-Recommender Systems: Content-based Filtering
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Machine Learning with Mahout
Machine Learning with MahoutMachine Learning with Mahout
Machine Learning with Mahout
 
Fundamentals of Deep Recommender Systems
 Fundamentals of Deep Recommender Systems Fundamentals of Deep Recommender Systems
Fundamentals of Deep Recommender Systems
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
 
How to make intelligent web apps
How to make intelligent web appsHow to make intelligent web apps
How to make intelligent web apps
 
EdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale DataEdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale Data
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Creating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveCreating Documentation Your Users Will Love
Creating Documentation Your Users Will Love
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 

Recommendation Engine Demystified

  • 2. Outline  Introduction  User-oriented Collaborative Filtering  Item-oriented Collaborative Filtering  Challenges  Best Practices
  • 3. Recommendation Engine What is a Recommendation Engine (RE)?   RE takes “observation” data and uses machine learning / statistical algorithms to predict outcomes or levels of interest.   “Recommender systems form a specific type of information filtering (IF) technique that attempts to present information items (movies, music, books, news, images, web pages, etc.) that are likely of interest to the user.” – Wikipedia
  • 4. Recommendation Engine Demographic based Clustering Social Knowledge based Graph based based Content Collaborative based Filtering Category Mixture Models based   This presentation will focus on Neighborhood-based Collaborative Filtering   User-oriented method   Item-oriented method
  • 5. Neighborhood-based Collaborative Filtering Data Normalization Data Input Data (Observations) Representation User-user similarity vs. Item-item similarity Neighborhood Formation Make Recommendation Recommendation Generation Web Page, Email, Direct Mail, Marketing Campaign, Delivering etc. Mechanism
  • 6. Outline  Introduction  User-oriented Collaborative Filtering  Item-oriented Collaborative Filtering  Challenges  Best Practices
  • 7. User-oriented CF   Input Data Representation: Users / Items matrix. users 1 2 3 4 5 6 7 8 9 10 … n 1 1 1 1 1 2 1 1 1 items 3 1 1 1 1 1 4 1 1 1 1 1 1 … m   Cell value “1” means user purchased the item.   Data Normalization is not shown on this slide.
  • 8. User-oriented CF   Neighborhood Formation:   Find the k most like-minded users in the system. users 1 2 3 4 5 6 7 8 9 10 … n 1 1 1 1 1 1 2 1 1 1 items 3 1 1 1 1 1 1 4 1 1 1 1 1 1 … m 1 1
  • 9. User-oriented CF   Neighborhood Formation:   Find the k most like-minded users in the system. users 1 2 3 4 5 6 7 8 9 10 … n 1 1 1 1 1 1 2 1 1 1 items 3 1 1 1 1 1 1 4 1 1 1 1 1 1 … m 1 1   Identify U9 and U2 are similar to U8
  • 10. User-oriented CF   Recommendation Generation: users 1 2 3 4 5 6 7 8 9 10 … n 1 1 1 1 1 1 2 1 1 1 items 3 1 1 1 1 1 1 4 1 1 1 1 1 1 … m 1 1   Identify I1 and I9 are not yet purchased by U8
  • 11. User-oriented CF   Recommendation Generation: users 1 2 3 4 5 6 7 8 9 10 … n 1 1 1 1 1 0.7 1 2 1 1 1 items 3 1 1 1 1 1 1 4 1 1 1 1 1 1 … m 1 0.9 1   Predict by taking weighted sum
  • 12. User-oriented CF Practical Implementation   Compute and store all user-user similarities. u•v   Cosine similarity: sim(u,v) = cos(u,v) = u ∗ v 2 2   Find N items that will be most likely purchased by user u.   Find k most similar users to u, save to Usim   Get all items purchased by Usim, save to Icandidate €   Remove unavailable items in Icandidate   Get all items purchased by u, save to Ipurchased   Take Icandidate – Ipurchased = Irecmd   Re-order items in Irecmd based on sum of user-user similarity ∑ v ∈k−similarUser(u) userSim(u,v) * rvi pred(u,i) = ∑ v ∈k−similarUser( u) userSim(u,v) €
  • 13. Outline  Introduction  User-oriented Collaborative Filtering  Item-oriented Collaborative Filtering  Challenges  Best Practices
  • 14. Item-oriented CF   Input Data Representation: Users / Items matrix. users 1 2 3 4 5 6 7 8 9 10 … n 1 1 1 1 1 2 1 1 1 items 3 1 1 1 1 1 4 1 1 1 1 1 1 … m   Cell value “1” means user purchased the item.   Data Normalization is not shown on this slide.
  • 15. Item-oriented CF   Neighborhood Formation:   Find the k items that have the most similar user vectors users 1 2 3 4 5 6 7 8 9 10 … n 1 1 1 1 1 2 1 1 1 1 items 3 1 1 1 1 4 1 1 1 1 1 1 1 … m 1 1
  • 16. Item-oriented CF – cont.   Recommendation Generation 1 W2-1 2 W2-3 3 Predict by taking weighted sum 4 TopN Recmd. for U8 : {1,9,3} U8 4 W4-1 1 8 8 purchased W8-9 9 items item-item similarity
  • 17. Item-oriented CF Practical Implementation   Compute and store all item-item similarities. a•b   Cosine similarity: sim(a,b) = cos(a,b) = a ∗ b 2 2   Find N items that will be most likely purchased by user u.   Get all items purchased by u, save to Ipurchased €   For each item in Ipurchased, find k most similar items save them to Icandidate   Remove unavailable items in Icandidate   Get all items purchased by u, save to Ipurchased   Take Icandidate – Ipurchased = Irecmd   Re-order items in Irecmd based on ∑ j ∈ purchasedItems(u) itemSim(i, j) * ruj pred(u,i) = ∑ j ∈ purchasedItems(u) itemSim(i, j)
  • 18. Outline  Introduction  User-oriented Collaborative Filtering  Item-oriented Collaborative Filtering  Challenges  Best Practices
  • 19. The Challenges you will face   Data Sparsity Issue   Cold Start Problem   Curse of Dimensionality   Scalability
  • 20. Data Sparsity Issue   Missing values in the Users / Items matrix. users Not 1 2 3 4 5 6 7 8 9 10 … n Reliable Mostly Unknown 1 2 ∑ userSim(u,v) * rvi items v ∈k−similarUser(u) 3 1 pred(u,i) = 1 ∑ v ∈k−similarUser( u) userSim(u,v) 4 1 1 … m €   NetflixPrize data set: 98.82% of cells are blank   Typical e-commerce txn data set can be 10-100 time more sparse than Netflix Prize data set !!
  • 21. Cold Start Problem   It occurs when new item or new user is added to the data matrix. users 1 2 3 4 5 6 7 8 9 10 … n 1 1 1 1 1 2 1 1 1 items 3 1 1 1 1 1 1 4 1 1 1 1 1 1 … m   RE does not have enough knowledge about this new user or this new item yet.   Content-based REs can be incorporated to alleviate cold start problem.
  • 22. Curse of Dimensionality   Adding more features (items or users) can increase the noise, and hence the error.   There aren’t enough observations to get good estimates. users items
  • 23. Scalability   User neighborhood formation: O(n2) for n users   Item neighborhood formation: O(m2) for m items   When m (# of items) << n (# of users), item-based CF will be more efficient than user-based CF   Ability to update neighborhood incrementally
  • 24. Outline  Introduction  User-oriented Collaborative Filtering  Item-oriented Collaborative Filtering  Challenges  Best Practices
  • 25. Best Practices   Understand the data thoroughly   Define business objectives and conversion metrics judiciously   Understand context and user intent   Apply adaptive reinforcement learning   Optimize RE using cost-based methods   Be aware of data-shift issue   Optimize marketing messages delivered with recommendation results
  • 26. Understand the data thoroughly   What data are available?   E-commerce data set typically contains:   Clickstream   Shopping cart / Saved Items / Wish list / Shared Item   Order / Return   User profile   User ratings   How are these data points being collected?   Is there pre-existing bias in the data? or leakage?   Is the data related to what we want to predict?
  • 27. Define business objectives and conversion metrics judiciously   Definingcorrect conversion metrics can be a competitive advantage. mitigate increase navigation inefficiency # of orders up-sell increase Web more within average property the context item price revenue cross-sell increase within items the context per order Recommendation Engine
  • 28. Understand context and user intent   Context should be considered when RE making recommendation. Month: December Temperature: 45oF
  • 29. Apply adaptive reinforcement learning C1 C2 C3 Incorporating clickstream adaptive reinforcement ∑ j ∈ purchasedItems(u) itemSim(i, j) * ruj pred(u,i) = + W iC i ∑ j ∈ purchasedItems(u) itemSim(i, j) €
  • 30. Optimize RE using cost-based models   Cost-based engine optimization Metrics Parameter
  • 31. Be aware of the data-shift issue   Data collection UI changes will influence data significantly, creating artificial data shifting Netflix prize data set Y. Koren, "Collaborative Filtering with Temporal Dynamics," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD 09), ACM Press, 2009, pp. 447-455.12
  • 32. Optimize marketing message delivered with recommendation result   It's How You Say It Screenshots from Amazon.com
  • 33. RECOMMENDATION ENGINE DEMYSTIFIED Alex Lin Intelligent Mining Email: alin@intelligentmining.com Twitter: DKALab