SlideShare a Scribd company logo
1 of 14
Focused Expertise                     Industries Served

 • Data Warehouse Design              •   Healthcare / Insurance
 • Business Intelligence              •   Financial Services
 • Big Data Analytics                 •   Retail / eCommerce
 • Search / Relevance                 •   Digital Media / Marketing
 • Infographics                       •   K-12 / Higher Education

 445 Park Ave New York, NY | 1-855-755-2246 | info@casertaconcepts.com

Big Data
Analytics
Recommendations
• Your customers expect them
   • Good recommendations make life easier
   • Help them find information, products, and services they might not
     have thought of


• What makes a good recommendation?
  • Relevant but not obvious
  • Sense of “surprise”



      SOLD!!             23” LED TV   24” LED TV     25” LED TV



      23” LED TV``


                           Blu-Ray    Home Theater   HDMI Cables
Where can recommendations
engines be found?
• Applications can be found in a wide variety of industries
 and applications:
  • Travel
  • Service Industry
  • Music/Online radio
  • TV and Video
  • Online Publications
  • Retail
   ..and countless others


   Our Use Case: Movie Ratings!
Our Goal
• Create a powerful, scalable recommendation engine with minimal
 development

• Make recommendations to users as they are browsing movie titles -
 instantaneously

• Recommendation must have context to the movie they are currently
 viewing.
                       OOPS! – too much surprise!
How do we hope to accomplish this?
Hadoop – distributed file system and processing platform
Mahout – collection of machine learning libraries

We will leverage 2 algorithms:
• Item Similarity– how similar is this particular movie to other
  movies based on usage
• Item-Based Recommender – predict an individuals
  preference based on their peers ratings

• Both algorithms only require a simple dataset of 3 fields:
 “User ID” , “Item ID”, “Rating”
Item Similarity – Context, Content Filtering
“People who liked this movie liked these as well”

• Item Similarity builds a matrix of items to other items and calculates
 similarity (based on user rating)

• The most similar item are then output as a list:
  • Item ID, Similar Item ID, Similarity Score
  • Items with the highest score are most similar
  • In this example users who liked “Twelve Monkeys” (7) also like “Fargo” (100)

               7         100        0.690951001800917
               7         50         0.653299445638532
               7         117        0.643701303640083
Item-Base – Peer, Collaborative Filtering
“People with similar taste to you liked these movies”
• Item-Base takes the Item Similarity matrix and weights based on
 “peer” user preference.

• Essentially it determines the best movie critics for you to follow


• The items with the highest recommendation score are then output as tuples
  • User ID [Item ID1:Score,…., Item IDn:Score]
  • Items with the highest recommendation score are the most relevant to this user
  • For user “Johny Sisklebert” (572), the two most highly recommended movies are
     “Seven” and “Donnie Brasco”
572 [11:5.0,293:4.70718,8:4.688335,273:4.687676,427:4.685926,234:4.683155,168:4.669672,89:4.66959,4:4.65515]
573 [487:4.54397,1203:4.5291,616:4.51644,605:4.49344,709:4.3406,502:4.33706,152:4.32263,503:4.20515,432:4.26455,611:4.22019]
574 [1:5.0,902:5.0,546:5.0,13:5.0,534:5.0,533:5.0,531:5.0,1082:5.0,1631:5.0,515:5.0]
Recommendation Store
• Serving recommendations needs to be instantaneous
    We need a database!

• The core to this solution is two reference tables:


    Rec_Item_Similarity          Rec_User_Item_Base
    Item_ID                      User_ID
    Similar_Item                 Item_ID
    Similarity_Score             Recommendation_Score


• When called to make recommendations we query our store
  • Rec_Item_Similarity based on the Item_ID they are viewing
  • Rec_User_Item_Base based on their User_ID
Delivering Recommendations
    So if Johny is viewing “12 Monkeys” we query our
    recommendation store and present the results
       Item Similarity      Raw Score     Score
                                                       Item-Base (Peer)           Raw Score Score
Fargo                             0.691        1.000
                                                       Seven                             5.000      1.000
Star Wars                         0.653        0.946
                                                       Donnie Brasco                     4.707
                                                                                  Item-Based:       0.941
Rock, The                         0.644        0.932
                                                       Babe                              4.688      0.938
Pulp Fiction                      0.628        0.909                             Peers like these
                                                       Heat                              4.688      0.938
Return of the Jedi                0.627        0.908                                 Movies
                                                       To Kill a Mockingbird             4.686      0.937
Independence Day                  0.618        0.894
                                                       Jaws                              4.683      0.937
Willy Wonka                       0.603        0.872
                                                       Monty Python, Holy Grail          4.670      0.934
Mission: Impossible               0.597        0.864                                  Best
                                                       Blade Runner                      4.670      0.934
Silence of the Lambs, The         0.596        0.863
                                                       Get Shorty
                                                                                Recommendations
                                                                                         4.655      0.931
Star Trek: First Contact          0.594        0.859
Raiders of the Lost Ark           0.584        0.845
Terminator, The                   0.574        0.831       Top 10 Recommendations
Blade Runner                      0.571        0.826
Usual Suspects, The               0.569        0.823      Seven (Se7en)                    1.823
Seven (Se7en)                     0.569        0.823      Blade Runner                     1.760
                                                          Fargo                            1.000
                                                          Star Wars                        0.946
                                                          Donnie Brasco                    0.941
                                                          Babe                             0.938
                                                          Heat                             0.938
                                                          To Kill a Mockingbird            0.937
                                                          Jaws                             0.937
                                                          Monty Python, Holy Grail         0.934
From Good to Great Recommendations
• Note that the first 5 recommendations look pretty good
    …but the 6th result would have been “Babe” the children's movie
                                                   OOPS!




• Tuning the algorithms might help: parameter changes, similarity
 measures.

• How else can we make it better?
1. Delivery filters
2. Introduce additional algorithms such as K-Means, or Fuzzy K-Means
Delivery Scoring and Filters
   Apply assumptions to control the results of collaborative filtering
   • One or more categories must match
   • Only children movies will be recommended for children's movies.


                        Action   Adventure Children's Comedy   Crime   Drama   Film-Noir   Horror   Romance   Sci-Fi   Thriller
Twelve Monkeys            0         0         0        0        0       1         0          0        0        1          0
Babe                      0         0         1        1        0       1         0          0        0        0          0
Seven (Se7en)             0         0         0        0        1       1         0          0        0        0          1
Star Wars                 1         1         0        0        0       0         0          0        1        1          0
Blade Runner              0         0         0        0        0       0         1          0        0        1          0
Fargo                     0         0         0        0        1       1         0          0        0        0          1
Willy Wonka               0         1         1        1        0       0         0          0        0        0          0
Monty Python              0         0         0        1        0       0         0          0        0        0          0
Jaws                      1         0         0        0        0       0         0          1        0        0          0
Heat                      1         0         0        0        1       0         0          0        0        0          1
Donnie Brasco             0         0         0        0        1       1         0          0        0        0          0
To Kill a Mockingbird     0         0         0        0        0       1         0          0        0        0          0


     Similarly logic could be applied to promote more favorable options
     • New Releases
     • Retail Case: Items that are on-sale, overstock
Additional Algorithm – K-Means
  “These movies are similar based on their attributes”

 • Treats items as coordinates
 • Places a number of random
   “centroids” and assigns the
   nearest items
 • Moves the centroids around based
   on average location
 • Process repeats until the
   assignments stop changing


We would use the major attributes of the Movie to create coordinate points.
• Categories
• Actors
• Director
• Synopsis Text
Integrating K-Means into the process
Movies recommended by more than 1 algorithm are the most highly rated




                                                K-Means:
                 Item-Based                      Similar




                              Item Similarity

                                                           Best
                                                      Recommendations
Summary
• Mahout and Hadoop can provide a relatively low cost and
 extremely scalable platform for recommendations

• Mahout offers a great library of established Machine Learning
 libraries, reducing development efforts

• A good recommendation system combines Collaborative and
 Content filtering algorithms


             elliott@casertaconcepts.com

More Related Content

More from Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data LakeCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 

More from Caserta (20)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
The Emerging Role of the Data Lake
The Emerging Role of the Data LakeThe Emerging Role of the Data Lake
The Emerging Role of the Data Lake
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 

Recently uploaded

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Focused Expertise: Big Data Analytics Recommendations

  • 1. Focused Expertise Industries Served • Data Warehouse Design • Healthcare / Insurance • Business Intelligence • Financial Services • Big Data Analytics • Retail / eCommerce • Search / Relevance • Digital Media / Marketing • Infographics • K-12 / Higher Education 445 Park Ave New York, NY | 1-855-755-2246 | info@casertaconcepts.com Big Data Analytics
  • 2. Recommendations • Your customers expect them • Good recommendations make life easier • Help them find information, products, and services they might not have thought of • What makes a good recommendation? • Relevant but not obvious • Sense of “surprise” SOLD!! 23” LED TV 24” LED TV 25” LED TV 23” LED TV`` Blu-Ray Home Theater HDMI Cables
  • 3. Where can recommendations engines be found? • Applications can be found in a wide variety of industries and applications: • Travel • Service Industry • Music/Online radio • TV and Video • Online Publications • Retail ..and countless others Our Use Case: Movie Ratings!
  • 4. Our Goal • Create a powerful, scalable recommendation engine with minimal development • Make recommendations to users as they are browsing movie titles - instantaneously • Recommendation must have context to the movie they are currently viewing. OOPS! – too much surprise!
  • 5. How do we hope to accomplish this? Hadoop – distributed file system and processing platform Mahout – collection of machine learning libraries We will leverage 2 algorithms: • Item Similarity– how similar is this particular movie to other movies based on usage • Item-Based Recommender – predict an individuals preference based on their peers ratings • Both algorithms only require a simple dataset of 3 fields: “User ID” , “Item ID”, “Rating”
  • 6. Item Similarity – Context, Content Filtering “People who liked this movie liked these as well” • Item Similarity builds a matrix of items to other items and calculates similarity (based on user rating) • The most similar item are then output as a list: • Item ID, Similar Item ID, Similarity Score • Items with the highest score are most similar • In this example users who liked “Twelve Monkeys” (7) also like “Fargo” (100) 7 100 0.690951001800917 7 50 0.653299445638532 7 117 0.643701303640083
  • 7. Item-Base – Peer, Collaborative Filtering “People with similar taste to you liked these movies” • Item-Base takes the Item Similarity matrix and weights based on “peer” user preference. • Essentially it determines the best movie critics for you to follow • The items with the highest recommendation score are then output as tuples • User ID [Item ID1:Score,…., Item IDn:Score] • Items with the highest recommendation score are the most relevant to this user • For user “Johny Sisklebert” (572), the two most highly recommended movies are “Seven” and “Donnie Brasco” 572 [11:5.0,293:4.70718,8:4.688335,273:4.687676,427:4.685926,234:4.683155,168:4.669672,89:4.66959,4:4.65515] 573 [487:4.54397,1203:4.5291,616:4.51644,605:4.49344,709:4.3406,502:4.33706,152:4.32263,503:4.20515,432:4.26455,611:4.22019] 574 [1:5.0,902:5.0,546:5.0,13:5.0,534:5.0,533:5.0,531:5.0,1082:5.0,1631:5.0,515:5.0]
  • 8. Recommendation Store • Serving recommendations needs to be instantaneous We need a database! • The core to this solution is two reference tables: Rec_Item_Similarity Rec_User_Item_Base Item_ID User_ID Similar_Item Item_ID Similarity_Score Recommendation_Score • When called to make recommendations we query our store • Rec_Item_Similarity based on the Item_ID they are viewing • Rec_User_Item_Base based on their User_ID
  • 9. Delivering Recommendations So if Johny is viewing “12 Monkeys” we query our recommendation store and present the results Item Similarity Raw Score Score Item-Base (Peer) Raw Score Score Fargo 0.691 1.000 Seven 5.000 1.000 Star Wars 0.653 0.946 Donnie Brasco 4.707 Item-Based: 0.941 Rock, The 0.644 0.932 Babe 4.688 0.938 Pulp Fiction 0.628 0.909 Peers like these Heat 4.688 0.938 Return of the Jedi 0.627 0.908 Movies To Kill a Mockingbird 4.686 0.937 Independence Day 0.618 0.894 Jaws 4.683 0.937 Willy Wonka 0.603 0.872 Monty Python, Holy Grail 4.670 0.934 Mission: Impossible 0.597 0.864 Best Blade Runner 4.670 0.934 Silence of the Lambs, The 0.596 0.863 Get Shorty Recommendations 4.655 0.931 Star Trek: First Contact 0.594 0.859 Raiders of the Lost Ark 0.584 0.845 Terminator, The 0.574 0.831 Top 10 Recommendations Blade Runner 0.571 0.826 Usual Suspects, The 0.569 0.823 Seven (Se7en) 1.823 Seven (Se7en) 0.569 0.823 Blade Runner 1.760 Fargo 1.000 Star Wars 0.946 Donnie Brasco 0.941 Babe 0.938 Heat 0.938 To Kill a Mockingbird 0.937 Jaws 0.937 Monty Python, Holy Grail 0.934
  • 10. From Good to Great Recommendations • Note that the first 5 recommendations look pretty good …but the 6th result would have been “Babe” the children's movie OOPS! • Tuning the algorithms might help: parameter changes, similarity measures. • How else can we make it better? 1. Delivery filters 2. Introduce additional algorithms such as K-Means, or Fuzzy K-Means
  • 11. Delivery Scoring and Filters Apply assumptions to control the results of collaborative filtering • One or more categories must match • Only children movies will be recommended for children's movies. Action Adventure Children's Comedy Crime Drama Film-Noir Horror Romance Sci-Fi Thriller Twelve Monkeys 0 0 0 0 0 1 0 0 0 1 0 Babe 0 0 1 1 0 1 0 0 0 0 0 Seven (Se7en) 0 0 0 0 1 1 0 0 0 0 1 Star Wars 1 1 0 0 0 0 0 0 1 1 0 Blade Runner 0 0 0 0 0 0 1 0 0 1 0 Fargo 0 0 0 0 1 1 0 0 0 0 1 Willy Wonka 0 1 1 1 0 0 0 0 0 0 0 Monty Python 0 0 0 1 0 0 0 0 0 0 0 Jaws 1 0 0 0 0 0 0 1 0 0 0 Heat 1 0 0 0 1 0 0 0 0 0 1 Donnie Brasco 0 0 0 0 1 1 0 0 0 0 0 To Kill a Mockingbird 0 0 0 0 0 1 0 0 0 0 0 Similarly logic could be applied to promote more favorable options • New Releases • Retail Case: Items that are on-sale, overstock
  • 12. Additional Algorithm – K-Means “These movies are similar based on their attributes” • Treats items as coordinates • Places a number of random “centroids” and assigns the nearest items • Moves the centroids around based on average location • Process repeats until the assignments stop changing We would use the major attributes of the Movie to create coordinate points. • Categories • Actors • Director • Synopsis Text
  • 13. Integrating K-Means into the process Movies recommended by more than 1 algorithm are the most highly rated K-Means: Item-Based Similar Item Similarity Best Recommendations
  • 14. Summary • Mahout and Hadoop can provide a relatively low cost and extremely scalable platform for recommendations • Mahout offers a great library of established Machine Learning libraries, reducing development efforts • A good recommendation system combines Collaborative and Content filtering algorithms elliott@casertaconcepts.com