Developing a Movie recommendation Engine with Spark

•

11 likes•2,585 views

Edureka!

Technology

Slide 2 www.edureka.co/apache-spark-scala-training
At the end of the session, you will be able to know :
 What is a recommendation engine
 Major companies using recommendation engines
 Different approaches to build recommendation engine
 How to build a recommendation engine using Spark and Machine learning library (MLlib)
What are we going to learn today ?

Slide 3 www.edureka.co/apache-spark-scala-training
Transition – Search to Recommendation
We are leaving the era of search and entering one of discovery. What’s the difference?
Search is what you do when you are looking for something. Discovery is when
something wonderful that you didn’t know existed, finds you
CNN Money
The race to create a smart Google

Slide 4 www.edureka.co/apache-spark-scala-training
Recommendations make life easier
Recommendations help user find information, products and
services that user might not have thought of

Slide 5 www.edureka.co/apache-spark-scala-training
Recommendation Approaches
Collaborative filtering
The user will be recommended items that people with similar tastes and preferences liked in the past
Content based
The user will be recommended items similar to the ones that user preferred in that past
Hybrid methods
Users are recommended by combining both collaborative filter and content based approaches

Slide 6 www.edureka.co/apache-spark-scala-training
Lets take a small quiz

Slide 7 www.edureka.co/apache-spark-scala-training
Recommendation Engine at LastFm
Recommended tracks by last.fm
Which approach
last.fm uses to
recommend
Music?

Slide 8 www.edureka.co/apache-spark-scala-training
Recommendation Engine at IMDB
Movie recommendations by IMDB
Which approach
IMDB uses to
recommend
movies ?

Slide 9 www.edureka.co/apache-spark-scala-training
Recommendation Engine at Amazon
Recommended books by Amazon
Which approach
Amazon uses to
recommend
items ?

Slide 10 www.edureka.co/apache-spark-scala-training
Recommendation Engine at Youtube
Recommended Videos by Youtube
Which approach
Youtube uses to
recommend
videos ?

Slide 11 www.edureka.co/apache-spark-scala-training
Recommendation Engine at LinkedIn
Job recommendations by LinkedIn
Which approach
LinkedIn uses to
recommend
jobs?

Slide 12 www.edureka.co/apache-spark-scala-training
Implementing Recommendation Engine
To implement a recommendation engine we will require following :
• Data source – to store historical data e.g. MySQL, MongoDB, HBase etc.
• Spark - low latency computing
• MLlib – library of machine learning algorithms

Slide 13 www.edureka.co/apache-spark-scala-training
High Level Architecture - Recommendation Engine
Data Source Hadoop Spark Application
MLlib
Recommendation Engine Architecture

Slide 14 www.edureka.co/apache-spark-scala-training
Step 1 - Data Source

Slide 15 www.edureka.co/apache-spark-scala-training
Step 2 – Hadoop to the rescue
One of the problem with different types of data sources
is that raw data is not well structured and we need
something which can store data from different data
sources at a single place
Hadoop is the best fit which solves this problem

Slide 16 www.edureka.co/apache-spark-scala-training
Step 3 - Spark
Once we have all the data in place we can
use Spark to do in-memory computation on
the data
Apache Spark is an in-memory cluster
computing system which provides real time
data processing capability.
Note that its possible to build a recommendation engine without using Spark. We can build a recommendation engine
by only using Hadoop but since Hadoop reads and writes to disk not in-memory, which takes extra time. So a
recommendation engine build using only Hadoop will not be a real time.

Slide 17 www.edureka.co/apache-spark-scala-training
Step 4 - MLlib
Spark
MLlibSparkSQL Spark Streaming
Rather than writing the entire recommendation engine
from scratch, we can use very popular MLlib library which
provides machine learning algorithms to build a
recommendation engine

Slide 18 www.edureka.co/apache-spark-scala-training
High Level Architecture - Recommendation Engine
Data Source Hadoop Spark Application
MLlib
Recommendation Engine Architecture

Slide 19 www.edureka.co/apache-spark-scala-training
Lets See a Code Example
Code to build a recommendation engine

Questions
Slide 20 www.edureka.co/apache-spark-scala-training

Slide 21 www.edureka.co/apache-spark-scala-training
References
http://recommender-systems.org/content-based-filtering/
http://archive.fortune.com/magazines/fortune/fortune_archive/2006/11/27/8394347/index.htm
http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html

What's hot

Ansible - Hands on TrainingMehmet Ali Aydın

Building a Streaming Microservice Architecture: with Apache Spark Structured ...Databricks

Personalized Page Generation for Browsing RecommendationsJustin Basilico

Netflix Global Cloud ArchitectureAdrian Cockcroft

Algorithmic Music Recommendations at SpotifyChris Johnson

Netflix talk at ML Platform meetup Sep 2019Faisal Siddiqi

Advanced nGrinder 2nd EditionJunHo Yoon

Introduction to ansibleOmid Vahdaty

Docker 101: Introduction to DockerDocker, Inc.

Ansible presentationJohn Lynch

OVN DBs HA with scale testAliasgar Ginwala

Introduction to Google App EngineChakkrit (Kla) Tantithamthavorn

Android activities & viewsma-polimi

Serverless integration with Knative and Apache Camel on KubernetesClaus Ibsen

Netflix: A State of Xen - Chaos Monkey & CassandraDataStax Academy

OPA APIs and Use Case SurveyTorin Sandall

Android summer training reportShashendra Singh

Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...Edureka!

Tutorial on sequence aware recommender systems - UMAP 2018Paolo Cremonesi

Introduction to Kubernetes and Google Container Engine (GKE)Opsta

What's hot (20)

Ansible - Hands on Training

Building a Streaming Microservice Architecture: with Apache Spark Structured ...

Personalized Page Generation for Browsing Recommendations

Netflix Global Cloud Architecture

Algorithmic Music Recommendations at Spotify

Netflix talk at ML Platform meetup Sep 2019

Advanced nGrinder 2nd Edition

Introduction to ansible

Docker 101: Introduction to Docker

Ansible presentation

OVN DBs HA with scale test

Introduction to Google App Engine

Android activities & views

Serverless integration with Knative and Apache Camel on Kubernetes

Netflix: A State of Xen - Chaos Monkey & Cassandra

OPA APIs and Use Case Survey

Android summer training report

Splunk Architecture | Splunk Tutorial For Beginners | Splunk Training | Splun...

Tutorial on sequence aware recommender systems - UMAP 2018

Introduction to Kubernetes and Google Container Engine (GKE)

Similar to Developing a Movie recommendation Engine with Spark

Spark is going to replace Apache Hadoop! Know Why?Edureka!

End-to-End Data Pipelines with Apache SparkBurak Yavuz

Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi

Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...Lillian Pierson

Spark for big data analyticsEdureka!

Data Engineer's Lunch 90: Migrating SQL Data with ArcionAnant Corporation

Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014gmalouf678

Deep learning and Apache SparkQuantUniversity

Spark1Dr. G. Bharadwaja Kumar

Dec6 meetup spark presentationRamesh Mudunuri

Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupHyderabad Scalability Meetup

Spark Hsinchu meetupYung-An He

Clickstream & Social Media Analysis using Apache SparkTUMRA | Big Data Science - Gain a competitive advantage through Big Data & Data Science

Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15MLconf

Atlanta MLConfQubole

Big Data Processing with Spark and Scala Edureka!

Partner Webinar: Recommendation Engines with MongoDB and HadoopMongoDB

Getting started with SparkSQL - Desert Code Camp 2016clairvoyantllc

Learn Apache Spark: A Comprehensive GuideWhizlabs

Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018Codemotion

Similar to Developing a Movie recommendation Engine with Spark (20)

Spark is going to replace Apache Hadoop! Know Why?

End-to-End Data Pipelines with Apache Spark

Hadoop or Spark: is it an either-or proposition? By Slim Baltagi

Big Data 2.0 - How Spark technologies are reshaping the world of big data ana...

Spark for big data analytics

Data Engineer's Lunch 90: Migrating SQL Data with Arcion

Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014

Deep learning and Apache Spark

Spark1

Dec6 meetup spark presentation

Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup

Spark Hsinchu meetup

Clickstream & Social Media Analysis using Apache Spark

Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15

Atlanta MLConf

Big Data Processing with Spark and Scala

Partner Webinar: Recommendation Engines with MongoDB and Hadoop

Getting started with SparkSQL - Desert Code Camp 2016

Learn Apache Spark: A Comprehensive Guide

Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018

Recently uploaded

unit 4 immunoblotting technique complete.pptxBkGupta21

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

"ML in Production",Oleksandr BaganFwdays

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Recently uploaded (20)

unit 4 immunoblotting technique complete.pptx

Gen AI in Business - Global Trends Report 2024.pdf

Streamlining Python Development: A Guide to a Modern Project Setup

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

DevoxxFR 2024 Reproducible Builds with Apache Maven

SIP trunking in Janus @ Kamailio World 2024

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Take control of your SAP testing with UiPath Test Suite

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

The State of Passkeys with FIDO Alliance.pptx

Connect Wave/ connectwave Pitch Deck Presentation

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

"ML in Production",Oleksandr Bagan

Generative AI for Technical Writer or Information Developers

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Nell’iperspazio con Rocket: il Framework Web di Rust!

What is DBT - The Ultimate Data Build Tool.pdf

Unraveling Multimodality with Large Language Models.pdf

Developing a Movie recommendation Engine with Spark

1. www.edureka.co/apache-spark-scala-training Developing a Movie recommendation engine with Spark

2. Slide 2 www.edureka.co/apache-spark-scala-training At the end of the session, you will be able to know :  What is a recommendation engine  Major companies using recommendation engines  Different approaches to build recommendation engine  How to build a recommendation engine using Spark and Machine learning library (MLlib) What are we going to learn today ?

3. Slide 3 www.edureka.co/apache-spark-scala-training Transition – Search to Recommendation We are leaving the era of search and entering one of discovery. What’s the difference? Search is what you do when you are looking for something. Discovery is when something wonderful that you didn’t know existed, finds you CNN Money The race to create a smart Google

4. Slide 4 www.edureka.co/apache-spark-scala-training Recommendations make life easier Recommendations help user find information, products and services that user might not have thought of

5. Slide 5 www.edureka.co/apache-spark-scala-training Recommendation Approaches Collaborative filtering The user will be recommended items that people with similar tastes and preferences liked in the past Content based The user will be recommended items similar to the ones that user preferred in that past Hybrid methods Users are recommended by combining both collaborative filter and content based approaches

6. Slide 6 www.edureka.co/apache-spark-scala-training Lets take a small quiz

7. Slide 7 www.edureka.co/apache-spark-scala-training Recommendation Engine at LastFm Recommended tracks by last.fm Which approach last.fm uses to recommend Music?

8. Slide 8 www.edureka.co/apache-spark-scala-training Recommendation Engine at IMDB Movie recommendations by IMDB Which approach IMDB uses to recommend movies ?

9. Slide 9 www.edureka.co/apache-spark-scala-training Recommendation Engine at Amazon Recommended books by Amazon Which approach Amazon uses to recommend items ?

10. Slide 10 www.edureka.co/apache-spark-scala-training Recommendation Engine at Youtube Recommended Videos by Youtube Which approach Youtube uses to recommend videos ?

11. Slide 11 www.edureka.co/apache-spark-scala-training Recommendation Engine at LinkedIn Job recommendations by LinkedIn Which approach LinkedIn uses to recommend jobs?

12. Slide 12 www.edureka.co/apache-spark-scala-training Implementing Recommendation Engine To implement a recommendation engine we will require following : • Data source – to store historical data e.g. MySQL, MongoDB, HBase etc. • Spark - low latency computing • MLlib – library of machine learning algorithms

13. Slide 13 www.edureka.co/apache-spark-scala-training High Level Architecture - Recommendation Engine Data Source Hadoop Spark Application MLlib Recommendation Engine Architecture

14. Slide 14 www.edureka.co/apache-spark-scala-training Step 1 - Data Source

15. Slide 15 www.edureka.co/apache-spark-scala-training Step 2 – Hadoop to the rescue One of the problem with different types of data sources is that raw data is not well structured and we need something which can store data from different data sources at a single place Hadoop is the best fit which solves this problem

16. Slide 16 www.edureka.co/apache-spark-scala-training Step 3 - Spark Once we have all the data in place we can use Spark to do in-memory computation on the data Apache Spark is an in-memory cluster computing system which provides real time data processing capability. Note that its possible to build a recommendation engine without using Spark. We can build a recommendation engine by only using Hadoop but since Hadoop reads and writes to disk not in-memory, which takes extra time. So a recommendation engine build using only Hadoop will not be a real time.

17. Slide 17 www.edureka.co/apache-spark-scala-training Step 4 - MLlib Spark MLlibSparkSQL Spark Streaming Rather than writing the entire recommendation engine from scratch, we can use very popular MLlib library which provides machine learning algorithms to build a recommendation engine

18. Slide 18 www.edureka.co/apache-spark-scala-training High Level Architecture - Recommendation Engine Data Source Hadoop Spark Application MLlib Recommendation Engine Architecture

19. Slide 19 www.edureka.co/apache-spark-scala-training Lets See a Code Example Code to build a recommendation engine

20. Questions Slide 20 www.edureka.co/apache-spark-scala-training

21. Slide 21 www.edureka.co/apache-spark-scala-training References http://recommender-systems.org/content-based-filtering/ http://archive.fortune.com/magazines/fortune/fortune_archive/2006/11/27/8394347/index.htm http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html

22. Slide 22 Course Url

Developing a Movie recommendation Engine with Spark

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Developing a Movie recommendation Engine with Spark

Similar to Developing a Movie recommendation Engine with Spark (20)

More from Edureka!

More from Edureka! (20)

Recently uploaded

Recently uploaded (20)

Developing a Movie recommendation Engine with Spark