SlideShare a Scribd company logo
1 of 117
Download to read offline
Apache Mahout – Tutorial (2015)
Cataldo Musto, Ph.D.
Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale
Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2014/2015
07/01/2015 1
Outline
• What is Mahout ?
– Overview
• How to use Mahout ?
– Hands-on session
2
Part 1
What is Mahout?
3
What is (a) Mahout?
4
an elephant driver
• Mahout is a Java library
– Implementing Machine Learning techniques
5
What is (a) Mahout?
• Mahout is a Java library
– Implementing Machine Learning techniques
• Clustering
• Classification
• Recommendation
• Frequent ItemSet
6
What is (a) Mahout?
• Mahout is a Java library
– Implementing Machine Learning techniques
• Clustering
• Classification
• Recommendation
• Frequent ItemSet (removed)
7
What is (a) Mahout?
What can we do?
• Currently Mahout supports mainly four use
cases:
– Recommendation - takes users' behavior and tries
to find items users might like.
– Clustering - takes e.g. text documents and groups
them into groups of topically related documents.
– Classification - learns from existing categorized
documents what documents of a specific category
look like and is able to assign unlabelled
documents to the (hopefully) correct category.
8
Why Mahout?
• Mahout is not the only ML framework
– Weka
– R (http://www.r-project.org/)
• Why do we prefer Mahout?
(http://www.cs.waikato.ac.nz/ml/weka/)
9
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation
10
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation
–Scalable
11
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation
–Scalable
• Based on Hadoop (not mandatory!)
12
Why do we need a scalable framework?
Big Data!
13
When do we need a scalable framework?
(e.g. Recommendation Task)
Over 100m
user-preferences connections
14
http://mahout.apache.org/users/recommende
r/recommender-first-timer-faq.html
Use Cases
15
Use Cases
Recommendation Engine on Foursquare 16
Use Cases
User Interest Modeling on Twitter 17
Use Cases
Pattern Mining on Yahoo! (as anti-spam) 18
Algorithms
• Recommendation
– User-based Collaborative Filtering
– Item-based Collaborative Filtering
– Matrix Factorization-based CF
• Several factorization techniques
19
Algorithms
• Clustering
– K-Means
– Fuzzy K-Means
– Streaming K-Means
– etc.
• Topic Modeling
– LDA (Latent Dirichlet Allocation)
20
Algorithms
• Classification
– Logistic Regression
– Bayes (only on Hadoop, from release 0.9)
– Random Forests (only on Hadoop, from release
0.9)
– Hidden Markov Models
– Perceptrons
21
Algorithms
• Other
– Text processing
• Creation of sparse vectors from text
– Dimensionality Reduction techniques
• Principal Component Analysis (PCA)
• Singular Value Decomposition (SVD)
– (and much more.)
22
Mahout in the
Apache Software Foundation
23
Mahout in the
Apache Software Foundation
Original Mahout Project
24
Mahout in the
Apache Software Foundation
Taste: collaborative filtering framework
25
Mahout in the
Apache Software Foundation
Lucene: information retrieval software library
26
Mahout in the
Apache Software Foundation
Hadoop: framework for distributed storage and programming based on MapReduce
27
Next Releases: Watch out!
Hadoop is going to be replaced by Apache Spark
28
X
http://spark.apache.org
General Architecture
Three-tiers architecture
(Application, Algorithms and Shared Libraries)
29
General Architecture
Business Logic
30
General Architecture
Data Storage and Shared Libraries
31
General Architecture
External Applications invoking Mahout APIs
32
In this tutorial we will focus on
Recommendation
33
Recommendation
• Mahout implements a Collaborative Filtering
framework
– Uses historical data (ratings, clicks, and purchases) to
provide recommendations
• User-based: recommend items by finding similar users. This is
often harder to scale because of the dynamic nature of users;
• Item-based: calculate similarity between items and make
recommendations. Items usually don't change much, so this often
can be computed offline;
– Popularized by Amazon and others
• Matrix factorization-based: split the original user-item matrix in
smaller matrices in order to analyze item rating patterns and learn
some latent factors explaining users’ behavior and items
characteristics.
– Popularized by Netflix Prize
34
Recommendation - Architecture
35
Recommendation Workflow
Inceptive Idea:
A Java/J2EE application
invokes a Mahout
Recommender whose
DataModel is based on a set
of User Preferences that are
built on the ground of a
physical DataStore
36
Physical Storage (database, files, etc.)
37
Recommendation Workflow
Physical Storage (database, files, etc.)
Data Model
38
Recommendation Workflow
Physical Storage (database, files, etc.)
Data Model
Recommender
39
Recommendation Workflow
External Application
Physical Storage (database, files, etc.)
Data Model
Recommender
40
Recommendation Workflow
Recommendation in Mahout
• Input: raw data (user preferences)
• Output: preferences estimation
• Step 1
– Mapping raw data into a DataModel Mahout-compliant
• Step 2
– Tuning recommender components
• Similarity measure, neighborhood, etc.
• Step 3
– Computing rating estimations
• Step 4
– Evaluating recommendation
41
Recommendation Components
• Mahout key abstractions are implemented through
Java interfaces :
– DataModel interface
• Methods for mapping raw data to a Mahout-compliant form
– UserSimilarity interface
• Methods to calculate the degree of correlation between two users
– ItemSimilarity interface
• Methods to calculate the degree of correlation between two items
– UserNeighborhood interface
• Methods to define the concept of ‘neighborhood’
– Recommender interface
• Methods to implement the recommendation step itself
42
Recommendation Components
• Mahout key abstractions are implemented
through Java interfaces :
– example: DataModel interface
• Each methods for mapping raw data to a Mahout-
compliant form is an implementation of the generic
interface
• e.g. MySQLJDBCDataModel feeds a DataModel from a
MySQL database
• (and so on)
43
Components: DataModel
• A DataModel is the interface to draw information about user
preferences.
• Which sources is it possible to draw?
– Database
• MySQLJDBCDataModel, PostgreSQLDataModel
• NoSQL databases supported: MongoDBDataModel, CassandraDataModel
– External Files
• FileDataModel
– Generic (preferences directly feed through Java code)
• GenericDataModel
(They are all implementations of the DataModel interface)
44
• GenericDataModel
– Feed through Java calls
• FileDataModel
– CSV (Comma Separated Values)
• JDBCDataModel
– JDBC Driver
– Standard database structure
Components: DataModel
45
FileDataModel – CSV input
46
Components: DataModel
• Regardless the source, they all share a
common implementation.
• Basic object: Preference
– Preference is a triple (user,item,score)
– Stored in UserPreferenceArray
47
Components: DataModel
• Basic object: Preference
– Preference is a triple (user,item,score)
– Stored in UserPreferenceArray
• Two implementations
– GenericUserPreferenceArray
• It stores numerical preference, as well.
– BooleanUserPreferenceArray
• It skips numerical preference values.
48
Components: UserSimilarity
• UserSimilarity defines a notion of similarity between two
Users.
– (respectively) ItemSimilarity defines a notion of similarity
between two Items.
• Which definition of similarity are available?
– Pearson Correlation
– Spearman Correlation
– Euclidean Distance
– Tanimoto Coefficient
– LogLikelihood Similarity
– Already implemented!
49
Example: TanimotoDistance
50
Example: CosineSimilarity
51
Different Similarity definitions
influence neighborhood formation
52
Pearson’s vs. Euclidean distance
53
Pearson’s vs. Euclidean distance
54
Pearson’s vs. Euclidean distance
55
Components: UserNeighborhood
• Which definition of neighborhood are available?
– Nearest N users
• The first N users with the highest similarity are labeled as ‘neighbors’
– Thresholds
• Users whose similarity is above a threshold are labeled as ‘neighbors’
– Already implemented!
56
Components: Recommender
• Given a DataModel, a definition of similarity between
users (items) and a definition of neighborhood, a
recommender produces as output an estimation of
relevance for each unseen item
• Which recommendation algorithms are implemented?
– User-based CF
– Item-based CF
– SVD-based CF
(and much more…)
57
Recap
• Many implementations of a CF-based recommender!
– Different recommendation algorithms
– Different neighborhood definitions
– Different similarity definitions
• Evaluation fo the different implementations is actually
very time-consuming
– The strength of Mahout lies in that it is possible to save
time in the evaluation of the different combinations of the
parameters!
– Standard interface for the evaluation of a Recommender
System
58
Evaluation
• Mahout provides classes for the evaluation of
a recommender system
– Prediction-based measures
• Mean Average Error
• RMSE (Root Mean Square Error)
– IR-based measures
• Precision, Recall, F1-measure, F1@n
• NDCG (ranking measure)
59
Evaluation
• Prediction-based Measures
– Class: AverageAbsoluteDifferenceEvaluator
– Method: evaluate()
– Parameters:
• Recommender implementation
• DataModel implementation
• TrainingSet size (e.g. 70%)
• % of the data to use in the evaluation (smaller % for
fast prototyping)
60
Evaluation
• IR-based Measures
– Class: GenericRecommenderIRStatsEvaluator
– Method: evaluate()
– Parameters:
• Recommender implementation
• DataModel implementation
• Relevance Threshold (mean+standard deviation)
• % of the data to use in the evaluation (smaller % for
fast prototyping)
61
Part 2
How to use Mahout?
Hands-on
62
Download Mahout
• Download
– The latest Mahout release is 0.9
– Available at:
http://archive.apache.org/dist/mahout/0.9/mahout-
distribution-0.9.zip
– Extract all the libraries and include them in a new
NetBeans (Eclipse) project
• Requirement: Java 1.6.x or greater.
• Hadoop is not mandatory!
63
Important
JavaDoc
https://builds.apache.org/job/Mahout-Quality/javadoc/
64
Exercise 1
• Create a Preference object
• Set preferences through some simple Java call
• Print some statistics about preferences
– How many preferences? On which items?
– Wheter a user has expressed preference on a
certain item.
– Which one is the item with the highest score?
65
Hints
• Hints about objects to be used:
– Preference
• Methods setUserId, setItemId, setValue;
– GenericUserPreferenceArray
• Dimension = number of preferences to be defined;
• Methods: getIds(),
sortByValueReversed(),hasPrefWithItemId(id);
66
Exercise 1: preferences
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.cf.taste.model.PreferenceArray;
class CreatePreferenceArray {
private CreatePreferenceArray() {
}
public static void main(String[] args) {
PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);
user1Prefs.setUserID(0, 1L);
user1Prefs.setItemID(0, 101L);
user1Prefs.setValue(0, 2.0f);
user1Prefs.setItemID(1, 102L);
user1Prefs.setValue(1, 3.0f);
Preference pref = user1Prefs.get(1);
System.out.println(pref);
}
} 67
Exercise 1: preferences
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.cf.taste.model.PreferenceArray;
class CreatePreferenceArray {
private CreatePreferenceArray() {
}
public static void main(String[] args) {
PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);
user1Prefs.setUserID(0, 1L);
user1Prefs.setItemID(0, 101L);
user1Prefs.setValue(0, 2.0f);
user1Prefs.setItemID(1, 102L);
user1Prefs.setValue(1, 3.0f);
Preference pref = user1Prefs.get(1);
System.out.println(pref);
}
}
Score 2 for Item 101
68
Exercise 2
• Create a DataModel
• Feed the DataModel through some simple
Java calls
• Print some statistics about data (how many
users, how many items, maximum ratings,
etc.)
69
Exercise 2
• Hints about objects to be used:
– FastByIdMap
• PreferenceArray stores the preferences of a single user
• Where do the preferences of all the users are stored?
– An HashMap? No.
– Mahout introduces data structures optimized for
recommendation tasks
– HashMap are replaced by FastByIDMap
– Model
• Methods: getNumItems(),
getNumUsers(),getMaxPreference()
• General statistics about the model.
70
Exercise 2: data model
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.model.GenericDataModel;
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;
class CreateGenericDataModel {
private CreateGenericDataModel() {
}
public static void main(String[] args) {
FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>();
PreferenceArray prefsForUser1 = new GenericUserPreferenceArray(10);
prefsForUser1.setUserID(0, 1L);
prefsForUser1.setItemID(0, 101L);
prefsForUser1.setValue(0, 3.0f);
prefsForUser1.setItemID(1, 102L);
prefsForUser1.setValue(1, 4.5f);
preferences.put(1L, prefsForUser1);
DataModel model = new GenericDataModel(preferences);
System.out.println(model);
}
}
71
Exercise 3
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!
72
Exercise 3
• Hints about objects to be used:
– FileDataModel
• Argument: new File with the path of the CSV
– PearsonCorrelationSimilarity,
TanimotoCoefficientSimilarity, etc.
• Argument: the model
73
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;
class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
}
74
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;
class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
} FileDataModel
75
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;
class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
} Similarity Definitions
76
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;
class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
} Output
77
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!
• Generate neighboorhood
• Generate recommendations
78
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!
• Generate neighboorhood
• Generate recommendations
– Compare different combinations of parameters!
79
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!
• Generate neighboorhood
• Generate recommendations
– Compare different combinations of parameters!
80
Exercise 4
• Hints about objects to be used:
– NearestNUserNeighborhood
– GenericUserBasedRecommender
• Parameters:
– data model  already shown
– Neighborhood
» Class: NearestNUserNeighborhood(n,similarity,model)
» Class: ThresholdUserNeighborhood(thr,similarity,model)
– similarity measure  already shown
81
Exercise 4: First Recommender
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
} 82
Exercise 4: First Recommender
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
FileDataModel
83
Exercise 4: First Recommender
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
2 neighbours
84
Exercise 4: First Recommender
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
Top-1 Recommendation
for User 1 85
• Download the GroupLens dataset (100k)
– Its format is already Mahout compliant
– http://files.grouplens.org/datasets/movielens/ml-
100k.zip
• Preparatory Exercise: repeat exercise 3 (similarity
calculations) with a bigger dataset
• Next: now we can run the recommendation
framework against a state-of-the-art dataset
Exercise 5: MovieLens Recommender
86
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("ua.base"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 20);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
Exercise 5: MovieLens Recommender
87
import org.apache.mahout.cf.taste.impl.model.file.*;
import org.apache.mahout.cf.taste.impl.neighborhood.*;
import org.apache.mahout.cf.taste.impl.recommender.*;
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.model.*;
import org.apache.mahout.cf.taste.neighborhood.*;
import org.apache.mahout.cf.taste.recommender.*;
import org.apache.mahout.cf.taste.similarity.*;
class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("ua.base"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(10, 50);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
Exercise 5: MovieLens Recommender
We can play with
parameters! 88
Exercise 5: MovieLens Recommender
• Analyze Recommender behavior with different
combinations of parameters
– Do the recommendations change with a different
similarity measure?
– Do the recommendations change with different
neighborhood sizes?
– Which one is the best one?
• …. Let’s go to the next exercise!
89
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: RMSE, MAE, Precision
Exercise 6: Recommender Evaluation
90
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: RMSE, MAE
• Hints: useful classes
– Implementations of RecommenderEvaluator
interface
• AverageAbsoluteDifferenceRecommenderEvaluator
• RMSRecommenderEvaluator
Exercise 6: Recommender Evaluation
91
• Further Hints:
– Use RandomUtils.useTestSeed()to ensure
the consistency among different evaluation runs
– Invoke the evaluate() method
• Parameters
– RecommenderBuilder: recommender instance (as in previous
exercises.
– DataModelBuilder: specific criterion for training
– Split Training-Test: double value (e.g. 0.7 for 70%)
– Amount of data to use in the evaluation: double value (e.g 1.0
for 100%)
Exercise 6: Recommender Evaluation
92
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
Ensures the consistency between
different evaluation runs.
93
Exercise 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
94
Exercise 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
70%training
(whole dataset evaluation)
95
Exercise 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
Recommendation Engine
96
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
We can add more measures
97
Exercise 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse);
}
} 98
Exercise 7: item-based recommender
• Mahout provides Java classes for building an
item-based recommender system
– Amazon-like
– Recommendations are based on similarities
among items (generally pre-computed offline)
– Evaluate it with the MovieLens dataset!
99
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
return new GenericItemBasedRecommender(model, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
}
Exercise 7: item-based recommender
100
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
return new GenericItemBasedRecommender(model, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
}
ItemSimilarity
Example 7: item-based recommender
101
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
return new GenericItemBasedRecommender(model, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
}
No Neighborhood definition for item-
based recommenders
Example 7: item-based recommender
102
Exercise 8: MF-based recommender
• Mahout provides Java classes for building an a CF
recommender system based on state-of-the-art
matrix factorization techniques
– Class: SVDRecommender
• Parameters: DataModel, Factorizer
• Factorizer: a factorization algorithm
– Alternating Least Squares (ALSWRFactorizer)
– SVD++ (SVDPlusPlusFactorizer)
– Stochastic Gradient Descent (ParallelSGDFactorizer)… etc
– Several parameters to tune!
– Evaluate it with the MovieLens dataset!
103
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ALSWRFactorizer = new ALSWRFactorizer(model, 10, 0.065, 60);
return new SVDRecommender(model, factorizer);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
}
104
Exercise 8: MF-based recommender
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ALSWRFactorizer = new ALSWRFactorizer(model, 10, 0.065, 60);
return new SVDRecommender(model, factorizer);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse); }
}
105
Exercise 8: MF-based recommender
Hyperparameters: latent factors,
lambda, iterations
Mahout Strengths
• Fast-prototyping and evaluation
– To evaluate a different configuration of the same
algorithm we just need to update a parameter and
run again.
– Example
• Different Neighborhood Size
• Different similarity measures, etc.
106
5 minutes to look for the
best configuration 
107
• Evaluation of CF algorithms through IR
measures
• Metrics: Precision, Recall
Exercise 9: Recommender Evaluation
108
• Evaluation of CF algorithms through IR
measures
• Metrics: Precision, Recall
• Hints: useful classes
– GenericRecommenderIRStatsEvaluator
– Evaluate() method
• Same parameters of exercise 6 and 7
Exercise 9: Recommender Evaluation
109
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}
Exercise 9: IR-based evaluation
Precision@5 , Recall@5, etc.
110
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}
Exercise 9: IR-based evaluation
Precision@5 , Recall@5, etc.
111
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}
Exercise 9: IR-based evaluation
Set Neighborhood to 500
112
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new EuclideanDistanceSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}
Exercise 9: IR-based evaluation
Set Euclidean Distance
113
• Write a class that automatically runs
evaluation with different parameters
– e.g. fixed neighborhood sizes from an Array of
values
– Print the best scores and the configuration
Exercise 10: Recommender Evaluation
114
• Find the best configuration for several
datasets
– Download datasets from
http://mahout.apache.org/users/basics/collections.html
–Write classes to transform input data in a
Mahout-compliant form
–Extend exercise 10!
Exercise 11: more datasets!
115
End. Do you want more?
116
Do you want more?
• Recommendation
– Deploy of a Mahout-based Web Recommender
– Integration with Hadoop/Spark
– Integration of content-based information
– Custom similarities, Custom recommenders, Re-
scoring functions
• Content-based Recommender Systems
through Classification Algorithms!
117

More Related Content

What's hot

Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Mauryasuraj98
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithmPradip Kumar
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahoutGaurav Kasliwal
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorizationLuis Serrano
 
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Simplilearn
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
Mô hình hóa dữ liệu mức quan niệm
Mô hình hóa dữ liệu mức quan niệm Mô hình hóa dữ liệu mức quan niệm
Mô hình hóa dữ liệu mức quan niệm nataliej4
 
Giáo Trình Java Cơ Bản ( Vietnamese)
Giáo Trình Java Cơ Bản ( Vietnamese)Giáo Trình Java Cơ Bản ( Vietnamese)
Giáo Trình Java Cơ Bản ( Vietnamese)Đông Lương
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systemsAravindharamanan S
 
Báo cáo bài tập lớn Website tin tức bằng PHP
Báo cáo bài tập lớn Website tin tức bằng PHPBáo cáo bài tập lớn Website tin tức bằng PHP
Báo cáo bài tập lớn Website tin tức bằng PHPMinh Chiến
 
phân tích thiết kế hệ thống thông tin
phân tích thiết kế hệ thống thông tinphân tích thiết kế hệ thống thông tin
phân tích thiết kế hệ thống thông tinQuynh michelanh quynh
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache MahoutAman Adhikari
 
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...Nguyen Cao
 

What's hot (20)

Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system Movie recommendation system using collaborative filtering system
Movie recommendation system using collaborative filtering system
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Fp growth algorithm
Fp growth algorithmFp growth algorithm
Fp growth algorithm
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Đề tài: Tra cứu ảnh dựa trên nội dung sử dụng nhiều đặc trưng, HAY
Đề tài: Tra cứu ảnh dựa trên nội dung sử dụng nhiều đặc trưng, HAYĐề tài: Tra cứu ảnh dựa trên nội dung sử dụng nhiều đặc trưng, HAY
Đề tài: Tra cứu ảnh dựa trên nội dung sử dụng nhiều đặc trưng, HAY
 
Matrix factorization
Matrix factorizationMatrix factorization
Matrix factorization
 
Luận văn: Một thế thống tra cứu ảnh dựa trên nội dung, HAY
Luận văn: Một thế thống tra cứu ảnh dựa trên nội dung, HAYLuận văn: Một thế thống tra cứu ảnh dựa trên nội dung, HAY
Luận văn: Một thế thống tra cứu ảnh dựa trên nội dung, HAY
 
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
Mô hình hóa dữ liệu mức quan niệm
Mô hình hóa dữ liệu mức quan niệm Mô hình hóa dữ liệu mức quan niệm
Mô hình hóa dữ liệu mức quan niệm
 
Giáo Trình Java Cơ Bản ( Vietnamese)
Giáo Trình Java Cơ Bản ( Vietnamese)Giáo Trình Java Cơ Bản ( Vietnamese)
Giáo Trình Java Cơ Bản ( Vietnamese)
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
Báo cáo bài tập lớn Website tin tức bằng PHP
Báo cáo bài tập lớn Website tin tức bằng PHPBáo cáo bài tập lớn Website tin tức bằng PHP
Báo cáo bài tập lớn Website tin tức bằng PHP
 
phân tích thiết kế hệ thống thông tin
phân tích thiết kế hệ thống thông tinphân tích thiết kế hệ thống thông tin
phân tích thiết kế hệ thống thông tin
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
 
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
Recommendation Systems: Applying Amazon's Collaborative Filtering Methods to ...
 
Hệ Cơ Sở Dữ Liệu Đa Phương Tiện PTIT
Hệ Cơ Sở Dữ Liệu Đa Phương Tiện PTITHệ Cơ Sở Dữ Liệu Đa Phương Tiện PTIT
Hệ Cơ Sở Dữ Liệu Đa Phương Tiện PTIT
 

Viewers also liked

Introduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache MahoutIntroduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache Mahoutsscdotopen
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutTed Dunning
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionVarad Meru
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle CompetitionsDataRobot
 

Viewers also liked (8)

Introduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache MahoutIntroduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache Mahout
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 

Similar to Mahout Tutorial and Hands-on (version 2015)

Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsMaya Hristakeva
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engineKeeyong Han
 
Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用James Chen
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Apache Mahout
Apache MahoutApache Mahout
Apache MahoutAjit Koti
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender systemKaren Li
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Sonya Liberman
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.ASHISH JAGTAP
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OSri Ambati
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark MLAhmet Bulut
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?Axel de Romblay
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 

Similar to Mahout Tutorial and Hands-on (version 2015) (20)

Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research Recommendations
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
Machine Learning & Apache Mahout
Machine Learning & Apache MahoutMachine Learning & Apache Mahout
Machine Learning & Apache Mahout
 
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
An Analytics Platform for Connected Vehicles
An Analytics Platform for Connected VehiclesAn Analytics Platform for Connected Vehicles
An Analytics Platform for Connected Vehicles
 

More from Cataldo Musto

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...Cataldo Musto
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationCataldo Musto
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Cataldo Musto
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Cataldo Musto
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Cataldo Musto
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsCataldo Musto
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeCataldo Musto
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemCataldo Musto
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Cataldo Musto
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...Cataldo Musto
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfCataldo Musto
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Cataldo Musto
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesCataldo Musto
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsCataldo Musto
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?Cataldo Musto
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Cataldo Musto
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkCataldo Musto
 

More from Cataldo Musto (20)

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender System
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart Cities
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social Network
 

Recently uploaded

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Mahout Tutorial and Hands-on (version 2015)

  • 1. Apache Mahout – Tutorial (2015) Cataldo Musto, Ph.D. Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2014/2015 07/01/2015 1
  • 2. Outline • What is Mahout ? – Overview • How to use Mahout ? – Hands-on session 2
  • 3. Part 1 What is Mahout? 3
  • 4. What is (a) Mahout? 4 an elephant driver
  • 5. • Mahout is a Java library – Implementing Machine Learning techniques 5 What is (a) Mahout?
  • 6. • Mahout is a Java library – Implementing Machine Learning techniques • Clustering • Classification • Recommendation • Frequent ItemSet 6 What is (a) Mahout?
  • 7. • Mahout is a Java library – Implementing Machine Learning techniques • Clustering • Classification • Recommendation • Frequent ItemSet (removed) 7 What is (a) Mahout?
  • 8. What can we do? • Currently Mahout supports mainly four use cases: – Recommendation - takes users' behavior and tries to find items users might like. – Clustering - takes e.g. text documents and groups them into groups of topically related documents. – Classification - learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. 8
  • 9. Why Mahout? • Mahout is not the only ML framework – Weka – R (http://www.r-project.org/) • Why do we prefer Mahout? (http://www.cs.waikato.ac.nz/ml/weka/) 9
  • 10. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation 10
  • 11. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation –Scalable 11
  • 12. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation –Scalable • Based on Hadoop (not mandatory!) 12
  • 13. Why do we need a scalable framework? Big Data! 13
  • 14. When do we need a scalable framework? (e.g. Recommendation Task) Over 100m user-preferences connections 14 http://mahout.apache.org/users/recommende r/recommender-first-timer-faq.html
  • 17. Use Cases User Interest Modeling on Twitter 17
  • 18. Use Cases Pattern Mining on Yahoo! (as anti-spam) 18
  • 19. Algorithms • Recommendation – User-based Collaborative Filtering – Item-based Collaborative Filtering – Matrix Factorization-based CF • Several factorization techniques 19
  • 20. Algorithms • Clustering – K-Means – Fuzzy K-Means – Streaming K-Means – etc. • Topic Modeling – LDA (Latent Dirichlet Allocation) 20
  • 21. Algorithms • Classification – Logistic Regression – Bayes (only on Hadoop, from release 0.9) – Random Forests (only on Hadoop, from release 0.9) – Hidden Markov Models – Perceptrons 21
  • 22. Algorithms • Other – Text processing • Creation of sparse vectors from text – Dimensionality Reduction techniques • Principal Component Analysis (PCA) • Singular Value Decomposition (SVD) – (and much more.) 22
  • 23. Mahout in the Apache Software Foundation 23
  • 24. Mahout in the Apache Software Foundation Original Mahout Project 24
  • 25. Mahout in the Apache Software Foundation Taste: collaborative filtering framework 25
  • 26. Mahout in the Apache Software Foundation Lucene: information retrieval software library 26
  • 27. Mahout in the Apache Software Foundation Hadoop: framework for distributed storage and programming based on MapReduce 27
  • 28. Next Releases: Watch out! Hadoop is going to be replaced by Apache Spark 28 X http://spark.apache.org
  • 29. General Architecture Three-tiers architecture (Application, Algorithms and Shared Libraries) 29
  • 31. General Architecture Data Storage and Shared Libraries 31
  • 33. In this tutorial we will focus on Recommendation 33
  • 34. Recommendation • Mahout implements a Collaborative Filtering framework – Uses historical data (ratings, clicks, and purchases) to provide recommendations • User-based: recommend items by finding similar users. This is often harder to scale because of the dynamic nature of users; • Item-based: calculate similarity between items and make recommendations. Items usually don't change much, so this often can be computed offline; – Popularized by Amazon and others • Matrix factorization-based: split the original user-item matrix in smaller matrices in order to analyze item rating patterns and learn some latent factors explaining users’ behavior and items characteristics. – Popularized by Netflix Prize 34
  • 36. Recommendation Workflow Inceptive Idea: A Java/J2EE application invokes a Mahout Recommender whose DataModel is based on a set of User Preferences that are built on the ground of a physical DataStore 36
  • 37. Physical Storage (database, files, etc.) 37 Recommendation Workflow
  • 38. Physical Storage (database, files, etc.) Data Model 38 Recommendation Workflow
  • 39. Physical Storage (database, files, etc.) Data Model Recommender 39 Recommendation Workflow
  • 40. External Application Physical Storage (database, files, etc.) Data Model Recommender 40 Recommendation Workflow
  • 41. Recommendation in Mahout • Input: raw data (user preferences) • Output: preferences estimation • Step 1 – Mapping raw data into a DataModel Mahout-compliant • Step 2 – Tuning recommender components • Similarity measure, neighborhood, etc. • Step 3 – Computing rating estimations • Step 4 – Evaluating recommendation 41
  • 42. Recommendation Components • Mahout key abstractions are implemented through Java interfaces : – DataModel interface • Methods for mapping raw data to a Mahout-compliant form – UserSimilarity interface • Methods to calculate the degree of correlation between two users – ItemSimilarity interface • Methods to calculate the degree of correlation between two items – UserNeighborhood interface • Methods to define the concept of ‘neighborhood’ – Recommender interface • Methods to implement the recommendation step itself 42
  • 43. Recommendation Components • Mahout key abstractions are implemented through Java interfaces : – example: DataModel interface • Each methods for mapping raw data to a Mahout- compliant form is an implementation of the generic interface • e.g. MySQLJDBCDataModel feeds a DataModel from a MySQL database • (and so on) 43
  • 44. Components: DataModel • A DataModel is the interface to draw information about user preferences. • Which sources is it possible to draw? – Database • MySQLJDBCDataModel, PostgreSQLDataModel • NoSQL databases supported: MongoDBDataModel, CassandraDataModel – External Files • FileDataModel – Generic (preferences directly feed through Java code) • GenericDataModel (They are all implementations of the DataModel interface) 44
  • 45. • GenericDataModel – Feed through Java calls • FileDataModel – CSV (Comma Separated Values) • JDBCDataModel – JDBC Driver – Standard database structure Components: DataModel 45
  • 47. Components: DataModel • Regardless the source, they all share a common implementation. • Basic object: Preference – Preference is a triple (user,item,score) – Stored in UserPreferenceArray 47
  • 48. Components: DataModel • Basic object: Preference – Preference is a triple (user,item,score) – Stored in UserPreferenceArray • Two implementations – GenericUserPreferenceArray • It stores numerical preference, as well. – BooleanUserPreferenceArray • It skips numerical preference values. 48
  • 49. Components: UserSimilarity • UserSimilarity defines a notion of similarity between two Users. – (respectively) ItemSimilarity defines a notion of similarity between two Items. • Which definition of similarity are available? – Pearson Correlation – Spearman Correlation – Euclidean Distance – Tanimoto Coefficient – LogLikelihood Similarity – Already implemented! 49
  • 52. Different Similarity definitions influence neighborhood formation 52
  • 56. Components: UserNeighborhood • Which definition of neighborhood are available? – Nearest N users • The first N users with the highest similarity are labeled as ‘neighbors’ – Thresholds • Users whose similarity is above a threshold are labeled as ‘neighbors’ – Already implemented! 56
  • 57. Components: Recommender • Given a DataModel, a definition of similarity between users (items) and a definition of neighborhood, a recommender produces as output an estimation of relevance for each unseen item • Which recommendation algorithms are implemented? – User-based CF – Item-based CF – SVD-based CF (and much more…) 57
  • 58. Recap • Many implementations of a CF-based recommender! – Different recommendation algorithms – Different neighborhood definitions – Different similarity definitions • Evaluation fo the different implementations is actually very time-consuming – The strength of Mahout lies in that it is possible to save time in the evaluation of the different combinations of the parameters! – Standard interface for the evaluation of a Recommender System 58
  • 59. Evaluation • Mahout provides classes for the evaluation of a recommender system – Prediction-based measures • Mean Average Error • RMSE (Root Mean Square Error) – IR-based measures • Precision, Recall, F1-measure, F1@n • NDCG (ranking measure) 59
  • 60. Evaluation • Prediction-based Measures – Class: AverageAbsoluteDifferenceEvaluator – Method: evaluate() – Parameters: • Recommender implementation • DataModel implementation • TrainingSet size (e.g. 70%) • % of the data to use in the evaluation (smaller % for fast prototyping) 60
  • 61. Evaluation • IR-based Measures – Class: GenericRecommenderIRStatsEvaluator – Method: evaluate() – Parameters: • Recommender implementation • DataModel implementation • Relevance Threshold (mean+standard deviation) • % of the data to use in the evaluation (smaller % for fast prototyping) 61
  • 62. Part 2 How to use Mahout? Hands-on 62
  • 63. Download Mahout • Download – The latest Mahout release is 0.9 – Available at: http://archive.apache.org/dist/mahout/0.9/mahout- distribution-0.9.zip – Extract all the libraries and include them in a new NetBeans (Eclipse) project • Requirement: Java 1.6.x or greater. • Hadoop is not mandatory! 63
  • 65. Exercise 1 • Create a Preference object • Set preferences through some simple Java call • Print some statistics about preferences – How many preferences? On which items? – Wheter a user has expressed preference on a certain item. – Which one is the item with the highest score? 65
  • 66. Hints • Hints about objects to be used: – Preference • Methods setUserId, setItemId, setValue; – GenericUserPreferenceArray • Dimension = number of preferences to be defined; • Methods: getIds(), sortByValueReversed(),hasPrefWithItemId(id); 66
  • 67. Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; import org.apache.mahout.cf.taste.model.Preference; import org.apache.mahout.cf.taste.model.PreferenceArray; class CreatePreferenceArray { private CreatePreferenceArray() { } public static void main(String[] args) { PreferenceArray user1Prefs = new GenericUserPreferenceArray(2); user1Prefs.setUserID(0, 1L); user1Prefs.setItemID(0, 101L); user1Prefs.setValue(0, 2.0f); user1Prefs.setItemID(1, 102L); user1Prefs.setValue(1, 3.0f); Preference pref = user1Prefs.get(1); System.out.println(pref); } } 67
  • 68. Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; import org.apache.mahout.cf.taste.model.Preference; import org.apache.mahout.cf.taste.model.PreferenceArray; class CreatePreferenceArray { private CreatePreferenceArray() { } public static void main(String[] args) { PreferenceArray user1Prefs = new GenericUserPreferenceArray(2); user1Prefs.setUserID(0, 1L); user1Prefs.setItemID(0, 101L); user1Prefs.setValue(0, 2.0f); user1Prefs.setItemID(1, 102L); user1Prefs.setValue(1, 3.0f); Preference pref = user1Prefs.get(1); System.out.println(pref); } } Score 2 for Item 101 68
  • 69. Exercise 2 • Create a DataModel • Feed the DataModel through some simple Java calls • Print some statistics about data (how many users, how many items, maximum ratings, etc.) 69
  • 70. Exercise 2 • Hints about objects to be used: – FastByIdMap • PreferenceArray stores the preferences of a single user • Where do the preferences of all the users are stored? – An HashMap? No. – Mahout introduces data structures optimized for recommendation tasks – HashMap are replaced by FastByIDMap – Model • Methods: getNumItems(), getNumUsers(),getMaxPreference() • General statistics about the model. 70
  • 71. Exercise 2: data model import org.apache.mahout.cf.taste.impl.common.FastByIDMap; import org.apache.mahout.cf.taste.impl.model.GenericDataModel; import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.model.PreferenceArray; class CreateGenericDataModel { private CreateGenericDataModel() { } public static void main(String[] args) { FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>(); PreferenceArray prefsForUser1 = new GenericUserPreferenceArray(10); prefsForUser1.setUserID(0, 1L); prefsForUser1.setItemID(0, 101L); prefsForUser1.setValue(0, 3.0f); prefsForUser1.setItemID(1, 102L); prefsForUser1.setValue(1, 4.5f); preferences.put(1L, prefsForUser1); DataModel model = new GenericDataModel(preferences); System.out.println(model); } } 71
  • 72. Exercise 3 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! 72
  • 73. Exercise 3 • Hints about objects to be used: – FileDataModel • Argument: new File with the path of the CSV – PearsonCorrelationSimilarity, TanimotoCoefficientSimilarity, etc. • Argument: the model 73
  • 74. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } 74
  • 75. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } FileDataModel 75
  • 76. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } Similarity Definitions 76
  • 77. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } Output 77
  • 78. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations 78
  • 79. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations – Compare different combinations of parameters! 79
  • 80. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations – Compare different combinations of parameters! 80
  • 81. Exercise 4 • Hints about objects to be used: – NearestNUserNeighborhood – GenericUserBasedRecommender • Parameters: – data model  already shown – Neighborhood » Class: NearestNUserNeighborhood(n,similarity,model) » Class: ThresholdUserNeighborhood(thr,similarity,model) – similarity measure  already shown 81
  • 82. Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } 82
  • 83. Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } FileDataModel 83
  • 84. Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } 2 neighbours 84
  • 85. Exercise 4: First Recommender import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } Top-1 Recommendation for User 1 85
  • 86. • Download the GroupLens dataset (100k) – Its format is already Mahout compliant – http://files.grouplens.org/datasets/movielens/ml- 100k.zip • Preparatory Exercise: repeat exercise 3 (similarity calculations) with a bigger dataset • Next: now we can run the recommendation framework against a state-of-the-art dataset Exercise 5: MovieLens Recommender 86
  • 87. import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("ua.base")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 20); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } Exercise 5: MovieLens Recommender 87
  • 88. import org.apache.mahout.cf.taste.impl.model.file.*; import org.apache.mahout.cf.taste.impl.neighborhood.*; import org.apache.mahout.cf.taste.impl.recommender.*; import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.model.*; import org.apache.mahout.cf.taste.neighborhood.*; import org.apache.mahout.cf.taste.recommender.*; import org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("ua.base")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(10, 50); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } Exercise 5: MovieLens Recommender We can play with parameters! 88
  • 89. Exercise 5: MovieLens Recommender • Analyze Recommender behavior with different combinations of parameters – Do the recommendations change with a different similarity measure? – Do the recommendations change with different neighborhood sizes? – Which one is the best one? • …. Let’s go to the next exercise! 89
  • 90. • Evaluate different CF recommender configurations on MovieLens data • Metrics: RMSE, MAE, Precision Exercise 6: Recommender Evaluation 90
  • 91. • Evaluate different CF recommender configurations on MovieLens data • Metrics: RMSE, MAE • Hints: useful classes – Implementations of RecommenderEvaluator interface • AverageAbsoluteDifferenceRecommenderEvaluator • RMSRecommenderEvaluator Exercise 6: Recommender Evaluation 91
  • 92. • Further Hints: – Use RandomUtils.useTestSeed()to ensure the consistency among different evaluation runs – Invoke the evaluate() method • Parameters – RecommenderBuilder: recommender instance (as in previous exercises. – DataModelBuilder: specific criterion for training – Split Training-Test: double value (e.g. 0.7 for 70%) – Amount of data to use in the evaluation: double value (e.g 1.0 for 100%) Exercise 6: Recommender Evaluation 92
  • 93. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } Ensures the consistency between different evaluation runs. 93
  • 94. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } 94
  • 95. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } 70%training (whole dataset evaluation) 95
  • 96. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } Recommendation Engine 96
  • 97. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } We can add more measures 97
  • 98. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); System.out.println(rmse); } } 98
  • 99. Exercise 7: item-based recommender • Mahout provides Java classes for building an item-based recommender system – Amazon-like – Recommendations are based on similarities among items (generally pre-computed offline) – Evaluate it with the MovieLens dataset! 99
  • 100. class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); System.out.println(rmse); } } Exercise 7: item-based recommender 100
  • 101. class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); System.out.println(rmse); } } ItemSimilarity Example 7: item-based recommender 101
  • 102. class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); System.out.println(rmse); } } No Neighborhood definition for item- based recommenders Example 7: item-based recommender 102
  • 103. Exercise 8: MF-based recommender • Mahout provides Java classes for building an a CF recommender system based on state-of-the-art matrix factorization techniques – Class: SVDRecommender • Parameters: DataModel, Factorizer • Factorizer: a factorization algorithm – Alternating Least Squares (ALSWRFactorizer) – SVD++ (SVDPlusPlusFactorizer) – Stochastic Gradient Descent (ParallelSGDFactorizer)… etc – Several parameters to tune! – Evaluate it with the MovieLens dataset! 103
  • 104. class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ALSWRFactorizer = new ALSWRFactorizer(model, 10, 0.065, 60); return new SVDRecommender(model, factorizer); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); System.out.println(rmse); } } 104 Exercise 8: MF-based recommender
  • 105. class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ALSWRFactorizer = new ALSWRFactorizer(model, 10, 0.065, 60); return new SVDRecommender(model, factorizer); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); System.out.println(rmse); } } 105 Exercise 8: MF-based recommender Hyperparameters: latent factors, lambda, iterations
  • 106. Mahout Strengths • Fast-prototyping and evaluation – To evaluate a different configuration of the same algorithm we just need to update a parameter and run again. – Example • Different Neighborhood Size • Different similarity measures, etc. 106
  • 107. 5 minutes to look for the best configuration  107
  • 108. • Evaluation of CF algorithms through IR measures • Metrics: Precision, Recall Exercise 9: Recommender Evaluation 108
  • 109. • Evaluation of CF algorithms through IR measures • Metrics: Precision, Recall • Hints: useful classes – GenericRecommenderIRStatsEvaluator – Evaluate() method • Same parameters of exercise 6 and 7 Exercise 9: Recommender Evaluation 109
  • 110. class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Exercise 9: IR-based evaluation Precision@5 , Recall@5, etc. 110
  • 111. class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Exercise 9: IR-based evaluation Precision@5 , Recall@5, etc. 111
  • 112. class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Exercise 9: IR-based evaluation Set Neighborhood to 500 112
  • 113. class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new EuclideanDistanceSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Exercise 9: IR-based evaluation Set Euclidean Distance 113
  • 114. • Write a class that automatically runs evaluation with different parameters – e.g. fixed neighborhood sizes from an Array of values – Print the best scores and the configuration Exercise 10: Recommender Evaluation 114
  • 115. • Find the best configuration for several datasets – Download datasets from http://mahout.apache.org/users/basics/collections.html –Write classes to transform input data in a Mahout-compliant form –Extend exercise 10! Exercise 11: more datasets! 115
  • 116. End. Do you want more? 116
  • 117. Do you want more? • Recommendation – Deploy of a Mahout-based Web Recommender – Integration with Hadoop/Spark – Integration of content-based information – Custom similarities, Custom recommenders, Re- scoring functions • Content-based Recommender Systems through Classification Algorithms! 117