Utilizing Mahout, implement a Collaborative Filtering framework using historical data, in this instance, movie ratings by 943 users, to provide Item-based recommendations. Three item based recommendations will be provided for each user.
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Example: movielens data with mahout
1. MovieLens data with Mahout
MovieLens data sets are collected by the GroupLens Research Project at the University of Minnesota and is
available from http://grouplens.org/datasets/movielens/
This data set consists of:
Users and items are numbered consecutively from 1.
The data is randomly ordered.
This is a tab separated list of user id | item id | rating | timestamp.
The time stamps are unix seconds since 1/1/1970 UTC
Example:
1 272 3 887431647
2 1 4 888550871
2 10 2 888551853
Line 1:
1 (user id) 272 (item id) 3 (rating) 887431647 (timestamp)
The objective:
The objective is to implement a Collaborative Filtering framework using historical data, in this instance, movie
ratings by 943 users, to provide Item-based recommendations. Three item based recommendations will be
provided for each user.
In order to and make these recommendations it is necessary to calculate similarity between items. Items
usually don't change much, so this often can be computed offline and has been popularized by Amazon and
others.
In the example provided the measure of similarity used is Euclidean Distance, however other measures are
available, including:
- Pearson correlation
- Spearman correlation
- Tanimoto coefficient
- LogLikelihood similarity
The code:
mahout recommenditembased
--input /user/cloudera/ua.base
--tempDir /user/cloudera/run1
--similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE
--output /user/cloudera/run1/results
--numRecommendations 3
The output:
Item recommendations for the first 33 users: