Yong Zheng, Mayur Agnani, Mili Singh. “Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering”. Proceedings of The 6th ACM Conference on Research in Information Technology (RIIT), Rochester, NY, USA, October, 2017
Software and Systems Engineering Standards: Verification and Validation of Sy...
[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering
1. Yong Zheng, Mayur Agnani, Mili Singh
School of Applied Technology
Illinois Institute of Technology
Chicago, IL, 60616, USA
Identifying Grey Sheep Users By The Distribution of User
Similarities In Collaborative Filtering
2. Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
2
3. Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
3
11. Traditional Recommendation Algorithms
11
Content-Based Recommendation Algorithms
The user will be recommended items similar to the ones the
user preferred in the past, such as book/movie recsys
Collaborative Filtering Based Recommendation Algorithms
The user will be recommended items that people with similar
tastes and preferences liked in the past, e.g., movie recsys
Hybrid Recommendation Algorithms
Combine content-based and collaborative filtering based
algorithms to produce item recommendations.
13. Collaborative Filtering: Algorithms
13
User-Based KNN Collaborative Filtering (UBCF)
Assumption: a user u’s rating on item t is similar to
other users’ rating on item t, while this group of
similar users is called user K-nearest neighbor
Pirates of the
Caribbean 4
Kung Fu Panda 2 Harry Potter 6 Harry Potter 7
U1 4 4 1 2
U2 3 4 2 1
U3 2 2 4 4
U4 4 4 1 ?
14. Collaborative Filtering: Algorithms
14
User-Based KNN Collaborative Filtering (UBCF)
Pirates of the
Caribbean 4
Kung Fu Panda 2 Harry Potter 6 Harry Potter 7
U1 4 4 1 2
U2 3 4 2 1
U3 2 2 4 4
U4 4 4 1 ?
a = the target user
i = the target item
N = user neighborhood
u = a user neighbor in N
15. Collaborative Filtering: Algorithms
15
Popular Challenges in Collaborative Filtering
Data sparsity problems
Cold-start users or items
Grey-sheep users
Incorporate content into collaborative filtering
….
16. Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
16
17. Grey Sheep Users
17
Definition 1 by Mark Claypool, et al., 1999
A group of users who neither agree nor disagree with any group
of users. Therefore, they will not benefit from the user-based
collaborative filtering technique
Definition 2 by John McCrae, et al., 2004
White Sheep Users: high correlations with other users
Black Sheep Users: very few or no correlating users
Grey Sheep Users: unusual tastes, low correlations with others
19. Existing Approaches
19
There are two existing Approaches
Clustering Technique by Ghazanfar, et al., 2011
Distribution of User Ratings by Gras, et al., 2016
They were developed based on the 1st definition
We propose a novel approach based on 2nd definition
White Sheep Users: high correlations with other users
Black Sheep Users: very few or no correlating users
Grey Sheep Users: unusual tastes, low correlations with others
Note: Black Sheep User refers to the user whom we do not have enough knowledge for.
20. Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
20
21. Proposed Solution
21
We propose a novel approach based on 2nd definition
White Sheep Users: high correlations with other users
Black Sheep Users: very few or no correlating users
Grey Sheep Users: unusual tastes, low correlations with others
The Distribution of User-User Correlations or Similarities
22. Proposed Solution
22
Proposed Solution
Step 1, represent each user as distribution of user correlations
Step 2, select good and bad examples
Step 3, apply outlier detection on selected examples. Grey sheep
users are the intersections of bad examples and identified
outliers
Step4, examine the quality of identified grey sheep users
23. Proposed Solution
23
Step 1, Distribution Representations
We calculate user-user correlations by cosine similarity
Obtain the descriptive statistics of the distribution
24. Proposed Solution
24
Step 2, Example Selection
Good examples: high correlations and left-skewed
Bad examples: low correlations and right-skewed
25. Proposed Solution
25
Step 3, Outlier Detection by Local Outlier Factor (LOF)
LOF helps identify outliers by the local density
Observations with LOF > 1 will be considered as outliers
We set different threshold values to find
the optimal one for identifying grey sheep
users, for example
LOF threshold = 1.0
LOF threshold = 1.1
LOF threshold = 1.2
LOF threshold = ….
26. Proposed Solution
26
Step 4, Examine the quality of identified GS Users
The parameters in our solution
Example Selection
LOF threshold
Neighbor of neighborhood in LOF method
Our goals or examination criteria
To find as many GS users as possible
Recommendation by UBCF should be worse for GS users than non-
GS users
27. Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
27
28. Experimental Setting
• Data: MovieLens 10 million rating data
– 10M ratings
– 72K users
– 10K movies
– Each user has rated at least 20 movies
• Evaluation
– 80% as training, 20% as testing
– Mean absolute error, MAE, to eval rating predictions
28
32. Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
32
33. Conclusions
• We develop a novel approach to identify GS
users by utilizing the definition related to the
user-user correlations
• Our approach can successfully identify GS users
• Our approach is less complicated than the
existing approaches
33
34. Drawbacks and Future Work
• We did not compare our solution with the two
existing methods
• The user-user correlations may not be reliable
if the rating data is sparse
34
35. Stay Tuned
• Yong Zheng, Mayur Agnani, Mili Singh. “Identification of Grey
Sheep Users By Histogram Intersection In Recommender Systems”.
Proceedings of The 13th International Conference on Advanced
Data Mining and Applications, 2017
– We improve the proposed solution in the RIIT paper
– We better measure user-user correlations
– We compare our solution with the two existing methods and
demonstrate the advantages and effectiveness of our proposed
solution
35
36. Yong Zheng, Mayur Agnani, Mili Singh
School of Applied Technology
Illinois Institute of Technology
Chicago, IL, 60616, USA
Identifying Grey Sheep Users By The Distribution of User
Similarities In Collaborative Filtering