➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
[ADMA 2017] Identification of Grey Sheep Users By Histogram Intersection In Recommender Systems
1. Yong Zheng, Mayur Agnani, Mili Singh
Illinois Institute of Technology
Chicago, IL, 60616, USA
2017 International Conference on Advanced Data Mining and
Applications, Singapore, Nov 5-6, 2017
Identification of Grey Sheep Users By Histogram
Intersection In Recommender Systems
2. Agenda
• Background: Recommender Systems
• Grey Sheep Users In Collaborative Filtering
• Methodology and Solutions
• Experimental Results
• Conclusions and Future Work
2
4. Traditional Recommendation Algorithms
4
Content-Based Recommendation Algorithms
The user will be recommended items similar to the ones the
user preferred in the past, such as book/movie recsys
Collaborative Filtering Based Recommendation Algorithms
The user will be recommended items that people with similar
tastes and preferences liked in the past, e.g., movie recsys
Hybrid Recommendation Algorithms
Combine content-based and collaborative filtering based
algorithms to produce item recommendations.
5. Collaborative Filtering: Algorithms
5
User-Based KNN Collaborative Filtering (UBCF)
Assumption: a user u’s rating on item t is similar to
other users’ rating on item t, while this group of
similar users is called user K-nearest neighbor
Pirates of the
Caribbean 4
Kung Fu Panda 2 Harry Potter 6 Harry Potter 7
U1 4 4 1 2
U2 3 4 2 1
U3 2 2 4 4
U4 4 4 1 ?
6. Collaborative Filtering: Algorithms
6
User-Based KNN Collaborative Filtering (UBCF)
Pirates of the
Caribbean 4
Kung Fu Panda 2 Harry Potter 6 Harry Potter 7
U1 4 4 1 2
U2 3 4 2 1
U3 2 2 4 4
U4 4 4 1 ?
a = the target user
i = the target item
N = user neighborhood
u = a user neighbor in N
7. Collaborative Filtering: Algorithms
7
Popular Challenges in Collaborative Filtering
Data sparsity problems
Cold-start users or items
Grey-sheep users
Incorporate content into collaborative filtering
….
8. Grey Sheep Users
8
Definition 1 by Mark Claypool, et al., 1999
A group of users who neither agree nor disagree with any group
of users. Therefore, they will not benefit from the user-based
collaborative filtering technique
Clustering Technique by Ghazanfar, et al., 2011
Distribution of User Ratings by Gras, et al., 2016
Definition 2 by John McCrae, et al., 2004
White Sheep Users may have high correlations with other users;
Black Sheep Users have very few or no correlating users; Grey
Sheep Users own unusual tastes and low correlations with others
Distribution of User Similarities by Zheng, et al., 2017
10. Proposed Solution
10
Approach Based on The Distribution of User Similarities
White Sheep Users: high correlations with other users
Black Sheep Users: very few or no correlating users
Grey Sheep Users: unusual tastes, low correlations with others
The Distribution of User-User Correlations or Similarities
11. Proposed Solution
11
Proposed Solution
Step 1, represent each user as distribution of user correlations
Step 2, select good and bad examples
Step 3, apply outlier detection on selected examples. Grey sheep
users are the intersections of bad examples and identified
outliers
Step4, examine the quality of identified grey sheep users
12. Proposed Solution
12
Step 1, Distribution Representations
We calculate user-user correlations by cosine similarity
Obtain the descriptive statistics of the distribution
13. Proposed Solution
13
Step 2, Example Selection
Good examples: high correlations and left-skewed
Bad examples: low correlations and right-skewed
14. Proposed Solution
14
Step 3, Outlier Detection by Local Outlier Factor (LOF)
LOF helps identify outliers by the local density
Observations with LOF > 1 will be considered as outliers
We set different threshold values to find
the optimal one for identifying grey sheep
users, for example
LOF threshold = 1.0
LOF threshold = 1.1
LOF threshold = 1.2
LOF threshold = ….
15. Proposed Solution
15
Step 4, Examine the quality of identified GS Users
The parameters in our solution
Example Selection
LOF threshold
Neighbor of neighborhood in LOF method
Our goals or examination criteria
To find as many GS users as possible
Recommendation by UBCF should be worse for GS users than non-
GS users
16. Improved Approach
16
Drawback
Cosine similarities reply on co-ratings. If two users did not rate
items in common, we are not able to measure their similarities
Improved Approach
We represent each user as its similarity distribution
The distribution can be represented by a histogram
The interaction of two histograms tells the user-user similarity
17. Experimental Setting
• Data: MovieLens 100K rating data
– 100K ratings
– 1K users
– 1.7K movies
– Each user has rated at least 20 movies
• Evaluation
– 80% as training, 20% as testing
– Mean absolute error, MAE, to eval rating predictions
17
21. Conclusions
• We develop a novel approach to identify GS
users by utilizing the definition related to the
user-user correlations
• We propose to use histogram intersection to
better measure user-user similarities
• Our approach is demonstrated to work better
than others based on the MovieLens 100K data
21
22. Future Work
• Try it on other data sets
• Seek approaches to improve the
recommendation performance for the group of
Grey Sheep Users
22
23. Yong Zheng, Mayur Agnani, Mili Singh
Illinois Institute of Technology
Chicago, IL, 60616, USA
Identification of Grey Sheep Users By Histogram
Intersection In Recommender Systems