Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

493 views

Published on

Yong Zheng, Mayur Agnani, Mili Singh. “Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering”. Proceedings of The 6th ACM Conference on Research in Information Technology (RIIT), Rochester, NY, USA, October, 2017

Published in: Engineering
  • Login to see the comments

[RIIT 2017] Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

  1. 1. Yong Zheng, Mayur Agnani, Mili Singh School of Applied Technology Illinois Institute of Technology Chicago, IL, 60616, USA Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering
  2. 2. Agenda • Background: Recommender Systems • Grey Sheep Users In Collaborative Filtering • Methodology and Solutions • Experimental Results • Conclusions and Future Work 2
  3. 3. Agenda • Background: Recommender Systems • Grey Sheep Users In Collaborative Filtering • Methodology and Solutions • Experimental Results • Conclusions and Future Work 3
  4. 4. Information Overload 4
  5. 5. Alleviating Information Overload 5 Information Extraction and Filtering Query Information Retrieval Recommender Systems
  6. 6. Recommender System (RS) • RS: item recommendations tailored to user tastes 6
  7. 7. Recommender Systems 7 E-Commerce: Amazon.com, eBay.com, BestBuy, NewEgg
  8. 8. Recommender Systems 8 Online Streaming: Netflix, Pandora, Spotify, Youtube, etc
  9. 9. Recommender Systems 9 Social Media: Facebook, Twitter, Weibo, etc
  10. 10. How it works 10 Red Mars Juras- sic Park Lost World 2001 Found ation Differ- ence Engine Recommender Systems User Profile Neuro- mancer 2010 Recommendations
  11. 11. Traditional Recommendation Algorithms 11 Content-Based Recommendation Algorithms The user will be recommended items similar to the ones the user preferred in the past, such as book/movie recsys Collaborative Filtering Based Recommendation Algorithms The user will be recommended items that people with similar tastes and preferences liked in the past, e.g., movie recsys Hybrid Recommendation Algorithms Combine content-based and collaborative filtering based algorithms to produce item recommendations.
  12. 12. Collaborative Filtering 1212
  13. 13. Collaborative Filtering: Algorithms 13 User-Based KNN Collaborative Filtering (UBCF) Assumption: a user u’s rating on item t is similar to other users’ rating on item t, while this group of similar users is called user K-nearest neighbor Pirates of the Caribbean 4 Kung Fu Panda 2 Harry Potter 6 Harry Potter 7 U1 4 4 1 2 U2 3 4 2 1 U3 2 2 4 4 U4 4 4 1 ?
  14. 14. Collaborative Filtering: Algorithms 14 User-Based KNN Collaborative Filtering (UBCF) Pirates of the Caribbean 4 Kung Fu Panda 2 Harry Potter 6 Harry Potter 7 U1 4 4 1 2 U2 3 4 2 1 U3 2 2 4 4 U4 4 4 1 ? a = the target user i = the target item N = user neighborhood u = a user neighbor in N
  15. 15. Collaborative Filtering: Algorithms 15 Popular Challenges in Collaborative Filtering Data sparsity problems Cold-start users or items Grey-sheep users Incorporate content into collaborative filtering ….
  16. 16. Agenda • Background: Recommender Systems • Grey Sheep Users In Collaborative Filtering • Methodology and Solutions • Experimental Results • Conclusions and Future Work 16
  17. 17. Grey Sheep Users 17 Definition 1 by Mark Claypool, et al., 1999 A group of users who neither agree nor disagree with any group of users. Therefore, they will not benefit from the user-based collaborative filtering technique Definition 2 by John McCrae, et al., 2004 White Sheep Users: high correlations with other users Black Sheep Users: very few or no correlating users Grey Sheep Users: unusual tastes, low correlations with others
  18. 18. Research Problem: Identifying Grey Sheep Users 18 Collaborative Filtering Other Algorithms
  19. 19. Existing Approaches 19 There are two existing Approaches Clustering Technique by Ghazanfar, et al., 2011 Distribution of User Ratings by Gras, et al., 2016 They were developed based on the 1st definition We propose a novel approach based on 2nd definition White Sheep Users: high correlations with other users Black Sheep Users: very few or no correlating users Grey Sheep Users: unusual tastes, low correlations with others Note: Black Sheep User refers to the user whom we do not have enough knowledge for.
  20. 20. Agenda • Background: Recommender Systems • Grey Sheep Users In Collaborative Filtering • Methodology and Solutions • Experimental Results • Conclusions and Future Work 20
  21. 21. Proposed Solution 21 We propose a novel approach based on 2nd definition White Sheep Users: high correlations with other users Black Sheep Users: very few or no correlating users Grey Sheep Users: unusual tastes, low correlations with others The Distribution of User-User Correlations or Similarities
  22. 22. Proposed Solution 22 Proposed Solution Step 1, represent each user as distribution of user correlations Step 2, select good and bad examples Step 3, apply outlier detection on selected examples. Grey sheep users are the intersections of bad examples and identified outliers Step4, examine the quality of identified grey sheep users
  23. 23. Proposed Solution 23 Step 1, Distribution Representations We calculate user-user correlations by cosine similarity Obtain the descriptive statistics of the distribution
  24. 24. Proposed Solution 24 Step 2, Example Selection Good examples: high correlations and left-skewed Bad examples: low correlations and right-skewed
  25. 25. Proposed Solution 25 Step 3, Outlier Detection by Local Outlier Factor (LOF) LOF helps identify outliers by the local density Observations with LOF > 1 will be considered as outliers We set different threshold values to find the optimal one for identifying grey sheep users, for example LOF threshold = 1.0 LOF threshold = 1.1 LOF threshold = 1.2 LOF threshold = ….
  26. 26. Proposed Solution 26 Step 4, Examine the quality of identified GS Users The parameters in our solution Example Selection LOF threshold Neighbor of neighborhood in LOF method Our goals or examination criteria To find as many GS users as possible Recommendation by UBCF should be worse for GS users than non- GS users
  27. 27. Agenda • Background: Recommender Systems • Grey Sheep Users In Collaborative Filtering • Methodology and Solutions • Experimental Results • Conclusions and Future Work 27
  28. 28. Experimental Setting • Data: MovieLens 10 million rating data – 10M ratings – 72K users – 10K movies – Each user has rated at least 20 movies • Evaluation – 80% as training, 20% as testing – Mean absolute error, MAE, to eval rating predictions 28
  29. 29. Results • Impact by the # of neighbors in LOF method 29
  30. 30. Results • Comparison of Recommendation Quality 30
  31. 31. Results • Visualization of GS and Non-GS users 31
  32. 32. Agenda • Background: Recommender Systems • Grey Sheep Users In Collaborative Filtering • Methodology and Solutions • Experimental Results • Conclusions and Future Work 32
  33. 33. Conclusions • We develop a novel approach to identify GS users by utilizing the definition related to the user-user correlations • Our approach can successfully identify GS users • Our approach is less complicated than the existing approaches 33
  34. 34. Drawbacks and Future Work • We did not compare our solution with the two existing methods • The user-user correlations may not be reliable if the rating data is sparse 34
  35. 35. Stay Tuned • Yong Zheng, Mayur Agnani, Mili Singh. “Identification of Grey Sheep Users By Histogram Intersection In Recommender Systems”. Proceedings of The 13th International Conference on Advanced Data Mining and Applications, 2017 – We improve the proposed solution in the RIIT paper – We better measure user-user correlations – We compare our solution with the two existing methods and demonstrate the advantages and effectiveness of our proposed solution 35
  36. 36. Yong Zheng, Mayur Agnani, Mili Singh School of Applied Technology Illinois Institute of Technology Chicago, IL, 60616, USA Identifying Grey Sheep Users By The Distribution of User Similarities In Collaborative Filtering

×