3. First, About Datasets
• In the beginning there was MovieLens
Good for analysis, experimentation, simulation, and seed
data
• Now: old, static, synthetic
Less interesting for current day experiments
• New dataset to fill the void: MovieTweetings
Daily updated dataset of movie ratings extracted from
IMDb ratings posted to Twitter
Recent movies, growing, natural dataset
ACM RecSys Challenge 2014 3
4. The Challenge
• Why stop at ratings?
• Tweets have much more information!
• Focus on ‘user interaction’
• How can we predict user engagement?
How does that benefit recsys?
ACM RecSys Challenge 2014 4
12. Discussion
• Are some ratings more important than others?
• Which external data source was useful?
• Which are the important features of the tweets?
• What else can we do with this dataset?
ACM RecSys Challenge 2014 12