Presented at IUI 2016. The MovieLens datasets are widely used in education, research, and industry. They are downloaded hundreds
of thousands of times each year, reflecting their use in popular press programming books, traditional and online courses, and software. These datasets are a product of member activity in the MovieLens movie recommendation system, an active research platform that has hosted many experiments since its launch in
1997. This article documents the history of MovieLens and the MovieLens datasets. We include a discussion of lessons learned from running a long-standing, live research platform from the perspective of a research organization. We document best practices and limitations of using the MovieLens datasets in new research.
10. MovieLens benchmark datasets
10
Name Dates Users Movies Ratings Density
ML 100K ‘97 – ‘98 943 1,682 100,000 6.30%
ML 1M ‘00 – ‘03 6,040 3,706 1,000,209 4.47%
ML 10M ‘95 – ‘09 69,878 10,681 10,000,054 1.34%
ML 20M ‘95 – ‘15 138,493 27,278 20,000,263 0.54%
designed for replicability
11. MovieLens latest datasets
11
Name Dates Users Movies Ratings Density
ML Latest ‘95 – ‘16 247,753 34,208 22,884,377 0.003%
ML Latest
Small
‘96 – ‘16 668 10,329 105,339 0.015%
designed for recency
13. tension: datasets vs. system
» ideal (pure) vs. actual (it’s complex)
» systems want to change
• stay current, constant improvements
• A/B tests, beta testing, and other experiments
» context changes
• devices, competing sites, changing user base
13
19. some key changes
» core flow of browse/search
» rating widget
» recommender
» new user experience
» …
19
20. history of experiments
» both online field experiments and online
lab experiments
» created temporary and permanent
changes, changed pattern of use
20
22. in the paper
» the story of MovieLens (1997 origins)
• lessons learned from running a “real” system
in a research lab
• lots of fun descriptive stats/charts
» best practices for dataset researchers
• limitations
• alternatives
22
23. people who made this possible
» John Riedl
» Istvan Albert, Al Borchers, Dan Cosley,
Brent J. Dahlen, Rich Davies, Michael
Ekstrand, Dan Frankowski, Nathaniel
Good, Jon Herlocker, Daniel Kluver,
Shyong (Tony) Lam, Michael Ludwig,
Sean McNee, Chad Salvatore, Shilad Sen,
and Loren Terveen
» MovieLens users
23
24. in ACM Transactions on Interactive Intelligent Systems, Dec. 2015
» feedback? contact us: grouplens-info@cs.umn.edu
presented by Max Harper, Research Scientist, University of Minnesota,
harper@cs.umn.edu
written with Joe Konstan, Distinguished McKnight University Professor,
University of Minnesota, konstan@cs.umn.edu
This material is based on work supported by the National Science Foundation under grants
DGE-9554517, IIS-9613960, IIS-9734442, IIS-9978717, EIA-9986042, IIS-0102229, IIS-
0324851, IIS-0534420, IIS-0808692, IIS-0964695, IIS-0968483, IIS-1017697, IIS-1210863.
This project was also supported by the University of Minnesota’s Undergraduate Research
Opportunities Program and by grants and/or gifts from Net Perceptions, Inc., CFK Productions,
and Google.
24
The MovieLens Datasets:
History and Context
28. key dataset limitations (1/2)
» system UI and recommender changes
» bias towards “successful” users
» possible bias towards users with tolerance
for “research quality” design
» timestamps do not reflect time of
consumption
28
29. key dataset limitations (2/2)
» recommender systems research
community attitudes
• implicit behaviors > ratings?
• dataset-only research increasingly
discouraged
29
34. lessons from running MovieLens
» lessons from startups apply (it’s hard, fail
fast)
» continual work, not one-time effort
» encourage code quality through good
social coding conventions
» invest in tools that allow users to help
34
35. dataset uses
» recommender systems research
» recommender systems MOOC
• http://coursera.org/learn/recommender-systems
» code examples (popular press, blogs)
» higher education
» commercial – internal testing
35
I am the current caretaker of a system called movielens, and the datasets that are derived from that system
I'm here to present a paper that we published in Transactions on Interactive Intelligent Systems about movielens and the movielens datasets
Notes:
what is the point? why should I listen to this talk? why are you telling us this?
theme: tension building/maintaining a real system vs. producing a “pure” dataset
- a solution (impossible to implement retroactively) is to document extensively (e.g., add version number to each rating)
there are many other things that changed beyond the ones listed in the current talk…mention them briefly?
add a road-map at the beginning of the talk
- maybe “things to know if you use the movielens datasets”
include most cited papers (+1)
don’t say specifics about recommenders – just say how high level effect might have influenced ratings
why are you telling us movielens history? we’re sharing these lessons because we think they’re useful for users and people who want to generate their own datasets
say as a theme: the system changes and that has impact on dataset?
mention genome and other grouplens datasets?
MovieLens is a web site that collects 5-star ratings on movies
We have collected the result of many users providing many of these movie ratings in the movielens datasets, a publicly available resource for folks to explore rating data
Notes:
possibly convert to a data table
Fundamentally, movielens is relevant because ratings-based systems have become so prevalent across a variety of systems
(maybe cut this slide?)
most of these books and papers refer to the datasets, rather than the system
Notes:
just say “mooc”
most of these books and papers refer to the datasets, rather than the system
Notes:
just say “mooc”
2 goals in this talk.
introduce the MovieLens datasets to make sure everyone knows what I’m talking about, and to catch some of you up on new releases
discuss the tension between system-building and dataset purity, which I hope will be useful
both to inform us about some potential limitations inherent in dataset-based research
and to inform researchers engaged in releasing datasets of their own
---
relevance to IUI folks who…
conduct dataset research
peer review dataset research
build systems
release datasets
fundamentally, the MovieLens datasets describe users’ movie rating behavior
the core of the dataset contains tuples of the form shown here.
for example: user Max rated the movie Toy Story 4 stars at a particular time
rating values represent “half-star” ratings, from 0.5 stars to 5 stars
timestamps represent the most recent time when the rating was provided
In our latest dataset, there are about 20 million records like these
here are the four dataset versions
we’ve released one about every five years
they vary quite a bit in their characteristics
the older datasets are most useful for comparing new work to existing published studies
we recommend that new work that is not comparative uses the 20m dataset
for development or educational work, we have released a set of non-stable “latest” datasets
kept up to date (generated in 2016 to include new movies)
latest is unabridged, containing all users, including those with just 1 rating
latest-small is kept to 100k ratings for speed of development and testing, designed for educational purposes, demos, and other needs that don’t require big data
latest-small is also redistributable for non-commercial purposes
ideal: “pure” datasets
actual: user-generated datasets come from user interaction with a system
these changes work against the concept of generating pure data
movielens is a good case study, since it has been around for so long
Here it is! This is movielens, circa August 1997, around the time of its launch, as rendered by netscape navigator 4
MovieLens has operated continuously since that time.
Let’s look though some screenshots showing its evolution
version 1, released september 1999
version 2, released February 2000
version 3, released February 2003
and most recently, version 4, released November 2014
and this basically what it will look like if you visit today
core flow of browse/search
rating widget
half stars, number of clicks
recommender
prediction, ordering
new user experience
“entry barrier”, initial personalization
there’s more: tagging, movie management, social features, …
recommender (1997 user-user via grouplens, 1999 user-user net perceptions, 2003 item-item multilens, 2012 item-item lenskit, 2014 popularity blending item-item or svd)
new user (1997 rate 5 from 10 at a time (9 random, 1 easy), 2002 rate 15 selected for popularity, 2014 pick groups recommender)
ratings widget (1997 5 stars dropdown, 2003 half stars pulldown, 2014 clickable stars)
Notes:
more visuals
too much here
…not unique to MovieLens, practice of A/B testing affects most datasets (e.g., Netflix, Amazon)
and yet we find remarkable stability in general use of the ratings widget in aggregation
chart shows average and median ratings across time, aggregated by month.
given the extent of changes we’ve just discussed, it is somewhat remarkable to observe so little monthly variation
Notes:
get rid of median line?
a brief acknowledgement of the people who made this retrospective look possible
the core idea or premise hasn’t really changed since its initial release!
movielens is a system that helps people find movies to watch
it works by asking users to rate movies to express their preferences in 1-5 stars
it uses those ratings to predict subsequent ratings
it can prioritize the display of highly-predicted ratings to personalize the experience
Notes:
polish presentation of timestamps + influence
usage
movielens has been used by lots of people, all around the world
we’ve registered about 280,000 people since launching in 1997
and the system has welcomed several thousand monthly active users since 2001
Notes:
maybe combine with other chart?
To understand the datasets, it is critical to understand the underlying system
Like all systems, movielens has changed
Like many systems, movielens has experimented with features
there are a variety of other datasets that provide different characteristics
this table shows some of the most prominent ones
the two biggest alternatives in the movies space, eachmovie and netflix, have each been redacted and are no longer available, legally speaking
however, there are a number of great alternatives for ratings data across other domains
Notes:
Maybe cut this slide
explain the cross-outs
Let’s go back to the mid-90’s
Digital Equipment Corporation (DEC) was running an experimental system called EachMovie
EachMovie was built to explore the still young idea of personalized recommendations with collaborative filtering
But in 1997, DEC decided to shut down EachMovie
The DEC researchers reached out to the recommender systems community, looking for an organization to develop a replacement site, to serve the same users
Joe Konstan and John Riedl (pictured here) responded, and had their graduate students build a “copy” of eachMovie, backed by the grouplens recommender engine
our paper has links to all of those, if you’re interested!