What to Read Next? Three Stages of Data-driven Book Discovery

WHAT TO READ NEXT?
THREE STAGES OF DATA-DRIVEN BOOK DISCOVERY
METTE SKOV
TOINE BOGERS
AALBORG UNIVERSITY
AOIR PANEL ON ‘NETWORKED READING’ OCTOBER 7, 2016

BOOKS ARE NOT DEAD (THEY AREN’T EVEN SICK!)
Books remain very popular!
– Slow but steady increase in book sales to
2.7 billion books in the US in 2015
E-books make up 18.9% of that in the US
– Total sales revenue: $29.2 billion in US in
2015
E-book sales revenue was $5.3 billion
So there is definitely a market & need for
discovering (new) interesting books!
3

BOOK DISCOVERY IS NOT THAT EASY...
Readers often struggle with existing systems (search
engines & recommender systems) to discover new books
– Information needs are highly complex
Topical match, complex relevance aspects, personal interests
& preferences, context of use
– Search engines and recommenders are ill-equipped to
address such needs!
4

EXAMPLESOF COMPLEX INFORMATION NEEDS 5

SOCIAL BOOK SEARCH LAB
Series of workshops (2011-2016) with a shared data
challenge using data from LibraryThing and Amazon
Focus is on the design, development & evaluation of
systems that can address complex book requests
1. Detecting complex book requests
2. Analyzing book requests for relevance aspects
3. Developing better algorithms for suggesting relevant
books
4. Exploring interactionswith book search engines
6

OVERVIEW OF DATA SOURCES 7
2.8 million books
944 annotated
requests
Books
Users
Book
requests

ANNOTATED LT TOPIC
8
Group name
Request title
Narrative

BOOK REQUESTS
Forum posts describing realistic book search requests
– Book request narratives can touch upon many different
aspects
Users search for topics, genres, authors, plots, etc.
Users want books that are engaging, funny, well-written,
educational, etc.
Users have different preferences, knowledge, reading level,
etc.
– Book discussion fora contain many such focused requests!
LibraryThing, Goodreads, …
9

2.8 million books
944 annotated
requests
5,658 suggestionsSuggestions
Books
Users
Book
requests

ANNOTATED LT TOPIC
11
Recommended
books
Group name
Request title
Narrative

Catalog additions
12
Forum suggestion added
after the request was posted

94,000 user
profiles
2.8 million books
944 annotated
requests
5,658 suggestionsSuggestions
Books
Users
Book
requests User profiles

Suggestions
Books
Users
Book
requests
Bibliographic
metadata
Curated
metadata
User-generated
content
User profiles
Tags
Reviews

DETECTINGCOMPLEX
BOOK REQUESTS
PART 2

DETECTING BOOK REQUESTS
How common are requests for book recommendations in
the LibraryThing Forums?
– Currently 233,000+ threads in the LibraryThing forums
– Annotated a random sample of 4,000 threads of which
15.1% were book requests
– Means there are potentially over 35,000 book requests on
LibraryThing!
17

DETECTING BOOK REQUESTS
Can we detect such book requests automatically?
– Initial experiments achieved an accuracy of 94.17% on a test
set of 2,000 annotated book requests
– Most predictive characteristics
Words such as any, suggestions, looking, recommendations,
thanks, anyone, read, books, and recommend
No. of sentencesending in a question mark
Degree of expertiseof LibraryThing users replying to the thread
Ratio of suggested books cataloged afterwards by the requester
18

ANALYZING
BOOK REQUESTS
PART 3

ANALYZING BOOK REQUESTS
Book requests contain many elements that could be mined
to benefit search engines & recommendation systems
– Example: Relevance aspects
What makes a suggested book relevant to the user?
Identified eight relevance aspects in book search
requests (Reuter, 2007; Koolen et al., 2015)
20

Accessibility
– Accessibility in terms of the language, length, or level of difficulty of a
book.
Content
– Aspects such as topic, plot, genre, style, or comprehensiveness of a book.
Engagement
– Books that fit a particular mood or interest, books that are considered high
quality, or provide a particular reading experience.
Familiarity
– Books that are similar to known books or related to a previous experience.
21

Known-item
– Descriptions of known books with the sole purposeof identifying its title and/or
author.
Metadata
– Books with a certain title or by a certain author, editor, illustrator, publisher, in a
particular format, or written or published in certain year or period.
Novelty
– Books with content that is novel to the reader, books that are unusualor quirky.
Socio-Cultural
– Books related to the user’s socio-culturalbackground or values, books that are
popular or obscure, or books that have had a particular culturalor social impact.
22

Distribution of relevance aspects & prediction success
24
Aspect % of narratives
(N = 944) Precision
Accessibility 16.1 % 0.31
Content 73.9 % 0.63
Engagement 22.6 % 0.32
Familiarity 35.8 % 0.47
Known-item 21.4 % 0.71
Metadata 28.0 % 0.17
Novelty 3.6 % 0.03
Socio-cultural 14.2 % 0.23

SUGGESTING
RELEVANT BOOKS
PART 4

SUGGESTING RELEVANTBOOKS 26
Suggestions
Books
Users
Book
requests
Bibliographic
metadata
Curated
metadata
User-generated
content
User profiles
Tags
Reviews

- Title
- Publisher
- Editorial
- Creator
- Series
- Award
- Character
- Place
• Different
grou
- Blurb
- Epigraph
- First words
- Last words
- Quotation
- User reviews
• Different
grou
- Dewey
- Thesaurus
- Index terms
- Tags
Bibliographic metadata
Content
Curated metadata
Reviews
Tags
Different types of book metadata fields

Set of metadata fields NDCG@10
Bibliographic metadata 0.2015
Content 0.0115
Curated metadata 0.0691
Tags 0.2056
Reviews 0.2832
All fields combined 0.3058
NDCG@10
0.0 0.1 0.2 0.3 0.4

ANALYZING BOOK
SEARCH BEHAVIOR
PART 5

AIM & APPROACH
We aim to contribute to building dedicated book
search and discovery services
Our long-term goal is to investigate book search
behaviour through a range of user tasks and interfaces:
– How should the user interface combine professional,
curated metadata and user-generated metadata?
– How should the user interface adapt itself as the user
progresses through their search task, and if so, how?
– When do users prefer to browse or search?
– How can we best support different types of search
tasks?

USER STUDY OF INTERACTIVE BOOK SEARCH BEHAVIOUR
Comparative user studies with 192 + 111 participants (2015 & 2016)
Welcome
Informed
Consent
Background
Pre-Task
Information
Task
Post-Task
Questions
Experience Thank you

EXPERIMENTAL TASKS
Goal-oriented task: Imagine you participate in an experiment at a desert-
island for one month. There will be no people, no TV, radio or other
distraction. The only things you are allowed to take with you are 5 books:
– On surviving on a desert island
– That will teach you something new
– Highly recommended by other users
– For fun
– About one of your personal hobbies or interests
Non-goal task: Imagine you are waiting to meet a friend in a coffee shop
or pub or the airport or your office. While waiting, you come across this
website and explore it looking for any book that you find interesting, or
engaging or relevant. Explore anything you wish until you are completely
and utterly bored…

MULTISTAGE INTERFACE – BROWSE VIEW

MULTISTAGE INTERFACE – SEARCH VIEW

MULTISTAGE INTERFACE – BOOK-BAG VIEW

WHAT HAVE WE LEARNED SO FAR?
Need for heterogeneous record information (user
generated and curated, professional data)
Multi-stage interface:
– Longer search session
– Less queries issued (more browsing)
– No differences in number of books added to book bag
Clear differences in search behaviour between the
different types of tasks
(Gäde et al. 2015, 2016)

CONCLUSIONS
Tens of thousands of information needs are going unmet
– Just the tip of the iceberg?
– Search engines and recommender systems are ill-equipped
to deal with this!
42

OPEN QUESTIONS
How (dis)similar are relevance aspects for books to those
for other domains?
How do relevance aspects influence the choice of
algorithm(s) & data representation(s)?
How does the combination of data from different sources
(Amazon, LibraryThing, Library of Congress, British Library)
affect the quality of the results and UX?
Decontextualized metadata: What happens when we mix
metadata from different sources?
– Example: reuse of recommendations or tags ‘out of context’
43

QUESTIONS?
“ALWAYS READ SOMETHING THAT
WILL MAKE YOU LOOK GOOD IF
YOU DIE IN THE MIDDLE OF IT.”
P.J. O’Rourke

REFERENCES
Slide 3
– Book sales statistics taken from https://www.statista.com/topics/1177/book-
market/ and https://www.statista.com/topics/1474/e-books/; last visited
October 1, 2016
Slide 6
– Official website of the Social Book Search lab: http://social-book-
search.humanities.uva.nl/
Slide 20
– Reuter, K. (2007). Assessing Aesthetic Relevance: Children’s Book Selection in a
Digital Library. JASIST, 58(12), 1745–1763.
– Koolen, M., Bogers, T., Van den Bosch, A., and Kamps, J. (2015). Looking for
Books in Social Media: An Analysis of Complex Search Requests. Proceedings of
ECIR 2015, Volume 9022 of the series Lecture Notes in Computer Science, pp.
184-196
46

REFERENCES
Slide 40
– Gäde, M., Hall, M., Huurdeman, H., Kamps, J., Koolen, M., Skov, M., Toms, E. &
Walsh, D. (2015). Overview of the SBS 2015 Interactive Track . Working Notes of
CLEF 2015 – Conference and Labs of the Evaluation Forum, CEUR workshop
proceedings, vol. 1391
– Gäde, M., Hall, M., Huurdeman, H., Kamps, J., Koolen, M., Skov, M., Bogers, T. &
Walsh, D. (2015). Overview of the SBS 2016 Interactive Track. Working Notes of
CLEF 2016 – Conference and Labs of the Evaluation Forum, CEUR workshop
proceedings, vol. 1609
47

What to Read Next? Three Stages of Data-driven Book Discovery

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

More from Toine Bogers

More from Toine Bogers (15)

Recently uploaded

Recently uploaded (20)

What to Read Next? Three Stages of Data-driven Book Discovery