Designing with algorithms

Designing with
algorithms
How text miners and UX can work together
@micheleidesmith @j_h_kim
Michele Ide-Smith, Product Manager
Jee-Hyub Kim, Text Miner
European Bioinformatics Institute (EMBL-EBI)

@micheleidesmith
European Bioinformatics Institute
The home for big data in biology

@micheleidesmith
What we’ll cover
• Context - ﬁnding evidence in research literature
• What are annotations?
• What is text mining?
• Research insights and our design process
• Summary - what we learnt

@micheleidesmith
Research scientists are
expected to publish
several articles every year

@micheleidesmith
Finding evidence in
scientiﬁc literature is a
challenge

“I was looking for the cellular
location (cytoplasm or nucleus) of
ribonucleotide reductase. It’s like a
needle in a haystack.”

@micheleidesmith

@micheleidesmith
Abstracts
31.4 million
Agricola
records
631,222
Full text
articles
3.8 million
NHS
guidelines
780
Patents
4.2 million

@micheleidesmith
Full text is free to read and
share, but a CC-BY license
allows reuse

@micheleidesmith
Researchers still like to
read PDFs

“Sometimes it’s nicer to scan a PDF, in my
opinion...less scrolling and the ﬁgures are more
prominent. I really don’t like to read on the screen.”
“I can search in the PDF a little bit more easily than
in the full text article.”
“This [full text] is fairly clear but sometimes PDFs are
slightly easier to read, slightly easier on the eye.”

@micheleidesmith
Younger researchers
prefer reading online

“I almost never look at PDFs, they are a bit of a pain.”
“I never go to the publisher site - I like to see all the
articles in the same format. I don’t go to the PDF
unless I want to print it out.”

@micheleidesmith
Our users
• Life sciences researchers - find evidence for their
research questions, learn new methods and find all
available literature on a topic
• Curators - find evidence for e.g. a gene function so
that they can curate a page in a database

@micheleidesmith
“If I notice it’s really important then I’ll print it, so I
can highlight it with a pen.”

@micheleidesmith
GOAL:
To help researchers ﬁnd useful
information in articles quickly, and
link to related data resources

@micheleidesmith
annotation
noun
a note by way of explanation or comment
added to a text or diagram.

@micheleidesmith
An annotation is metadata (e.g. a comment,
explanation, presentational markup) attached
to text, image, or other data.
https://en.wikipedia.org/wiki/Annotation

@micheleidesmith
Annotations can be
private

@micheleidesmith
Annotations can be
public

@micheleidesmith
Annotations can be
created by humans

Or by machines…
By TedColes - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=38648141

@micheleidesmith
Human curation of
annotations is valuable,
but hard to sustain

@micheleidesmith
“Text mining…,refers to the process of
deriving high-quality information from
text.”
https://en.wikipedia.org/wiki/Text_mining

@micheleidesmith
Text miners ﬁnd useful information in
unstructured text using algorithms (a set of
rules) and build data pipelines.
By W Gossett https://www.ﬂickr.com/photos/systemf/

@micheleidesmith
Scientiﬁc literature
• Biological terms e.g. diseases, organisms, genes,
proteins and chemicals (using ontologies).
• Biological processes and functions e.g. gene-
disease relationships, protein-protein interactions or
gene function (from proximity of words in text and
position in the article)

@micheleidesmith
RESEARCH GOAL 1:
To understand how researchers
and curators ﬁnd literature and
make decisions about what to read

@micheleidesmith
Interviews with 8
researchers and 2
curators

@micheleidesmith
Find evidence to inform research

@micheleidesmith
Skim
read
abstracts Look at
figures
Skim
read
results
CTRL & F
to find
keywords
in text
Check
for data
files
Prioritise what
to read
Researchers prioritise what
they want to read, as their
time is limited.
They use different strategies
to identify articles which are
worth reading in full.

@micheleidesmith
RESEARCH GOAL 2:
To ﬁnd out how well the
prototype worked for users

@micheleidesmith
Usability tests with 17
users, in 2 iterations

@micheleidesmith
Research questions
• Do participants discover/use the feature?
• How easy is it to use/navigate through annotations?
• Do they trust the information?
• How do they feel about inaccurate annotations?
• Would they provide feedback if they had the
opportunity?

@micheleidesmith
I used an Invision clickable prototype
for the ﬁrst few sessions

@micheleidesmith
But we got best results using a
prototype with real data behind it

@micheleidesmith
Text miners and
developers took turns to
observe usability tests

@micheleidesmith
Issues were logged in a spreadsheet

@micheleidesmith
We worked in sprints to
address UX, technical
and text mining issues

“If it’s not speciﬁc enough,
I end up with a lot of
things being highlighted.”

@micheleidesmith
Granularity
• Some terms appeared too frequently, or were too
general to be useful e.g. “cell” or “formation”.
• Participants expected us to split Gene Ontology (GO)
terms into 3 separate categories e.g. Biological
process, molecular function, cellular component

“I guess false positives
automatically make me anxious
about whether to believe…"

@micheleidesmith
Trust
• Users lost trust in the information if there were false
positives e.g.
• “oxide” is not an organism, but “oxidae” is
• “ubiquitin” is a process not a gene/protein

@micheleidesmith
Feedback
• Machine annotations are not perfect, so we offered
users a way to provide feedback:

“If you do disease, could
you do variation? That
would be a killer.”

@micheleidesmith
Annotation types
• Users made suggestions for other types of
annotation that would be useful to them
• Our platform enables other text mining groups to
provide annotations, as we don’t have capacity

“I thought that if it [annotations
control panel] had something to tell
me about those things, it would
already be there”

@micheleidesmith
Discoverability
• We can only show annotations on articles with a CC-
BY, CC-BY-NC or CC-0 license
• We can't display numbers in brackets due to the
performance impact on page loading
• Participants didn’t want highlights on by default
• Some people claim to ignore the right column

@micheleidesmith
“It would look like a Christmas
tree!...For me it would be quite
disturbing in terms of reading”

“I think it’s good that you can click more
than one. Because you can more easily
associate proteins or genes with GO, or the
organism. Which is very good. I would look
for yellow close to blue or orange.”

“The details one is an extra level
of clicking that’s frustrating. This
[structure diagram] is great.”

@micheleidesmith
Engagement
• Once annotations were highlighted in the text,
participants didn’t necessarily realise they could
interact with them
• They expected to see something useful, which makes
clicking on the annotation worthwhile

“Maybe I’m trying to be too lazy”.
“With my curator hat on accession numbers are
exciting, but I wouldn’t want to have to scroll
through the article to see if there was one.”
“If you click on organisms I’d expect it to expand
out and see the unique items e.g. zebraﬁsh”

@micheleidesmith
Navigation
• Participants wanted to jump straight to highlighted
terms in the text
• They also wanted to navigate through highlights
• Some expected to see a list of terms that appear in
the text under the checkbox

@micheleidesmith
So did we meet our
goal?

“I really think this is amazingly useful
to have all the names of the genes
highlighted because you can get a
quick overview, which is much better
than trying to read the text quickly.”

“I do like it, it’s clever! ...It makes
life much faster, rather than going
in and out….It makes information
and searching much faster”

@micheleidesmith
It’s early days and we still have
many improvements to make,
but early indications are positive

@micheleidesmith
Two important lessons
from working together

@micheleidesmith
1. User research and
feedback are essential for
improving text mining pipelines

@micheleidesmith
2. Compromises between
technical/performance constraints
and user needs are inevitable - but
make decisions together

Thank you for listening!

Designing with algorithms

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (13)

Similar to Designing with algorithms

Similar to Designing with algorithms (20)

More from Michele Ide-Smith

More from Michele Ide-Smith (20)

Recently uploaded

Recently uploaded (20)

Designing with algorithms