A case study presented at UX Cambridge 2016.
For hundreds of years, discoveries in science have been discussed, debated and advanced within the scientific literature. Finding evidence in the literature, to test a hypothesis, is fundamental to scientific research.
But finding evidence in scientific literature can be time consuming and difficult, especially as the number of published articles increases significantly each year. Advances in text mining technology offer the potential to make this task easier and quicker. Text miners are software engineers and subject experts who write algorithms to find useful information in vast amounts of unstructured text content. Deciding what information is useful to end users, and presenting it in an intuitive way, at the right point in time, is where UX can help.
This is a case study about annotating scientific terms and concepts in millions of research articles, with the goal to help life science researchers identify relevant information in articles quickly and easily. We explain how text miners, UX and developers collaborated; what we discovered about user needs; challenges and constraints we faced and iterative improvements we have made to the design.
1. Designing with
algorithms
How text miners and UX can work together
@micheleidesmith @j_h_kim
Michele Ide-Smith, Product Manager
Jee-Hyub Kim, Text Miner
European Bioinformatics Institute (EMBL-EBI)
3. @micheleidesmith
@micheleidesmith @j_h_kim
What we’ll cover
• Context - finding evidence in research literature
• What are annotations?
• What is text mining?
• Research insights and our design process
• Summary - what we learnt
12. “Sometimes it’s nicer to scan a PDF, in my
opinion...less scrolling and the figures are more
prominent. I really don’t like to read on the screen.”
“I can search in the PDF a little bit more easily than
in the full text article.”
“This [full text] is fairly clear but sometimes PDFs are
slightly easier to read, slightly easier on the eye.”
14. “I almost never look at PDFs, they are a bit of a pain.”
“I never go to the publisher site - I like to see all the
articles in the same format. I don’t go to the PDF
unless I want to print it out.”
15. @micheleidesmith
@micheleidesmith @j_h_kim
Our users
• Life sciences researchers - find evidence for their
research questions, learn new methods and find all
available literature on a topic
• Curators - find evidence for e.g. a gene function so
that they can curate a page in a database
34. @micheleidesmith
@micheleidesmith @j_h_kim
Scientific literature
• Biological terms e.g. diseases, organisms, genes,
proteins and chemicals (using ontologies).
• Biological processes and functions e.g. gene-
disease relationships, protein-protein interactions or
gene function (from proximity of words in text and
position in the article)
38. @micheleidesmith
@micheleidesmith @j_h_kim
Skim
read
abstracts Look at
figures
Skim
read
results
CTRL & F
to find
keywords
in text
Check
for data
files
Prioritise what
to read
Researchers prioritise what
they want to read, as their
time is limited.
They use different strategies
to identify articles which are
worth reading in full.
41. @micheleidesmith
@micheleidesmith @j_h_kim
Research questions
• Do participants discover/use the feature?
• How easy is it to use/navigate through annotations?
• Do they trust the information?
• How do they feel about inaccurate annotations?
• Would they provide feedback if they had the
opportunity?
47. “If it’s not specific enough,
I end up with a lot of
things being highlighted.”
48. @micheleidesmith
@micheleidesmith @j_h_kim
Granularity
• Some terms appeared too frequently, or were too
general to be useful e.g. “cell” or “formation”.
• Participants expected us to split Gene Ontology (GO)
terms into 3 separate categories e.g. Biological
process, molecular function, cellular component
49. “I guess false positives
automatically make me anxious
about whether to believe…"
57. @micheleidesmith
@micheleidesmith @j_h_kim
Discoverability
• We can only show annotations on articles with a CC-
BY, CC-BY-NC or CC-0 license
• We can't display numbers in brackets due to the
performance impact on page loading
• Participants didn’t want highlights on by default
• Some people claim to ignore the right column
60. “I think it’s good that you can click more
than one. Because you can more easily
associate proteins or genes with GO, or the
organism. Which is very good. I would look
for yellow close to blue or orange.”
62. “The details one is an extra level
of clicking that’s frustrating. This
[structure diagram] is great.”
63. @micheleidesmith
@micheleidesmith @j_h_kim
Engagement
• Once annotations were highlighted in the text,
participants didn’t necessarily realise they could
interact with them
• They expected to see something useful, which makes
clicking on the annotation worthwhile
64. “Maybe I’m trying to be too lazy”.
“With my curator hat on accession numbers are
exciting, but I wouldn’t want to have to scroll
through the article to see if there was one.”
“If you click on organisms I’d expect it to expand
out and see the unique items e.g. zebrafish”
69. “I really think this is amazingly useful
to have all the names of the genes
highlighted because you can get a
quick overview, which is much better
than trying to read the text quickly.”
70. “I do like it, it’s clever! ...It makes
life much faster, rather than going
in and out….It makes information
and searching much faster”