Paolo ciccarese DILS 2013 keynote

Open Annotation (in Biomedicine)
Mass General Hospital Harvard Medical School
Annotation, Semantic Annotation and
Keeping the right crowd in the loop
Paolo Ciccarese, PhD
@paolociccarese

• How do we get the best
up to date knowledge to
the final users* preserving
the historical record?
• How do we involve
experts in the knowledge
creation/extraction
process?
Research Questions
Paolo Ciccarese, PhD DILS 2013
* healthcare providers, researchers, scientists, scholars, librarians, students…

Salesman: Answer is simple
• By crowd-sourcing annotation
and semantic annotation
• Annotation
– intuitive and agile
– micro data integration
– traceable
– large scale
– unstructured/structured
– manual/automatic/semi-automatic
– supports disagreement
– personal/groups/public
– velocity and fast turn
– …

Scientist: Answer not that simple but
slowly things are getting better
• Growing interest in annotation
• Annotation is an important
tool to be combined with other
methods
• It nicely allows to keep
knowledgeable human agents
in the loop
• Still lots of research to be done
but we have a standard and
tools are improving fast
• Right time to annotate!!!

Annotation in teaching: learning from the expertsGregNagy,professorof
ClassicsatHarvardUniversity
DirectoroftheHarvardCenter
forHellenicStudiesinWashingtonDC
GaryKing,ProfessorofGovernment
DirectorfortheInstitutefor
QuantitativeSocialScience
atHarvardUniversity
http://www.annotations.harvard.edu/
MOOCs, edX, HarvardX, MITX

Annotation Convergence Workshop 2013
• More than 100
participants from
Harvard (plus visitors)
• More than 25
annotation related
presentations
• Morning session videos
are online
http://www.annotations.harvard.edu/
Big interest from libraries

Harvard Library Cloud
Harvard Libraries, how do we make them
discoverable and how do we integrate such a great
variety of resources. Data integration gets more
value out of existing records.
David Weinberger, Writer, Senior researcher
at the Berkman Center and co-director
of the Harvard Library Innovation Lab.
There is only so much you can do at the record
level. When you have scholars and students… they
are doing the work of discovering the relationships
between the parts. Annotation is the platform
http://www.librarycloud.org/

Filtered Push (Biodiversity)
There are 2-3 billions
specimens and it has been
estimated1 that no more than
3% have any digital record
Emeritus Professor University of Massachusetts Boston
IT Research Staff Harvard University Herbaria
1. ARTURO H.ARIÑO, APPROACHES TO ESTIMATING THE
UNIVERSE OF NATURAL HISTORY COLLECTIONS DATA;
Biodiversity Informatics, 7, 2010, pp. 81 – 92 ;
2. Nelson et al. Five task clusters that enable efficient
and
effective digitization of biological collections,
ZooKeys 209: 19–45, doi: 10.3897/zookeys.209.3135
2
BobMorris
http://wiki.filteredpush.org/

Research Objects
StianSoiland-Reyes,Researcher,
UniversityofManchester,UK
Carole Goble full professor
School of Computer Science
University of Manchester, UK
How can we record research
for anticipated but also
unanticipated re-use?
http://wiki.myexperiment.org/index.php/Research_Objects

Neuroscience Information Framework (NIF)
Professor in Residence,
Department of Neurosciences, UCSD
Co-Director, National Center for Microscopy
and Imaging Research (NCMIR)
MaryannMartone,PhDhttp://neuinfo.org
A dynamic inventory of Web-based neuroscience
resources: data, materials, and tools accessible
via anycomputer connected to theInternet.
Annotation can be used to link scientific
literature with the NIF resources such as
antibodies and animal strains and mutants

A (few?) years back…

Data integration learned in College
• University of Pavia (Italy) mid/late-Nineties
• Software engineering: Databases integration
Knowledge

Hypertensions databases integration
• Electronic Patient Records from several
institutions and departments
• Creating a normalized database for analysis of
patient data
• ‘Classic’ integration issues
– Columns nature
– Formats (names, dates and unit of measures)
– Unstructured content
– Social interactions (assisted annotation of records)
• Tacit  Explicit knowledge/semantics
Annotation of patient records
After 15 years I still get at least an email a month on this topic

Data integration during my PhD
• University of Pavia (Italy) 2001-2004
• PhD in Bioengineering and Bioinformatics
• Evidence Based Clinical Decision Support
Knowledge

Hypothesis (EBM)
• If we deliver up to date computerized clinical
practice guidelines to the point of care
– We will provide decision support reducing errors,
malpractice and costs
– We will improve the quality of care by leveraging
the best scientific evidence
– We will be able to collect structured data for
updating the guidelines speeding up the
guidelines creation/dissemination process.

CPG representation and enactment
Annotation of clinical guidelines
After 12 years I still review ‘innovative’ papers on the topic

The Guide Project* (1999-2004)
• Beyond Evidence Based clinical decision
support
– integrates a formalized model of the medical
knowledge expressed in clinical guidelines and
protocols with both WorkFlow Management
Systems and Electronic Patient Record
technologies
*Guide on OpenClinical: http://www.openclinical.org/gmm_guide.html
P Ciccarese, E Caffi, S Quaglini, M Stefanelli
Architectures and tools for innovative health information systems: the Guide Project
International journal of medical informatics 74 (7-8), 553-562, 2005

The Guide Project (1999-2004)
• Integrated Clinical KnowledgeManagement
infrastructure through separation of concerns
(SoC)
Integration:
-Datatypes system
- Terminologies
- Contracts (XML)
- Web Services (WSDL)
-Social interaction

Guide: lesson learned (1)
• Guidelines are semi-structured knowledge
that is hard to be formalized directly by
medical operators or knowledge engineers
alone (we needed both)
• Interaction between health care providers and
knowledge engineers causes behavioral
modifications for both
• Annotation was a big part of the process and
it made feel the physicians in control

• Knowledge extraction and encoding in a three
steps process
1. From paper to a list of recommendations (possibly
using markup/annotation tools?)
2. From the recommendations to a flow-chart like
model where all the entities (agents, patients
variables, drugs) were explicit (< semantics)
3. From the flow-chart like model to a formal model

• The architecture demonstrated to be robust and
scalable
– Datatypes, Terminologies, Contracts, Web Services
and XML were good for components to communicate
• But the semantics was still not completely explicit
– XML not ideal to represent knowledge and graphs
– Data integration was relying on tacit knowledge
– Low quality of patient data in the EPRs
• How about ontologies… and RDF?
Prof. Barry Smith

Semantics at work… Protégé EON, Sage
• Frame-based logic with
Protégé for Knowledge
representation
– Clinical practice guidelines
– Domain ontologies
– Virtual medical record
– Organizational entities
Samson Tu
Stanford University
Prof. Mark Musen
Stanford University
http://www.openclinical.org/gmm_eon.html
http://www.openclinical.org/gmm_sage.html

Growing Interest for Semantic
Technologies lead me to Boston
• Simile (2003-2006): Semantic Interoperability
of Metadata and Information in unLike
Environments
– to enhance inter-operability among digital assets,
schemata/vocabularies/ontologies, metadata, and
services.
• PIs: Eric Miller (Zephira), David Karger (MIT)
and McKenzie Smith (UC Davis)

Stefano Mazzocchi
Google Inc
David Huynh, PhD
Google Inc
Simile widgets
• Exhibit
• Timeline
• Timeplot
• Welkin and Vicino
• Piggy Bank
• Potluck
• Playgroud

Piggy Bank
http://simile.mit.edu/wiki/Piggy_Bank

Simile Potluck
http://simile.mit.edu/potluck/

Simile Playground
• Combined most of the Simile technologies
• Data extraction, semantic integration,
annotation and publishing in the same
platform… in the browser!!!
http://simile.mit.edu/wiki/Playground

Boston (Summer 2006)
Clinical Space-> Neurology Research

SWAN (Semantic Web Applications in
Neuromedicine) (2004-2010)
• Developing cures for highly
complex diseasesrequires
extensive interdisciplinary
collaboration and exchange of
biomedical information in
context.
• Our ability to exchange such
information across sub-
specialties today is limited by
the current scientific
knowledge ecosystem’s
inability to properly
contextualize and integrate
data and discourse in
machine-interpretable form.
June Kinoshita
Tim Clark
Director of MIND Informatics
Mass General Hospital

A ‘structured’ view of a publication
classic publication
scientific discourse ‘semantic’ representation
http://tinyurl.com/cgyna2m
Semantic Web Applications in Neuromedicine
(SWAN) project [2007]
Annotation of scientific papers

AlzSWAN Curation Process
http://hypothesis.alzforum.org

AlzSwan: the SWAN-Alzheimer KB
http://hypothesis.alzforum.org/

Goldehypothesis

A claim

Nature News: Literature mining: Speed reading (27 January 2010)

NaturePaolo Ciccarese, PhD DILS 2013

SWAN in numbers (1.5 years)
• 2398 Research Statements
– 184 Hypothesis
• 60 deeply annotated
• 124 simply annotated
– 2214 Claims
• 61 Research Questions
• 48 Comments
• 2825 Journal Articles
Less papers than
those published in
a week on the
topic

SWAN, data integration and
interoperability
• RDF, Triple Store and SPARQL
• Integration of data from PubMed, UniProt,
PRO, GO, data repositories
• Ontologies (OWL DL)
– SWAN (Scientific Discourse)
– PAV (Provenance Authoring and Versioning)
– CO (Collections)
• ≈ Linked Data
PROV
Nanopublications
Elsevier Satellite
Research Objects
…

W3C HCLS Working Group Notes

SWAN: lesson learned (1)
• Labor intensive + subjectivity + loss of context
(missed links back to the original content)
• Full article representation not attractive,
scientists want to ‘formalize’ only what is
interesting for them at that very moment
(during their normal activities)
• Form based approach not efficient (too many
copy and paste involved)

SWAN: lesson learned (2)
• Discourse elements can be further structured
(relationships provided value but text is not
actionable)
– see nanopublications, HyBrow, HyQue, BEL
• Integration with external sources not trivial
(normalized models)… and we needed more!

Semantic Resources Project
• Antibodies
• Mouse Models
• Protein Ontology
extensions for APP
• Ontology Broker
(adding new temporary
terms to the ontologies
during the activities)
AlanRuttenbergJonathanReeshttp://neurocommons.org/page/Semantic_resources_project
Timothy Danford

… thinking of SWAN 2…
But wait a minute…
Unstructured Knowledge
Annotation
Structured Knowledge
Structured Knowledge
Annotation
Better Structured Knowledge
How can we build SWAN, Guide and, at the same time
be helpful to a larger crowd?

Science is big
• As (biomedical) scientists we deal with an
increasing amount of digital/online resources:
publications, dataset/databases, big data,
reports, grants, images, videos, guidelines,
protocols, vocabularies, linked data, software..
• Journal publications are still the peak of the
iceberg (bottleneck?) of science:
• About 150-250 articles a week
• 10mins/article ≈ 34 hours/week

Science is social
• We publish and participate to conferences in
order to contribute to and be part of science
• We belong to formal/informal and
vertical/horizontal scientific communities
• We communicate with colleagues via emails,
voice, video; we broadcast to colleagues
through publications, blogs, screencasts,
twitter, social networks…
• We build on each other’s work!

Science is connected
CourtesyofTimClark

… and with the new technologies
The Journal of Laryngology, Rhinology, and Otology
Volume 29 / Issue 10 / October 1914, pp 500-510 Better access and links

Network of knowledge
How do we keep track of it?

… we commonly use annotation
• We annotate prints,
HTML and PDFs
• We bookmark/tag web
pages…
• … and publications
(citations/references)
• We comment on web
pages, blogs, forums and
emails
• youtube, vimeo,
flickrslideshare,twitter…

How is that working out for you?
• Can you integrate annotations?
• Can you leverage machine computation?
• Can you share it easily with your colleagues?
• Can you capitalize on the work of colleagues?
• Can you easily discover valuable resources?
• Can you integrate it with other resources?
• Can you detect the up-to-date science?
• …

Annotation and Semantics
And Open!!!
A generic model and platform for
creating annotation and semantic
annotation on any online content

Annotation Ontology (AO) - 2009
• OWL vocabulary for representing and sharing
annotation of digital resources (text, images,
audio, video, …) and their fragments in RDF
format
• Focus on biomedicine and sciences. But desire to
make the AO framework more broadly usable.
Ciccarese et al, 2011
An open annotation ontology for science on web 3.0
J Biomed Semantics 2011, 2(Suppl 2):S4 (17 May 2011)

Annotation Ontology crowd
The Living Document
Project
Biotea

Open Annotation Collaboration
• Focus on interoperability for annotations in
order to allow sharing of annotations across:
– Annotation clients;
– Content collections;
– Services that leverage annotations.
• Focus on annotation for scholarly purposes.
But desire to make the OAC framework more
broadly usable.
http://openannotation.org/

Interoperability starts from people
• OA started with the reconciliation of
– Open Annotation Collaboration (OAC)
– Annotation Ontology (AO)

W3C Open Annotation Community Group
• 93 participants from around the world: 5th of
132 groups
http://www.w3.org/community/openannotation/

Open Annotation Model (Feb 2013)
http://www.openannotation.org/spec/core/

Web Annotation Tool
• Domeo is a web application for producing and
sharingstand-off annotation
• Science and semantics linked in a few clicks
• Domeo is open source and designed as an
open system… we are working to make it
easier to customize.
– http://annotationframework.org
– https://twitter.com/DomeoTool

Annotating while we are reading

Manual and automatic annotation
URLIamannotating
Manualannotationtools
Automaticannotationtools
Exploration panels

Manual annotation: notes/comments

Semantic tagging
NCBO BioPortal
NIF Registry
Domeo can query external services and use as qualifiers anything that
has a unique identifier.

Semantic tagging
We could refer to historic figures, galaxies, places, events…

Semantic Tag on text
Links to further readings
and additional resources
Annotation and Pop-up

Image annotation

Image annotation
By semantically tagging figures in a paper, I make them discoverable…
And we can integrate inference capabilities

Defining permissions (annotation sets)

Support for extensions: antibodies
Contributed to PubMedLinkOut through NIF (http://neuinfo.org)
Translates into a formal OWL/RDF representation
Antibodyregistry.org

Hypotheses management (v1)
Translates into a formal OWL/RDF representation (SWAN Ontology)
Possibility for integrating
Nanopublications and BEL
Data as evidence

Hypotheses management (SWAN)
classic publication scientific discourse ‘semantic’ representation
Semantic Web Applications in Neuromedicine
(SWAN) project [2007]

Hypotheses management (SWAN)
graph representation
Paolo Ciccarese, PhD NFAIS Workshop 2013

Infinite possibilities
• Integration of Nanopubs, HyBrow, HyQue, BEL
• Capturing microdata and metadata
• Annotating videos, audios, 3D models, database
records
• Plug-ins for: Clinical guidelines, Clinical trials,
Drug-drug interaction, Protocols, Databases
curation
• Legislation, Astronomy, Humanities
• …

Text mining

Reflect
http://reflect.ws/

Domeo Text Mining Selection
Paolo Ciccarese, hD NFAIS Workshop 2013
Domeo can trigger external text mining services and transform the results
into annotation (that can be annotated)
- NCBO Annotator, NIF Annotator, Textpresso, UMIA based algorithms
Many other possibilities
- SADI services
- WhatIzIt
- DBPedia Spotlight

Text Mining Results

Text mining services comparison and improvement
Text Mining Results and social-curation

Support for comments/discussions

Domeo supports extraction pipelines

Self Reference

References

References are annotations!

Virtual bibliography

Extend your reading

Search example

Serialization in AO/RDF working on OA

Utopia for PDF
http://getutopia.com

Integration through APIs (ex NIF)
PubMedLinkouts!!

Stemcell
http://http://www.stembook.org/

Stembook.org and Domeo

Integration with Drupal 7 (Biblio module)
ThankstoStephaneCorlosquetDrupalCoredeveloepr

In conclusion…
• Consider annotation as first class citizen for
your projects… annotation is a great
ubiquitous way to keep the crowd in the loop
• Consider using the Open Annotation Model
and joining the community… we can help!
• Domeo is a complete playground/framework
for creating and sharing semantic annotation
• There are lots of other open source tools…

annotator.js (Text)
• Open Knowledge Foundation Project for text
annotation: easy to integrate and supports
extensions
http://okfnlabs.org/annotator/

annotorious.js (Images)
• Image annotation: to add drawing and
commenting to images in web pages
http://annotorious.github.io/

Shared Canvas (Manuscripts)
www.shared-canvas.org/

MapHub (Maps)
• Maps annotation
http://maphub.github.io/

Keep annotating… and sharing!
Thank you

Paolo ciccarese DILS 2013 keynote

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Paolo ciccarese DILS 2013 keynote

Similar to Paolo ciccarese DILS 2013 keynote (20)

More from Paolo Ciccarese

More from Paolo Ciccarese (6)

Recently uploaded

Recently uploaded (20)

Paolo ciccarese DILS 2013 keynote

Editor's Notes