TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Paolo ciccarese DILS 2013 keynote
1. Open Annotation (in Biomedicine)
Mass General Hospital Harvard Medical School
Annotation, Semantic Annotation and
Keeping the right crowd in the loop
Paolo Ciccarese, PhD
@paolociccarese
2. • How do we get the best
up to date knowledge to
the final users* preserving
the historical record?
• How do we involve
experts in the knowledge
creation/extraction
process?
Research Questions
Paolo Ciccarese, PhD DILS 2013
* healthcare providers, researchers, scientists, scholars, librarians, students…
3. Salesman: Answer is simple
• By crowd-sourcing annotation
and semantic annotation
• Annotation
– intuitive and agile
– micro data integration
– traceable
– large scale
– unstructured/structured
– manual/automatic/semi-automatic
– supports disagreement
– personal/groups/public
– velocity and fast turn
– …
Paolo Ciccarese, PhD DILS 2013
4. Scientist: Answer not that simple but
slowly things are getting better
• Growing interest in annotation
• Annotation is an important
tool to be combined with other
methods
• It nicely allows to keep
knowledgeable human agents
in the loop
• Still lots of research to be done
but we have a standard and
tools are improving fast
• Right time to annotate!!!
Paolo Ciccarese, PhD DILS 2013
5. Annotation in teaching: learning from the expertsGregNagy,professorof
ClassicsatHarvardUniversity
DirectoroftheHarvardCenter
forHellenicStudiesinWashingtonDC
GaryKing,ProfessorofGovernment
DirectorfortheInstitutefor
QuantitativeSocialScience
atHarvardUniversity
http://www.annotations.harvard.edu/
Paolo Ciccarese, PhD DILS 2013
MOOCs, edX, HarvardX, MITX
6. Annotation Convergence Workshop 2013
• More than 100
participants from
Harvard (plus visitors)
• More than 25
annotation related
presentations
• Morning session videos
are online
http://www.annotations.harvard.edu/
Paolo Ciccarese, PhD DILS 2013
Big interest from libraries
7. Harvard Library Cloud
Harvard Libraries, how do we make them
discoverable and how do we integrate such a great
variety of resources. Data integration gets more
value out of existing records.
David Weinberger, Writer, Senior researcher
at the Berkman Center and co-director
of the Harvard Library Innovation Lab.
There is only so much you can do at the record
level. When you have scholars and students… they
are doing the work of discovering the relationships
between the parts. Annotation is the platform
http://www.librarycloud.org/
Paolo Ciccarese, PhD DILS 2013
8. Filtered Push (Biodiversity)
There are 2-3 billions
specimens and it has been
estimated1 that no more than
3% have any digital record
Emeritus Professor University of Massachusetts Boston
IT Research Staff Harvard University Herbaria
1. ARTURO H.ARIÑO, APPROACHES TO ESTIMATING THE
UNIVERSE OF NATURAL HISTORY COLLECTIONS DATA;
Biodiversity Informatics, 7, 2010, pp. 81 – 92 ;
2. Nelson et al. Five task clusters that enable efficient
and
effective digitization of biological collections,
ZooKeys 209: 19–45, doi: 10.3897/zookeys.209.3135
2
BobMorris
http://wiki.filteredpush.org/
Paolo Ciccarese, PhD DILS 2013
10. Neuroscience Information Framework (NIF)
Professor in Residence,
Department of Neurosciences, UCSD
Co-Director, National Center for Microscopy
and Imaging Research (NCMIR)
MaryannMartone,PhDhttp://neuinfo.org
A dynamic inventory of Web-based neuroscience
resources: data, materials, and tools accessible
via anycomputer connected to theInternet.
Annotation can be used to link scientific
literature with the NIF resources such as
antibodies and animal strains and mutants
Paolo Ciccarese, PhD DILS 2013
12. Data integration learned in College
• University of Pavia (Italy) mid/late-Nineties
• Software engineering: Databases integration
Paolo Ciccarese, PhD DILS 2013
Knowledge
13. Hypertensions databases integration
• Electronic Patient Records from several
institutions and departments
• Creating a normalized database for analysis of
patient data
• ‘Classic’ integration issues
– Columns nature
– Formats (names, dates and unit of measures)
– Unstructured content
– Social interactions (assisted annotation of records)
• Tacit Explicit knowledge/semantics
Annotation of patient records
Paolo Ciccarese, PhD DILS 2013
After 15 years I still get at least an email a month on this topic
14. Data integration during my PhD
• University of Pavia (Italy) 2001-2004
• PhD in Bioengineering and Bioinformatics
• Evidence Based Clinical Decision Support
Paolo Ciccarese, PhD DILS 2013
Knowledge
15. Hypothesis (EBM)
• If we deliver up to date computerized clinical
practice guidelines to the point of care
– We will provide decision support reducing errors,
malpractice and costs
– We will improve the quality of care by leveraging
the best scientific evidence
– We will be able to collect structured data for
updating the guidelines speeding up the
guidelines creation/dissemination process.
Paolo Ciccarese, PhD DILS 2013
16. CPG representation and enactment
Annotation of clinical guidelines
Paolo Ciccarese, PhD DILS 2013
After 12 years I still review ‘innovative’ papers on the topic
17. The Guide Project* (1999-2004)
• Beyond Evidence Based clinical decision
support
– integrates a formalized model of the medical
knowledge expressed in clinical guidelines and
protocols with both WorkFlow Management
Systems and Electronic Patient Record
technologies
*Guide on OpenClinical: http://www.openclinical.org/gmm_guide.html
P Ciccarese, E Caffi, S Quaglini, M Stefanelli
Architectures and tools for innovative health information systems: the Guide Project
International journal of medical informatics 74 (7-8), 553-562, 2005
Paolo Ciccarese, PhD DILS 2013
18. The Guide Project (1999-2004)
• Integrated Clinical KnowledgeManagement
infrastructure through separation of concerns
(SoC)
Integration:
-Datatypes system
- Terminologies
- Contracts (XML)
- Web Services (WSDL)
-Social interaction
Paolo Ciccarese, PhD DILS 2013
19. Guide: lesson learned (1)
• Guidelines are semi-structured knowledge
that is hard to be formalized directly by
medical operators or knowledge engineers
alone (we needed both)
• Interaction between health care providers and
knowledge engineers causes behavioral
modifications for both
• Annotation was a big part of the process and
it made feel the physicians in control
Paolo Ciccarese, PhD DILS 2013
20. Guide: lesson learned (2)
• Knowledge extraction and encoding in a three
steps process
1. From paper to a list of recommendations (possibly
using markup/annotation tools?)
2. From the recommendations to a flow-chart like
model where all the entities (agents, patients
variables, drugs) were explicit (< semantics)
3. From the flow-chart like model to a formal model
Paolo Ciccarese, PhD DILS 2013
21. Guide: lesson learned (3)
• The architecture demonstrated to be robust and
scalable
– Datatypes, Terminologies, Contracts, Web Services
and XML were good for components to communicate
• But the semantics was still not completely explicit
– XML not ideal to represent knowledge and graphs
– Data integration was relying on tacit knowledge
– Low quality of patient data in the EPRs
• How about ontologies… and RDF?
Paolo Ciccarese, PhD DILS 2013
Prof. Barry Smith
22. Semantics at work… Protégé EON, Sage
• Frame-based logic with
Protégé for Knowledge
representation
– Clinical practice guidelines
– Domain ontologies
– Virtual medical record
– Organizational entities
Samson Tu
Stanford University
Prof. Mark Musen
Stanford University
http://www.openclinical.org/gmm_eon.html
http://www.openclinical.org/gmm_sage.html
Paolo Ciccarese, PhD DILS 2013
23. Growing Interest for Semantic
Technologies lead me to Boston
• Simile (2003-2006): Semantic Interoperability
of Metadata and Information in unLike
Environments
– to enhance inter-operability among digital assets,
schemata/vocabularies/ontologies, metadata, and
services.
• PIs: Eric Miller (Zephira), David Karger (MIT)
and McKenzie Smith (UC Davis)
Paolo Ciccarese, PhD DILS 2013
24. Stefano Mazzocchi
Google Inc
David Huynh, PhD
Google Inc
Simile widgets
• Exhibit
• Timeline
• Timeplot
• Welkin and Vicino
• Piggy Bank
• Potluck
• Playgroud
Paolo Ciccarese, PhD DILS 2013
27. Simile Playground
• Combined most of the Simile technologies
• Data extraction, semantic integration,
annotation and publishing in the same
platform… in the browser!!!
http://simile.mit.edu/wiki/Playground
Paolo Ciccarese, PhD DILS 2013
29. SWAN (Semantic Web Applications in
Neuromedicine) (2004-2010)
• Developing cures for highly
complex diseasesrequires
extensive interdisciplinary
collaboration and exchange of
biomedical information in
context.
• Our ability to exchange such
information across sub-
specialties today is limited by
the current scientific
knowledge ecosystem’s
inability to properly
contextualize and integrate
data and discourse in
machine-interpretable form.
June Kinoshita
Tim Clark
Director of MIND Informatics
Mass General Hospital
Paolo Ciccarese, PhD DILS 2013
30. A ‘structured’ view of a publication
classic publication
scientific discourse ‘semantic’ representation
http://tinyurl.com/cgyna2m
Semantic Web Applications in Neuromedicine
(SWAN) project [2007]
Paolo Ciccarese, PhD DILS 2013
Annotation of scientific papers
37. SWAN in numbers (1.5 years)
• 2398 Research Statements
– 184 Hypothesis
• 60 deeply annotated
• 124 simply annotated
– 2214 Claims
• 61 Research Questions
• 48 Comments
• 2825 Journal Articles
Paolo Ciccarese, PhD DILS 2013
Less papers than
those published in
a week on the
topic
38. SWAN, data integration and
interoperability
• RDF, Triple Store and SPARQL
• Integration of data from PubMed, UniProt,
PRO, GO, data repositories
• Ontologies (OWL DL)
– SWAN (Scientific Discourse)
– PAV (Provenance Authoring and Versioning)
– CO (Collections)
• ≈ Linked Data
Paolo Ciccarese, PhD DILS 2013
PROV
Nanopublications
Elsevier Satellite
Research Objects
…
40. SWAN: lesson learned (1)
• Labor intensive + subjectivity + loss of context
(missed links back to the original content)
• Full article representation not attractive,
scientists want to ‘formalize’ only what is
interesting for them at that very moment
(during their normal activities)
• Form based approach not efficient (too many
copy and paste involved)
Paolo Ciccarese, PhD DILS 2013
41. SWAN: lesson learned (2)
• Discourse elements can be further structured
(relationships provided value but text is not
actionable)
– see nanopublications, HyBrow, HyQue, BEL
• Integration with external sources not trivial
(normalized models)… and we needed more!
Paolo Ciccarese, PhD DILS 2013
42. Semantic Resources Project
• Antibodies
• Mouse Models
• Protein Ontology
extensions for APP
• Ontology Broker
(adding new temporary
terms to the ontologies
during the activities)
AlanRuttenbergJonathanReeshttp://neurocommons.org/page/Semantic_resources_project
Paolo Ciccarese, PhD DILS 2013
Timothy Danford
43. … thinking of SWAN 2…
But wait a minute…
Unstructured Knowledge
Annotation
Structured Knowledge
Structured Knowledge
Annotation
Better Structured Knowledge
Paolo Ciccarese, PhD DILS 2013
How can we build SWAN, Guide and, at the same time
be helpful to a larger crowd?
44. Science is big
• As (biomedical) scientists we deal with an
increasing amount of digital/online resources:
publications, dataset/databases, big data,
reports, grants, images, videos, guidelines,
protocols, vocabularies, linked data, software..
• Journal publications are still the peak of the
iceberg (bottleneck?) of science:
• About 150-250 articles a week
• 10mins/article ≈ 34 hours/week
Paolo Ciccarese, PhD DILS 2013
45. Science is social
• We publish and participate to conferences in
order to contribute to and be part of science
• We belong to formal/informal and
vertical/horizontal scientific communities
• We communicate with colleagues via emails,
voice, video; we broadcast to colleagues
through publications, blogs, screencasts,
twitter, social networks…
• We build on each other’s work!
Paolo Ciccarese, PhD DILS 2013
47. … and with the new technologies
The Journal of Laryngology, Rhinology, and Otology
Volume 29 / Issue 10 / October 1914, pp 500-510 Better access and links
Paolo Ciccarese, PhD DILS 2013
49. … we commonly use annotation
• We annotate prints,
HTML and PDFs
• We bookmark/tag web
pages…
• … and publications
(citations/references)
• We comment on web
pages, blogs, forums and
emails
• youtube, vimeo,
flickrslideshare,twitter…
Paolo Ciccarese, PhD DILS 2013
50. How is that working out for you?
• Can you integrate annotations?
• Can you leverage machine computation?
• Can you share it easily with your colleagues?
• Can you capitalize on the work of colleagues?
• Can you easily discover valuable resources?
• Can you integrate it with other resources?
• Can you detect the up-to-date science?
• …
Paolo Ciccarese, PhD DILS 2013
51. Annotation and Semantics
And Open!!!
A generic model and platform for
creating annotation and semantic
annotation on any online content
Paolo Ciccarese, PhD DILS 2013
52. Annotation Ontology (AO) - 2009
• OWL vocabulary for representing and sharing
annotation of digital resources (text, images,
audio, video, …) and their fragments in RDF
format
• Focus on biomedicine and sciences. But desire to
make the AO framework more broadly usable.
Ciccarese et al, 2011
An open annotation ontology for science on web 3.0
J Biomed Semantics 2011, 2(Suppl 2):S4 (17 May 2011)
Paolo Ciccarese, PhD DILS 2013
54. Open Annotation Collaboration
• Focus on interoperability for annotations in
order to allow sharing of annotations across:
– Annotation clients;
– Content collections;
– Services that leverage annotations.
• Focus on annotation for scholarly purposes.
But desire to make the OAC framework more
broadly usable.
http://openannotation.org/
Paolo Ciccarese, PhD DILS 2013
55. Interoperability starts from people
• OA started with the reconciliation of
– Open Annotation Collaboration (OAC)
– Annotation Ontology (AO)
Paolo Ciccarese, PhD DILS 2013
56. W3C Open Annotation Community Group
• 93 participants from around the world: 5th of
132 groups
Paolo Ciccarese, PhD DILS 2013
http://www.w3.org/community/openannotation/
57. Open Annotation Model (Feb 2013)
http://www.openannotation.org/spec/core/
Paolo Ciccarese, PhD DILS 2013
58. Web Annotation Tool
• Domeo is a web application for producing and
sharingstand-off annotation
• Science and semantics linked in a few clicks
• Domeo is open source and designed as an
open system… we are working to make it
easier to customize.
– http://annotationframework.org
– https://twitter.com/DomeoTool
Paolo Ciccarese, PhD DILS 2013
62. Semantic tagging
NCBO BioPortal
NIF Registry
Domeo can query external services and use as qualifiers anything that
has a unique identifier.
Paolo Ciccarese, PhD DILS 2013
63. Semantic tagging
We could refer to historic figures, galaxies, places, events…
Paolo Ciccarese, PhD DILS 2013
64. Semantic Tag on text
Links to further readings
and additional resources
Annotation and Pop-up
Paolo Ciccarese, PhD DILS 2013
66. Image annotation
By semantically tagging figures in a paper, I make them discoverable…
And we can integrate inference capabilities
Paolo Ciccarese, PhD DILS 2013
68. Support for extensions: antibodies
Contributed to PubMedLinkOut through NIF (http://neuinfo.org)
Translates into a formal OWL/RDF representation
Antibodyregistry.org
Paolo Ciccarese, PhD DILS 2013
69. Hypotheses management (v1)
Translates into a formal OWL/RDF representation (SWAN Ontology)
Possibility for integrating
Nanopublications and BEL
Data as evidence
Paolo Ciccarese, PhD DILS 2013
70. Hypotheses management (SWAN)
classic publication scientific discourse ‘semantic’ representation
Semantic Web Applications in Neuromedicine
(SWAN) project [2007]
Paolo Ciccarese, PhD DILS 2013
75. Domeo Text Mining Selection
Paolo Ciccarese, hD NFAIS Workshop 2013
Domeo can trigger external text mining services and transform the results
into annotation (that can be annotated)
- NCBO Annotator, NIF Annotator, Textpresso, UMIA based algorithms
Many other possibilities
- SADI services
- WhatIzIt
- DBPedia Spotlight
Paolo Ciccarese, PhD DILS 2013
91. Integration with Drupal 7 (Biblio module)
ThankstoStephaneCorlosquetDrupalCoredeveloepr
Paolo Ciccarese, PhD DILS 2013
92. In conclusion…
• Consider annotation as first class citizen for
your projects… annotation is a great
ubiquitous way to keep the crowd in the loop
• Consider using the Open Annotation Model
and joining the community… we can help!
• Domeo is a complete playground/framework
for creating and sharing semantic annotation
• There are lots of other open source tools…
Paolo Ciccarese, PhD DILS 2013
93. annotator.js (Text)
• Open Knowledge Foundation Project for text
annotation: easy to integrate and supports
extensions
Paolo Ciccarese, PhD DILS 2013
http://okfnlabs.org/annotator/
94. annotorious.js (Images)
• Image annotation: to add drawing and
commenting to images in web pages
Paolo Ciccarese, PhD DILS 2013
http://annotorious.github.io/
And talking about students the MOOCS are another amazing opportunity for annotation.
Hypertension study with 3-4 different databases to be ultimately cleaned up by hand. It was no fun at all.
Hypertension study with 3-4 different databases to be ultimately cleaned up by hand. It was no fun at all.
Before and during my PhD I’ve been then focusing on Evidence based decision support.We still have the problem of accessing patients data, but now we have also the problem of accessing organizational data and evidence-based guidelines/protocols.Normally every ward had a different database, most of them produced by small companies, very fragmented market. I saw XML as an easier way to convey knowledge. Problem is how the data are generated in first place.
clinical practice guideline, domain ontologies, a view of patient data (virtual medical record), and other entities (e.g. those that define roles in an organization)17min
SIMILE sought to enhance inter-operability among digital assets, schemata/vocabularies/ontologies, metadata, and services. A key challenge it solved was to make collections interoperable which are distributed across individual, community, and institutional stores -- by drawing on the assets, schemata/vocabularies/ontologies, and metadata held in such stores.MIT Libraries and MIT CSAIL (founding partners also included HP Laboratories and the World Wide Web Consortium) with support from the Andrew W. Mellon Foundation.So, what's the difference? Wikipedia says "Interoperability: the capability of different programs to exchange data via a common set of business procedures, and to read and write the same file formats and use the same protocols" and "Integration allows data from one device or software to be read or manipulated by another, resulting in ease of use." Yuck, those aren't much help.To me, interoperability means that two (or more) systems work together unchanged even though they weren't necessarily designed to work together. Integration means that you've written some custom code to connect two (or more) systems together. So integrating two systems which are already interoperable is trivial; you just configure them to know about each other. Integrating non-interoperable systems takes more work.The beauty of interoperability is that two systems developed completely independently can still work together. Magic? No, standards (or at least specifications, open or otherwise); see Open Standards in Everyday Life. Consider a Web services consumer that wants to invoke a particular WSDL, and a provider that implements the same WSDL; they'll work together, even if they were implemented independently. Why? Because they agree on the same WSDL (which may have come from a third party) and a protocol (such as SOAP over HTTP) discovered in the binding. How does the consumer discover the provider? Some registry, perhaps one that implements UDDI (which sucks, BTW). So SOAP, HTTP, WSDL, UDDI--all that good WS-I stuff--make Web services interoperable.Another example I like is the "X/Open Distributed Transaction Processing (DTP) model" (aka the XA spec); see "Configuring and using XA distributed transactions in WebSphere Studio." With it, a transaction manager by one vendor can use resource managers by other vendors. Even though they weren't all written for each other, they still work together because they follow the same spec. They're interoperable.Now consider two systems that weren't designed to be interoperable, or perhaps interoperable but with different specs. This requires integration. The integration code--could be Java, Message Broker, etc.; I co-authored a whole book on this--takes the interface one system expects and converts it to the one the other system provides. This is why WPS has stuff like Interface Maps and Business Object Maps.So, you want interoperable systems; integrating them is simple. Otherwise, you have to integrate them yourself.
26mins
Developing cures for highly complex diseases, such as neurodegenerative disorders, requires extensive interdisciplinary collaboration and exchange of biomedical information in context. Our ability to exchange such information across sub-specialties today is limited by the current scientific knowledge ecosystem’s inability to properly contextualize and integrate data and discourse in machine-interpretable form. This inherently limits the productivity of research and the progress toward cures for devastating diseases such as Alzheimer’s and Parkinson’s. The SWAN (Semantic Web Applications in Neuromedicine) ontology is an ontology for modeling scientific discourse and has been developed in the context of building a series of applications for biomedical researchers, as well as extensive discussions and collaborations with the larger bio-ontologies community. This document describes the SWAN ontology of scientific discourse.
But no scientist is an island, we know we cannot scale very well so we normally organized ourselves in groups
Scientists are connected and science isCredits http://www.tnca.org/2012/08/30/for-immediate-release-secretary-of-state-has-authority-to-stop-certification-of-election-and-should-use-it/
People are connected and so is science
For a resource we recognize we can FIND many other connected ones. FIND because most of the times these links are not there.We SPEND TIME searching and putting the network together and how do we keep track of it?