Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Paolo ciccarese DILS 2013 keynote

Slides presented for the Keynote at DILS 2013 in Montreal, Canada

Related Books

Free with a 30 day trial from Scribd

See all
  • Login to see the comments

Paolo ciccarese DILS 2013 keynote

  1. 1. Open Annotation (in Biomedicine) Mass General Hospital Harvard Medical School Annotation, Semantic Annotation and Keeping the right crowd in the loop Paolo Ciccarese, PhD @paolociccarese
  2. 2. • How do we get the best up to date knowledge to the final users* preserving the historical record? • How do we involve experts in the knowledge creation/extraction process? Research Questions Paolo Ciccarese, PhD DILS 2013 * healthcare providers, researchers, scientists, scholars, librarians, students…
  3. 3. Salesman: Answer is simple • By crowd-sourcing annotation and semantic annotation • Annotation – intuitive and agile – micro data integration – traceable – large scale – unstructured/structured – manual/automatic/semi-automatic – supports disagreement – personal/groups/public – velocity and fast turn – … Paolo Ciccarese, PhD DILS 2013
  4. 4. Scientist: Answer not that simple but slowly things are getting better • Growing interest in annotation • Annotation is an important tool to be combined with other methods • It nicely allows to keep knowledgeable human agents in the loop • Still lots of research to be done but we have a standard and tools are improving fast • Right time to annotate!!! Paolo Ciccarese, PhD DILS 2013
  5. 5. Annotation in teaching: learning from the expertsGregNagy,professorof ClassicsatHarvardUniversity DirectoroftheHarvardCenter forHellenicStudiesinWashingtonDC GaryKing,ProfessorofGovernment DirectorfortheInstitutefor QuantitativeSocialScience atHarvardUniversity http://www.annotations.harvard.edu/ Paolo Ciccarese, PhD DILS 2013 MOOCs, edX, HarvardX, MITX
  6. 6. Annotation Convergence Workshop 2013 • More than 100 participants from Harvard (plus visitors) • More than 25 annotation related presentations • Morning session videos are online http://www.annotations.harvard.edu/ Paolo Ciccarese, PhD DILS 2013 Big interest from libraries
  7. 7. Harvard Library Cloud Harvard Libraries, how do we make them discoverable and how do we integrate such a great variety of resources. Data integration gets more value out of existing records. David Weinberger, Writer, Senior researcher at the Berkman Center and co-director of the Harvard Library Innovation Lab. There is only so much you can do at the record level. When you have scholars and students… they are doing the work of discovering the relationships between the parts. Annotation is the platform http://www.librarycloud.org/ Paolo Ciccarese, PhD DILS 2013
  8. 8. Filtered Push (Biodiversity) There are 2-3 billions specimens and it has been estimated1 that no more than 3% have any digital record Emeritus Professor University of Massachusetts Boston IT Research Staff Harvard University Herbaria 1. ARTURO H.ARIÑO, APPROACHES TO ESTIMATING THE UNIVERSE OF NATURAL HISTORY COLLECTIONS DATA; Biodiversity Informatics, 7, 2010, pp. 81 – 92 ; 2. Nelson et al. Five task clusters that enable efficient and effective digitization of biological collections, ZooKeys 209: 19–45, doi: 10.3897/zookeys.209.3135 2 BobMorris http://wiki.filteredpush.org/ Paolo Ciccarese, PhD DILS 2013
  9. 9. Research Objects StianSoiland-Reyes,Researcher, UniversityofManchester,UK Carole Goble full professor School of Computer Science University of Manchester, UK How can we record research for anticipated but also unanticipated re-use? http://wiki.myexperiment.org/index.php/Research_Objects Paolo Ciccarese, PhD DILS 2013
  10. 10. Neuroscience Information Framework (NIF) Professor in Residence, Department of Neurosciences, UCSD Co-Director, National Center for Microscopy and Imaging Research (NCMIR) MaryannMartone,PhDhttp://neuinfo.org A dynamic inventory of Web-based neuroscience resources: data, materials, and tools accessible via anycomputer connected to theInternet. Annotation can be used to link scientific literature with the NIF resources such as antibodies and animal strains and mutants Paolo Ciccarese, PhD DILS 2013
  11. 11. A (few?) years back… Paolo Ciccarese, PhD DILS 2013
  12. 12. Data integration learned in College • University of Pavia (Italy) mid/late-Nineties • Software engineering: Databases integration Paolo Ciccarese, PhD DILS 2013 Knowledge
  13. 13. Hypertensions databases integration • Electronic Patient Records from several institutions and departments • Creating a normalized database for analysis of patient data • ‘Classic’ integration issues – Columns nature – Formats (names, dates and unit of measures) – Unstructured content – Social interactions (assisted annotation of records) • Tacit  Explicit knowledge/semantics Annotation of patient records Paolo Ciccarese, PhD DILS 2013 After 15 years I still get at least an email a month on this topic
  14. 14. Data integration during my PhD • University of Pavia (Italy) 2001-2004 • PhD in Bioengineering and Bioinformatics • Evidence Based Clinical Decision Support Paolo Ciccarese, PhD DILS 2013 Knowledge
  15. 15. Hypothesis (EBM) • If we deliver up to date computerized clinical practice guidelines to the point of care – We will provide decision support reducing errors, malpractice and costs – We will improve the quality of care by leveraging the best scientific evidence – We will be able to collect structured data for updating the guidelines speeding up the guidelines creation/dissemination process. Paolo Ciccarese, PhD DILS 2013
  16. 16. CPG representation and enactment Annotation of clinical guidelines Paolo Ciccarese, PhD DILS 2013 After 12 years I still review ‘innovative’ papers on the topic
  17. 17. The Guide Project* (1999-2004) • Beyond Evidence Based clinical decision support – integrates a formalized model of the medical knowledge expressed in clinical guidelines and protocols with both WorkFlow Management Systems and Electronic Patient Record technologies *Guide on OpenClinical: http://www.openclinical.org/gmm_guide.html P Ciccarese, E Caffi, S Quaglini, M Stefanelli Architectures and tools for innovative health information systems: the Guide Project International journal of medical informatics 74 (7-8), 553-562, 2005 Paolo Ciccarese, PhD DILS 2013
  18. 18. The Guide Project (1999-2004) • Integrated Clinical KnowledgeManagement infrastructure through separation of concerns (SoC) Integration: -Datatypes system - Terminologies - Contracts (XML) - Web Services (WSDL) -Social interaction Paolo Ciccarese, PhD DILS 2013
  19. 19. Guide: lesson learned (1) • Guidelines are semi-structured knowledge that is hard to be formalized directly by medical operators or knowledge engineers alone (we needed both) • Interaction between health care providers and knowledge engineers causes behavioral modifications for both • Annotation was a big part of the process and it made feel the physicians in control Paolo Ciccarese, PhD DILS 2013
  20. 20. Guide: lesson learned (2) • Knowledge extraction and encoding in a three steps process 1. From paper to a list of recommendations (possibly using markup/annotation tools?) 2. From the recommendations to a flow-chart like model where all the entities (agents, patients variables, drugs) were explicit (< semantics) 3. From the flow-chart like model to a formal model Paolo Ciccarese, PhD DILS 2013
  21. 21. Guide: lesson learned (3) • The architecture demonstrated to be robust and scalable – Datatypes, Terminologies, Contracts, Web Services and XML were good for components to communicate • But the semantics was still not completely explicit – XML not ideal to represent knowledge and graphs – Data integration was relying on tacit knowledge – Low quality of patient data in the EPRs • How about ontologies… and RDF? Paolo Ciccarese, PhD DILS 2013 Prof. Barry Smith
  22. 22. Semantics at work… Protégé EON, Sage • Frame-based logic with Protégé for Knowledge representation – Clinical practice guidelines – Domain ontologies – Virtual medical record – Organizational entities Samson Tu Stanford University Prof. Mark Musen Stanford University http://www.openclinical.org/gmm_eon.html http://www.openclinical.org/gmm_sage.html Paolo Ciccarese, PhD DILS 2013
  23. 23. Growing Interest for Semantic Technologies lead me to Boston • Simile (2003-2006): Semantic Interoperability of Metadata and Information in unLike Environments – to enhance inter-operability among digital assets, schemata/vocabularies/ontologies, metadata, and services. • PIs: Eric Miller (Zephira), David Karger (MIT) and McKenzie Smith (UC Davis) Paolo Ciccarese, PhD DILS 2013
  24. 24. Stefano Mazzocchi Google Inc David Huynh, PhD Google Inc Simile widgets • Exhibit • Timeline • Timeplot • Welkin and Vicino • Piggy Bank • Potluck • Playgroud Paolo Ciccarese, PhD DILS 2013
  25. 25. Piggy Bank http://simile.mit.edu/wiki/Piggy_Bank Paolo Ciccarese, PhD DILS 2013
  26. 26. Simile Potluck http://simile.mit.edu/potluck/ Paolo Ciccarese, PhD DILS 2013
  27. 27. Simile Playground • Combined most of the Simile technologies • Data extraction, semantic integration, annotation and publishing in the same platform… in the browser!!! http://simile.mit.edu/wiki/Playground Paolo Ciccarese, PhD DILS 2013
  28. 28. Boston (Summer 2006) Clinical Space-> Neurology Research Paolo Ciccarese, PhD DILS 2013
  29. 29. SWAN (Semantic Web Applications in Neuromedicine) (2004-2010) • Developing cures for highly complex diseasesrequires extensive interdisciplinary collaboration and exchange of biomedical information in context. • Our ability to exchange such information across sub- specialties today is limited by the current scientific knowledge ecosystem’s inability to properly contextualize and integrate data and discourse in machine-interpretable form. June Kinoshita Tim Clark Director of MIND Informatics Mass General Hospital Paolo Ciccarese, PhD DILS 2013
  30. 30. A ‘structured’ view of a publication classic publication scientific discourse ‘semantic’ representation http://tinyurl.com/cgyna2m Semantic Web Applications in Neuromedicine (SWAN) project [2007] Paolo Ciccarese, PhD DILS 2013 Annotation of scientific papers
  31. 31. AlzSWAN Curation Process Paolo Ciccarese, PhD DILS 2013 http://hypothesis.alzforum.org
  32. 32. AlzSwan: the SWAN-Alzheimer KB http://hypothesis.alzforum.org/ http://hypothesis.alzforum.org Paolo Ciccarese, PhD DILS 2013
  33. 33. Goldehypothesis Paolo Ciccarese, PhD DILS 2013
  34. 34. A claim Paolo Ciccarese, PhD DILS 2013
  35. 35. Paolo Ciccarese, PhD DILS 2013 Nature News: Literature mining: Speed reading (27 January 2010)
  36. 36. NaturePaolo Ciccarese, PhD DILS 2013 http://hypothesis.alzforum.org
  37. 37. SWAN in numbers (1.5 years) • 2398 Research Statements – 184 Hypothesis • 60 deeply annotated • 124 simply annotated – 2214 Claims • 61 Research Questions • 48 Comments • 2825 Journal Articles Paolo Ciccarese, PhD DILS 2013 Less papers than those published in a week on the topic
  38. 38. SWAN, data integration and interoperability • RDF, Triple Store and SPARQL • Integration of data from PubMed, UniProt, PRO, GO, data repositories • Ontologies (OWL DL) – SWAN (Scientific Discourse) – PAV (Provenance Authoring and Versioning) – CO (Collections) • ≈ Linked Data Paolo Ciccarese, PhD DILS 2013 PROV Nanopublications Elsevier Satellite Research Objects …
  39. 39. W3C HCLS Working Group Notes Paolo Ciccarese, PhD DILS 2013
  40. 40. SWAN: lesson learned (1) • Labor intensive + subjectivity + loss of context (missed links back to the original content) • Full article representation not attractive, scientists want to ‘formalize’ only what is interesting for them at that very moment (during their normal activities) • Form based approach not efficient (too many copy and paste involved) Paolo Ciccarese, PhD DILS 2013
  41. 41. SWAN: lesson learned (2) • Discourse elements can be further structured (relationships provided value but text is not actionable) – see nanopublications, HyBrow, HyQue, BEL • Integration with external sources not trivial (normalized models)… and we needed more! Paolo Ciccarese, PhD DILS 2013
  42. 42. Semantic Resources Project • Antibodies • Mouse Models • Protein Ontology extensions for APP • Ontology Broker (adding new temporary terms to the ontologies during the activities) AlanRuttenbergJonathanReeshttp://neurocommons.org/page/Semantic_resources_project Paolo Ciccarese, PhD DILS 2013 Timothy Danford
  43. 43. … thinking of SWAN 2… But wait a minute… Unstructured Knowledge Annotation Structured Knowledge Structured Knowledge Annotation Better Structured Knowledge Paolo Ciccarese, PhD DILS 2013 How can we build SWAN, Guide and, at the same time be helpful to a larger crowd?
  44. 44. Science is big • As (biomedical) scientists we deal with an increasing amount of digital/online resources: publications, dataset/databases, big data, reports, grants, images, videos, guidelines, protocols, vocabularies, linked data, software.. • Journal publications are still the peak of the iceberg (bottleneck?) of science: • About 150-250 articles a week • 10mins/article ≈ 34 hours/week Paolo Ciccarese, PhD DILS 2013
  45. 45. Science is social • We publish and participate to conferences in order to contribute to and be part of science • We belong to formal/informal and vertical/horizontal scientific communities • We communicate with colleagues via emails, voice, video; we broadcast to colleagues through publications, blogs, screencasts, twitter, social networks… • We build on each other’s work! Paolo Ciccarese, PhD DILS 2013
  46. 46. Science is connected CourtesyofTimClark Paolo Ciccarese, PhD DILS 2013
  47. 47. … and with the new technologies The Journal of Laryngology, Rhinology, and Otology Volume 29 / Issue 10 / October 1914, pp 500-510 Better access and links Paolo Ciccarese, PhD DILS 2013
  48. 48. Network of knowledge How do we keep track of it? Paolo Ciccarese, PhD DILS 2013
  49. 49. … we commonly use annotation • We annotate prints, HTML and PDFs • We bookmark/tag web pages… • … and publications (citations/references) • We comment on web pages, blogs, forums and emails • youtube, vimeo, flickrslideshare,twitter… Paolo Ciccarese, PhD DILS 2013
  50. 50. How is that working out for you? • Can you integrate annotations? • Can you leverage machine computation? • Can you share it easily with your colleagues? • Can you capitalize on the work of colleagues? • Can you easily discover valuable resources? • Can you integrate it with other resources? • Can you detect the up-to-date science? • … Paolo Ciccarese, PhD DILS 2013
  51. 51. Annotation and Semantics And Open!!! A generic model and platform for creating annotation and semantic annotation on any online content Paolo Ciccarese, PhD DILS 2013
  52. 52. Annotation Ontology (AO) - 2009 • OWL vocabulary for representing and sharing annotation of digital resources (text, images, audio, video, …) and their fragments in RDF format • Focus on biomedicine and sciences. But desire to make the AO framework more broadly usable. Ciccarese et al, 2011 An open annotation ontology for science on web 3.0 J Biomed Semantics 2011, 2(Suppl 2):S4 (17 May 2011) Paolo Ciccarese, PhD DILS 2013
  53. 53. Annotation Ontology crowd The Living Document Project Biotea Paolo Ciccarese, PhD DILS 2013
  54. 54. Open Annotation Collaboration • Focus on interoperability for annotations in order to allow sharing of annotations across: – Annotation clients; – Content collections; – Services that leverage annotations. • Focus on annotation for scholarly purposes. But desire to make the OAC framework more broadly usable. http://openannotation.org/ Paolo Ciccarese, PhD DILS 2013
  55. 55. Interoperability starts from people • OA started with the reconciliation of – Open Annotation Collaboration (OAC) – Annotation Ontology (AO) Paolo Ciccarese, PhD DILS 2013
  56. 56. W3C Open Annotation Community Group • 93 participants from around the world: 5th of 132 groups Paolo Ciccarese, PhD DILS 2013 http://www.w3.org/community/openannotation/
  57. 57. Open Annotation Model (Feb 2013) http://www.openannotation.org/spec/core/ Paolo Ciccarese, PhD DILS 2013
  58. 58. Web Annotation Tool • Domeo is a web application for producing and sharingstand-off annotation • Science and semantics linked in a few clicks • Domeo is open source and designed as an open system… we are working to make it easier to customize. – http://annotationframework.org – https://twitter.com/DomeoTool Paolo Ciccarese, PhD DILS 2013
  59. 59. Annotating while we are reading Paolo Ciccarese, PhD DILS 2013
  60. 60. Manual and automatic annotation URLIamannotating Manualannotationtools Automaticannotationtools Exploration panels Paolo Ciccarese, PhD DILS 2013
  61. 61. Manual annotation: notes/comments Paolo Ciccarese, PhD DILS 2013
  62. 62. Semantic tagging NCBO BioPortal NIF Registry Domeo can query external services and use as qualifiers anything that has a unique identifier. Paolo Ciccarese, PhD DILS 2013
  63. 63. Semantic tagging We could refer to historic figures, galaxies, places, events… Paolo Ciccarese, PhD DILS 2013
  64. 64. Semantic Tag on text Links to further readings and additional resources Annotation and Pop-up Paolo Ciccarese, PhD DILS 2013
  65. 65. Image annotation Paolo Ciccarese, PhD DILS 2013
  66. 66. Image annotation By semantically tagging figures in a paper, I make them discoverable… And we can integrate inference capabilities Paolo Ciccarese, PhD DILS 2013
  67. 67. Defining permissions (annotation sets) Paolo Ciccarese, PhD DILS 2013
  68. 68. Support for extensions: antibodies Contributed to PubMedLinkOut through NIF (http://neuinfo.org) Translates into a formal OWL/RDF representation Antibodyregistry.org Paolo Ciccarese, PhD DILS 2013
  69. 69. Hypotheses management (v1) Translates into a formal OWL/RDF representation (SWAN Ontology) Possibility for integrating Nanopublications and BEL Data as evidence Paolo Ciccarese, PhD DILS 2013
  70. 70. Hypotheses management (SWAN) classic publication scientific discourse ‘semantic’ representation Semantic Web Applications in Neuromedicine (SWAN) project [2007] Paolo Ciccarese, PhD DILS 2013
  71. 71. Hypotheses management (SWAN) graph representation Paolo Ciccarese, PhD NFAIS Workshop 2013
  72. 72. Infinite possibilities • Integration of Nanopubs, HyBrow, HyQue, BEL • Capturing microdata and metadata • Annotating videos, audios, 3D models, database records • Plug-ins for: Clinical guidelines, Clinical trials, Drug-drug interaction, Protocols, Databases curation • Legislation, Astronomy, Humanities • … Paolo Ciccarese, PhD DILS 2013
  73. 73. Text mining Paolo Ciccarese, PhD DILS 2013
  74. 74. Reflect http://reflect.ws/ Paolo Ciccarese, PhD DILS 2013
  75. 75. Domeo Text Mining Selection Paolo Ciccarese, hD NFAIS Workshop 2013 Domeo can trigger external text mining services and transform the results into annotation (that can be annotated) - NCBO Annotator, NIF Annotator, Textpresso, UMIA based algorithms Many other possibilities - SADI services - WhatIzIt - DBPedia Spotlight Paolo Ciccarese, PhD DILS 2013
  76. 76. Text Mining Results Paolo Ciccarese, PhD DILS 2013
  77. 77. Text mining services comparison and improvement Text Mining Results and social-curation Paolo Ciccarese, PhD DILS 2013
  78. 78. Support for comments/discussions Paolo Ciccarese, PhD DILS 2013
  79. 79. Domeo supports extraction pipelines Paolo Ciccarese, PhD DILS 2013
  80. 80. Self Reference Paolo Ciccarese, PhD DILS 2013
  81. 81. References Paolo Ciccarese, PhD DILS 2013
  82. 82. References are annotations! Paolo Ciccarese, PhD DILS 2013
  83. 83. Virtual bibliography Paolo Ciccarese, PhD DILS 2013
  84. 84. Extend your reading Paolo Ciccarese, PhD DILS 2013
  85. 85. Search example Paolo Ciccarese, PhD DILS 2013
  86. 86. Serialization in AO/RDF working on OA Paolo Ciccarese, PhD DILS 2013
  87. 87. Utopia for PDF Paolo Ciccarese, PhD DILS 2013 http://getutopia.com
  88. 88. Integration through APIs (ex NIF) PubMedLinkouts!! Paolo Ciccarese, PhD DILS 2013
  89. 89. Stemcell Paolo Ciccarese, PhD DILS 2013 http://http://www.stembook.org/
  90. 90. Stembook.org and Domeo Paolo Ciccarese, PhD DILS 2013
  91. 91. Integration with Drupal 7 (Biblio module) ThankstoStephaneCorlosquetDrupalCoredeveloepr Paolo Ciccarese, PhD DILS 2013
  92. 92. In conclusion… • Consider annotation as first class citizen for your projects… annotation is a great ubiquitous way to keep the crowd in the loop • Consider using the Open Annotation Model and joining the community… we can help! • Domeo is a complete playground/framework for creating and sharing semantic annotation • There are lots of other open source tools… Paolo Ciccarese, PhD DILS 2013
  93. 93. annotator.js (Text) • Open Knowledge Foundation Project for text annotation: easy to integrate and supports extensions Paolo Ciccarese, PhD DILS 2013 http://okfnlabs.org/annotator/
  94. 94. annotorious.js (Images) • Image annotation: to add drawing and commenting to images in web pages Paolo Ciccarese, PhD DILS 2013 http://annotorious.github.io/
  95. 95. Shared Canvas (Manuscripts) Paolo Ciccarese, PhD DILS 2013 www.shared-canvas.org/
  96. 96. MapHub (Maps) • Maps annotation Paolo Ciccarese, PhD DILS 2013 http://maphub.github.io/
  97. 97. Paolo Ciccarese, PhD DILS 2013
  98. 98. Keep annotating… and sharing! Thank you Paolo Ciccarese, PhD DILS 2013

×