Overview of the different approaches for addressing reproducibilities (using semantics) in laboratory protocols, workflow description and publication and workflow infrastructure. Furthermore, Research Objects are introduced as a means to capture the context and annotations of scientific experiments, together with the privacy and IPR concerns that may arise. This presentation was presented in Dagstuhl Seminar 16041: http://www.dagstuhl.de/16041
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Reproducibility Using Semantics: An Overview
1. Reproducibility Using Semantics:
An Overview
Dagstuhl Seminar
Jan 2016
Daniel Garijo, Olga Giraldo, Idafen Santana-Pérez,
Victor Rodriguez Doncel, Oscar Corcho
Ontology Engineering Group
Universidad Politécnica de Madrid
Madrid, Spain
2. The Research Method in different disciplines
2
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
INVIVO/VITROINSILICO
DATASET SCIENTIFIC WORKFLOW
INFRASTRUCTURE
3. Some problems in lab protocols
some of them present
insufficient granularity,
the instructions can be
imprecise or ambiguous due to
the use of natural language.
• Incubate the
centrifuge tubes in a
water bath.
• Incubate the samples
for 5 min with gentle
shaking.
• Rinse DNA briefly in
1-2 ml of wash.
• Incubate at -20C
overnight.
3
5. Semantic annotation
The Protocol as a document
sp:application of the protocol
sp:advantage of the protocol
sp:limitation of the protocol
sp:provenance of the protocol
sp:purpose of the protocol
sp:introduction section
sp:buffer list
sp:equipment and supplies list
sp:kit list
sp:primer list
sp:reagent list
sp:software list
sp:solution list
sp:materials section
exact:caution
sp:critical step
sp:hint
sp:pause point
sp:storage condition
sp:timing
sp:troubleshooting
sp:methods section
sp:experimental
protocol
iao:document iao:document part
iao:textual entity iao:data set
owl:subClassOf
ro:hasPart
ro:partOf
owl:subClassOf
owl:subClassOfowl:subClassOf
ro:hasPart
ro:hasPart
ro:hasPart
ro:partOf
ro:partOf
ro:partOf
owl:subClassOf owl:subClassOf
exact:alert message
owl:subClassOf
sp:basic step of
DNA extraction
p-plan:Step
p-plan:Variable
sp:cell disruption
sp:plant tissue
Basic Steps of DNA Extraction
sp:DNA purification
obi:DNA extract
p-plan:hasInputVariable
p-plan:hasOutputVariable
p-plan:hasOutputVariable
owl:subClassOf
sp:digestion
reaction
sp:powdered tissue
owl:subClassOf owl:subClassOf
owl:subClassOf
p-plan:hasInputVariable
sp:digested
contaminant
p-plan:hasInputVariable
p-plan:hasOutputVariable
owl:subClassOfowl:subClassOfowl:subClassOfowl:subClassOf
bfo:isPrecededBy bfo:isPrecededBy
SMART Protocols ontology is available here:
http://vocab.linkeddata.es/SMARTProtocols/
GATE Smart Protocols
5
6. The Research Method in different disciplines
6
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
INVIVO/VITROINSILICO
DATASET SCIENTIFIC WORKFLOW
INFRASTRUCTURE
7. Vocabularies and methodologies for representing and publishing
workflows
7
Interactive
Browsing
(Pubby frontend)
Programatic access
(external apps)
Wings workflow
generation
OPM/PROV
conversion
Publication Share Reuse
Core
Portal
WINGS on local laptop
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on shared host
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on web server
Workflow
Template
Workflow
Instance
PROV
export
Linked
Data
Publication
Users
Other
workflow
environments
RDF
TripleStore
Workflow Provenance
Workflow Plan
Methodology for workflow publishing
Repository of linked workflows:
http://www.opmw.org/sparql
http://purl.org/net/p-plan
http://www.opmw.org/ontology/
Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.
Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked
Science 2012, Boston, 2012.
7
8. The Research Method in different disciplines
8
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
INVIVO/VITROINSILICO
DATASET SCIENTIFIC WORKFLOW
INFRASTRUCTURE
10. Some results
• Pegasus Montage Workflow
• Astronomy workflow
• Construct large image mosaics of the sky
• Montage Software distribution
• 59 binaries
• Target IaaS Cloud Providers
• Amazon EC2 & Futuregrid
• Vagrant
10
RO available at http://pegasus.isi.edu/publications/reppar
11. The Research Method in different disciplines
11
INPUT DATA LABORATORY PROTOCOL EQUIPMENT
INVIVO/VITROINSILICO
DATASET SCIENTIFIC WORKFLOW
INFRASTRUCTURE
+ CONTEXT!
12. Research Objects
ROs as web pages http://rohub.linkeddata.es/
ROs as part of a Linked Data Platform (alpha): http://purl.org/net/ldp4ro
12
13. How to preserve Workflows/Research Objects?
13
Three main ways/levels:
•Descriptive reproducibility
•Documentation
•Workflow execution reproducibility
•Can we run the workflow?
•Workflow results reproducibility
•Can we get the same results?
Checklists!
•Corcho et al: Checklist for workflow conservation.
•http://dx.doi.org/10.6084/m9.figshare.1285011
•40 different aspects
•Documentation
•Goals
•Results
•Metadata
•Corcho et al: Checklist for a workflow conservation plan
•http://dx.doi.org/10.6084/m9.figshare.1285012
•Based on the DCC’s data management plan
16. Acknowledgements
• The Semantic e-Science team at UPM
• Carlos Badenes
• Daniel Garijo
• Olga Giraldo
• Rafael González-Cabero
• Idafen Santana
• Victor Rodriguez Doncel
• The Wf4Ever team
• Carole Goble, José Manuel Gómez Pérez, Raúl Palma, Jun Zhao,
Stian Soiland-Reyes, Khalid Belhajjame, José Enrique Ruíz, Marco
Roos, Lourdes Verdes-Montenegro, Norman Morrison, Sean
Bechoffer, Graham Klyne, Matt Gamble, and a large etcetera
• The Research Object community group
• http://www.researchobject.org/
16