Short talk on Research Object and their use for reproducibility and publishing in the Systems Biology Commons Platform FAIRDOMHub, and the underlying software SEEK.
Forensic Biology & Its biological significance.pdf
Research Objects, SEEK and FAIRDOM
1. Research Objects,
SEEK4Science and
FAIRDOM
ProfessorCarole Goble CBE FREng FBCS
The University of Manchester, UK
The Software Sustainability Institute, UK
carole.goble@manchester.ac.uk
researchobject.org
Schloss Dagstuhl Seminar 16041 Reproducibility of Data-Oriented Experiments in e-Science, 25-29 January 2016
2. Metadata objects – not necessarily encapsulated!
Platform Independent framework to Bundle, Port and Link
(scattered) resources, related experiments, and hold context
Container
Packaging:
Zip files, Docker images, BagIt, …
Catalogues & Commons Platforms:
FAIRDOM SEEK, STELAR eLab
Manifest
Metadata
Describes the aggregated
resources, their
annotations and their
provenance
Manifest
3. Manifest Metadata
Manifest Construction
• Identification – id, title, creator, status….
• Aggregates – list of ids/links to resources
• Annotations – list of annotations about resources
Manifest
Manifest Description
• Checklists – what should be there
• Provenance – where it came from
• Versioning – its evolution
• Dependencies – what else is needed
Manifest
4. Manifest Construction
Unique identifiers as
names for things.
doi, epic, orcid, purl, RII,
Identifiers.org
Mechanism of
aggregation to group
things together.
OAI-ORE
Metadata about those
things & how they relate
to each other.
W3C OADM
http://w3id.org/ro/
5. FAIR Manifest Descriptions: RO types
Content Annotation Profiles
Checklist
Provenance
Versioning
Dependencies
NISO-JATS
Dublin Core
EFO ISA
SBML
JERM
Gamble, Goble, Klyne, Zhao
MIM:A Minimum Information
Model vocabulary and framework
for Scientific Linked Data,
IEEE 8th Intl Conf on eScience ,
2012
SED-ML
SBOL
MIAPE
6. FAIRDOMHub Sys Bio Commons
https://doi.org/10.15490/seek.1.investigation.56
7. Aggregating Commons
Links into public resources
Integration with public
resources
Integration with
modelling tools
FAIRDOM Repositories
SOPs
models
data
Launch local
executables
Integration with lab tools
Integration with lab
management systems
and local file stores
Direct Upload
BiVes
8. RO Unzip
• Reproducibility
• Versioning
• Systematic and
extensible meta-
data collection
• Cross platform
exchange
• Publishing
Living Snapshot
https://doi.org/10.15490/seek.1.investigation.56
https://dx.doi.org/10.1111/febs.13237
Research Objects: why, what and how, Examples, challengesIn practice the exchange, reuse and reproduction of scientific experiments is hard, dependent on bundling and exchanging the experimental methods, computational codes, data, algorithms, workflows and so on along with the narrative. These "Research Objects" are not fixed, just as research is not “finished”: codes fork, data is updated, algorithms are revised, workflows break, service updates are released. Neither should they be viewed just as second-class artifacts tethered to publications, but the focus of research outcomes in their own right: articles clustered around datasets, methods with citation profiles. Many funders and publishers have come to acknowledge this, moving to data sharing policies and provisioning e-infrastructure platforms. Many researchers recognise the importance of working with Research Objects. The term has become widespread. However. What is a Research Object? How do you mint one, exchange one, build a platform to support one, curate one? How do we introduce them in a lightweight way that platform developers can migrate to? What is the practical impact of a Research Object Commons on training, stewardship, scholarship, sharing? How do we address the scholarly and technological debt of making and maintaining Research Objects? Are there any examples? I’ll present our practical experiences of the why, what and how of Research Objects.
Scattered Assets across silos
Access, Interoperate, Reuse
Bridge and overlap between execution environment and publishing environment
Vagrant
BagIt is a hierarchical file packaging format designed to support disk-based storage and network transfer of arbitrary digital content. A "bag" consists of a "payload" (the arbitrary content) and "tags", which are metadata files intended to document the storage and transfer of the bag. A required tag file contains a manifest listing every file in the payload together with its corresponding checksum. The name, BagIt, is inspired by the "enclose and deposit" method,[1] sometimes referred to as "bag it and tag it“
ReproZip is another packaging system
Packaging – physical and logical containers
Open Archives Initiation Object Reuse and Exchange (OAI ORE) is a standard for describing aggregations of web resources
http://www.openarchives.org/ore/
Uses a Resource Map to describe the aggregated resources
Proxies allow for statements about the resources within the aggregation
Capturing context and viewpoints
Several concrete serialisations
RDF/XML, Atom, RDFa
Open Annotation specification is a community developed data model for annotation of web resources
http://www.openannotation.org/spec/core/
Developed by the W3C Open Annotation Community Group
Allows for “stand-off” annotations
Annotation as a first class citizen
Developed to fit with Web Architecture
How do you make a research object? Well, gather your resources, describe them in the manifest.
Different types of Containers can be used to transfer and package the Research Object;
The Research Object Bundle is a structured ZIP file format… but more specific and more general formats are also used, such a
Docker images (a bit low-level, capturing the whole execution environment)
BagIt (a digital archiving format that is commonly used by libraries), or
Simply existing Web resources (which may be subject to change).
You can register and archive research object in domain-specific repositories like FAIRDOM’s SEEK (system biology models), FARR Commons CKAN (public health medical data), technology-specific repositories (myExperiment for workflow-centric workflows), or generic data repositories you probably have already heard of, like Zenodo and Figshare.
Packaging – physical and logical containers
Open Archives Initiation Object Reuse and Exchange (OAI ORE) is a standard for describing aggregations of web resources
http://www.openarchives.org/ore/
Uses a Resource Map to describe the aggregated resources
Proxies allow for statements about the resources within the aggregation
Capturing context and viewpoints
Several concrete serialisations
RDF/XML, Atom, RDFa
Open Annotation specification is a community developed data model for annotation of web resources
http://www.openannotation.org/spec/core/
Developed by the W3C Open Annotation Community Group
Allows for “stand-off” annotations
Annotation as a first class citizen
Developed to fit with Web Architecture
Pericles could be looked at – for preservation.
Each RO adheres to a profile;
Core profiles are citation – e.g. NISO-JATS spec
Library: Dublin Core Application Profile
It works as aggregated asset manager, allowing storage on SEEK, or linking assets from disparate databases.
Export as RO Model, Data, SOP, Parameters
Freezing the whole ISA for an investigation would be too restrictive and impractical. Changing an investigation could be as subtle as uploading a new version of a data file that is associated with an assay related to the investigation. So the concept of versioning for an Investigation gets quite complicated. At the same time, it is important that the DOI should be able to point to the investigation in the state it was in at the time it was published and linked to the DOI. To solve this, we have been implementing the construction of Research Objects for investigations, which can be used to create a snapshot of the investigation and all its parts at a specific time. It is also a digital object that can be deposited outside of SEEK.
RO Query
Vagrant – the automation tool for docker
Docker “Github”
Using their own tools
One-command click repeatability