DevEX - reference for building teams, processes, and platforms
Research Object Composer: A Tool for Publishing Complex Data Objects in the Cloud
1. Research Object Composer:
Publishing Complex Data Objects in the FAIRground
Presented by Anita de Waard, VP Research Collaborations
September 24 2019
2. “One important purpose of the Commons Pilot is to collectively agree on a set of best
practices … to eliminate barriers for accessing, sharing and analyzing biomedical
data.”
Biomedical data moving to the cloud
“Storing, managing, standardizing and publishing the vast amounts of data produced
by biomedical research is a critical mission for the National Institutes of Health.”
2
Findable, Accessible, Interoperable, Reusable Data
Building blocks for the FAIRGround!
Fairly AI-Ready?
3. Open API!
As a researcher studying genetic
disease X…
I want to
• access 1000s of DNA
sequences of a population, run
analysis Y and
• share results of my findings,
protocol and input data with
my collaborators/ community
• publish an article about it in a
way that data is FAIR along
each step of the process
so that others can reproduce and
build on this work.
Building an open interoperable data ecosystem:
User story:
3
4. Cloud data is accessible if openly disseminated
Need open data & identifiers for workflow tools:
Requirements :
1. Landing page URL including GUID
2. URL for page where file can be
accessed (downloaded)
3. Metadata for object
4. Reference to the Task (zero or one)
that this dataset was Derived From
5. Reference to the Task(s) that this
dataset is the Source Of
c
4
c
5. Building an open interoperable data ecosystem:
Aggregates
link things together
Annotations
about things & their
relationships
Container
Packaging content & links:
Zip files, BagIt, Docker images
Identification
locate things
regardless where
5
6. Building an open interoperable data ecosystem:
database
Open
repository
Workflow Tool
Task 1
Workflow
Input
Task 2
Task 3
Output
Research Object Composer
http://www.researchobject.org
Research Object Profiler
Add annotation and
relationships (metadata)
to collection to describe a
research object:
- URI
- Length
- Filename
- Checksums
etc.
Research Object Serializer
(a manifest itemizing file names)
Serialise Research Object
in standard format based BagIt
=1
=2
=3
RO
1
2
3
Open API
6
Mendeley Data
RO
1
2
3
• DOIs
• Metadata
(Findability)
• Open repo
(Accessibility)
• Versioning
• RO Standard
(Interoperability,
Reusability)
7. • The RO Composer is not a registry of research objects, but it can list research objects currently under
construction.
• The RO Composer is a microservice which responsibility is to help other services create and deposit
research objects.
• The composer acts as a temporary construction site that can be completed by multiple services (e.g. a
data management system, a workflow system, a user interface).
• These clients will be jointly building a Research Object
that can then be validated according to the schema,
before the RO is downloaded or deposited into an archive
(like Zenodo or Mendeley Data).
• Clients of the RO Composer are applications
(driven by a user interface) or agents (engaged
automatically from other events, e.g. a workflow run).
• The RO Composer is not a required component to this:
any software may generate research objects by following
Research Object specifications.
Purpose of the Research Object Composer*:
7* From: https://github.com/ResearchObject/research-object-composer/blob/master/introduction.ipynb
9. Use case for the ROC: Earth Sciences!
EVER-EST – RO in Earth Sciences
12 EU partners
4 research communities
Powered by ROHub
9
10. Other use case: Chemistry! NMReDATA
10
http://nmredata.org/
NMReDATA:
• chemical shifts, scalar couplings, multiplet analysis and
2D cross peaks extracted from a set of NMR spectra
• linked to the assigned chemical structure.
• data resulting from full analysis of organic compounds
and natural products using various spectra.
/ NMR Record
• Database entry or folders including a .sdf file (containing the
chemical structure and the NMReDATA)
• Folders including the relevant NMR spectra (with FID,
acquisition and processing parameters in the manufacturer’s
format).
• In order to facilitate transfers and exchanges of records, the
folder can be compressed in the .zip format.
• The NMR records (and the.sdf file) will be generated by
computer-assisted structure elucidation software or web-
based tools.
RO
1
2
3
Sounds like a
ResearchObject to us…?
11. Some questions to ponder:
• How to enable interoperability between
ROC and other repositories?
• How do we get the word out there and get
people to use ROs at scale?
• What challenges for wide adoption by
repositories? Authoring tools? Workflow
tools?
• How do RO’s fit in with other initiatives: is
an RO Data, Software, both?
− Citations? Cf Software citation
− Credit? Does it go along with new credit
metrics, Make Data Count, etc?
• What role can publishers play in this?
− Support standards (sit on panels, etc)
− What else??
11
12. Acknowledgements:
This work was funded by the National Institutes of Health, National Heart,
Lung and Blood Institutes STAGE Project, with Seven Bridges Genomics
inc. Agreement No. 1 OT3 OD025463-01
And performed by:
12
Marina Soares E Silva
Chris Wright
Wouter Haak
Carole Goble
Stian Soyland-Reyes
Finn Bacall
Editor's Notes
Big biomedical data embodies the potential to deliver faster more knowledge about diseases.
Collaboration between, among others, data services providers and developers of standards on research objects increases the chance to deliver an interoperable open research data ecosystem which we aim to be sustainable and scalable.
Standards-based metadata framework for logically and physically bundling resources with context http://researchobject.org