Mtsr2015 goble-keynote

Research Objects:
why, what and how
ProfessorCarole Goble CBE FREng FBCS
The University of Manchester, UK
The Software Sustainability Institute, UK
carole.goble@manchester.ac.uk
researchobject.org
Metadata and Semantic Research Conference 2015, 9-11 Sept 2015, Manchester, UK

Prologue
e-Lab Collabs.
& Shared Asset
Repositories
Knowledge,
Metadata, Linked
Data, Ontologies
Software
Engineering for
Scientists
Computational
Workflow Systems
Reproducibility
Micro
Publications
Open Science
Research
Objects
Linked Data for
Science
Scholarly
Comms

Prologue
Biodiversity
Systems Biology
Synthetic Biology
Astronomy
Helio
Physics
Genomics
Public Health
Epidemiology
Digital
Preservation
Social
Science
Pharmacology

Knowledge Turning, Info Flow
Barriers to Cure
• Access to scientific
resources
• Coordination and
Collaboration
• Flow of Information
http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation

[Pettifer, Attwood]
http://getutopia.com

Virtual Witnessing*
Scientific publications:
• announce a result
• convince readers the result is
correct
“papers in experimental [and
computational science] should
describe the results and provide
a clear enough protocol
[algorithm] to allow successful
repetition and extension”
Jill Mesirov, Broad Institute, 2010**
**Accessible Reproducible Research, Science 22 January 2010, Vol. 327 no. 5964 pp. 415-416, DOI: 10.1126/science.1179653
*Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer.

Bramhall et al QUALITY OF METHODS REPORTING IN ANIMAL MODELS OF
COLITIS Inflammatory Bowel Diseases, , 2015
“Only one of the 58 papers reported all essential
criteria on our checklist. Animal age, gender, housing
conditions and mortality/morbidity were all poorly
reported…..”
50papers randomly chosen from 378 manuscripts in
2011 that use BurrowsWheeler Aligner for mapping
Illumina reads
31 no s/w version,
parameters, exact version of
genomic reference sequence
26no access to
primary data sets
Nekrutenko & Taylor, Next-generation sequencing data interpretation:
enhancing, reproducibility and accessibility, Nature Genetics 13 (2012)

“I can’t immediately reproduce the research in my own laboratory.
It took an estimated 280 hours for an average user to approximately
reproduce the paper.”
Prof Phil Bourne
Associate Director, NIH Big Data 2 Knowledge Program

“An article about
computational science in a
scientific publication is not
the scholarship itself, it is
merely advertising of the
scholarship. The actual
scholarship is the complete
software development
environment, [the complete
data] and the complete set
of instructions which
generated the figures.”
David Donoho, “Wavelab and
Reproducible Research,” 1995

From Manuscripts to “Research Objects”
Multi-various, citable research products/assets

From manuscripts to “Research Objects”

From manuscripts to “Research Objects”
Pre-packaged Docker images containing a
bioinformatics tool and
standardised interface through which data
and parameters are passed.
http://bioboxes.org

FAIR Research, crossing silos
From Manuscripts to “Research Objects”
Datasets, Data collections
Standard operating procedures
Software, algorithms
Configurations,
Tools and apps, services
Codes, code libraries
Workflows, scripts
System software
Infrastructure
Compilers, hardware
Fragmentation

FAIR RO Distributed Commons
NIH BD2K, EU FAIRPorts…. Pooled Resources

NIH BD2K Commons
and Research Objects
https://datascience.nih.gov/commons

Why Research Objects?
• Computational Workflows / Scripts
– Multi-step, nested.
– Data, executable codes (remote and local),
libraries
– Preservation, Repair
– Reproducibility
• Systems Biology
– Models, data (construction, validation,
predicted), SOPs, samples, articles
– Structured Investigations, Studies, Assays
– Exchange
– Reproducibility

• Computational Workflows / Scripts
– Multi-step, nested.
– Data, executable codes (remote and local),
libraries
– Preservation, Repair
– Reproducibility
• Systems Biology
– Models, data (construction, validation,
predicted), SOPs, samples, articles
– Structured Investigations, Studies, Assays
– Exchange
– Reproducibility
Commons
Commons
myexperiment.org
fair-dom.org

"Mapping present and future predicted distribution patterns for a
meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al
Workflow Commons

Instruments, Materials, Method
Data Scopes
Input Data
Software
Output Data
Config
Parameters
Methods
techniques, algorithms,
spec. of the steps
Materials
datasets, parameters,
algorithm seeds
Experiment
Instruments
codes, services, scripts,
underlying libraries
Laboratory
sw and hw infrastructure,
systems software,
integrative platforms
Setup
Drummond, Replicability is not Reproducibility: Nor is it Good Science, online
Peng, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.

Instruments, Materials, Method
Read. Run. Remake
Science changes,
experiments & results vary,
So do labs.
Instruments break,
labs decay.
Zhao, et al . Why workflows break - Understanding and combating decay in
Taverna workflows, 8th Intl Conf e-Science 2012
http://atyourservice.blogs.xerox.com/files/2011/09/cloning-results-may-vary.jpg

Reproducibility: working. reporting
submit article
and move on…
publish article
Research
Environment
Publication
Environment
Peer
Review

FAIR Reproducibility
Find, Access, Interoperate, Reuse

https://doi.org/10.15490/seek.1.investigation.56

FAIRDOM Metadata framework
link studies, link assets, map content to.
Common
elements and
relationships
between things
produced and
used in
experiments.
Common
elements
Specific
elements for
specific data
types.
Just Enough
Results Model
http://seek4science.org/JERMOntologyhttp://isatab.sourceforge.net/format.html

Penkler et al (2015) FEBSJ 282:1481-1511
https://dx.doi.org/10.1111/febs.13237

Consumers
Producers
Project
Repositories
harvesting
link
Standards
organise
validate
Native Commons
Repositories
Compound, nested, scattered, yet interconnected
COMMONS

Preserved, portable research products. Snapshots.
inter-platform exchange, reproducibility
Commons
New
Discovery

Cross-Institutional e-Lab fragmentation
parts scattered across subject specific/general resources
101 Innovations in Scholarly Communication - the Changing ResearchWorkflow, Boseman and Kramer, 2015,
http://figshare.com/articles/101_Innovations_in_Scholarly_Communication_the_Changing_Research_Workflow/1286826

Active research products, snaphots
• Fork.
• Merge.
• Version.
• Cite
• Snapshot.
• Live.
[Martin Scharm]
Haus et al, BMC Systems Biology, 2011, 5:10
Solvent production by Clostridium acetobutylicum

F1000Research Living Figures
versioned articles, in-article data manipulation
R Lawrence Force2015, Vision Award Runner Up
http://f1000.com/posters/browse/summary/1097482
Simply data + code
Can change the definition of
a figure, and ultimately the
journal article
Colomb J and Brembs B.
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1;
ref status: indexed, http://f1000r.es/3is]
F1000Research 2014, 3:176
Other labs can replicate the study, or
contribute their data to a meta-
analysis or disease model - figure
automatically updates.
Data updates time-stamped.
New conclusions added via versions.

Publish, Release (like Software)
11/09/2015 34
An “evolving manuscript” would begin with a
pre-publication, pre-peer review “beta 0.9”
version of an article, followed by the approved
published article itself, [ … ] “version 1.0”.
Subsequently, scientists would update this
paper with details of further work as the area
of research develops. Versions 2.0 and 3.0
might allow for the “accretion of confirmation
[and] reputation”.
Ottoline Leyser […] assessment criteria in
science revolve around the individual. “People
have stopped thinking about the scientific
enterprise”.
http://www.timeshighereducation.co.uk/news/evolving-manuscripts-the-future-of-scientific-communication/2020200.article

Jennifer Schopf,Treating Data Like Software: A Case for Production Quality Data,JCDL 2012
Software-like Release paradigm
• Agile
development
methods
• Free Open
Source
Software
methods
https://tctechcrunch2011.files.wordpress.com/2011/05/tcdisrupt_tc-9.jpg

Knowledge
Turning
interpret
Commons
FAIR
Research
Products
Reproducibility
Interpretation
Comparison
Preservation
Portability
Release
Active
Research
Research Objectmeans
ends
drivers
Why Summary Framework
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, DOI: 10.1007/978-3-642-37186-8_1

Multi-various products, platforms, resources.
First class citizens - id, manage, credit, track, profile, focus
A Framework to Bundle, Port and Link (scattered) resources, related
experiments. Metadata Objects that carry Research Context. Units of exchange.
Bechhofer,Why linked data is not enough for scientists,
DOI: 10.1016/j.future.2011.08.004

Metadata Objects
Evolving
multi –typed, stewarded, sited, authored
span research, researchers, platforms, time
Contributions.
Content.
closed <-> open
local <-> alienembed <-> refer
Stewardship. Citation.
Bigger on the inside than the
outside, Content maybe
logically or physically inside
TARDIS:Time and Relative
Dimension in Space
Scholarship
https://meditationsfromzion.files.wordpress.com/2013/05/tardis.jpg

What and How Framework
Manifest
Core model
using
standards
Annotation
profiles
progressive
extensions
Implement-
ation
Profiles
using legacy
& commodity
platforms
Policies
Tools
Lifecycle
Steward
Ship Training
Principles & Conventions
API specificationMetadata formats

Technology Independent.
The least possible.
The simplest feasible.
Low tech.
Graceful degradation.
The Research Object Desiderata

Manifests and Containers
Container
Packaging:
Zip files, Docker images, BagIt, …
Catalogues & Commons Platforms:
FAIRDOM SEEK, Farr CommonsCKAN,
STELAR eLab, myExperiment
Manifest
Metadata
Describes the aggregated resources, their
annotations and their provenance
Manifest

Manifest Metadata
Manifest Construction
• Identification – id, title, creator, status….
• Aggregates – list of ids/links to resources
• Annotations – list of annotations about resources
Manifest
Manifest Description
• Checklists – what should be there
• Provenance – where it came from
• Versioning – its evolution
• Dependencies – what else is needed
Manifest

Manifest Construction
Unique identifiers as
names for things.
doi, epic, orcid, purl, RII,
Identifiers.org
Mechanism of
aggregation to group
things together.
OAI-ORE
Metadata about those
things & how they relate
to each other.
W3C OADM
http://w3id.org/ro/

FAIR Manifest Descriptions: Types of RO
Progressive Annotation Profiles
Checklist
Provenance
Versioning
Dependencies http://www.cnri.reston.va.us/papers/OverviewDi
gitalObjectArchitecture.pdf
NISO-JATS
Dublin Core
EFO JERM
SBML wfdesc

Checklists aka Reporting Guidelines
Consistent Reporting, Standardised Cataloguing, Validation
Gamble, Goble, Klyne, Zhao
MIM:A Minimum Information Model vocabulary and
framework for Scientific Linked Data,
IEEE 8th Intl Conf on eScience , 2012
MeanWhealDiameter reports:
must include values for the
properties: SubjectId,
SptSolution, Date, FollowUp
should include values for the
properties:VariableLabel

Implementation Profiles
Research Object Bundle Specification
Manifest
https://w3id.org/bundle/ doi:10.5281/zenodo.10440
Container
Packaging:
Zip files, Docker images, BagIt, …
Catalogues & Commons Platforms:
FAIRDOM SEEK, Farr CommonsCKAN,
STELAR eLab, myExperiment

RO Unzip
• Reproducibility
• Versioning
• Systematic and
extensible meta-
data collection
• Cross platform
exchange
• Publishing
Living Snapshot
Sys and Syn Bio Experiments
management and publishing

Sys & Syn Biology
Community Standards
Bergmann, Rodriguez, Le Novère.
COMBINE archive specification.
<http://identifiers.org/combine.specifications/o
mex.version-1> (2014)
Bergman et al COMBINE archive and OMEX
format: one file to share all information to
reproduce a modeling project, BMC
Bioinformatics 2014, 15:369
Combine with RO.
Standardised metadata & API
http://co.mbine.org/documents/archive
https://github.com/stain/ro-combine-archive doi:10.5281/zenodo.10439
Martin Scharm
Universität Rostock

ATLAS Collider
Data Analytics
Portable, lightweight
application runtime
and packaging tool.
Image
ATLAS and CMS detector data
CharlesVardeman, Da Huo
University of Notre Dame
All data and files
of the execution
+ Instructions
convert
bundle
manifest
Relate files
and layers
Add provenance
and annotations
Link in other
content
Exchange
Reproducibility
Same data
Same code
Same run time
environment
Systematic and
extensible metadata
collection

Computational Workflow Runs
workflowrun.prov.ttl
(RDF)
outputA.txt
outputC.jpg
outputB/
intermediates/
1.txt
2.txt
3.txt
de/def2e58b-50e2-4949-9980-fd310166621a.txt
inputA.txt
workflow attribution
execution
environment
Aggregating in Research Object
ZIP folder structure (RO Bundle)
mimetype
application/vnd.wf4ever.robundle+zip
.ro/manifest.jso
n
URI
references
Exchange
Reproducibility
Same data
Same code
Systematic and
extensible meta-
data collection
Workflow
Annotation Profile
Wf4Ever
Project

STELAR Asthma
Research e-Lab
STELAR e-Lab
Requests for data
Data Exports
Comments, questions
ALSPAC
MAAS
SEATON
Ashford
On-going data
collection
STELAR Researchers
Isle of
Wight
Data Collection
Methods and Results
STELARTeam
Farr Institute@Manchester

Farr Institute Commons
catalogues over safe havens
Exchange
Systematic and
extensible meta-data
collection

NIH BD2K Commons
and Research Objects
Metadata Profiles
RO Model API
Community IDs
RO Model Manifest Profile
Implementation Profiles
https://datascience.nih.gov/commons

Many outstanding issues…
Social & Cultural Technical
Tragedy of the Commons
https://doctorwhothing.files.wordpress.com/2014/01/doctor-who-
fan-girl-group.jpg

me
ME
my team
close
colleagues
peers
Personal productivity
Retention & Reuse
Publish driven
Public Good
Sharing & Reproducibility
Access driven
[Apologies to Resnick and Malone]

FAIR Reward. Reducing Pain.
Cost vs Benefit.

RO Ramps. Born RO.
Commodity Tooling, Libraries, Lightweight
Making and Auto-making
Manifest Descriptions
Making
Containers
Literate Programming,
electronic lab notebooks
Rendering &
Using Manifests

FAIR Citation, credit, tracking
• Citation
– Resolution and semantics
• Tamper-proof currency
– Blockchain, Ethereum
• RO trajectories
– Data trajectories [Missier]
– Provenance propagation
• Credit trajectories
– Micro-credit tracking
• Social-political acceptance
– All research products valued
– FAIR publishing effort recognised
• Defend it (snapshot)
• Locate it (most recent)
• Reuse it (a version, a component)
• Credit it (contributory authorship)
• Cross link it (connections)

Knowledge Turning with Ros
Simple approach, towards transparent FAIR principles
https://d2t1xqejof9utc.cloudfront.net/screenshots/pics/1ddf584eb4cf6b12
83baf9aa6d380cff/original.jpg

Inspired by Bob Harrison
• Incremental shift for
infrastructure providers.
• Moderate shift for policy
makers and stewards.
• Paradigm shift for
researchers, their
institutions and
publishers.
Knowledge Turning with ROs

All the members of the Wf4Ever team
Colleagues in Manchester’s Information
Management Group
http://www.researchobject.org
http://www.wf4ever-project.org
http://www.fair-dom.org
http://seek4science.org
http://rightfield.org.uk
http://www.software.ac.uk
http://www.datafairport.org
Alan Williams
Jo McEntyre
Norman Morrison
Stian Soiland-Reyes
Paul Groth
Tim Clark
Juliana Freire
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
Kristian Garza
Barend Mons
Sean Bechhofer
Philip Bourne
Matthew Gamble
Raul Palma
Jun Zhao
Neil Chue Hong
Josh Sommer
Matthias Obst
Jacky Snoep
David Gavaghan
Rebecca Lawrence
Stuart Owen
Finn Bacall

Mtsr2015 goble-keynote

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Mtsr2015 goble-keynote

Similar to Mtsr2015 goble-keynote (20)

More from Carole Goble

More from Carole Goble (20)

Recently uploaded

Recently uploaded (20)

Mtsr2015 goble-keynote