REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
FAIR data and model management for systems biology (and SOPs too!)
1. FAIR Data and Model
Management for
Systems Biology
(and SOPs too!)
Prof Carole Goble
The University of Manchester
The Software Sustainability Institute
ELIXIR UK, SynBioChem Centre
carole.goble@manchester.ac.uk
MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
2. • Project-centric data and
model management
• Respect & expects other systems
• Forged in fire of
national & international
projects
• PhDs/postgrads/PIs
• Context
• FAIRDOM Initiative
• Challenges
http://www.fair-dom.org
http://www.fairdomhub.org
3. republic of science*
regulation of science
institutions
libraries
*Merton’s four norms of scientific behaviour (1942)
public archives
cloud services
12. Challenge: Most quantitative databases provide kinetic constants
for enzymes, sometimes binding constants….
Little to help building quantitative descriptions, i.e. concentrations,
sizes, diffusions….
Exceptions: gene expression data, proteomics, metabolomics.
Localisation: The average concentration of a protein in a piece of
brain is of limited use (mix of tissues and subcellular compartments)
[Nicolas Le Novere, 2015]
Public-Centric Asset Management
public archives
13. FAIR for the Researcher
Collaborative, data/model-driven
science
Publication
Local and Public Resources
Skills and Productivity
Compliance
17. Project-Centric Asset Management
Is this data available?
What SOP was used for
this sample?
Where is the validation data for
this model?
• Retain results beyond a
project / the PhD student
• Exchange & find assets.
• Share, disseminate and
publish assets sensitively
• Consistent reporting for
interpretation, interop &
comparison
• Promote standardised
metadata practices.
• Organise and link assets
• Reuse results
18. Find
Data, models, protocols,
projects, people
Catalogued and linked assets
Link studies to assets
Control sharing, versioning,
gateway to scattered
public/local archives
Access
Interoperate
Standards (SBML, SED-
ML…), vocabs, formats, ids
harvesting, export, API
Reuse
Download assets
Run models with exp’mtl data
DOI citation
21. SEEK:
Science Commons
Web-based Cataloguing and Rich web
interface for describing, finding,
linking and promoting ongoing
research and outcomes. Small files,
aggregates across data archives.
openBIS:
Scaled local LIMS and analytics
Extract,Transform and Load tooling
direct from the instrumentation, data
analysis pipelines.Automatic
archiving. Handles large data.
FAIRDOM Suite
28. Metadata standards & templates to
link studies and link assets
Just Enough
Results Model
Describes
common
elements and
relationships
between things
produced and
used in
experiments.
Structured
descriptions for
consistency and
comparison
35. http://seek.virtual-liver.de/
• Navigation
• Single standards
at one scale
• Multi-type hosting
“To integrate the detailed
knowledge that we have at the
molecular level up to the
functional level at
tissue/organ/whole body level “
Multi-scale?
Multi-silos ….
36. Handling/converting data of
different levels of detail to
make the model run.
Representing in the SBML
model the DNA bindings at the
level of detail that had been
measured in the experiments
Whole Cell model by Jonathan Karr
(Rostock Summer School, DagmarWaltemath)
Support for aggregating data to find the appropriate
level of representation for a given model.
Karr JR, Sanghvi JC, Macklin DN, et al. AWhole-Cell Computational Model Predicts Phenotype
from Genotype. Cell. 2012;150(2):389-401. doi:10.1016/j.cell.2012.05.044.
37. Challenge: mismatches
• Systems on different scales
– incompatible time scales, data may be too sparse or
need to be aggregated to work with another module
• Different levels of complexity
– comparing results from different modelling
approaches.
• Linking models needs thinking and standards
– connecting the single standards
– interfacing between the different scales
– connecting (experimental/simulation) data to models
38.
39. Challenge: model evolution
BiVeS tool: diff in versions of computational models
Provenance,Versioning, Parameter tracking
Releasing updated versions into the literature
Identifying, Interpreting, and CommunicatingChanges in XML-encoded Models of Biological Systems Scharm et. al.
2015, under revision at BIOINFORMATICS
Haus et al, BMC
Systems Biology, 2011, 5:10
Solvent production by
Clostridium acetobutylicum
[Martin Scharm]
40. F1000Research Living Figures,
versioned articles, in-article data manipulation
R Lawrence Force2015, Vision Award Runner Up http://f1000.com/posters/browse/summary/1097482
Simply data + code
Can change the definition of
a figure, and ultimately the
journal article
Colomb J and Brembs B.
Sub-strains of Drosophila Canton-S differ
markedly in their locomotor behavior [v1;
ref status: indexed, http://f1000r.es/3is]
F1000Research 2014, 3:176
Other labs can replicate the study, or
contribute their data to a meta-analysis
or disease model - figure automatically
updates.
Data updates time-stamped.
New conclusions added via versions.
41. Challenge: reproducibility
bridging from research to FAIR publishing
Bergmann, Rodriguez, Le Novère. COMBINE archive specification.
<http://identifiers.org/combine.specifications/omex.version-1> (2014)
Describe
Access
Port
49. Enabling multi-scale modelling in systems medicine
1. Exploit existing data for multi-scale modelling
2. Develop SOPs and quality standards for systematic collection of quantitative
data and information.
3. Identify required standards and ontologies for models and data repositories in
systems medicine.
4. Develop modelling workflows for the integration of data and models; support
data management, model construction and analysis.
5. Develop mathematical formalism to analyze and compare multi-scale models
(parameter estimation, sensitivity analysis, identifiability analysis and image
analysis).
Wolkenhauer et al, Enabling multiscale modeling in systems medicine, 2014, Genome Medicine 6(3)
50. Carole Goble Stuart Owen
Finn
Bacall
Jacky Snoep
Wolfgang
Mueller
Olga Krebs Quyen Nguyen
Natalie Stanford
KatyWolstencroft
Peter Kunzst Bernd Rinn
fairdom@fair-dom.org
fair-dom@fair-dom.org
http://www.fair-dom.org
http://www.fairdomhub.org
http://seek4science.org
http://www.rightfield.org.uk
http://jjj.biochem.sun.ac.za
http://sybit.net/software/openBIS Donal FellowsAlanWilliams
Rostyslav
Kuzyakiv
Jakub
Straszewski
Chandrasekhar
Ramakrishnan
Caterina
Barillari
Norman Morrison