SlideShare a Scribd company logo
1 of 23
Partners Funding
bioexcel.eu
Provenance and Research Object
1
Stian Soiland-Reyes
eScience Lab, The University of Manchester
2017-11-03, Aix-en-Provence
CESAB workshop: Reproducible Workflows
orcid.org/0000-0001-9842-9718 @soilandreyes
This work is licensed under a
Creative Commons Attribution 4.0 International License.
bioexcel.eu
http://www.myexperiment.org Find and Share
bioexcel.eu
https://view.commonwl.org/
http://doi.org/10.7490/f1000research.1114375.1
bioexcel.eu
Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved.
http://www.w3.org/TR/prov-overview/
Core PROV model
Entity – A “thing” in the world
Document, Excel file, database row, molecule,
LEGO structure, house, …
Activity – Something that happened
Usually defined start/end time
May use and generate entities
Agent – Someone/something
Participating in activities
Person, SoftwareAgent, Organization
Key principles:
Provenance statements point backwards in time
Any PROV document is one particular view on history
More than one entity can describe same “thing”
bioexcel.eu
Attribution
Who collected this sample? Who helped?
Which lab performed the sequencing?
Who did the data analysis?
Who wrote the analysis workflow?
Who made the data set used by analysis?
Who curated the results?
Alice
The
lab
Data
wasAttributedTo
actedOnBehalfOf
Why do I need this?
i. To be recognized for my work
ii. Who should I give credits to?
iii. Who should I complain to?
iv. Can I trust them?
v. Who should I make friends with?
bioexcel.eu
Derivation
Which sample was this metagenome sequenced from?
Which meta-genomes was this sequence extracted from?
Which sequence was the basis for the results?
What is the previous revision of the new results?
wasDerivedFrom
wasQuotedFrom
Sequence
New
results
wasDerivedFrom
Sample
Meta -
genome
Old
results
wasRevisionOf
wasInfluencedBy
Why do I need this?
i. To verify consistency (did I use
the correct sequence?)
ii. To find the latest revision
iii. To backtrack where a diversion
appeared after a change
iv. To credit work I depend on
v. Auditing and defence for
peer review
bioexcel.eu
Activities
What happened? When? Who?
What was used and generated?
Why was this workflow started?
Which workflow ran? Where?
used
wasGeneratedBy
wasStartedAt
"2012-06-21"
Metagenome
Sample
wasAssociatedWith
Workflow
server
wasInformedBy
wasStartedBy
Workflow
run
wasGeneratedBy
Results
Sequencing
wasAssociatedWith
Alice
hadPlan
Workflow
definition
hadRole
Lab
technician
Results
Why do I need this?
i. To see which analysis was performed
ii. To find out who did what
iii. What was the metagenome
used for?
iv. To understand the whole process
“make me a Methods section”
v. To track down inconsistencies
bioexcel.eu
Input ports
Processors
Output ports
Workflow
Typical (?) workflow structure
Data links
http://taverna.incubator.apache.org/
bioexcel.eu
Workflow description (wfdesc)
http://purl.org/wf4ever/wfdesc#
bioexcel.eu
Workflow run provenance (wfprov)
http://purl.org/wf4ever/wfprov#
bioexcel.eu
Workflow Run Bundle
output/A.txt
output/C.jpg
output/B/
intermediates/
1.txt
2.txt
3.txt
de/def2e58b-50e2-4949-9980-fd310166621a.txt
input/X.txt
workflow
URI
references
attribution
execution
environment
ZIP folder structure (RO Bundle)
mimetype
application/vnd.wf4ever.robundle+zip
.ro/manifest.json
https://doi.org/10.5281/zenodo.51314
workflowrun.prov.ttl
bioexcel.euhttps://doi.org/10.1016/j.websem.2015.01.003
application/vnd.wf4ever.robundle+zip
Research Object Bundle
http://www.researchobject.org/
bioexcel.eu
A Research Object bundles and relates digital
resources of a scientific experiment/investigation +
context
Data used and results produced in
experimental study
Methods employed to produce and
analyse that data
Provenance and settings for the
experiments
People involved in the investigation
Annotations about these resources, to
improve understanding and
interpretation
bioexcel.eu
Standards-based metadata framework for bundling embedded and
referenced resources with context
Citable Reproducible Packaging
researchobject.org
bioexcel.eu
Systems Biology Research Objects
exchange, portability and maintenance
components
packaged into
various containers
ISA-TABchecksum
bioexcel.eu
Download as a
Research Object Bundle
Snapshots evolving CWL files in GitHub
Permalink to snapshot the workflow
 identifier for RO
Common Workflow
LanguageViewer
CWL files packaged in a RO
CWL RO + added richness
Lift out parts into the manifest
bioexcel.eu
Artists Impression
bioexcel.eu
https://osf.io/h59uh/ https://doi.org/10.1101/191783
bioexcel.eu
https://doi.org/10.1101/191783
identifiers.org
bioexcel.eu
identifiers.org
PROV
JSON
https://doi.org/10.1109/BigData.2016.7840618
manifest.json
bioexcel.eu
Provenance from cwltoolFarah Z Khan:
Modify cwltool reference implementation
to capture provenance
Generates Bag-It Research Object
Mints identifiers for data and run
Capture intermediate values
Workflow activities as PROV
 wfdesc, OPMW, ProvONE
http://doi.org/10.7490/f1000research.1114781.1
Partners Funding
bioexcel.eu
Acknowledgements
22
Farah Z Khan
Carole Goble
Michael R. Crusoe
Apache Taverna
BioExcel
Common Workflow Language
Research Object
W3C PROV WG
Partners Funding
bioexcel.eu
https://www.slideshare.net/StianSoilandReyes/

More Related Content

Similar to 2017-11-03 Provenance and Research Object

The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...Carole Goble
 
Knowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems ScienceKnowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems ScienceDavid De Roure
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOMCarole Goble
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Carole Goble
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
 
Mduke sagecite-jisc-march11
Mduke sagecite-jisc-march11Mduke sagecite-jisc-march11
Mduke sagecite-jisc-march11monicaduke
 
2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTS2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTSStian Soiland-Reyes
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1iotest
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaFabrizio Orlandi
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Carole Goble
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk SlidesBioCatalogue
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynoteCarole Goble
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeCarole Goble
 
Histolab: an Open Source Python Library for Reproducible Digital Pathology
Histolab: an Open Source Python Library for Reproducible Digital PathologyHistolab: an Open Source Python Library for Reproducible Digital Pathology
Histolab: an Open Source Python Library for Reproducible Digital PathologyAlessia Marcolini
 

Similar to 2017-11-03 Provenance and Research Object (20)

A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
 
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
 
Knowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems ScienceKnowledge Infrastructure for Global Systems Science
Knowledge Infrastructure for Global Systems Science
 
Introduction to FAIRDOM
Introduction to FAIRDOMIntroduction to FAIRDOM
Introduction to FAIRDOM
 
Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014Results may vary: Collaborations Workshop, Oxford 2014
Results may vary: Collaborations Workshop, Oxford 2014
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Mduke sagecite-jisc-march11
Mduke sagecite-jisc-march11Mduke sagecite-jisc-march11
Mduke sagecite-jisc-march11
 
Reproducibility 1
Reproducibility 1Reproducibility 1
Reproducibility 1
 
2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTS2016-04-21 BioExcel Usecase Open PHACTS
2016-04-21 BioExcel Usecase Open PHACTS
 
Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1Semantic IoT Semantic Inter-Operability Practices - Part 1
Semantic IoT Semantic Inter-Operability Practices - Part 1
 
Pine education-platform
Pine education-platformPine education-platform
Pine education-platform
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in Wikipedia
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017Being Reproducible: SSBSS Summer School 2017
Being Reproducible: SSBSS Summer School 2017
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk Slides
 
Mtsr2015 goble-keynote
Mtsr2015 goble-keynoteMtsr2015 goble-keynote
Mtsr2015 goble-keynote
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 
Histolab: an Open Source Python Library for Reproducible Digital Pathology
Histolab: an Open Source Python Library for Reproducible Digital PathologyHistolab: an Open Source Python Library for Reproducible Digital Pathology
Histolab: an Open Source Python Library for Reproducible Digital Pathology
 

More from Stian Soiland-Reyes

2017-11-03 Scientific Workflow systems
2017-11-03 Scientific Workflow systems2017-11-03 Scientific Workflow systems
2017-11-03 Scientific Workflow systemsStian Soiland-Reyes
 
2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language Viewer2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language ViewerStian Soiland-Reyes
 
2016-05-18-Make research reproducible again - researchobject.org
2016-05-18-Make research reproducible again - researchobject.org2016-05-18-Make research reproducible again - researchobject.org
2016-05-18-Make research reproducible again - researchobject.orgStian Soiland-Reyes
 
2015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 20152015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 2015Stian Soiland-Reyes
 
2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architecture2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architectureStian Soiland-Reyes
 
2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator project2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator projectStian Soiland-Reyes
 
2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wild2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wildStian Soiland-Reyes
 
2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?Stian Soiland-Reyes
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
 
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...Stian Soiland-Reyes
 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow systemStian Soiland-Reyes
 
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTXTaverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTXStian Soiland-Reyes
 
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)Stian Soiland-Reyes
 
Bringing caBIG services together using Taverna
Bringing caBIG services together using TavernaBringing caBIG services together using Taverna
Bringing caBIG services together using TavernaStian Soiland-Reyes
 

More from Stian Soiland-Reyes (18)

2017-09-27-scholarly-html-ro
2017-09-27-scholarly-html-ro2017-09-27-scholarly-html-ro
2017-09-27-scholarly-html-ro
 
2017-11-03 Scientific Workflow systems
2017-11-03 Scientific Workflow systems2017-11-03 Scientific Workflow systems
2017-11-03 Scientific Workflow systems
 
2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language Viewer2017-07-22 Common Workflow Language Viewer
2017-07-22 Common Workflow Language Viewer
 
2016-05-18-Make research reproducible again - researchobject.org
2016-05-18-Make research reproducible again - researchobject.org2016-05-18-Make research reproducible again - researchobject.org
2016-05-18-Make research reproducible again - researchobject.org
 
2015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 20152015-07-11 Apache Taverna - BOSC 2015
2015-07-11 Apache Taverna - BOSC 2015
 
2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architecture2014-10-31 Taverna 3 architecture
2014-10-31 Taverna 3 architecture
 
2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status
 
2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator project2014-10-30 Taverna as an Apache Incubator project
2014-10-30 Taverna as an Apache Incubator project
 
2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wild2014-06-13 Research objects in the wild
2014-06-13 Research objects in the wild
 
2013-05-29 Taverna Provenance
2013-05-29 Taverna Provenance2013-05-29 Taverna Provenance
2013-05-29 Taverna Provenance
 
2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?2013-03-21 What can provenance do for me?
2013-03-21 What can provenance do for me?
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects
 
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
2011 07-06 SCUFL2 Poster - because a workflow is more than its definition (BO...
 
2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system2011-06-08 Taverna workflow system
2011-06-08 Taverna workflow system
 
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTXTaverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
Taverna workflow management system (2010 11-30 Bath Workflow Tools) PPTX
 
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)Taverna workflow management system (2010 11-30 Bath Workflow Tools)
Taverna workflow management system (2010 11-30 Bath Workflow Tools)
 
Bringing caBIG services together using Taverna
Bringing caBIG services together using TavernaBringing caBIG services together using Taverna
Bringing caBIG services together using Taverna
 

Recently uploaded

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 

Recently uploaded (20)

Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 

2017-11-03 Provenance and Research Object

Editor's Notes

  1. S. Woodman, H. Hiden, P. Watson,  P. Missier Achieving Reproducibility by Combining Provenance with Service and Workflow Versioning. In: The 6th Workshop on Workflows in Support of Large-Scale Science. 2011, Seattle
  2. Sequencing machines: illumina
  3. Workflow descriptions is a model for describing the abstract workflow. This can be automatically extracted from existing workflow definition (ie. Taverna workflows as found on myExperiment) or be made manually by users. Even if the workflow system used is no longer applicable, this model gives a description of the workflow at a level that can be reimplemented in other languages– such as done in SHIWA with (??) The wfdesc model gives hooks to annotate the different steps with description, purpose, tasks and example values, and is also the recipe behind the abstract workflow provenance. (next slide0
  4. Wfprov is not intended to be a provenance model. Rather it provides a place where other models can be hooked in. A “convergence layer”. It is easily mappable to OPM and PROV. It relates to the wfdesc model, where you can see an actual workflow run, and relate artifacts found aggregated in the RO with their provenance within the workflow run.
  5. This is how we represent a workflow run as a Workflow Results RO Bundle. We aggregate the workflow outputs, , workflow definition, the inputs used for execution, a description of the execution environment, external URI references (such as the project homepage) and attribution to scientists who contributed to the bundle. This effectively forms a Research Object, all tied together by the RO Bundle Manifest, which is in JSON-LD format. (normal JSON that is also valid RDF).
  6. 12
  7. Mimetype: robundle+zip ZIP or BagIt folder structure JSON and YAML Linked-ISA
  8. it would be the same wherever the git commit lives. So the links can also be generated locally with a git checkout - e.g. as we're doing in the cwltool reference runner provenance when we need to refer to what workflow was run solved the problem we had in Taverna where we didn't know where the workflow lived we still might not know that.. but if it's a public workflow and it later is visualized, then CWL Viewer can show it future-proof!
  9. BioCompute Objects - BCO – is a community-driven project backed by FDA and George Washington University to standardize exchange of High-Troughput-Sequencing workflows for regulatory submissions between FDA, pharma, bioinformatics platform providers and researchers. There is a particular challenge for regulatory bodies like FDA in areas like personalized medicine, as to review and approve the bioinformatic side they need to inspect and in some cases replicate the computational analytical workflow. The challenge here is not just the normal reproducibility thing about packaging software and providing required datasets, but also for human understanding of what has been done, by expressing the higher level steps of the workflow, their parameter spaces and algorithm settings. At the heart of the BCO is a domain-specific object model which capture this essential information without going in details of the actual execution. The BCO is expressed as a JSON format, which also includes additional metadata and external identifiers.
  10. If we look at this JSON briefly, it is split into metadata, a brief overview of the pipeline with arguments and scripts. The actual workflow definition is defined outside. In addition we define parametric domain, and for verification the input output domains. This allows inspection to see what is the scope of the analysis. (Click for Animation) Many of these are actually external links. The authors and contributors are identified using ORCID, which is a de-facto standard identifier for researchers; Cross-references are given within the pipeline can be provided in any language, like Python The workflow can be specified using Common Workflow Language – which gives portability as well as capturing execution environment, e.g. which Python version to use for the scripts. The referenced data files are of course in multiple formats, like CSV or – for sequencing data - more specific formats like SAM
  11. Now while the BCO references these resources in several places in its JSON structure, some may also be indirectly referenced. For instance the CWL workflow might reference particular Docker images that capture the Python version to use. W3C PROV files might be provided, which can explain more detailed provenance of workflows; this might however become specific to the workflow engine used, and might not be directly identified all the resources seen in the BCO. While we can identify authors with ORCID, they might author different parts of the BCO. If you made a clever Python script used by a BCO, then it is only FAIR that you should be attributed – even if you were nowhere in the vicinity when the BCO was later created. So you can think of these pink, green and blue arrows here as each giving partial picture of what is the whole BioCompute Objects. There is also the question of how to move the BCO around – the JSON has many external references as well as relative references to plain files – how can you capture it all without understanding all of the BCO spec? We are looking at using the BagIt Research Objects for this purpose. Bag-It is a digital archive format used by Library of Congress and digital repositories. It handles checksums and completeness of files, even if they are large or external. Research Object (RO) is a model for capturing and describing research outputs; embedding data, executables, publications, metadata, provenance and annotations. Although it is a general model, ROs have been used in particular for capturing reproducible workflows. The combination of these, ro-bagit has recently used by the NIH-funded Big Data for Discovery Service for transferring and archiving very large HTS datasets in a location-independent way, so naturally this could be a good choice for how to archive BCOs. (Click for Animation) So here the manifest of the Research Object, ties everything together. The manifest is in JSON-LD format – so it is Linked Data – but you don’t have to know unless you really want to – it is also just JSON. The manifest **aggregates** all the other resources, including the BCO, but also external resources as well as outside references like identifiers.org. The aggregation also provide attribution and provenance of each resource, so they get the credit they deserve. This is of course also important for regulatory purposes, e.g. to check if the latest version of a tool was used. An important aspect of research objects is also to capture annotations, using the W3C Web Annotation Model. This allows any part of the BCO to be further described; textually or semantically; so you are not limited to what is supported by the specification of BCO or Research Object. In particular this might be where community-driven standards like BioSchemas can be used.