Slide deck presenting the Provenance support of Taverna workflow system, detailing architecture, ontologies and how results are exported as Research Object bundles, including the PROV-O provenance of the workflow run.
This is the original PPTX version (PowerPoint 2013), for PDF version see http://www.slideshare.net/soilandreyes/20130529-taverna-provenance
2. ARCHITECTURE
Provenance
Workflow
Workflow run
Process run (iteration)
Parameter bindings
Data
Lists
Values
References
Errors
Process1
portA B C
D E
Process2
portA B C
D E
Invoke
Retry
Failover
Loop
Error bounce
Provenance
Parallelise
Processor
dispatch stack
layer injected by plugin
P Missier, S Soiland-Reyes, S Owen, W Tan, A Nenadic, I Dunlop, C
Goble: (2010, January). Taverna, reloaded. In Scientific and
Statistical Database Management (pp. 471-481). Springer Berlin
Heidelberg. DOI 10.1007/978-3-642-13818-8_33
captures provenance trace
Workflow execution
4. INTERMEDIATE RESULTS
• Within the Taverna Workbench, the provenance database is
used for showing intermediate results and previous runs
Clicking a processor
Inputs and outputs of individual invocations
* Provenance is captured in Taverna by plugging into the execution stack of processors (See Taverna, Reloaded)* While running, data values and provenance traces are stored in internal database. * Provenance is captured for workflow run (including a copy of the workflow definition), process iterations (start/stop) and parameter input/output bindings to value references.
PROV-O: Standard W3C ontology for provenance - we use it directly to record activity start/stop and generation of valueswfprov: An extension of PROV-O for tracking workflow execution, parameter bindings and subprocesses. Relates execution to an higher-level view of workflow structure (wfdesc)tavernaprov: An extension of wfprov for tracking Taverna-specific features, such as lists, error documents, and embed (not-so-large) byte content in RDF
Within the Taverna Workbench, the internal provenance database is consulted to look up intermediate input/output values (individual processor invocations). This is used for debugging and verification. Workflow runs within the workbench can also be loaded up from the database at a later stage.
Saving workflow results to a folder structureGenerates a file per value in the workflow outputs, named after portsNested folders for list outputs.Provenance trace (in RDF according using the ontology stack) relates output files to execution, links to intermediate values. (Values between processors who did not make it to an output port).
The structure is similar to the folder, but is now inside a ZIP file. Augments the previous structure by also including the workflow definition, the inputs used for execution, a description of the execution environment, external URI references (such as the project homepage) and attribution to scientists who contributed to the bundle. This effectively forms a Research Object, all tied together by the RO Bundle Manifest, which is in JSON-LD format. (normal JSON that is also valid RDF).