6. Taverna Uses — Scientific Areas
• Biodiversity — BioVeL project
• Digital Preservation — SCAPE project
• Astronomy — AstroTaverna product
• SolarWind Physics — HELIO project
• In silico Medicine —VPH-Share project
NERSC Workflow Day 6
7. Biodiversity: BioVeL
• Virtual e-Laboratory for
Biodiversity
– Service and knowledge commons
– Supporting biodiversity research
– Integrating with third-party
applications
• For example, iPython Notebook
• Portal for running production-
grade workflows on users’ data
– Powered byTaverna Server
– Integration with major biodiversity
databases
– Interaction support made to
support
NERSC Workflow Day 7
8. Digital Preservation:
SCAPE
• Automated petabyte-scale
digital collection maintenance
– Century of scanned
newspapers
– Whole national radio/TV
output
– MajorWeb archives
• Processing engine powered by
Taverna
– Lift simple workflows to work
at collection level
– Metadata management
– Semantic annotations and
components for guided
workflow construction
NERSC Workflow Day 8
9. Astronomy:
AstroTaverna
• Taverna plugin: IVOA (Virtual Observatory)
– Astronomy data services and tools
• Example workflow:
– List of galaxy names → Look upVO
properties → Find similar/near galaxies →
Add bibliography
• VOTable support (select/merge/split/..)
– Later adapted by bioinformatics community
• Projects: CANUBE, Wf4Ever,VAMDC, ER-
Flow
• TavernaWorkbench used on the desktop:
– IVOA service registry user interface
– Integrated with standalone astronomy tools
(SAMPS protocol): Aladin,TOPCAT
NERSC Workflow Day 9
10. Astrophysics: HELIO
• Virtual laboratory for
SolarWind Science
– Observation catalogs
– Processing
– Data integration platform
• Taverna is workflow glue
– Taverna Server created to
support
– Workflows manage
catalog access
– Workflows manage data
processing
NERSC Workflow Day 10
11. Medicine and Physiology:
VPH-Share
• Platform for computer-aided
medicine
– Support for diagnosis and
treatment prognosis
• Osteoarthritis, Dementia, Liver
disease, Cardiovascular disease
– Driven by specially-configured
cloud instances
• Taverna is control and data
management layer
– Coordinates processing within
cloud instances
– User communication with
cloud instances viaTaverna
interactions
• Including complex 3D tasks
NERSC Workflow Day 11
13. The Basics of aTavernaWorkflow
Input Ports (data in)
SOAP processor (web service call)
XML handling processors
Data Links (connect processors)
Output Ports (data out)
13
Get concept suggestions from term
Eelke van der Horst
http://www.myexperiment.org/workflows/4590.html
NERSC Workflow Day
14. TavernaWorkflows
• Describe how data flows between processing nodes
– Control dependencies also supported
• Processing service nodes of various kinds
– Invoke programs (local or on cluster or grid or …)
– Call services (SOAP or REST)
– Read from and write to databases
– Transfer data
– Interact with the user
• Built-in parallelism and iteration
– Processes lists of data in parallel
• Large data usually handled by reference
– Avoids having to transfer it where not necessary
NERSC Workflow Day 14
15. TavernaWorkflows can get
complex…
NERSC Workflow Day 15
BioVeL Population Model Construction and Analysis
Maria Paula Balcázar-Vargas, Jonathan Giddy and Gerard Oostermeijer
http://www.myexperiment.org/workflows/3684.html
16. Managing Workflow Complexity
• Subworkflows
– Put smaller workflows within larger ones
– Like using a user-defined function in a
programming language
– Can hide contents of subworkflow
• Components
– “Black box” (but implemented with subworkflow)
– Semantically-annotated; described behaviour
– Like using a library in a programming language
NERSC Workflow Day 16
17. Taverna Engine
• Executes (“enacts”)Taverna Workflows
• Pushes data through system in parallel
– Subject to limits described in workflow
• Processor nodes invoked when their data
becomes available
– Turn inputs into outputs
• Captures detailed trace of what happened
(“provenance”)
– FollowsW3C PROV specification
NERSC Workflow Day 17
18. TavernaCommand LineTool
• Simple wrapper roundTavernaWorkflow
Engine
• Inputs as simple files
• Outputs as directory structure
• Provenance packaged in Research Object
– ZIP Archive
– Inputs, Outputs, Intermediate values
– Workflow, Provenance, Overall metadata
NERSC Workflow Day 18
19. Taverna Server
• ExtendsWorkflow Engine to work for
multiple simultaneous users
– Isolates workflows from each other
– Allows asynchronous usage
– Manages resources
– Clients can be in any language, not just Java
• Designed to sit behind a Portal
– User interfaces are domain-specific
NERSC Workflow Day 19
20. Taverna ServerArchitecture
20
Tomcat Container
+ CXF Framework
Taverna Server
Webapp
Common System
Model
PerUserFileManager
Web
Portal
Ruby
Client
Per-RunTavernaWorkflowEngine
Processing
Service
Catalog
Services
Storage
Services
TavernaWorkbench
(forthcoming)
Deployment
Host
Common
Management
Model
Selected
Notification
Endpoints
Management
Interface
(separate auth)
NERSC Workflow Day
24. • Non-profit organization, forming a
community of open-source software
projects.
• Strong emphasis on openness, collaboration
and a consensus-based development
process.
• Examples:
– Apache HTTP Server,Tomcat, Maven, Hadoop,
OpenOffice, Subversion
NERSC Workflow Day 24
25. Why ApacheTaverna?
• Open development: Everything on mailing
list
• Engagement: Encourage developer
involvement – not just making plugins
• Independence:ApacheTaverna is an
independent project
– Not a “Manchester thing”
• Shared ownership: equal participation
• Sustainability: self-managed community
NERSC Workflow Day 25
26. Apache Incubator
Gradually becoming an Apache project
• Intellectual Property assigned to ASF
– License changed to Apache License 2.0
• Infrastructure change – everything at
*.apache.org
• Community building – growing developer
base
• Mentoring on the “ApacheWay” by
volunteers from other Apache projects
NERSC Workflow Day 26
27. Taverna Releases
• Current stable release: Taverna 2.5
– Command Line (2.5.1), Server (2.5.4), Workbench (2.5.1)
• http://www.taverna.org.uk/download/
• Taverna 3 Release plan:
– ApacheTaverna Language
• API for workflow definitions
– ApacheTaverna Engine & Command Line
• Can also run workflows fromTaverna 2Workbench
– ApacheTaverna Server
– ApacheTavernaWorkbench
NERSC Workflow Day 27
Taverna server spawns commandline tool for user separation.
CHECK INTERACTION WITH ALAN
The components of the architecture:
An OSGi platform, with the Taverna Platform API
implemented by Taverna Core
executes a workflow using the Taverna Engine
uses Activity plugins for the different service types (WSDL, REST, Biomart, R scripts, command line tools, etc)
also implemented by the Taverna Server client which uses the Java Client library to proxy running of a workflow on the Taverna Server
The Taverna workbench to design and run workflows
UI plugins for each service type
executes workflows using the Taverna platform API
The Taverna command line which executes workflows using the Taverna platform API
A Taverna Server, which exposes the Taverna platform API as a REST API and SOAP API for executing workflows
Taverna Player, which use the Ruby client library to execute workflows on the Taverna Server
Taverna Lite, which also uses the Ruby client library to execute workflows, but also manage a repository of workflows and allow user interactions.
The OSGi framework (OSGi being an acronym for "Open Services Gateway initiative") is a module system and service platform for the Java programming language that implements a complete and dynamic component model, something that does not exist in standalone Java/VM environments. Applications or components (coming in the form of bundles for deployment) can be remotely installed, started, stopped, updated, and uninstalled without requiring a reboot; management of Java packages/classes is specified in great detail. Application life cycle management (start, stop, install, etc.) is done via APIs that allow for remote downloading of management policies. The service registry allows bundles to detect the addition of new services, or the removal of services, and adapt accordingly.
The OSGi specifications have moved beyond the original focus of service gateways, and are now used in applications ranging from mobile phones to the open source Eclipse IDE. Other application areas include automobiles, industrial automation, building automation, PDAs, grid computing, entertainment, fleet management and application servers.