SlideShare a Scribd company logo
1 of 39
Introduction to Workflows with 
Taverna and myExperiment 
Stian Soiland-Reyes and Christian Brenninkmeijer 
University of Manchester 
materials by Dr Katy Wolstencroft and Dr Aleksandra Pawlik 
http://orcid.org/0000-0001-9842-9718 
http://orcid.org/0000-0002-2937-7819 
http://orcid.org/0000-0002-1279-5133 
http://orcid.org/0000-0001-8418-6735 
Bonn, 2014-09-01 
This work is licensed under a http://www.taverna.org.uk/ 
Creative Commons Attribution 3.0 Unported License
Data Deluge 
 ‘Omics data 
 Next Gen Sequencing 
 eGovernment 
 World bank data 
 Climate change data 
 Large scale physics 
 Large Hadron collider 
 Astronomy
Lots of Resources 
NAR 2014: 1552 databases 
Genbank 2014-04: 172 million sequences, 
162 billion basepairs 
WGS 2014-04: 774 billion basepairs
Next Generation Sequencing 
 2008-2012: 1000 Genome Project 
 A Deep Catalog of Human Genetic Variation 
 2009-: Genome 10k project 
 A genomic zoo—DNA sequences of 10,000 vertebrate 
species, approximately one for every vertebrate genus. 
 2012-: Human Microbiome Project 
 Characterise the microbial communities found at 
several different sites on the human body
Where is the data? 
 Repositories run by major service providers 
(e.g. NCBI, EBI) 
 Local project stores 
 Static web pages 
 Dynamic web applications 
 FTP servers (!) 
 Inside PDFs  
 Web Services 
The implicit workflow 
Bioinformatics research combines: 
 Data resources (public and private) 
 Computational power (standard and custom) 
 Researchers and collaborators 
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 
cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta 
atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg 
ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg 
gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag 
atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt 
tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 
12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 
atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt 
tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
What that means for 
Bioinformatics 
 Sequential use of distributed tools 
 Incompatible input and output formats 
 Challenging to record/reproduce/tweak 
 parameter selections 
 service selection 
 results of each step 
 OK for one gene or one protein, but what about 
10,000? 
 Analysing large data sets requires programmatic help
Workflow as a Solution 
 Sophisticated analysis pipeline 
 Graphical representation of 
executable analysis 
 Combine a set of services to 
analyse or manage data (local or 
remote) 
 Data flow from one service (boxes) 
to the next (connected with 
arrows) 
 Iteration – process multiple data 
items 
 Automation – rerun workflow
Example Taverna Workflow 
Workflow: Get the weather 
forecast of the day given the 
city and the country 
Green box is a Web Service 
Purple boxes are local XML 
services to assemble/ extract 
XML 
Blue boxes are workflow 
input and output ports 
Arrows define the direction 
of data flow
Workflows as a solution 
 Flow of data from one tool to the next is 
automatic – just connect inputs and outputs 
 Incompatibilities overcome in the workflow with 
helper services (shims) 
 Allowing new tool combinations 
 Workflow engine records parameter values and 
algorithms – provenance 
 Workflows can include data integration and 
visualization 
 Iteration over large data sets automatic – ideal 
for high throughput analysis (e.g. omics)
Reproducible Research 
Preventing non-reproducible research 
 An array of errors 
http://www.economist.com/node/21528593 
 Duke University, 2006 - Prediction of the course 
of a patient’s lung cancer using expression arrays 
and recommendations on different 
chemotherapies from cell cultures – reported in 
Nature Medicine 
 3 different groups could not reproduce the results 
and uncovered mistakes in the original work
If the Analyses were done 
using Workflows..... 
 Reviewers could re-run the in-silico experiments 
and see results for themselves 
 Methods could be properly examined and 
criticized by inspecting the workflow 
 Mistakes and opportunities could be pinpointed 
earlier
Kepler 
Triana 
BPEL 
Ptolemy II 
Taverna 
Different Workflow Systems 
VisTrails 
Galaxy 
Pipeline Pilot
Freely available, 
open source 
80,000+ downloads 
across versions 
Installers for Windows, 
Mac OS X, Linux 
Taverna Workbench 
http://www.taverna.org.uk/ 
Current version: 2.5.0 
Wolstencroft et al. (2013): The Taverna workflow suite: designing and executing 
workflows of Web Services on the desktop, web or in the cloud”, 
Nucleic Acids Research, 41(W1): W557-W561. doi:10.1093/nar/gkt328
Taverna Workflow System 
History: 
 2003: Taverna 0.1 
(300 downloads) 
 2014: Taverna 2.5.0 
(5100 downloads) 
Products: 
 TavernaWorkbench 
 Taverna Server 
 Taverna Command line 
 Taverna Online 
 Taverna Player 
 Plugins and integrations 
http://www.taverna.org.uk
Taverna editions and extensibility 
 Core 
 Astronomy 
 Bioinformatics 
 Biodiversity 
 Digital Preservation 
 Enterprise 
Taverna is a generic workflow 
system that can be extended 
by plugins and customized for 
use in different domains. 
The Taverna editions are pre-built 
downloads of Taverna 
with plugins for the most 
popular domains. 
http://www.taverna.org.uk/download/workbench/2-5/
Taverna Workbench 
Workflow engine 
to run workflows 
List of services 
Construct and 
visualise workflows 
Web Services e.g. KEGG 
Programming 
libraries 
e.g. libSBML
Using Tools and Services 
from Taverna workflows 
 Web Services 
 WSDL 
 REST 
 Data services 
 BioMart 
 Local scripts: 
 R 
 Beanshell 
 Command line (e.g. Python, Perl) 
 Other workflows 
 And more..... Add your own!
What are Web Services? 
Web Services: HTTP-based programmatic access (API). 
Instead of “GET me the web page 
http://example.com/cat-pics”, 
Web Services allow “GET me a genome sequence 
http://example.com/gene/WAP_RAT” 
Connect to and use remote services from your 
computer in an automated way 
NOT the same as services on the web (i.e. forms that 
shows results as a web page)
Who Provides the Services? 
Open domain services and resources 
• Taverna accesses thousands of services 
• Third party – we don’t own them – we didn’t build them 
• All the major providers 
– NCBI, DDBJ, EBI … 
• Enforce NO common data model.
Asynchronous services 
(Submit, Wait, Fetch) 
Simple WSDL 
services 
BioMoby Semantic 
Services 
How do you use the services?
Tags 
Service Description 
Monitoring 
Provider 
Submitter
What do Scientists use Taverna for? 
Astronomy 
Music 
Meteorology 
Social Science 
Cheminformatics
Research Example 
Lymphoma Prediction Workflow 
MicroArray from 
tumor tissue 
Microarray 
preprocessing 
caArray 
Use gene-expression 
Lymphoma 
prediction Wei Tan Univ. Chicago 
GenePattern 
Ack. Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) 
Jared Nedzel (MIT) 
patterns 
associated with two 
lymphoma types to 
predict the type of 
an unknown 
sample.
Systems Biology Data Integration 
Read enzyme 
names from SBML 
Query maxd 
database using 
enzyme names 
Calculate colours 
based on gene 
expn level 
Create new SBML 
model with new 
colour nodes 
Mapping transcriptomics data onto SBML models 
Peter Li, Doug Kell, U Manchester
Workflows are … 
... records and protocols (i.e. your in silico 
experimental method) 
... know-how and intellectual property 
... hard work to develop and get right 
…..re-usable methods (i.e. you can build on the 
work of others) 
So why not share and re-use them
Workflow Repository
Just Enough Sharing…. 
 myExperiment can provide a central location for 
workflows from one community/group 
 myExperiment allows you to say 
 Who can look at your workflow 
 Who can download your workflow 
 Who can modify your workflow 
 Who can run your workflow 
 Ownership and attribution
Trypanosomiasis (Sleeping Sickness) 
in sub-Saharan Africa 
Microarray data 
QTL data 
http://www.genomics.liv.ac.uk/tryps/ 
Steve Kemp Andy Brass Paul Fisher 
Slides from Paul Fisher
Reuse, Recycle, Repurpose 
Workflows 
Dr Paul Fisher 
Identify QTg and pathways implicated in resistance to 
Trypanosomiasis in cattle 
Dr Jo Pennock 
Identify the QTg and pathways of 
colitis and helminth infections in 
the mouse model 
PubMed ID: 20687192 doi:10.1186/1471-2164-14-127
Another Host, Another 
Parasite...but the SAME 
Method 
 Mouse whipworm infection - parasite model of the human 
parasite - Trichuris trichuria 
Understanding Phenotype 
 Comparing resistant vs susceptible strains – Microarrays 
Understanding Genotype 
 Mapping quantitative traits – Classical genetics QTL 
Joanne Pennock, Richard Grencis 
University of Manchester
Workflow Results 
 Identified the biological pathways involved in sex 
dependence in the mouse model, previously believed to be 
involved in the ability of mice to expel the parasite. 
 Manual experimentation: Two year study of candidate 
genes, processes unidentified 
 Workflow experimentation: Two weeks study – identified 
candidate genes 
Joanne Pennock, Richard Grencis 
University of Manchester
Workflow Success 
 Workflow analysed each piece of data systematically 
 Eliminated user bias and premature filtering of 
datasets 
 The size of the QTL and amount of the microarray 
data made a manual approach impractical 
 Workflows capture exactly where data came from 
and how it was analysed 
 Workflow output produced a manageable amount of 
data for the biologists to interpret and verify 
 “make sense of this data” -> “does this make sense?”
Spectrum of Users 
Advanced users design and build 
workflows (informaticians) 
Intermediate users reuse and 
modify existing workflows or 
components 
Others “replay” workflows 
through web page
A Collection of Tools 
Workflow GUI Workbench Client User Interfaces 
and 3rd party plug-ins 
Workflow Repository 
Service Catalogue 
Web Portals 
Programming and 
APIs 
Activity and Service 
Plug-in Manager 
Provenance 
Store 
Workflow 
Server 
W3C 
PROV 
Secure Service Access, and 
Programming APIs 
E-Laboratories
Summary –Workflow Advantages 
 Informatics often relies on data integration and 
large-scale data analysis 
 Workflows are a mechanism for linking together 
resources and analyses 
 Promote reproducible research 
 Find and use successful analysis methods 
developed by others with myExperiment
More Information 
 Taverna 
 http://www.taverna.org.uk 
 myExperiment 
 http://www.myexperiment.org 
 BioCatalogue 
 http://www.biocatalogue.org
Tutorials 
 Using Taverna to design and build workflows 
 Reusing workflows from myExperiment 
 Finding and using different services: 
REST, Xpath, Beanshell, R, … 
 Exploring the workflow engine: iteration, looping, 
retries, parallel invocation 
 Web: Taverna Online, Taverna Player 
 Interactions 
 Components

More Related Content

Viewers also liked

2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflowsmyGrid team
 
2014 Taverna Tutorial R script
2014 Taverna Tutorial R script2014 Taverna Tutorial R script
2014 Taverna Tutorial R scriptmyGrid team
 
2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflow2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflowmyGrid team
 
2014 Taverna Tutorial Components
2014 Taverna Tutorial Components2014 Taverna Tutorial Components
2014 Taverna Tutorial ComponentsmyGrid team
 
2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath2014 Taverna tutorial Xpath
2014 Taverna tutorial XpathmyGrid team
 
2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and Biocatalogue2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and BiocataloguemyGrid team
 
2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced TavernamyGrid team
 
2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet import2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet importmyGrid team
 
2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scripts2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scriptsmyGrid team
 
2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions2014 Taverna Tutorial Interactions
2014 Taverna Tutorial InteractionsmyGrid team
 
If we build it will they come?
If we build it will they come?If we build it will they come?
If we build it will they come?myGrid team
 

Viewers also liked (12)

2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows2014 Taverna Tutorial Nested workflows
2014 Taverna Tutorial Nested workflows
 
2014 Taverna Tutorial R script
2014 Taverna Tutorial R script2014 Taverna Tutorial R script
2014 Taverna Tutorial R script
 
2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflow2014 Taverna tutorial Simple workflow
2014 Taverna tutorial Simple workflow
 
2014 Taverna Tutorial Components
2014 Taverna Tutorial Components2014 Taverna Tutorial Components
2014 Taverna Tutorial Components
 
กลุ่มที่ ๔ กตัญญูกตเวที
กลุ่มที่ ๔ กตัญญูกตเวทีกลุ่มที่ ๔ กตัญญูกตเวที
กลุ่มที่ ๔ กตัญญูกตเวที
 
2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath2014 Taverna tutorial Xpath
2014 Taverna tutorial Xpath
 
2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and Biocatalogue2014 Taverna tutorial REST and Biocatalogue
2014 Taverna tutorial REST and Biocatalogue
 
2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna2014 Taverna tutorial Advanced Taverna
2014 Taverna tutorial Advanced Taverna
 
2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet import2014 Taverna tutorial Spreadsheet import
2014 Taverna tutorial Spreadsheet import
 
2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scripts2014 Taverna tutorial Shims and Beanshell scripts
2014 Taverna tutorial Shims and Beanshell scripts
 
2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions2014 Taverna Tutorial Interactions
2014 Taverna Tutorial Interactions
 
If we build it will they come?
If we build it will they come?If we build it will they come?
If we build it will they come?
 

Similar to 2014 Taverna Tutorial Introduction to eScience and workflows

wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsPriscill Orue Esquivel
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
2014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES22014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES2myGrid team
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Anubis Hosein
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthPaolo Missier
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009Ian Foster
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods Zohaib HUSSAIN
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Solutions
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesMartin Hartmann
 
Jeff Grethe: CAMERA
Jeff Grethe: CAMERAJeff Grethe: CAMERA
Jeff Grethe: CAMERAIddo
 
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022dkNET
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceChef Software, Inc.
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...Chris Evelo
 

Similar to 2014 Taverna Tutorial Introduction to eScience and workflows (20)

DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
Accelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methodsAccelerating GWAS epistatic interaction analysis methods
Accelerating GWAS epistatic interaction analysis methods
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
2014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES22014-06-03-Taverna-IS-ENES2
2014-06-03-Taverna-IS-ENES2
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
 
Services For Science April 2009
Services For Science April 2009Services For Science April 2009
Services For Science April 2009
 
Next Generation Sequencing methods
Next Generation Sequencing methods Next Generation Sequencing methods
Next Generation Sequencing methods
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
Jeff Grethe: CAMERA
Jeff Grethe: CAMERAJeff Grethe: CAMERA
Jeff Grethe: CAMERA
 
AJH CV sept2016
AJH CV sept2016AJH CV sept2016
AJH CV sept2016
 
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
dkNET Webinar: The Human BioMolecular Atlas Program (HuBMAP) 10/14/2022
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right Science
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 

More from myGrid team

SWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice DescriptionSWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice DescriptionmyGrid team
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloudmyGrid team
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software SuitemyGrid team
 
The Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, FutureThe Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, FuturemyGrid team
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and modelsmyGrid team
 

More from myGrid team (6)

Taverna summary
Taverna summaryTaverna summary
Taverna summary
 
SWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice DescriptionSWeDe - Scientific Webservice Description
SWeDe - Scientific Webservice Description
 
Taverna workflows in the cloud
Taverna workflows in the cloudTaverna workflows in the cloud
Taverna workflows in the cloud
 
The Taverna Software Suite
The Taverna Software SuiteThe Taverna Software Suite
The Taverna Software Suite
 
The Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, FutureThe Taverna Workflow Management Software Suite - Past, Present, Future
The Taverna Workflow Management Software Suite - Past, Present, Future
 
The beauty of workflows and models
The beauty of workflows and modelsThe beauty of workflows and models
The beauty of workflows and models
 

Recently uploaded

Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 

Recently uploaded (20)

Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 

2014 Taverna Tutorial Introduction to eScience and workflows

  • 1. Introduction to Workflows with Taverna and myExperiment Stian Soiland-Reyes and Christian Brenninkmeijer University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra Pawlik http://orcid.org/0000-0001-9842-9718 http://orcid.org/0000-0002-2937-7819 http://orcid.org/0000-0002-1279-5133 http://orcid.org/0000-0001-8418-6735 Bonn, 2014-09-01 This work is licensed under a http://www.taverna.org.uk/ Creative Commons Attribution 3.0 Unported License
  • 2. Data Deluge  ‘Omics data  Next Gen Sequencing  eGovernment  World bank data  Climate change data  Large scale physics  Large Hadron collider  Astronomy
  • 3. Lots of Resources NAR 2014: 1552 databases Genbank 2014-04: 172 million sequences, 162 billion basepairs WGS 2014-04: 774 billion basepairs
  • 4. Next Generation Sequencing  2008-2012: 1000 Genome Project  A Deep Catalog of Human Genetic Variation  2009-: Genome 10k project  A genomic zoo—DNA sequences of 10,000 vertebrate species, approximately one for every vertebrate genus.  2012-: Human Microbiome Project  Characterise the microbial communities found at several different sites on the human body
  • 5. Where is the data?  Repositories run by major service providers (e.g. NCBI, EBI)  Local project stores  Static web pages  Dynamic web applications  FTP servers (!)  Inside PDFs   Web Services 
  • 6. The implicit workflow Bioinformatics research combines:  Data resources (public and private)  Computational power (standard and custom)  Researchers and collaborators 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
  • 7. What that means for Bioinformatics  Sequential use of distributed tools  Incompatible input and output formats  Challenging to record/reproduce/tweak  parameter selections  service selection  results of each step  OK for one gene or one protein, but what about 10,000?  Analysing large data sets requires programmatic help
  • 8. Workflow as a Solution  Sophisticated analysis pipeline  Graphical representation of executable analysis  Combine a set of services to analyse or manage data (local or remote)  Data flow from one service (boxes) to the next (connected with arrows)  Iteration – process multiple data items  Automation – rerun workflow
  • 9. Example Taverna Workflow Workflow: Get the weather forecast of the day given the city and the country Green box is a Web Service Purple boxes are local XML services to assemble/ extract XML Blue boxes are workflow input and output ports Arrows define the direction of data flow
  • 10. Workflows as a solution  Flow of data from one tool to the next is automatic – just connect inputs and outputs  Incompatibilities overcome in the workflow with helper services (shims)  Allowing new tool combinations  Workflow engine records parameter values and algorithms – provenance  Workflows can include data integration and visualization  Iteration over large data sets automatic – ideal for high throughput analysis (e.g. omics)
  • 11. Reproducible Research Preventing non-reproducible research  An array of errors http://www.economist.com/node/21528593  Duke University, 2006 - Prediction of the course of a patient’s lung cancer using expression arrays and recommendations on different chemotherapies from cell cultures – reported in Nature Medicine  3 different groups could not reproduce the results and uncovered mistakes in the original work
  • 12. If the Analyses were done using Workflows.....  Reviewers could re-run the in-silico experiments and see results for themselves  Methods could be properly examined and criticized by inspecting the workflow  Mistakes and opportunities could be pinpointed earlier
  • 13. Kepler Triana BPEL Ptolemy II Taverna Different Workflow Systems VisTrails Galaxy Pipeline Pilot
  • 14. Freely available, open source 80,000+ downloads across versions Installers for Windows, Mac OS X, Linux Taverna Workbench http://www.taverna.org.uk/ Current version: 2.5.0 Wolstencroft et al. (2013): The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud”, Nucleic Acids Research, 41(W1): W557-W561. doi:10.1093/nar/gkt328
  • 15. Taverna Workflow System History:  2003: Taverna 0.1 (300 downloads)  2014: Taverna 2.5.0 (5100 downloads) Products:  TavernaWorkbench  Taverna Server  Taverna Command line  Taverna Online  Taverna Player  Plugins and integrations http://www.taverna.org.uk
  • 16. Taverna editions and extensibility  Core  Astronomy  Bioinformatics  Biodiversity  Digital Preservation  Enterprise Taverna is a generic workflow system that can be extended by plugins and customized for use in different domains. The Taverna editions are pre-built downloads of Taverna with plugins for the most popular domains. http://www.taverna.org.uk/download/workbench/2-5/
  • 17. Taverna Workbench Workflow engine to run workflows List of services Construct and visualise workflows Web Services e.g. KEGG Programming libraries e.g. libSBML
  • 18. Using Tools and Services from Taverna workflows  Web Services  WSDL  REST  Data services  BioMart  Local scripts:  R  Beanshell  Command line (e.g. Python, Perl)  Other workflows  And more..... Add your own!
  • 19. What are Web Services? Web Services: HTTP-based programmatic access (API). Instead of “GET me the web page http://example.com/cat-pics”, Web Services allow “GET me a genome sequence http://example.com/gene/WAP_RAT” Connect to and use remote services from your computer in an automated way NOT the same as services on the web (i.e. forms that shows results as a web page)
  • 20. Who Provides the Services? Open domain services and resources • Taverna accesses thousands of services • Third party – we don’t own them – we didn’t build them • All the major providers – NCBI, DDBJ, EBI … • Enforce NO common data model.
  • 21. Asynchronous services (Submit, Wait, Fetch) Simple WSDL services BioMoby Semantic Services How do you use the services?
  • 22.
  • 23. Tags Service Description Monitoring Provider Submitter
  • 24. What do Scientists use Taverna for? Astronomy Music Meteorology Social Science Cheminformatics
  • 25. Research Example Lymphoma Prediction Workflow MicroArray from tumor tissue Microarray preprocessing caArray Use gene-expression Lymphoma prediction Wei Tan Univ. Chicago GenePattern Ack. Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT) patterns associated with two lymphoma types to predict the type of an unknown sample.
  • 26. Systems Biology Data Integration Read enzyme names from SBML Query maxd database using enzyme names Calculate colours based on gene expn level Create new SBML model with new colour nodes Mapping transcriptomics data onto SBML models Peter Li, Doug Kell, U Manchester
  • 27. Workflows are … ... records and protocols (i.e. your in silico experimental method) ... know-how and intellectual property ... hard work to develop and get right …..re-usable methods (i.e. you can build on the work of others) So why not share and re-use them
  • 29. Just Enough Sharing….  myExperiment can provide a central location for workflows from one community/group  myExperiment allows you to say  Who can look at your workflow  Who can download your workflow  Who can modify your workflow  Who can run your workflow  Ownership and attribution
  • 30. Trypanosomiasis (Sleeping Sickness) in sub-Saharan Africa Microarray data QTL data http://www.genomics.liv.ac.uk/tryps/ Steve Kemp Andy Brass Paul Fisher Slides from Paul Fisher
  • 31. Reuse, Recycle, Repurpose Workflows Dr Paul Fisher Identify QTg and pathways implicated in resistance to Trypanosomiasis in cattle Dr Jo Pennock Identify the QTg and pathways of colitis and helminth infections in the mouse model PubMed ID: 20687192 doi:10.1186/1471-2164-14-127
  • 32. Another Host, Another Parasite...but the SAME Method  Mouse whipworm infection - parasite model of the human parasite - Trichuris trichuria Understanding Phenotype  Comparing resistant vs susceptible strains – Microarrays Understanding Genotype  Mapping quantitative traits – Classical genetics QTL Joanne Pennock, Richard Grencis University of Manchester
  • 33. Workflow Results  Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite.  Manual experimentation: Two year study of candidate genes, processes unidentified  Workflow experimentation: Two weeks study – identified candidate genes Joanne Pennock, Richard Grencis University of Manchester
  • 34. Workflow Success  Workflow analysed each piece of data systematically  Eliminated user bias and premature filtering of datasets  The size of the QTL and amount of the microarray data made a manual approach impractical  Workflows capture exactly where data came from and how it was analysed  Workflow output produced a manageable amount of data for the biologists to interpret and verify  “make sense of this data” -> “does this make sense?”
  • 35. Spectrum of Users Advanced users design and build workflows (informaticians) Intermediate users reuse and modify existing workflows or components Others “replay” workflows through web page
  • 36. A Collection of Tools Workflow GUI Workbench Client User Interfaces and 3rd party plug-ins Workflow Repository Service Catalogue Web Portals Programming and APIs Activity and Service Plug-in Manager Provenance Store Workflow Server W3C PROV Secure Service Access, and Programming APIs E-Laboratories
  • 37. Summary –Workflow Advantages  Informatics often relies on data integration and large-scale data analysis  Workflows are a mechanism for linking together resources and analyses  Promote reproducible research  Find and use successful analysis methods developed by others with myExperiment
  • 38. More Information  Taverna  http://www.taverna.org.uk  myExperiment  http://www.myexperiment.org  BioCatalogue  http://www.biocatalogue.org
  • 39. Tutorials  Using Taverna to design and build workflows  Reusing workflows from myExperiment  Finding and using different services: REST, Xpath, Beanshell, R, …  Exploring the workflow engine: iteration, looping, retries, parallel invocation  Web: Taverna Online, Taverna Player  Interactions  Components