SlideShare a Scribd company logo
1 of 32
Download to read offline
Today's bioinformatics lesson
is brought to you by the letter 'W'
by
Keith Bradnam
Image from flickr.com/91619273@N00/
Today'sbloinformatieslesson
isbroughttoyoubytheletter1W1
Imagefromflickr.com/91619273©NO0/
Wis for WorkflowsisforWorkflows
A typical bioinformatics workflow
Illumina data
(FASTQ format)
Remove adapter contamination
Atypicalbioinformaticsworkflow
Removeadaptercontamination
A typical bioinformatics workflow
Illumina data
(FASTQ format)
Remove adapter contamination
scythe
cutadapt
trimgalore
skewer
Btrim
Trimmomatic
Atypicalbioinformaticsworkflow
Removeadaptercontamination
scythe
cutadapt
trimgalore
skewer
Btrim
Trimmomatic
A typical bioinformatics workflow
Illumina data
(FASTQ format)
Remove adapter contamination
scythe
cutadapt
trimgalore
skewer
Btrim
Trimmomatic
Lots of tools
you could use!
Atypicalbioinformaticsworkflow
Lotsoftools
youcoulduse!
Removeadaptercontamination
scythe
cutadapt
trimgalore
skewer
Btrim
Trimmomatic
Trim reads for low quality bases
sickle
Qtrim
FastQC
FastX
PRINSEQ
Trimmomatic
Trimreadsforlowqualitybases
sickle
Qtrim
FastQC
FastX
PRINSEC)
Trimmomatic
Map reads to genome/transcriptome
BWA
Bowtie
TopHat
SHRiMP
BFAST
MAQ
From ebi.ac.uk/~nf/hts_mappers/
There are a lot of
read mappers out there!
Fromebi.ac.uk/-nf/hts_mappers/ H I S A T •-JAGuaR • -
BWA-PSSM • - -
MOSAIK•- - - - - -
Hobbes2 •
CUSHAW3a-
NextGenMap •
Subread/Subjunc •
CRAC•-
SRmapper•-
GEM•
STAR •
ERNE•-
BatMelh•-
BLASRa-
YAHA •
SeciAlto •
Batmis •
Therearealotof DynMaPp O S A •
ContextMap•-
as?n1 •-
RUMa_
readmappersoutthere!StampydrFAST•-Bismark•-
•-
MapSplicea-REALa--
BS-Seekera-- - B S - S e e k e r 2 - ••
Supersplat
liceMapRAT • - B R A T - S W -•-
BFAST•-
segemeht•-
GNUMAP•-
GenomeMapper•-
mrFAST • • - mrsFAST m r s FA S T- L i l t r a - -• - - - -
PerM • - - - - - ---
RNA-Mate • - - -X-Matea- - - - SBSMAP • - - - - S p l a z e r
RazerS • --•--MicroRazerS - • - - • RazerS3
SHRIMPa ——•SHR1MP2-•
BWAs - - •BWA-SW
CloudBurst •
ProbeMatch •• W H A M - •
TopHata- T o p H a t 2-•-
Bowlie •- B o w t i e 2 •-
MOM4-
PASS•- P A S S - b i s - -•
Slider • - - -Slider-II-
()PALMA •
SOCS"-
MAO•
SegMap •
ZOOM•
PalMaNa-
RMAP•
SOAP• —SOAP2--•
BWT-SW • - - S O A P S p l i c e - -•
Blata-
SSAHA•
GMAP •
Exonerate •
Mummer3 •
ELAND •
GSNAP-a-
20012002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Years
Map reads to genome/transcriptome
BWA
Bowtie
TopHat
SHRiMP
BFAST
MAQ
From ebi.ac.uk/~nf/hts_mappers/Fromeloi.ac.uki-nti GnotdrnietAtft.- 2 c 1 4 . 1.5auppl9:512
hitk.,:,www.bicrileckentrakuoiryt41-2105/75.•9•512
HISAT
JAGuaIR - -
Bw •A-PSSM - - - -M0-A1K
Approach
ARYANA:AligningReadsbyVetAnother
MiladGnoliimi•r,Arjeankba::'',AliSharifiviv:1-•.44,Harritireza(..hitsazMerio. . ..ignit5.
Abstract
PitTsburgh,PA,1..,'SA31March-OSApril20.4
iert)mRic:COM8-Seq:FourthAnnualRkC(....V/111Satellite'Workshopor)MassivelyParallelSequencing
Motivation:Althoughthereare
'•'--AarlycihretentaigorithmsancsoftwarerookbrNigningsequencingreacio s r
gappeos,Fo./pncesearchisfarfromsoivenStrongInterestinfastalignrrien:-ishest1.1,1pc7e0intheSV or.7tmforaigorithms',V-rhbeperrionfastaridaccuratealignment.
anclitiortdenow?assembtyofneat-GeneratoniPet.enringlngreadequitesfastoveriap-layriur-concensus
tieInnoczmvecompetitiononagoingaroller:tonofreadstoagiverdatabasedfreferencegenomes.In
-f_ultra-• -
Contribution:I'leintrot-LreARvANA.afastgappecrear!alignerdevelopedonMebissofiilleAincleA•ing
nisastr,_cturewithaco-ripletelyneooaighrrentengOPthatrh.akesitsignrfiramlyfasterthan7hreeotheraligner's:
Sowtie2,BMAantiSegAirt),wtncomparableGen-t,c-.:tyant:acruracy.Insteadofthporne-consurningt-haricraciong:vac:et:ores''L,!•handhingrntsrnatrtx5,s,ARYANIAcome;withthpsese-anO-exten0aigorIMmirframeworkanoa
5lonificantlyIrnPrOvedmth
efficiencybyIntegrongriNpialgorithmictetirnidt.elincluong
dynamArseer:seteCtion,
nin'ectionalspeceltensiortreset-4.rephashtablesanogap-fillingcAnynn•nirbrogsarnming.Asthpreaclength _ - -
increasesARYA-V/A•.!TItioeflornyintermsofspeedanaahgnmentratebecomesmoreevelent.Thisisinperfect
',lakesAtpar)/todeveionmission-specieNignersforotherappiicationsusingARVANAengine.harmony4viththeiFelilit'ngthtrenaas:heseci4enclnigTechnologiesevohieIhealgorithmcplaTformofARYANA
introduction
Availability:ARYAN.4compip7esourcerexiecanheobrairteilfromkittp.//gitbubcOrnlar)'ana-aligner
i:vt-tyliv:nscellcarriesahatA4offnreconsistingorseveralusedalaborioushierarchilprocesstodividethegertorne
thnuNanditl r
billitmsofcharacteniwithanswerstomany into srnalier.covegtamwhiletheCelera(;i-siolnicsfirm
vitalqumlions_.1-11.mnineffortstodecipherthathookhasreplacedthatb rin
yatrnnputationalsequence-assemblysoli-
Islernatio,:ratilnynanGenolne..eq.ite-ncingConxort,Lion
gainedincreasing:rloitivntlintsince/953WhtiLthedoublewareappliedtothedatageneatedfrontbhoellyshredded
helicalstructure011)NAwasdiscovered-'twentyyears(shotgun)wholegentorte17,.ti:.'theautomatedSanger
Liter.W..GilbertandA.Maxarnreactthenrst2,1-tit...It-atter r
methodwasthegoldstandardfin-abouttwodettleN,as
wordofthebook[I].svhenIISangerandhistsolleastiesthe.first*-ene.,-ntieoror021i/Axecitiencing.untiliecreasing
applicationoflabeleddideoxynucleotidetriphosphatexvolome ofen-orfreegenomirinformationcan%edmiler-
weredmelopinganothmsequentingmethodbasedonthedemandforla.,,tandinexpensivemethodstoproducehigh
I I
thatact;ISchainterminatorsinaPC.Rrmclior:/2,3...
genceofnewtechnologies.thesotailedNett-Geno-rainn I
drearnofreadingthehunzarihonk f e wasrtallaedhyAboutthreedecadesafterthefirnONAvegurnLing,SequericisvOVG,S)
.-1,paradigrnshihinboththeexperimentaltechnititieli 2 0 1 3 2 0 1 4 2 0 1 5
completionofthe t 3 I li t h efrulnangenrmreprofect(4-61,rhe and computationalInettulthocturred
doetothetransition
SSAHA• -II B l o t •-_
Ftli 1stca'Aut'O' iniblniran 1 avaiklii‘41MI' (–CIa? V* artfig•
.
rit:ctir;s1P,eye iveSangermate-pairedreadst-,-41t7to
•coeirsgt:,-,1,vi, i,),:kly•ieri?itt,ari,
relmenregerunnes,suchasthehumangenotr, ormore
hvananliJ-Ktrutoa' V areSarrt-tunnowtr-eas,tat,
ttore-.4.0,7f4,,ati,
than2000prokitryotex-toilvar),nesandArchaea.lamg,
totheNGStec:hnologiesandalso;Availabilityoffinished
2001 2 0 0 0 WattledCentral'''''..•„
Nzvoetr - - --—-ecthecrtPrta4
4..0,,,,t,:.0.,.a.,....„.0,,,elun.:06,z,kx...,0_,-;:t:eC—rnOrdo.Ercfo;CerretnseS:0;xa:13'stect'AL:i.deelat;,,13,17,a5Vt.GISrbtco,„.-"•amoeue?aro%x,,,, (-1'sYl't“:""Mort$Fttecr,...-0-?D14',1C.4,Tr'lelow:ccrseitv..43P.Ittfrtfct'NIa61Lt)&-.ACUISark*arnkozoimat,re:errrao'rPt.v•nit
el,A
(611;
Bloinformatics
Filter for uniquely mapped reads
SAMtools
Picard
GATK
Unix
Filterforuniquelymappedreads
SAMtools
Picard
GATK
Unix
Filter for high quality alignments
SAMtools
Picard
GATK
Unix
Filterforhighqualityalignments
SAMtools
Picard
GATK
Unix
Data suitable for
final analysis
Datasuitablefor
finalanalysis
Some questions you should ask yourself…Somequestionsyoushouldaskyourself..
Wis for 'Why?'isfor'Why?
Why are each of these steps needed?Whyareeachofthesestepsneeded?
Why should I use tool 'X' at this step?WhyshouldIusetoolX'atthisstep?
Wis for 'What?'isfor'What?'
What is the effect on running each step?Whatistheeffectonrunningeachstep?
What is a good result?Whatisagoodresult?
The effect of applying many
'bioinformatics axes'
Illumina data
(FASTQ format)
2 FASTQ files
Files are ~6.5 GB
52.5 million reads total
Theeffectofapplyingmany
1bloinformaticsaxes'
IIluminadata
(FASTQformat)
2FASIQfiles
52.5millionreadstotal
Filesare,-,64.5GB
Remove adapters & trim
50.1 million reads
Removeadapters&trim
50.1millionreads
Align to transcriptome with Bowtie
35.8 million reads map
AligntotranscriptomewithBowtie
35.8millionreadsmap
Filter for uniquely mapped reads
31.4 million reads align uniquely
Filterforuniquelymappedreads
31.4millionreadsalignuniquely
Filter for high quality alignments
22.7 million reads have alignment scores of zero
Filterforhighqualityalignments
22.7millionreadshavealignmentscoresofzero
Data suitable for
final analysis
Reduced data from 52.5 to 22.7 million reads
Datasuitablefor
finalanalysis
Reduceddatafrom52.5to22.7millionreads
It can be helpful to know how the different
steps in a workflow reduce your data
Itcanbehelpfultoknowhowthedifferent
stepsinaworkflowreduceyourdata
One final tip…Onefinaltip...
ls -ltris ltr
Run this command after
every step of a workflow
Runthiscommandafter
everystepofaworkflow
Let's you see whether output files
were actually created
Let'syouseewhetheroutputfiles
wereactuallycreated
Let's you see whether output files
contain any data
Let'syouseewhetheroutputfiles
containanydata
Most recently modified files will be
at bottom of your terminal window
Mostrecentlymodifiedfileswillbe
atbottomofyourterminalwindow
The endTheend

More Related Content

Similar to BIOINFORMATICS WORKFLOW STEPS

20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing Course20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing CoursePierre Lindenbaum
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Surya Saha
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Surya Saha
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Jing-Doo Wang
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomicsNeil Saunders
 
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...Catalina Arango
 
EnviroInsite training workshop - Overview of EnviroInsite Features
EnviroInsite training workshop - Overview of EnviroInsite FeaturesEnviroInsite training workshop - Overview of EnviroInsite Features
EnviroInsite training workshop - Overview of EnviroInsite FeaturesBruce Jacobs
 
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsFrom Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsCharles Fracchia
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerSasha Goldshtein
 
Quick Introduction to Cytoscape for Undergraduates
Quick Introduction to Cytoscape for UndergraduatesQuick Introduction to Cytoscape for Undergraduates
Quick Introduction to Cytoscape for UndergraduatesKeiichiro Ono
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorHoffman Lab
 
Life of PySpark - A tale of two environments
Life of PySpark - A tale of two environmentsLife of PySpark - A tale of two environments
Life of PySpark - A tale of two environmentsShankar M S
 

Similar to BIOINFORMATICS WORKFLOW STEPS (20)

CSIRT-Kit: Your Security Response toolkit
CSIRT-Kit: Your Security Response toolkitCSIRT-Kit: Your Security Response toolkit
CSIRT-Kit: Your Security Response toolkit
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing Course20110114 Next Generation Sequencing Course
20110114 Next Generation Sequencing Course
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
NASA Biocene Workshop 10th Sept 2019
NASA Biocene Workshop 10th Sept 2019NASA Biocene Workshop 10th Sept 2019
NASA Biocene Workshop 10th Sept 2019
 
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Sequencing
SequencingSequencing
Sequencing
 
Chang Sha, China
Chang Sha, ChinaChang Sha, China
Chang Sha, China
 
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
Biomedical Signal Extraction for Computer-assisted Clinical Decision Making -...
 
EnviroInsite training workshop - Overview of EnviroInsite Features
EnviroInsite training workshop - Overview of EnviroInsite FeaturesEnviroInsite training workshop - Overview of EnviroInsite Features
EnviroInsite training workshop - Overview of EnviroInsite Features
 
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File FormatsFrom Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
The Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF PrimerThe Next Linux Superpower: eBPF Primer
The Next Linux Superpower: eBPF Primer
 
Quick Introduction to Cytoscape for Undergraduates
Quick Introduction to Cytoscape for UndergraduatesQuick Introduction to Cytoscape for Undergraduates
Quick Introduction to Cytoscape for Undergraduates
 
fastp: the FASTQ pre-processor
fastp: the FASTQ pre-processorfastp: the FASTQ pre-processor
fastp: the FASTQ pre-processor
 
Life of PySpark - A tale of two environments
Life of PySpark - A tale of two environmentsLife of PySpark - A tale of two environments
Life of PySpark - A tale of two environments
 
Submitted sequence (strains)
Submitted sequence (strains)Submitted sequence (strains)
Submitted sequence (strains)
 

More from Keith Bradnam

Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestKeith Bradnam
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Keith Bradnam
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Keith Bradnam
 
Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Keith Bradnam
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Keith Bradnam
 
Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Keith Bradnam
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writingKeith Bradnam
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Keith Bradnam
 
Polish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesPolish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesKeith Bradnam
 
10 tips for adding polish to presentations
10 tips for adding polish to presentations10 tips for adding polish to presentations
10 tips for adding polish to presentationsKeith Bradnam
 
Database talk for Bits & Bites meeting
Database talk for Bits & Bites meetingDatabase talk for Bits & Bites meeting
Database talk for Bits & Bites meetingKeith Bradnam
 
Benchmarking short-read mapping programs
Benchmarking short-read mapping programsBenchmarking short-read mapping programs
Benchmarking short-read mapping programsKeith Bradnam
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesKeith Bradnam
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished? Keith Bradnam
 
Twitter 101 - an introduction to Twitter
Twitter 101  - an introduction to TwitterTwitter 101  - an introduction to Twitter
Twitter 101 - an introduction to TwitterKeith Bradnam
 

More from Keith Bradnam (15)

Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contest
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2
 
Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
 
Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writing
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0
 
Polish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesPolish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slides
 
10 tips for adding polish to presentations
10 tips for adding polish to presentations10 tips for adding polish to presentations
10 tips for adding polish to presentations
 
Database talk for Bits & Bites meeting
Database talk for Bits & Bites meetingDatabase talk for Bits & Bites meeting
Database talk for Bits & Bites meeting
 
Benchmarking short-read mapping programs
Benchmarking short-read mapping programsBenchmarking short-read mapping programs
Benchmarking short-read mapping programs
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore Technologies
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished?
 
Twitter 101 - an introduction to Twitter
Twitter 101  - an introduction to TwitterTwitter 101  - an introduction to Twitter
Twitter 101 - an introduction to Twitter
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 

Recently uploaded (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

BIOINFORMATICS WORKFLOW STEPS

  • 1. Today's bioinformatics lesson is brought to you by the letter 'W' by Keith Bradnam Image from flickr.com/91619273@N00/ Today'sbloinformatieslesson isbroughttoyoubytheletter1W1 Imagefromflickr.com/91619273©NO0/
  • 3. A typical bioinformatics workflow Illumina data (FASTQ format) Remove adapter contamination Atypicalbioinformaticsworkflow Removeadaptercontamination
  • 4. A typical bioinformatics workflow Illumina data (FASTQ format) Remove adapter contamination scythe cutadapt trimgalore skewer Btrim Trimmomatic Atypicalbioinformaticsworkflow Removeadaptercontamination scythe cutadapt trimgalore skewer Btrim Trimmomatic
  • 5. A typical bioinformatics workflow Illumina data (FASTQ format) Remove adapter contamination scythe cutadapt trimgalore skewer Btrim Trimmomatic Lots of tools you could use! Atypicalbioinformaticsworkflow Lotsoftools youcoulduse! Removeadaptercontamination scythe cutadapt trimgalore skewer Btrim Trimmomatic
  • 6. Trim reads for low quality bases sickle Qtrim FastQC FastX PRINSEQ Trimmomatic Trimreadsforlowqualitybases sickle Qtrim FastQC FastX PRINSEC) Trimmomatic
  • 7. Map reads to genome/transcriptome BWA Bowtie TopHat SHRiMP BFAST MAQ From ebi.ac.uk/~nf/hts_mappers/ There are a lot of read mappers out there! Fromebi.ac.uk/-nf/hts_mappers/ H I S A T •-JAGuaR • - BWA-PSSM • - - MOSAIK•- - - - - - Hobbes2 • CUSHAW3a- NextGenMap • Subread/Subjunc • CRAC•- SRmapper•- GEM• STAR • ERNE•- BatMelh•- BLASRa- YAHA • SeciAlto • Batmis • Therearealotof DynMaPp O S A • ContextMap•- as?n1 •- RUMa_ readmappersoutthere!StampydrFAST•-Bismark•- •- MapSplicea-REALa-- BS-Seekera-- - B S - S e e k e r 2 - •• Supersplat liceMapRAT • - B R A T - S W -•- BFAST•- segemeht•- GNUMAP•- GenomeMapper•- mrFAST • • - mrsFAST m r s FA S T- L i l t r a - -• - - - - PerM • - - - - - --- RNA-Mate • - - -X-Matea- - - - SBSMAP • - - - - S p l a z e r RazerS • --•--MicroRazerS - • - - • RazerS3 SHRIMPa ——•SHR1MP2-• BWAs - - •BWA-SW CloudBurst • ProbeMatch •• W H A M - • TopHata- T o p H a t 2-•- Bowlie •- B o w t i e 2 •- MOM4- PASS•- P A S S - b i s - -• Slider • - - -Slider-II- ()PALMA • SOCS"- MAO• SegMap • ZOOM• PalMaNa- RMAP• SOAP• —SOAP2--• BWT-SW • - - S O A P S p l i c e - -• Blata- SSAHA• GMAP • Exonerate • Mummer3 • ELAND • GSNAP-a- 20012002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Years
  • 8. Map reads to genome/transcriptome BWA Bowtie TopHat SHRiMP BFAST MAQ From ebi.ac.uk/~nf/hts_mappers/Fromeloi.ac.uki-nti GnotdrnietAtft.- 2 c 1 4 . 1.5auppl9:512 hitk.,:,www.bicrileckentrakuoiryt41-2105/75.•9•512 HISAT JAGuaIR - - Bw •A-PSSM - - - -M0-A1K Approach ARYANA:AligningReadsbyVetAnother MiladGnoliimi•r,Arjeankba::'',AliSharifiviv:1-•.44,Harritireza(..hitsazMerio. . ..ignit5. Abstract PitTsburgh,PA,1..,'SA31March-OSApril20.4 iert)mRic:COM8-Seq:FourthAnnualRkC(....V/111Satellite'Workshopor)MassivelyParallelSequencing Motivation:Althoughthereare '•'--AarlycihretentaigorithmsancsoftwarerookbrNigningsequencingreacio s r gappeos,Fo./pncesearchisfarfromsoivenStrongInterestinfastalignrrien:-ishest1.1,1pc7e0intheSV or.7tmforaigorithms',V-rhbeperrionfastaridaccuratealignment. anclitiortdenow?assembtyofneat-GeneratoniPet.enringlngreadequitesfastoveriap-layriur-concensus tieInnoczmvecompetitiononagoingaroller:tonofreadstoagiverdatabasedfreferencegenomes.In -f_ultra-• - Contribution:I'leintrot-LreARvANA.afastgappecrear!alignerdevelopedonMebissofiilleAincleA•ing nisastr,_cturewithaco-ripletelyneooaighrrentengOPthatrh.akesitsignrfiramlyfasterthan7hreeotheraligner's: Sowtie2,BMAantiSegAirt),wtncomparableGen-t,c-.:tyant:acruracy.Insteadofthporne-consurningt-haricraciong:vac:et:ores''L,!•handhingrntsrnatrtx5,s,ARYANIAcome;withthpsese-anO-exten0aigorIMmirframeworkanoa 5lonificantlyIrnPrOvedmth efficiencybyIntegrongriNpialgorithmictetirnidt.elincluong dynamArseer:seteCtion, nin'ectionalspeceltensiortreset-4.rephashtablesanogap-fillingcAnynn•nirbrogsarnming.Asthpreaclength _ - - increasesARYA-V/A•.!TItioeflornyintermsofspeedanaahgnmentratebecomesmoreevelent.Thisisinperfect ',lakesAtpar)/todeveionmission-specieNignersforotherappiicationsusingARVANAengine.harmony4viththeiFelilit'ngthtrenaas:heseci4enclnigTechnologiesevohieIhealgorithmcplaTformofARYANA introduction Availability:ARYAN.4compip7esourcerexiecanheobrairteilfromkittp.//gitbubcOrnlar)'ana-aligner i:vt-tyliv:nscellcarriesahatA4offnreconsistingorseveralusedalaborioushierarchilprocesstodividethegertorne thnuNanditl r billitmsofcharacteniwithanswerstomany into srnalier.covegtamwhiletheCelera(;i-siolnicsfirm vitalqumlions_.1-11.mnineffortstodecipherthathookhasreplacedthatb rin yatrnnputationalsequence-assemblysoli- Islernatio,:ratilnynanGenolne..eq.ite-ncingConxort,Lion gainedincreasing:rloitivntlintsince/953WhtiLthedoublewareappliedtothedatageneatedfrontbhoellyshredded helicalstructure011)NAwasdiscovered-'twentyyears(shotgun)wholegentorte17,.ti:.'theautomatedSanger Liter.W..GilbertandA.Maxarnreactthenrst2,1-tit...It-atter r methodwasthegoldstandardfin-abouttwodettleN,as wordofthebook[I].svhenIISangerandhistsolleastiesthe.first*-ene.,-ntieoror021i/Axecitiencing.untiliecreasing applicationoflabeleddideoxynucleotidetriphosphatexvolome ofen-orfreegenomirinformationcan%edmiler- weredmelopinganothmsequentingmethodbasedonthedemandforla.,,tandinexpensivemethodstoproducehigh I I thatact;ISchainterminatorsinaPC.Rrmclior:/2,3... genceofnewtechnologies.thesotailedNett-Geno-rainn I drearnofreadingthehunzarihonk f e wasrtallaedhyAboutthreedecadesafterthefirnONAvegurnLing,SequericisvOVG,S) .-1,paradigrnshihinboththeexperimentaltechnititieli 2 0 1 3 2 0 1 4 2 0 1 5 completionofthe t 3 I li t h efrulnangenrmreprofect(4-61,rhe and computationalInettulthocturred doetothetransition SSAHA• -II B l o t •-_ Ftli 1stca'Aut'O' iniblniran 1 avaiklii‘41MI' (–CIa? V* artfig• . rit:ctir;s1P,eye iveSangermate-pairedreadst-,-41t7to •coeirsgt:,-,1,vi, i,),:kly•ieri?itt,ari, relmenregerunnes,suchasthehumangenotr, ormore hvananliJ-Ktrutoa' V areSarrt-tunnowtr-eas,tat, ttore-.4.0,7f4,,ati, than2000prokitryotex-toilvar),nesandArchaea.lamg, totheNGStec:hnologiesandalso;Availabilityoffinished 2001 2 0 0 0 WattledCentral'''''..•„ Nzvoetr - - --—-ecthecrtPrta4 4..0,,,,t,:.0.,.a.,....„.0,,,elun.:06,z,kx...,0_,-;:t:eC—rnOrdo.Ercfo;CerretnseS:0;xa:13'stect'AL:i.deelat;,,13,17,a5Vt.GISrbtco,„.-"•amoeue?aro%x,,,, (-1'sYl't“:""Mort$Fttecr,...-0-?D14',1C.4,Tr'lelow:ccrseitv..43P.Ittfrtfct'NIa61Lt)&-.ACUISark*arnkozoimat,re:errrao'rPt.v•nit el,A (611; Bloinformatics
  • 9. Filter for uniquely mapped reads SAMtools Picard GATK Unix Filterforuniquelymappedreads SAMtools Picard GATK Unix
  • 10. Filter for high quality alignments SAMtools Picard GATK Unix Filterforhighqualityalignments SAMtools Picard GATK Unix
  • 11. Data suitable for final analysis Datasuitablefor finalanalysis
  • 12. Some questions you should ask yourself…Somequestionsyoushouldaskyourself..
  • 14. Why are each of these steps needed?Whyareeachofthesestepsneeded?
  • 15. Why should I use tool 'X' at this step?WhyshouldIusetoolX'atthisstep?
  • 17. What is the effect on running each step?Whatistheeffectonrunningeachstep?
  • 18. What is a good result?Whatisagoodresult?
  • 19. The effect of applying many 'bioinformatics axes' Illumina data (FASTQ format) 2 FASTQ files Files are ~6.5 GB 52.5 million reads total Theeffectofapplyingmany 1bloinformaticsaxes' IIluminadata (FASTQformat) 2FASIQfiles 52.5millionreadstotal Filesare,-,64.5GB
  • 20. Remove adapters & trim 50.1 million reads Removeadapters&trim 50.1millionreads
  • 21. Align to transcriptome with Bowtie 35.8 million reads map AligntotranscriptomewithBowtie 35.8millionreadsmap
  • 22. Filter for uniquely mapped reads 31.4 million reads align uniquely Filterforuniquelymappedreads 31.4millionreadsalignuniquely
  • 23. Filter for high quality alignments 22.7 million reads have alignment scores of zero Filterforhighqualityalignments 22.7millionreadshavealignmentscoresofzero
  • 24. Data suitable for final analysis Reduced data from 52.5 to 22.7 million reads Datasuitablefor finalanalysis Reduceddatafrom52.5to22.7millionreads
  • 25. It can be helpful to know how the different steps in a workflow reduce your data Itcanbehelpfultoknowhowthedifferent stepsinaworkflowreduceyourdata
  • 28. Run this command after every step of a workflow Runthiscommandafter everystepofaworkflow
  • 29. Let's you see whether output files were actually created Let'syouseewhetheroutputfiles wereactuallycreated
  • 30. Let's you see whether output files contain any data Let'syouseewhetheroutputfiles containanydata
  • 31. Most recently modified files will be at bottom of your terminal window Mostrecentlymodifiedfileswillbe atbottomofyourterminalwindow