SlideShare a Scribd company logo
1 of 57
Download to read offline
Today's bioinformatics lesson
is brought to you by the letter 'D'
by
Keith Bradnam
Image from flickr.com/91619273@N00/
Today'sbloinformatieslesson
isbroughttoyoubytheletter101
Imagefromflickr.com/91619273©NO0/
D
is for Default parametersisforDefaultparameters
D
is also for Danger!isalsoforDanger!
about0,91-6-1?proelootspleasefrtaltusat415,1v.ostet:co/?7
caiwetalatlal7soriyourpateliaseofanostz-Rttoaster/Toleatt?%re
bralleltaclones a b e t efectaadocoMPlaererostadotaOSIZ.R*l
PapaapreovetozrlssabrelosprocluctosaeOSTER',visite/7ospottavoteo
ivit*ostet:coto.
X
X
Nobody reads a toaster manual!Nobodyreadsatoastermanual!
But everyone would read a manual for thisButeveryonewouldreadamanualforthis
, - - r -----------
opts....tos
Bioinformatics programs are not toasters!
ANL
Bloinformaticsprogramsarenottoasters!
-EL
Read the manual!Readthemanual!
At least, read *some* of the manualAtleast,read*some*ofthemanual
TIEBOW
Bowtie
Anultrafestmemory-efficientshortreedaligner
OHNSHOPKINS
U N I V E R S I T Y
Bowtle isanultrafast, memory-efficientshortreadaligner. It alignsshortDNAsequences(reads) to
thehumangenomeat arate of over25million 35-bpreadsperhour.Bowtieindexesthegenomewith
aBurrows-Wheelerindex tokeep itsmemoryfootprint small: typically about2.2GBfor thehuman
genome(2.9GBfor paired-end).
OSIcertified
Recentnews
"Lighterreleased
OLighter isanextremely fastandmemory-efficientprogramfor
correctingsequencingerrors inDNAsequencingdata.Fordetailson
howerror correctioncanhelpimprovethespeedandaccuracy of
downstreamanalysistools,seethepaperinGenomeBiology.
Sourceandsoftwareavailable atGitHub.
"1.1.1-101112014
OFixed acompilinglinkageproblemrelated withMacOSXMavericks.
OImprovedperformance forcaseswherethereferencecontainsmany
stretches ofNs.
OSome minorautomatictestsupdates.
1.1.0-7/19/2014
OAdded support for largeandsmallindexes,removing4-billion-
nucleotidebarrier.Bowtiecannowbeusedwithreferencegenomes
ofanysize
ONo longerreleasing 32-bit binaries.SimplifiedmanualandMakefile
accordingly.
OPhased outCygWinsupport.
OImproved efficiency ofindexfilesloading.
OFixed abug thatmadebowtic-inspecz fail insomesituations.
O(This releasewasbrieflygivenversionnumber2.0.0, butwe
changed it to 1.1.0 to avoidconfusionwithBowtie 2.)
1.0.1release-3/1412014
bowie-bio_sourceforge.ne:
SiteMap
Home
Newsarchive
Gettingstarted
Manual
ToolsthatuseBowtie
LatestRelease
Bowtie1.1.1 1 0 / 1 / 1 4
Pleasecite.Langmead8,TrapnellCoPopM,Salzberg
Ultraastancmemory-efficientalignmentofshot
DNAsequencestothenumangenome.GenomeEltol
10:1125.
Forreleaseupdates,subscribetothemailinglist.
relatedTools
Bowtie2: Fast,accuratereadalignment
Crossbow:Genotyping,cloudcomputing
Tophat:RNA-Seqsplicejunctionmapper
Cufflinks:Isoformassembly,quantitation
Myrna:Cloud,differentialgeneexpression
Lighter:Fasterrorcorrection
OthertoolsusingBowtie
Pre-builtindexes
Considerusing Illumina'siGenomes
collection.EachiGenomesarchivecontains
pre-builtBowtieandBowtie2indexes.
H.sapiens, NCBIGRCh38 2 . 7 GB
How to use bowtie
bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
Howtousebowtie
bowtie [options]* <ebwt> 1-1 <ml> -2 <m2>1 --12 <r> 1 <s>1 [<hit>]
How to use bowtie
bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
Howtousebowtie
bowtie [options]* <ebwt> 1-1 <ml> -2 <m2>1 --12 <r> 1 <s>1 [<hit>]
bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
Bowtie has a lot of options!Bowtiehasalotofoptions!
Thequeryinputfiles(specifiedeitheras<m1>and<m2>,oras<s>)areFASTQfiles(usuallyhavingextension • fq or , fastg).
Thisisthedefault.Seealso: --solexa-quais and --integer-quals.
Thequeryinputfiles(specifiedeitheras<mi>and<m2>,oras<s>)areFASTAfiles(usuallyhavingextension fa, .mfa, fna
orsimilar).All qualityvaluesareassumedtobe40onthePhredqualityscale.
-r T h e queryinputfiles(specifiedeitheras<rni>and<m2>,oras<s>)areRawfiles:onesequenceperline,withoutqualityvalues
ornames.All qualityvaluesareassumedtobe40onthePhredqualityscale.
-c T h e querysequencesaregivenoncommandline. I.e.<ml>,<m2>and<singles>arecomma-separatedlists ofreadsrather
thanlists ofreadfiles.
-C/--color A l i g n incolorspace.Readcharactersareinterpretedascolors.Theindexspecifiedmustbeacolorspaceindex(i.e. built with
bowtie-build -C,or bowtie will printanerrormessageandquit.SeeColorspacealignmentformoredetails.
-Qt--quals <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingunpairedCSFASTAreads.Useincombinationwith -c
and-t. --integer-quais is setautomaticallywhen-Q/--guals isspecified.
--Q1 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#1.mates.Useincombinationwith-C, -f,
and-1. --integer-quals issetautomaticallywhen--Q1isspecified.
--Q2 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#2mates.Useincombinationwith-C, -f,
and-2. --integer-quals issetautomaticallywhen--Q2isspecified.
-s/--skip <int> S k i p (i.e.donotalign)the first <int>readsorpairsintheinput.
-u/--qupto <int> O n l y alignthe first <int>readsorreadpairsfromtheinput (afterthe -s/--skip readsorpairshavebeenskipped).Default:
nolimit.
-51--trim5 <int> T r i m <int>basesfromhigh-quality(left)endofeachreadbeforealignment(default:0).
-3/--trim3 <int> T r i m <int>basesfromlow-quality (right)endofeachreadbeforealignment(default: 0).
--phred33-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus33.Default:on.
--phred64-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus64.Default: off.
--solexa-quals C o n v e r t inputqualitiesfromSolexa(whichcanbenegative)toPhred(whichcan't).Thisisusuallytherightoptionforusewith
(unconverted)readsemittedbyGAPipelineversionspriorto1.3.Default: off.
--solexa1.3-quals S a m e as--phred64-quals.Thisisusuallythe rightoption forusewith(unconverted)readsemittedbyGAPipelineversion1.3
orlater.Default: off.
--integer-quals Qualityvaluesarerepresentedinthereadinput fileasspace-separatedASCIIintegers,e.g.,4040 30 40-, ratherthanASCII
characters,e.g., I n t e g e r s aretreatedasbeingonthePhredqualityscaleunless--s01.exa-quals isalsospecified.
-k <int>
-m<int>
-M <int>
--best
Reportupto<int>validalignmentsperreadorpair(default:1).Validityofalignmentsisdeterminedbythealignmentpolicy(combined
effectsof-n, -v, -1,and-e). Ifmorethanonevalidalignmentexistsandthe--bestand--strataoptionsarespecified,thenonlythose
alignmentsbelongingtothebestalignment'stratum"willbereported.Bowtieisdesignedtobeveryfastforsmall-kbutbowtiecan
becomesignificantlysloweras-kincreases.IfyouwouldliketouseBowtieforlargervaluesof considerbuildinganindexwitha
densersuffix-arraysample,i.e.specifyasmaller-ot—offratewheninvokingbowtie-buildfortherelevantindex(seethePerformance
tuningsectionfordetails).
-a/--all Report allvalidalignmentsperreadorpair(default:off).Validityofalignmentsisdeterminedbythealignmentpolicy(combinedeffectsof
-n,-v, -1,and-e). Ifmorethanonevalidalignmentexistsandthe--bestand--strataoptionsarespecified,thenonlythose
alignmentsbelongingtothebestalignment"stratum"willbereported.Bowtieisdesignedtobeveryfastforsmall-kbutbowtiecan
becomesignificantlyslowerif -a/--all isspecified.IfyouwouldliketouseBowtiewith-a,considerbuildinganindexwithadensersuffix-
arraysample,i.e.specifyasmaller-oi—offratewheninvokingbowtie-buildfortherelevantindex(seethePerformancetuningsection
fordetails).
Suppressallalignmentsforaparticularreadorpair ifmorethan <int> reportablealignmentsexistfor it.Reportablealignmentsarethose
thatwouldbereportedgiventhe -n, -v, -1, -e, -k, -a, --best,and --strata options.Default:nolimit.Bowtieisdesignedtobeveryfast
forsmall-mbutbowtiecanbecomesignificantlyslowerforlargervaluesof-in. IfyouwouldliketouseBowtieforlargervaluesof-k,
considerbuildinganindexwithadensersuffix-arraysample,i.e.specifyasmaller-0/--offratewheninvokingbowtie-buildforthe
relevantindex(seethePerformancetuningsectionfordetails).
Behaveslike-raexceptthatifareadhasmorethan<int>reportablealignments,oneisreportedatrandom.Indefaultoutputmode,the
selectedalignment's7thcolumnissetto<int>-1-1toindicatethereadhasatleast<int>+1validalignments.In -S/--sammode,the
selectedalignmentisgivenaMAPQ(mappingquality)of0andthexm:ifieldissetto<int>4-1.Thisoptionrequires--best; ifspecified
without--best, --bestisenabledautomatically.
MakeBowtieguaranteethatreportedsingletonalignmentsare"best"intermsofstratum(i.e.numberofmismatches,ormismatchesin
theseedinthecaseof-r_mode)andintermsofthequalityvaluesatthemismatchedposition(s).Stratumalwaystrumpsquality;e.g.a
1-mismatchalignmentwherethemismatchedpositionhasPhredquality40ispreferredovera2-mismatchalignmentwherethe
mismatchedpositionsbothhavePhredquality10.When--bestisnotspecified,Bowtiemayreportalignmentsthataresub-optimalin
termsofstratumand/orquality(thoughaneffortismadetoreportthebestalignment).--bes7_modealsoremovesallstrandbias.Note
that --bestdoesnotaffectwhichalignmentsareconsidered"valid"bybowtie,onlywhichvalidalignmentsarereportedbyboTertie.When
--best isspecifiedandmultiplehitsareallowed(via -k or -a), thealignmentsforagivenreadareguaranteedtoappearinbest-to-worst
orderinbewtie'soutput.bowtie issomewhatslowerwhen--best isspecified.
--strata I f manyvalidalignmentsexistandarereportable(e.g.arenotdisallowedviathe -k option)andtheyfall intomorethanonealignment
"stratum",reportonlythosealignmentsthatfallintothebeststratum.Bydefault,Bowtiereportsallreportablealignmentsregardlessof
whethertheyfallintomultiplestrata.When--strata isspecified,--bestmustalsobespecified.
-v <int> R e p o r t alignmentswithatmost<int>mismatches.-0and-1optionsareignoredandqualityvalueshavenoeffectonwhat
alignmentsarevalid.-v ismutuallyexclusivewith-n.
-n/--seedmms<int> Maximum numberofmismatchespermittedinthe"seed",i.e.thefirstLbasepairsoftheread(whereLissetwith -1/--
seedien).Thismaybe0,1, 2or3andthedefaultis2.Thisoptionismutuallyexclusivewiththe -voption.
-ef--magerr <int> Maximum permittedtotalofqualityvaluesatallmismatchedreadpositionsthroughouttheentirealignment,notjustinthe
"seed".Thedefaultis70.LikeMaq,Dow-tieroundsqualityvaluestothenearest10andsaturatesat30;roundingcanbe
disabled with --nomaground.
-1/--seedien <int>
--nomaground
-I/--minins <int>
-X/--maxins <int>
--nofw/--norc
The"seedlength";i.e.,thenumberofbasesonthehigh-qualityendofthereadtowhichthe-nceilingapplies.Thelowest
permittedsettingis5andthedefaultis28.bowtieisfasterforlargervaluesof
MaqacceptsqualityvaluesinthePhredqualityscale,butinternallyroundsvaluestothenearest10,withamaximumof30.By
default,bowtiealsoroundsthisway.--nomagrouncipreventsthisroundinginbowtie.
Theminimuminsertsizeforvalidpaired-endalignments.E.g.if -I 60isspecifiedandapaired-endalignmentconsistsoftwo
20-bpalignmentsintheappropriateorientationwitha20-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas
-xisalsosatisfied).A19-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or-!,;arealsoused,the constraint
isappliedwithrespecttotheuntrimmedmates.Default:O.
Themaximuminsertsizeforvalidpaired-endalignments.E.g.if -x100isspecifiedandapaired-endalignmentconsistsoftwo
20-bpalignmentsintheproperorientationwitha60-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas -I is
alsosatisfied).A61-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or -5arealsoused,the -xconstraintis
appliedwithrespecttotheuntrimmedmates,notthetrimmedmates.Default:250.
Theupstream/downstreammateorientationsforavalidpaired-endalignmentagainsttheforwardreferencestrand.E.g.,if --
fr isspecifiedandthereisacandidatepaired-endalignmentwherematelappearsupstreamofthereversecomplementof
mate2andtheinsertlengthconstraintsaremet,thatalignmentisvalid.Also,ifmate2appearsupstreamofthereverse
complementofmatelandallotherconstraintsaremet,thattooisvalid. --rf likewiserequiresthatanupstreammatelbe
reverse-complementedandadownstreammate2beforward-oriented. --ff requiresbothanupstreammatelanda
downstreammate2tobeforward-oriented.Default: --fr when-C(colorspacealignment)isnotspecified, --ff when-Cis
specified.
If --nowisspecified,bowtiewillnotattempttoalignagainsttheforwardreferencestrand. If --nort isspecified,bowtiewill
notattempttoalignagainstthereverse-complementreferencestrand.Forpaired-endreadsusing --fr or --rf modes,--nofIsT
and--norcapplytotheforwardandreverse-complementpairorientations.I.e.specifying--nofwand--±r willonlyfindreads
intheR/Forientationwheremate2occursupstreamofmate1withrespecttotheforwardreferencestrand.
--maxbts T h e maximumnumberofbacktrackspermittedwhenaligningareadin 2 or-n3mode(default:125without--best,800
with--best).A"backtrack"istheintroductionofaspeculativesubstitutionintothealignment.Withoutthislimit,thedefault
Printtheamountofwall-clocktimetakenbyeachphase.
-V--offbase <int> When outputtingalignmentsinBowtieformat,considerthefirstbaseofareferencesequencetohaveoffset<int>.Thisoption
hasnoeffectin-si—salamode,sinceSAMmandates1-basedoffsets.Default:O.
--quiet P r i n t nothingbesidesalignments.
--refout
--al <filename>
--un <filename>
--max <filename>
--suppress <cols>
--fullref
WritealignmentstoasetoffilesnamedrefXXXXX.map,wherexxxXXisthe0-paddedindexofthereferencesequencealigned
to.Thiscanbeausefulwaytobreakupworkfordownstreamanalyseswhendealingwith,forexample,largenumbersofreads
alignedtotheassembledhumangenome.If <hits>isalsospecified,itwillbeignored.
--refidx W h e n areferencesequenceisreferredtoinareportedalignment,refertoitby0-basedindex(itsoffsetintothelistof
referencesthatwereindexed)ratherthanbyname.
Writeallreadsforwhichatleastonealignmentwasreportedtoafilewithname<filename>.Writtenreadswillappearasthey
didintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinbowtie.Paired-
endreadswillbewrittentotwoparallelfileswith_1andinserted inthefilename,e.g.,if <filename>isaligned.fq,the#1
andIt2matesthatalignatleastoncewillbewrittentoaligned_l.fqandaligned_2.fa_respectively.
Writeallreadsthatcouldnotbealignedtoafilewithname<filename>.Writtenreadswillappearastheydidintheinput,
withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinBowtie.Paired-endreadswillbe
writtentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if<filename>isunaligned.fq,the#1and#2mates
thatfailtoalignwillbewrittentounaligned_l fo andunaligned_2 q respectively.Unless--maxisalsospecified,readswith
anumberofvalidalignmentsexceedingthelimitsetwiththe-moptionarealsowrittento<filenane>.
Writeallreadswithanumberofvalidalignmentsexceedingthelimitsetwiththe-moptiontoafilewithname<filename>.
Writtenreadswillappearastheydidintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhave
takenplacewithin•zowtie.Paired-endreadswillbewrittentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if
<filename>ismax.fa,the#1and#2matesthatexceedthe-mlimitwillbewrittentomax_1.fqandmax_2.fqrespectively.
Thesereadsarenotwrittentothefilespecifiedwith--lart.
Suppresscolumnsofoutputinthedefaultoutputmode.E.g.if--suppress 1, 5,6isspecified,thereadname,readsequence,
andreadqualityfieldswillbeomitted.SeeDefaultBowtieoutputforfielddescriptions.Thisoptionisignored if theoutput
modeis-S/--sarr..
Printthefullreferncesequencename,includingwhitespace,inalignmentoutput.Bydefaultbowtieprintseverythinguptobut
notincludingthe firstwhitespace.
Colorspace
--snpphred <int>
--snpfrac <dec>
--col-cseq
--col-equal
--col-keepends
SAM
-S/--sam
Whendecodingcolorspacealignments,use <int> astheSNPpenalty.Thisshouldbesetto theuser'sbestguessof thetrue ratio
ofSNPsperbasein thesubjectgenome,converted to thePhredqualityscale.E.g., if theuserexpectsabout1SNPevery1,000
positions,--snpphredshouldbeset to30(whichisalsothedefault).Tospecifythefractiondirectly,use --snpfrac.
Whendecodingcolorspacealignments,use<dot>astheestimatedratio ofSNPsperbase.Forbestdecodingresults, thisshould
beset to theuser'sbestguessof thetrue ratio. bowtie internallyconvertsthe ratio toaPhredquality,andbehavesas if that
qualityhadbeensetviathe--zinpphredoption.Default:0.001.
Ifreadsareincolorspaceandthe defaultoutputmodeisactive, --col-cseq causesthereads'colorsequencetoappearinthe
read-sequencecolumn(column5)instead of thedecodednucleotidesequence.SeetheDecodingcolorspacealignmentssection
fordetailsaboutdecoding.Thisoptionisignoredin -s/--sammode.
Ifreadsareincolorspaceandthedefaultoutputmodeisactive,--col-cguaicausesthereadsoriginal(color)qualitysequence
toappearinthe qualitycolumn(column6)instead of thedecodedqualities.SeetheColorspacealignmentsectionfor details
aboutdecoding.Thisoptionisignoredin-S1--sarrimode.
Whendecodingcolorpsacealignments,bowtie trims offanucleotideandqualityfromthe leftandrightedgesofthealignment.
Thisisbecausethosenucleotidesaresupportedbyonlyonecolor,in contrasttothemiddlenucleotideswhicharesupportedby
two.Specify--col-keepends tokeeptheextreme-endnucleotidesandqualities.
PrintalignmentsinSAMformat.SeetheSAMoutputsectionofthemanualfordetails.TosuppressallSAMheaders,use--sam-
noheadinaddition to -S/--sam.Tosuppressjust the headers (e.g. if thealignmentisagainstaverylargenumberofreference
sequences),use--sam-nosqinaddition to -S/--sam. bowtiedoesnot writeBAMfilesdirectly, butSAMoutputcanbeconvertedto
BAMonthe flybypiping•DowtielSoutput tosamtools view. -Si—sarnisnotcompatiblewith --refout.
--mapo<int> I f analignmentisnon-repetitive(accordingto-m,--strataandotheroptions)settheMAPQ(mappingquality)fieldtothisvalue.
SeetheSAMSpecfordetailsabouttheMAK,fieldDefault:255.
--sam-hohead S u p p r e s s headerlines(starting with@)whenoutputis-S/--sarr..Thismustbespecifiedinadditionto -S/--sam.--sam-noheadis
ignoredunless-s/--sarr. isalsospecified.
--sam-hosq S u p p r e s s 1S0headerlineswhenoutputis--Si—sam.Thismustbespecifiedinaddition to -S/--sam.--sam-hosqisignoredunless
-sj--sam isalsospecified.
--sam-RG<text> A d d <text> (usually of theformTAG:VAL,e.g.ID:IL-1LANE2)asafieldonthe2:RGheaderline.Specify--sam-RGmultipletimesto
setmultiplefields.SeetheSAMSpecfordetailsaboutwhatfieldsarelegal.Notethat, if any@RGfieldsaresetusingthisoption,
theIDandSMfieldsmustbothbeamongthemtomakethegRGlinelegalaccordingto theSAMSpec.--sari-RGisignoredunless -
Performance
-of—offrate <int>
-pi—threads <int>
--mm
--shmem
Other
Overridetheoffrate oftheindexwith <int>. If <int> isgreaterthantheoffrateusedtobuildtheindex,thensomerow
markingsarediscardedwhentheindexisreadintomemory.Thisreducesthememoryfootprintofthealignerbutrequires
moretimetocalculatetextoffsets. <int> mustbegreaterthanthevalueusedtobuildtheindex.
Launch<in':>parallelsearchthreads(default: 1).Threadswillrunonseparateprocessors/coresandsynchronizewhenparsing
readsandoutputtingalignments.Searchingforalignmentsishighlyparallel,andspeedupisfairlyclosetolinear.Thisoptionis
onlyavailable if b,owtieislinkedwiththeothreadslibrary(i.e. ifBOVIIE_PTHREADS=0isnotspecifiedatbuildtime).
Usememory-mappedI/O toloadtheIndex,ratherthannormalCfileI/O.Memory-mappingtheindexallowsmanyconcurrent
bowtioprocessesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverhead
justonce).Thisfacilitatesmemory-efficientparallelizationofbowtieInsituationswhereusing-p isnotpossible.
Usesharedmemorytoloadtheindex,ratherthannormalCfileI/O.Usingsharedmemoryallowsmanyconcurrentbowtie
processesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverheadjustonce).
Thisfacilitatesmemory-efficientparallelizationofbowtieinsituationswhereusing-p isnotdesirable.Unlike--mm,--shnem
installstheindexintosharedmemorypermanently,or untiltheuserdeletesthesharedmemorychunksmanually.Seeyour
operatingsystemdocumentationfordetailsonhowtomanuallylistandremovesharedmemorychunks(onLinuxandMacOS
X,thesecommandsareipcsandipcm).YoumayalsoneedtoincreaseyourOS'smaximumshared-memorychunksizeto
accomodatelargerindexes;seeyourOSdocumentation.
--seed <int> U s e <int>astheseedforpseudo-randomnumbergenerator.
--verbose P r i n t verboseoutput(fordebugging).
--version P r i n t versioninformationandquit.
-hi—help P r i n t usageinformationandquit.
flickr.com/photos/dannyjacksonflickrcomiphotosidannyjackson
4
• -
"I'll just use the default parameters!""I'lljustusethedefaultparameters!"
"What could go wrong?""Whatcouldgowrong?"
First, some terminology…
Read 1 Read 2
'Insert'
inner-mate pair distance
DNA/RNA Fragmentadapter adapteradapter
First,someterminology...
DNA/RNAFragment adapter
Read1
inner-matepairdistance
'Insert'
Read2
We can plot the distribution
of inner mate pair distances
Wecanplotthedistribution
ofinnermatepairdistances
ReadsmappedtoTranscriptomewithBowtie2
200 4 0 0 6 0 0 8 0 0
Innersizebetweenmappedreadpairs
Notice anything unusual?
ReadsmappedtoTranscriptomewithBowtie2
Noticeanythingunusual?
200 4 0 0 6 0 0 8 0 0
Innersizebetweenmappedreadpairs
Bowtie 2 has an -X option
for 'max fragment length'
The default value is 500 bp
= 100 + 100 + 300
What happens if we
increase -X to 2000 bp?
Bowtie2hasan-Xoption
for'maxfragmentlength'
Thedefaultvalueis500bp
=100+100+300
Whathappensifwe
increase-Xto2000bp?
New data!
c7,
0 2 0 0
ReadsmappedtoTranscriptomewithBowtie2
1
Newdata!
11111r1n1Ithimin
Innersizebetweenmappedreadpairs
400 6 0 0 8 0 0
Most programs will have some options
that you should consider changing
Mostprogramswillhavesomeoptions
thatyoushouldconsiderchanging
Some options from TopHat
TopHat
command-line option
Meaning
Default
value
--num-threads
How many CPU threads to
use when running TopHat
1
--min-intron-length Minimum intron length 70
-r / --mate-inner-dist
Expected (mean) inner
distance between mate pairs
50
--mate-std-dev
Standard deviation for the
distribution on inner distances
20
SomeoptionsfromTopHat
1WTopHat
command-lineoption Meaning
Default
value
--num-threads HowmanyCPUthreadsto
usewhenrunningTopHat
1
You nearly always can run with more
processors/threads than the default (1)
Younearlyalwayscanrunwithmore
processors/threadsthanthedefault(1)
Some options from TopHat
TopHat
command-line option
Meaning
Default
value
--num-threads
How many CPU threads to
use when running TopHat
1
--min-intron-length Minimum intron length 70
-r / --mate-inner-dist
Expected (mean) inner
distance between mate pairs
50
--mate-std-dev
Standard deviation for the
distribution on inner distances
20
SomeoptionsfromTopHat
1WTopHat
command-lineoption Meaning
Default
value
--num-threads HowmanyCPUthreadsto
usewhenrunningTopHat
1
--min-intron-lengthMinimumintronlength 7 0
This might not be suitable for non-vertebratesThismightnotbesuitablefornonvertebrates
D
is for DocumentationisforDocumentation
You should document your efforts!Youshoulddocumentyourefforts!
You should document as you goYoushoulddocumentasyougo
1iortt,peke),,,v,
4Z-c;(>t_t'
17:11
tresP L
LA-r,,oc-nrt
t Lek
(20tf-1-*)1re-4,
(3,31 - or-
1?-1.,tokos•,,,,,4
Rool1,11--
12ProN
rc
RIcA.046
-AccAlitipow)
Pr5TPlowe
opoy.•)&!).
t
r Aer-od %Pt,'
• •••••-
t'vsa ( ( A t %44.--5 0-F 6 Co% c -
(tcei L0pAr COV Pv-0 t ) ‘_
5i et-f)triz. a c et 0 ) Loe_ev-T,t,voS
otir•I'L re e?
(c• ctfte,4411,
rev6-1esTmok•as,
kJ,So&ISIFir,at- 05.771 - efive,:t •••1
_
0V(t,d2(sty
LI5
IP(112,'A
lActi - r r e r
Cc. 5 4 , e c e 14(,,r5 ,Ferv'A4t r
3L.
PSAe e V O I N 4
or-ti-t1/4 etec.vt,
6,11,2) e
ot%Ps. C E-4• a t . t ,
LAJo-oevw'_1__ s v l y, - 1 c/94,er Ttrei-4/. _
(3 eos-1- r . v c-rf Pi410 /4,5
ok.1 t%'t / e n AY) 6"-',00•4)
6•, O f f t ) s s .
.SV.Cte,
(v-11re,iYVIte%kV%) ,,,,,e1r,C rot; C : )
r pu ) 4 61 Cke,4tteV c•At r a t tocr
CT') kJsIlk); 0 1 ; , 4 - S P -h0460w b l ' " "
re.4,ki,6- 1 S t . e12-",r,POT cve,e3e4crey
"egg,40LS-TAN oPcaftle - r t , *ger o-yreleve
_ A t , 1 7 . 7 L t 4-,5 F a t
1 1 0
PI21rA, ke, i 2 eleta
I:3.0ex4itiv a , - Pp v . / )•% ) 1 a rient
or-
g,vA.) e s - $ • T4P
••••• _ 3
covy L a _ Tsres
)ti7t-eltri
—1TO r e
(Drrn)t-iPM-N4C Arc- CA-13
rrsArtf..tg-S6,„tid—toAtt-
e1244-r
(0;:t tn.)
•••,trttrteT—.1.t.,,IA
1
(..•Oren
t
irtra
t4..c..ec4
IAI/oe4 ( I N T cyt _
tu.s.+-•
ST ft:,S
i42the4.1st2_7
IRO Ft-ca*.$
nit.P0-6
Lab books are good…
tiortt,peke),,,v,
17-6,Nos•••,,,.4
PsIcA.0*s6.._111Q•
-kce.iitoow)
RIN'YrIl
-1---- —
T c p
- efIc•I:rt _ 3
44,elk, -r 4rer
-:-1)
re-v6-1esTA.Was,
U_Si)S IFINit( OS:71—C plAlt " 1 n r • t e . . . n c ,
1:Y4'k.. r e PIONCti
(C•Nitt. C I E tot,
L
wkoD°'6'.%L•
54,ec e 14e.m-, r a 1E4--iter,'A‘tr
1 )
PsAee
(vo f;) wtE ct.c1,..J)
1160411,pt„,„
1
V)1.N„.
41.-ettneT—.1„.„;1,1
abbooksaregood...
L
• •11 • • 1 • - •
•nrt (20•- • - • • • • — •
- •• • • ••• -
opoysn&i)
4._(3,3-, 0 r- r
FpaL ( f A ( 0 5 - F 6 col6,1 c -
L/tors E 0 P v - 0 s p . ) _
P v i z , 5 i e r •-t• 1 2 0 et 0 ) Weaft-Tt0,1 S
, A r c , 6 ( el2-•
ittg, LSTAN
Or:-
L/ ( r , 1 7 2 - 1 ) ,
oex ' L k - P p 1 . 1 1 ) ) ) 1 a 1 f l I
1/1/1,-1_1w4,12,'A
•5) eltri
OPco2at1/4.)
P'1't F a t * -
or-
"••••••-)
a. Qr44-19—
v1,6A*5z•-
*gerE-yrn,J,6_1
PQtrAl / 2 e
Cer7 / rvs-1,- It 4
tA>11)*-, it-N4C
1244--r
(or'sesn
IAI/to'NNI (INT cyt
V I - , E WA '
nii.P0-6
…but electronic lab books are more helpful
few,":-)
1iortt,peke),,,v,
17-1-Nos•et,4
RIcA.e*s6
-kce.tiovw)
ROVNI-11
-1----- —
51NA
_
L
1:3"kk- r e x? IONCti
(C•Nist. ClEtoc.,
PsAee
v
—ef4c--;rtt
(Les-$ Tcp
t - r 4rer
-z1)
01"14
i1/4JitsTA.LIVai1241
- C
5-0Zc e k.,041,,,,r,
1 )
(vo ,;) e ct.cro,) _
f i c tert-t'4(-1 r
416041 4 , 1
1
V)I
d_et41
-
butelectroniclabbooksaremorehelpful
L
LA-1--,oc.•nrt tve7).1re-4,
(3,310ife--or-
(//k./ 4 L t e l l " - S
e P p A r S
P i t
Pit$ rc
• ••••-
opoysn&i)
F C014:140 *C-
C0ev—o1)_%--tvs")
5ieor-(3,11-0.'" a° ec.,-) t w o s
ittg. LSTphekje
Or-
,/71-111kA
est-
I:100v4lt-Ae'Lk- pp 1 . 1 1 ) ) ) 1 1 4 t (1:1 "•5
co2at1/4.) ci
17-7,E,Fat
•g,cv,rOrtor>y
or- 904---,
aver to,Vreteve6-1
Pi2trA,Ike,t, /2e/ta
mc.A
It71-
-1 b C LI7 / rr.,1-1-t 4
a>11n),-,P11-N4cAgzAwk-o_p_
Qr44-19-
e12-44--r
ratorNN.)eiNtlcyt
nit.P0-6
Microsoft
Word
XJft
ord
w
Tools like Microsoft Word
might not be future proof
ToolslikeMicrosoftWord
mightnotbefutureproof
Consider using plain text filesConsiderusingplaintextfiles
I.e. something that can be read using 'less'Lasomethingthatcanbereadusing'less'
I like to write README files in Markdown format for everything
Milk-DNase-Seq-Project:RNA-SeqAnalyis
--- - - -
Seemain,READMErl'ADME.md) file for moreinformation about this project.
4*BovineRNA-seqdata ##
Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100
bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/
share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file
whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows.
Theultimate goal is to find genes that are differentially expressedbetween
thesetwodevelopmentalstages.
Thesefiles were originallycompressedwith bzip2, will re-compress with gzip
sothat existing pipelines canwork with them.And will alsorenamethem to
havefastq suffix:
—bash
cdishare/tamu/Data/RNA-Seq/Cow/2014-10
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
tttCheckingbarcodes inRNA-Seqdata ##
Let'scheckon all barcodesbeingused. Will makesomesoft links to the data
files
"'bash
cdAnalysis/Test
mkdirRNA-Seq_Barcode_check
cdRNA-Seq_Barcode_check
qlogin
bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.*
[12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt
Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe
needto treat each file separately.
#TestrunofScytheandSickle#
IliketowriteREADMEfilesinMarkdownformatforeverything
Easy to output to HTML or PDF
Milk-DNase-Seq-Project:RNA-SeqAnalyis
--- - - -
Seemain,READMErl'ADME.md) file for moreinformation about this project.
4*BovineRNA-seqdata ##
Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100
bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/
share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file
whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows.
Theultimate goal is to find genes that are differentially expressedbetween
thesetwodevelopmentalstages.
Thesefiles were originallycompressedwith bzip2, will re-compress with gzip
sothat existing pipelines canwork with them.And will alsorenamethem to
havefastq suffix:
—bash
cdishare/tamu/Data/RNA-5eq/Cow/2014-10
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
tttCheckingbarcodes inRNA-Seqdata ## 1
is/Test
Let'scheckon all barcodesbeingused. Will makesomesoft links to the data Barcodecheck
files q Barcode_check
"'bash
cdAnalysis/Test
mkdirRNA-Seq_Barcode_check
cdRNA-Seq_Barcode_check
qlogin
bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.*
[12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt
Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe
needto treat each file separately.
#TestrunofScytheandSickle#
Milk-DNase-Seq-Project:RNA-SeqAnalyis
SeemainREADMEfileformoreinformationaboutthisproject.
BovineRNA-secidata
Storedin/share/tamu/Data/RNA-Seq/Cow/2014-10Lookslikepaired-read100bpdata.Intotal31x2files,rangingfrom1-3.5
GBinsize.Seealsotheishareitamo/Data/RNA-Seq/Cow/Metadatadirectorywhichcontainsametadatafilewhichsuggeststhat
wehavedatafrom15virgincowsand16lacatingcows.
Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages.
Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwiththem.Andwill
alsorenamethemtohavefastqsuffix:
cd/share/tamu/Data/RNA-Seq/Cow/2014-10
bunzip2*.b22
rename.pl s/txt/fastq/ *.txt
gzip *.fastq
CheckingbarcodesinRNA-Seqdata
et'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles
ogin
bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz2 1 grep "MI" 1 sed 's/.. f121:N:0://' 1 sort 1 unlq -c >
UnfortunatelythisfailedduetoaNospaceleftondeviceerror'.Somaybeneedtotreateachfileseparately.
TestrunofScytheandSickle
UnliketheDNase-Segdata.wenowhavepaired-enddata,whichrequiresrunningSicklealittledifferently.Sofirst,let'sdoatest
(using10.000readsfromeachoftwopairedFAST()files):
cdishare/tamu/Analysis/Test
mkdirPaired_end_seythe_sickle test
EasytooutputtoHIM_orPDF
http://korflab.ucdavis.edu/bootcamp.md
http://korflab.ucdavis.edu/bootcamp.html
Markdown is easy to read, and converts to
useful HTML (with hyperlinks and formatting)
http://kortlabiucdavis.edu/bootcamp.md
http://kortlabiucdavis.edu/bootcampihtml
Markdowniseasytoread,andconvertsto
usefulHIM_(withhyperlinksandformatting)
Title: Command-line Bootcamp
Authors: Keith Bradnam
Date: 2015-06-14
Address:Genome Center, UC Davis, Davis, CA, 95616
#Command-line Bootcamp
### Keith Bradnam
###UC Davis Genome Center
#10 Version 1.0 - - - June 2015
<br><br><br>
><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img
ait="Creative CommonsLicense" style-"border-width:0"
src="https://i.creativecommons.org/l/by-nc-sa/4.0188x31.png" /></a><br />This work is
licensed under a <a rel-"license" href-"http://creativecommons.org/licenses/by-nc-
sa/4.0/">CreativeCommonsAttribution-NonCommercial-ShareAlike 4.0 International
License</a>. Please send feedback, questions, money, or abuse to <krbradnamquedavis.edu>
Introduction [Introduction]
This 'bootcamp' i s intended to provide the reader with a basic overview of essential
Unix/Linux commands that will allow them to navigate a file system and move, copy, edit
files. I t will also introduce a brief overview of some 'power' commands in Unix.
##Why Unix? [Why Unix]
The [Unix operating system][Unix] has been around since 1969. Back then
thing as a graphical user interface. You typed everything. I t mayseem a
keyboard to issue commands today, but i t ' s much easier to automate keybo
mouse tasks. There are several variants of Unix (including [Linux][Linux o u g
differences do not matter much for most basic functions.
[Unix]: http://en.wikipedia.org/wiki/Unix
[Linux]: http://en.wikipedia.org/wiki/Linux
Increasingly, the raw output of biological research exists as _in silico_ data, usually
in the form of large text files. Unix is particularly suited to working with such files
andhas several powerful (and flexible) commands that can process your data for you. The
real strength of learning Unix is that most of these commands can be combined in an
almost unlimited fashion. So i f you can learn just five Unix commands, you will be able
to do a lot more than just five things.
OfTypeset Conventions [Typeset]
Command-line examples that you are meant to type into a terminal window will be shown_
Command-lineBootcamp
KeithBradnam
UCDavisGenomeCenter
Version1.0—June2015
ThisworkislicensedunderaCreativeCommonsAttribution-
NonCommercial-ShareAlike4.0InternationalLicense.Pleasesend
feedback,questions,money,orabusetokrbradnamgucdavis.edu
Introduction
This'bootcampisintendedtoprovidethereaderwithabasicoverviewofessential
Unix/Linuxcommandsthatwillallowthemtonavigateafilesystemandmove,copy,
editfiles.Itwillalsointroduceabriefoverviewofsome'power'commandsinUnix.
WhyUnix?
TheUnixoperatingsystemhasbeenaroundsince1969.Backthentherewasno
suchthingasagraphicaluserinterface.Youtypedeverything.itmayseemarchaicto
useakeyboardtoissuecommandstoday,butitsmucheasiertoautomatekeyboard
0 This repositorySearch Explore Gist Blog Help k b r a d n a m 0 0
KorfLab/Milk-DNase-Seq-Project
i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand
0 Watch
cdAnalysis/Test
mkdirRNA—Seq_Barcode_check
cdRNA—Seq_Barcode_check
qlogin
*Star 0 V F o r k 0
i=
koradnam3daysagoNewanalysisusingRtorunDEseq2
1_c::,tribiAtOr
317lines(213sloc)12.729kb R a w Blame History m
Milk-DNase-Seq-Project:RNA-SeqAnalyis
SeemainREADMEfileformoreinformationaboutthisprolect.
BovineRNA-seqdata
Storedin/shereitamu/Data/RNA-Seq/Cow/2014-1.0Lookslikepaired-read100bpdata.Intotal31x2files,ranging
from1-3.5GBinsize.SeealsotheisharettamuiData/RNA-Seq/CowiMetadatadirectorywhichcontainsametadata
filewhichsuggeststhatwehavedatafrom15virgincowsand16lacatingcows.
Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages.
Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwith
them.Andwillalsorenamethemtohavefastesuffix:
cdishare/tamu/Data/RNA—Seq/Cow/2014-10
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
CheckingbarcodesinRNA-Selldata
Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles
1
Sites like GitHub use Markdown
ThisrepositorySearch
1
Explore Gist Blog Help kbradnam 0 0
1
KorfLab/Milk-DNase-Seq-Project
i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand
0 Watch *Star 0 V F o r k 0
i=
kbradnam3daysagoNewanalysisusingRtorunDEseq2
1 b 10r
317lines(213sloc)12.729kb RawBlame History I l m
Milk-DNase-Seq-Project:RNA-SeqAnalyis
SeemainREADMEfileformoreinformationaboutthisprolect.
SiteslikeGitHubuseMarkdown
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
CheckingbarcodesinRNA-Selldata
Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles
cdAnalysis/Test
mkdirRNA-Seq_Barcode_check
cdRNA-Seg_Barcode_check
qlogin
Reproducible science is important!Reproduciblescienceisimportant!
Reviewers increasingly want more
details regarding bioinformatics methods
Reviewersincreasinglywantmore
detailsregardingbloinformaticsmethods
Make it easy to for others to follow your workMakeiteasytoforotherstofollowyourwork
The endTheend

More Related Content

What's hot

NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練Sheng-Hao Ma
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby SystemsEngine Yard
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging RubyAman Gupta
 
ch8-pv1-the-virtual-filesystem
ch8-pv1-the-virtual-filesystemch8-pv1-the-virtual-filesystem
ch8-pv1-the-virtual-filesystemyushiang fu
 
IPv6SG_03_20121103
IPv6SG_03_20121103IPv6SG_03_20121103
IPv6SG_03_20121103@ otsuka752
 
Creating a SNP calling pipeline
Creating a SNP calling pipelineCreating a SNP calling pipeline
Creating a SNP calling pipelineDan Bolser
 

What's hot (6)

NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
 
Debugging Ruby Systems
Debugging Ruby SystemsDebugging Ruby Systems
Debugging Ruby Systems
 
Debugging Ruby
Debugging RubyDebugging Ruby
Debugging Ruby
 
ch8-pv1-the-virtual-filesystem
ch8-pv1-the-virtual-filesystemch8-pv1-the-virtual-filesystem
ch8-pv1-the-virtual-filesystem
 
IPv6SG_03_20121103
IPv6SG_03_20121103IPv6SG_03_20121103
IPv6SG_03_20121103
 
Creating a SNP calling pipeline
Creating a SNP calling pipelineCreating a SNP calling pipeline
Creating a SNP calling pipeline
 

Similar to This bioinformatics lesson is brought to you by the letter 'D'

Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout source{d}
 
Reverse-Engineering a Proprietary Sound Sample Format
Reverse-Engineering a Proprietary Sound Sample FormatReverse-Engineering a Proprietary Sound Sample Format
Reverse-Engineering a Proprietary Sound Sample FormatAndrew Bulhak
 
SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2solgenomics
 
Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in Rschamber
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Maté Ongenaert
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)Alexandre Moneger
 
printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);
printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);
printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);Joel Porquet
 
Kernel Recipes 2016 - Why you need a test strategy for your kernel development
Kernel Recipes 2016 - Why you need a test strategy for your kernel developmentKernel Recipes 2016 - Why you need a test strategy for your kernel development
Kernel Recipes 2016 - Why you need a test strategy for your kernel developmentAnne Nicolas
 
Software Vulnerabilities in C and C++ (CppCon 2018)
Software Vulnerabilities in C and C++ (CppCon 2018)Software Vulnerabilities in C and C++ (CppCon 2018)
Software Vulnerabilities in C and C++ (CppCon 2018)Patricia Aas
 
Crash Dump Analysis 101
Crash Dump Analysis 101Crash Dump Analysis 101
Crash Dump Analysis 101John Howard
 
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...RootedCON
 
Introduction to Compiler Development
Introduction to Compiler DevelopmentIntroduction to Compiler Development
Introduction to Compiler DevelopmentLogan Chien
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01Raman Kannan
 

Similar to This bioinformatics lesson is brought to you by the letter 'D' (20)

Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout Introduction to source{d} Engine and source{d} Lookout
Introduction to source{d} Engine and source{d} Lookout
 
Reverse-Engineering a Proprietary Sound Sample Format
Reverse-Engineering a Proprietary Sound Sample FormatReverse-Engineering a Proprietary Sound Sample Format
Reverse-Engineering a Proprietary Sound Sample Format
 
SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2SGN Introduction to UNIX Command-line 2015 part 2
SGN Introduction to UNIX Command-line 2015 part 2
 
Ns network simulator
Ns network simulatorNs network simulator
Ns network simulator
 
Programming Assignment Help
Programming Assignment HelpProgramming Assignment Help
Programming Assignment Help
 
Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in R
 
Odp
OdpOdp
Odp
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)04 - I love my OS, he protects me (sometimes, in specific circumstances)
04 - I love my OS, he protects me (sometimes, in specific circumstances)
 
Introduction to Linux
Introduction to LinuxIntroduction to Linux
Introduction to Linux
 
printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);
printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);
printf("%s from %c to Z, in %d minutes!\n", "printf", 'A', 45);
 
Kernel Recipes 2016 - Why you need a test strategy for your kernel development
Kernel Recipes 2016 - Why you need a test strategy for your kernel developmentKernel Recipes 2016 - Why you need a test strategy for your kernel development
Kernel Recipes 2016 - Why you need a test strategy for your kernel development
 
Software Vulnerabilities in C and C++ (CppCon 2018)
Software Vulnerabilities in C and C++ (CppCon 2018)Software Vulnerabilities in C and C++ (CppCon 2018)
Software Vulnerabilities in C and C++ (CppCon 2018)
 
Crash Dump Analysis 101
Crash Dump Analysis 101Crash Dump Analysis 101
Crash Dump Analysis 101
 
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
 
Introduction to Compiler Development
Introduction to Compiler DevelopmentIntroduction to Compiler Development
Introduction to Compiler Development
 
M12 random forest-part01
M12 random forest-part01M12 random forest-part01
M12 random forest-part01
 
Mona cheatsheet
Mona cheatsheetMona cheatsheet
Mona cheatsheet
 
Microchip Mfg. problem
Microchip Mfg. problemMicrochip Mfg. problem
Microchip Mfg. problem
 

More from Keith Bradnam

13 questions you might have about galaxy
13 questions you might have about galaxy13 questions you might have about galaxy
13 questions you might have about galaxyKeith Bradnam
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam
 
This bioinformatics lesson is brought to you by the letter 'W'
This bioinformatics lesson is brought to you by the letter 'W'This bioinformatics lesson is brought to you by the letter 'W'
This bioinformatics lesson is brought to you by the letter 'W'Keith Bradnam
 
This bioinformatics lesson is brought to you by the letter 'T'
This bioinformatics lesson is brought to you by the letter 'T'This bioinformatics lesson is brought to you by the letter 'T'
This bioinformatics lesson is brought to you by the letter 'T'Keith Bradnam
 
Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestKeith Bradnam
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Keith Bradnam
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Keith Bradnam
 
Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Keith Bradnam
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Keith Bradnam
 
Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Keith Bradnam
 
What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?Keith Bradnam
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writingKeith Bradnam
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Keith Bradnam
 
Polish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesPolish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesKeith Bradnam
 
10 tips for adding polish to presentations
10 tips for adding polish to presentations10 tips for adding polish to presentations
10 tips for adding polish to presentationsKeith Bradnam
 
Database talk for Bits & Bites meeting
Database talk for Bits & Bites meetingDatabase talk for Bits & Bites meeting
Database talk for Bits & Bites meetingKeith Bradnam
 
Benchmarking short-read mapping programs
Benchmarking short-read mapping programsBenchmarking short-read mapping programs
Benchmarking short-read mapping programsKeith Bradnam
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesKeith Bradnam
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished? Keith Bradnam
 
Twitter 101 - an introduction to Twitter
Twitter 101  - an introduction to TwitterTwitter 101  - an introduction to Twitter
Twitter 101 - an introduction to TwitterKeith Bradnam
 

More from Keith Bradnam (20)

13 questions you might have about galaxy
13 questions you might have about galaxy13 questions you might have about galaxy
13 questions you might have about galaxy
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
This bioinformatics lesson is brought to you by the letter 'W'
This bioinformatics lesson is brought to you by the letter 'W'This bioinformatics lesson is brought to you by the letter 'W'
This bioinformatics lesson is brought to you by the letter 'W'
 
This bioinformatics lesson is brought to you by the letter 'T'
This bioinformatics lesson is brought to you by the letter 'T'This bioinformatics lesson is brought to you by the letter 'T'
This bioinformatics lesson is brought to you by the letter 'T'
 
Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contest
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2
 
Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2
 
Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1Genome assembly: then and now — v1.1
Genome assembly: then and now — v1.1
 
Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1Genome assembly: then and now — with notes — v1.1
Genome assembly: then and now — with notes — v1.1
 
What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?What's in a name? Better vocabularies = better bioinformatics?
What's in a name? Better vocabularies = better bioinformatics?
 
The art of good science writing
The art of good science writingThe art of good science writing
The art of good science writing
 
Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0Genome assembly: then and now — v1.0
Genome assembly: then and now — v1.0
 
Polish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slidesPolish that presentation! 25 tips to bring clarity to your slides
Polish that presentation! 25 tips to bring clarity to your slides
 
10 tips for adding polish to presentations
10 tips for adding polish to presentations10 tips for adding polish to presentations
10 tips for adding polish to presentations
 
Database talk for Bits & Bites meeting
Database talk for Bits & Bites meetingDatabase talk for Bits & Bites meeting
Database talk for Bits & Bites meeting
 
Benchmarking short-read mapping programs
Benchmarking short-read mapping programsBenchmarking short-read mapping programs
Benchmarking short-read mapping programs
 
Thoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore TechnologiesThoughts on the recent announcements by Oxford Nanopore Technologies
Thoughts on the recent announcements by Oxford Nanopore Technologies
 
When is a genome finished?
When is a genome finished? When is a genome finished?
When is a genome finished?
 
Twitter 101 - an introduction to Twitter
Twitter 101  - an introduction to TwitterTwitter 101  - an introduction to Twitter
Twitter 101 - an introduction to Twitter
 

Recently uploaded

Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 

Recently uploaded (20)

Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 

This bioinformatics lesson is brought to you by the letter 'D'

  • 1. Today's bioinformatics lesson is brought to you by the letter 'D' by Keith Bradnam Image from flickr.com/91619273@N00/ Today'sbloinformatieslesson isbroughttoyoubytheletter101 Imagefromflickr.com/91619273©NO0/
  • 2. D is for Default parametersisforDefaultparameters
  • 3. D is also for Danger!isalsoforDanger!
  • 4.
  • 5. about0,91-6-1?proelootspleasefrtaltusat415,1v.ostet:co/?7 caiwetalatlal7soriyourpateliaseofanostz-Rttoaster/Toleatt?%re bralleltaclones a b e t efectaadocoMPlaererostadotaOSIZ.R*l PapaapreovetozrlssabrelosprocluctosaeOSTER',visite/7ospottavoteo ivit*ostet:coto.
  • 6. X
  • 7. X Nobody reads a toaster manual!Nobodyreadsatoastermanual!
  • 8.
  • 9. But everyone would read a manual for thisButeveryonewouldreadamanualforthis , - - r ----------- opts....tos
  • 10. Bioinformatics programs are not toasters! ANL Bloinformaticsprogramsarenottoasters! -EL
  • 12. At least, read *some* of the manualAtleast,read*some*ofthemanual
  • 13. TIEBOW Bowtie Anultrafestmemory-efficientshortreedaligner OHNSHOPKINS U N I V E R S I T Y Bowtle isanultrafast, memory-efficientshortreadaligner. It alignsshortDNAsequences(reads) to thehumangenomeat arate of over25million 35-bpreadsperhour.Bowtieindexesthegenomewith aBurrows-Wheelerindex tokeep itsmemoryfootprint small: typically about2.2GBfor thehuman genome(2.9GBfor paired-end). OSIcertified Recentnews "Lighterreleased OLighter isanextremely fastandmemory-efficientprogramfor correctingsequencingerrors inDNAsequencingdata.Fordetailson howerror correctioncanhelpimprovethespeedandaccuracy of downstreamanalysistools,seethepaperinGenomeBiology. Sourceandsoftwareavailable atGitHub. "1.1.1-101112014 OFixed acompilinglinkageproblemrelated withMacOSXMavericks. OImprovedperformance forcaseswherethereferencecontainsmany stretches ofNs. OSome minorautomatictestsupdates. 1.1.0-7/19/2014 OAdded support for largeandsmallindexes,removing4-billion- nucleotidebarrier.Bowtiecannowbeusedwithreferencegenomes ofanysize ONo longerreleasing 32-bit binaries.SimplifiedmanualandMakefile accordingly. OPhased outCygWinsupport. OImproved efficiency ofindexfilesloading. OFixed abug thatmadebowtic-inspecz fail insomesituations. O(This releasewasbrieflygivenversionnumber2.0.0, butwe changed it to 1.1.0 to avoidconfusionwithBowtie 2.) 1.0.1release-3/1412014 bowie-bio_sourceforge.ne: SiteMap Home Newsarchive Gettingstarted Manual ToolsthatuseBowtie LatestRelease Bowtie1.1.1 1 0 / 1 / 1 4 Pleasecite.Langmead8,TrapnellCoPopM,Salzberg Ultraastancmemory-efficientalignmentofshot DNAsequencestothenumangenome.GenomeEltol 10:1125. Forreleaseupdates,subscribetothemailinglist. relatedTools Bowtie2: Fast,accuratereadalignment Crossbow:Genotyping,cloudcomputing Tophat:RNA-Seqsplicejunctionmapper Cufflinks:Isoformassembly,quantitation Myrna:Cloud,differentialgeneexpression Lighter:Fasterrorcorrection OthertoolsusingBowtie Pre-builtindexes Considerusing Illumina'siGenomes collection.EachiGenomesarchivecontains pre-builtBowtieandBowtie2indexes. H.sapiens, NCBIGRCh38 2 . 7 GB
  • 14. How to use bowtie bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>] Howtousebowtie bowtie [options]* <ebwt> 1-1 <ml> -2 <m2>1 --12 <r> 1 <s>1 [<hit>]
  • 15. How to use bowtie bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>] Howtousebowtie bowtie [options]* <ebwt> 1-1 <ml> -2 <m2>1 --12 <r> 1 <s>1 [<hit>]
  • 16. bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>] Bowtie has a lot of options!Bowtiehasalotofoptions!
  • 17. Thequeryinputfiles(specifiedeitheras<m1>and<m2>,oras<s>)areFASTQfiles(usuallyhavingextension • fq or , fastg). Thisisthedefault.Seealso: --solexa-quais and --integer-quals. Thequeryinputfiles(specifiedeitheras<mi>and<m2>,oras<s>)areFASTAfiles(usuallyhavingextension fa, .mfa, fna orsimilar).All qualityvaluesareassumedtobe40onthePhredqualityscale. -r T h e queryinputfiles(specifiedeitheras<rni>and<m2>,oras<s>)areRawfiles:onesequenceperline,withoutqualityvalues ornames.All qualityvaluesareassumedtobe40onthePhredqualityscale. -c T h e querysequencesaregivenoncommandline. I.e.<ml>,<m2>and<singles>arecomma-separatedlists ofreadsrather thanlists ofreadfiles. -C/--color A l i g n incolorspace.Readcharactersareinterpretedascolors.Theindexspecifiedmustbeacolorspaceindex(i.e. built with bowtie-build -C,or bowtie will printanerrormessageandquit.SeeColorspacealignmentformoredetails. -Qt--quals <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingunpairedCSFASTAreads.Useincombinationwith -c and-t. --integer-quais is setautomaticallywhen-Q/--guals isspecified. --Q1 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#1.mates.Useincombinationwith-C, -f, and-1. --integer-quals issetautomaticallywhen--Q1isspecified. --Q2 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#2mates.Useincombinationwith-C, -f, and-2. --integer-quals issetautomaticallywhen--Q2isspecified. -s/--skip <int> S k i p (i.e.donotalign)the first <int>readsorpairsintheinput. -u/--qupto <int> O n l y alignthe first <int>readsorreadpairsfromtheinput (afterthe -s/--skip readsorpairshavebeenskipped).Default: nolimit. -51--trim5 <int> T r i m <int>basesfromhigh-quality(left)endofeachreadbeforealignment(default:0). -3/--trim3 <int> T r i m <int>basesfromlow-quality (right)endofeachreadbeforealignment(default: 0). --phred33-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus33.Default:on. --phred64-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus64.Default: off. --solexa-quals C o n v e r t inputqualitiesfromSolexa(whichcanbenegative)toPhred(whichcan't).Thisisusuallytherightoptionforusewith (unconverted)readsemittedbyGAPipelineversionspriorto1.3.Default: off. --solexa1.3-quals S a m e as--phred64-quals.Thisisusuallythe rightoption forusewith(unconverted)readsemittedbyGAPipelineversion1.3 orlater.Default: off. --integer-quals Qualityvaluesarerepresentedinthereadinput fileasspace-separatedASCIIintegers,e.g.,4040 30 40-, ratherthanASCII characters,e.g., I n t e g e r s aretreatedasbeingonthePhredqualityscaleunless--s01.exa-quals isalsospecified.
  • 18. -k <int> -m<int> -M <int> --best Reportupto<int>validalignmentsperreadorpair(default:1).Validityofalignmentsisdeterminedbythealignmentpolicy(combined effectsof-n, -v, -1,and-e). Ifmorethanonevalidalignmentexistsandthe--bestand--strataoptionsarespecified,thenonlythose alignmentsbelongingtothebestalignment'stratum"willbereported.Bowtieisdesignedtobeveryfastforsmall-kbutbowtiecan becomesignificantlysloweras-kincreases.IfyouwouldliketouseBowtieforlargervaluesof considerbuildinganindexwitha densersuffix-arraysample,i.e.specifyasmaller-ot—offratewheninvokingbowtie-buildfortherelevantindex(seethePerformance tuningsectionfordetails). -a/--all Report allvalidalignmentsperreadorpair(default:off).Validityofalignmentsisdeterminedbythealignmentpolicy(combinedeffectsof -n,-v, -1,and-e). Ifmorethanonevalidalignmentexistsandthe--bestand--strataoptionsarespecified,thenonlythose alignmentsbelongingtothebestalignment"stratum"willbereported.Bowtieisdesignedtobeveryfastforsmall-kbutbowtiecan becomesignificantlyslowerif -a/--all isspecified.IfyouwouldliketouseBowtiewith-a,considerbuildinganindexwithadensersuffix- arraysample,i.e.specifyasmaller-oi—offratewheninvokingbowtie-buildfortherelevantindex(seethePerformancetuningsection fordetails). Suppressallalignmentsforaparticularreadorpair ifmorethan <int> reportablealignmentsexistfor it.Reportablealignmentsarethose thatwouldbereportedgiventhe -n, -v, -1, -e, -k, -a, --best,and --strata options.Default:nolimit.Bowtieisdesignedtobeveryfast forsmall-mbutbowtiecanbecomesignificantlyslowerforlargervaluesof-in. IfyouwouldliketouseBowtieforlargervaluesof-k, considerbuildinganindexwithadensersuffix-arraysample,i.e.specifyasmaller-0/--offratewheninvokingbowtie-buildforthe relevantindex(seethePerformancetuningsectionfordetails). Behaveslike-raexceptthatifareadhasmorethan<int>reportablealignments,oneisreportedatrandom.Indefaultoutputmode,the selectedalignment's7thcolumnissetto<int>-1-1toindicatethereadhasatleast<int>+1validalignments.In -S/--sammode,the selectedalignmentisgivenaMAPQ(mappingquality)of0andthexm:ifieldissetto<int>4-1.Thisoptionrequires--best; ifspecified without--best, --bestisenabledautomatically. MakeBowtieguaranteethatreportedsingletonalignmentsare"best"intermsofstratum(i.e.numberofmismatches,ormismatchesin theseedinthecaseof-r_mode)andintermsofthequalityvaluesatthemismatchedposition(s).Stratumalwaystrumpsquality;e.g.a 1-mismatchalignmentwherethemismatchedpositionhasPhredquality40ispreferredovera2-mismatchalignmentwherethe mismatchedpositionsbothhavePhredquality10.When--bestisnotspecified,Bowtiemayreportalignmentsthataresub-optimalin termsofstratumand/orquality(thoughaneffortismadetoreportthebestalignment).--bes7_modealsoremovesallstrandbias.Note that --bestdoesnotaffectwhichalignmentsareconsidered"valid"bybowtie,onlywhichvalidalignmentsarereportedbyboTertie.When --best isspecifiedandmultiplehitsareallowed(via -k or -a), thealignmentsforagivenreadareguaranteedtoappearinbest-to-worst orderinbewtie'soutput.bowtie issomewhatslowerwhen--best isspecified. --strata I f manyvalidalignmentsexistandarereportable(e.g.arenotdisallowedviathe -k option)andtheyfall intomorethanonealignment "stratum",reportonlythosealignmentsthatfallintothebeststratum.Bydefault,Bowtiereportsallreportablealignmentsregardlessof whethertheyfallintomultiplestrata.When--strata isspecified,--bestmustalsobespecified.
  • 19. -v <int> R e p o r t alignmentswithatmost<int>mismatches.-0and-1optionsareignoredandqualityvalueshavenoeffectonwhat alignmentsarevalid.-v ismutuallyexclusivewith-n. -n/--seedmms<int> Maximum numberofmismatchespermittedinthe"seed",i.e.thefirstLbasepairsoftheread(whereLissetwith -1/-- seedien).Thismaybe0,1, 2or3andthedefaultis2.Thisoptionismutuallyexclusivewiththe -voption. -ef--magerr <int> Maximum permittedtotalofqualityvaluesatallmismatchedreadpositionsthroughouttheentirealignment,notjustinthe "seed".Thedefaultis70.LikeMaq,Dow-tieroundsqualityvaluestothenearest10andsaturatesat30;roundingcanbe disabled with --nomaground. -1/--seedien <int> --nomaground -I/--minins <int> -X/--maxins <int> --nofw/--norc The"seedlength";i.e.,thenumberofbasesonthehigh-qualityendofthereadtowhichthe-nceilingapplies.Thelowest permittedsettingis5andthedefaultis28.bowtieisfasterforlargervaluesof MaqacceptsqualityvaluesinthePhredqualityscale,butinternallyroundsvaluestothenearest10,withamaximumof30.By default,bowtiealsoroundsthisway.--nomagrouncipreventsthisroundinginbowtie. Theminimuminsertsizeforvalidpaired-endalignments.E.g.if -I 60isspecifiedandapaired-endalignmentconsistsoftwo 20-bpalignmentsintheappropriateorientationwitha20-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas -xisalsosatisfied).A19-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or-!,;arealsoused,the constraint isappliedwithrespecttotheuntrimmedmates.Default:O. Themaximuminsertsizeforvalidpaired-endalignments.E.g.if -x100isspecifiedandapaired-endalignmentconsistsoftwo 20-bpalignmentsintheproperorientationwitha60-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas -I is alsosatisfied).A61-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or -5arealsoused,the -xconstraintis appliedwithrespecttotheuntrimmedmates,notthetrimmedmates.Default:250. Theupstream/downstreammateorientationsforavalidpaired-endalignmentagainsttheforwardreferencestrand.E.g.,if -- fr isspecifiedandthereisacandidatepaired-endalignmentwherematelappearsupstreamofthereversecomplementof mate2andtheinsertlengthconstraintsaremet,thatalignmentisvalid.Also,ifmate2appearsupstreamofthereverse complementofmatelandallotherconstraintsaremet,thattooisvalid. --rf likewiserequiresthatanupstreammatelbe reverse-complementedandadownstreammate2beforward-oriented. --ff requiresbothanupstreammatelanda downstreammate2tobeforward-oriented.Default: --fr when-C(colorspacealignment)isnotspecified, --ff when-Cis specified. If --nowisspecified,bowtiewillnotattempttoalignagainsttheforwardreferencestrand. If --nort isspecified,bowtiewill notattempttoalignagainstthereverse-complementreferencestrand.Forpaired-endreadsusing --fr or --rf modes,--nofIsT and--norcapplytotheforwardandreverse-complementpairorientations.I.e.specifying--nofwand--±r willonlyfindreads intheR/Forientationwheremate2occursupstreamofmate1withrespecttotheforwardreferencestrand. --maxbts T h e maximumnumberofbacktrackspermittedwhenaligningareadin 2 or-n3mode(default:125without--best,800 with--best).A"backtrack"istheintroductionofaspeculativesubstitutionintothealignment.Withoutthislimit,thedefault
  • 20. Printtheamountofwall-clocktimetakenbyeachphase. -V--offbase <int> When outputtingalignmentsinBowtieformat,considerthefirstbaseofareferencesequencetohaveoffset<int>.Thisoption hasnoeffectin-si—salamode,sinceSAMmandates1-basedoffsets.Default:O. --quiet P r i n t nothingbesidesalignments. --refout --al <filename> --un <filename> --max <filename> --suppress <cols> --fullref WritealignmentstoasetoffilesnamedrefXXXXX.map,wherexxxXXisthe0-paddedindexofthereferencesequencealigned to.Thiscanbeausefulwaytobreakupworkfordownstreamanalyseswhendealingwith,forexample,largenumbersofreads alignedtotheassembledhumangenome.If <hits>isalsospecified,itwillbeignored. --refidx W h e n areferencesequenceisreferredtoinareportedalignment,refertoitby0-basedindex(itsoffsetintothelistof referencesthatwereindexed)ratherthanbyname. Writeallreadsforwhichatleastonealignmentwasreportedtoafilewithname<filename>.Writtenreadswillappearasthey didintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinbowtie.Paired- endreadswillbewrittentotwoparallelfileswith_1andinserted inthefilename,e.g.,if <filename>isaligned.fq,the#1 andIt2matesthatalignatleastoncewillbewrittentoaligned_l.fqandaligned_2.fa_respectively. Writeallreadsthatcouldnotbealignedtoafilewithname<filename>.Writtenreadswillappearastheydidintheinput, withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinBowtie.Paired-endreadswillbe writtentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if<filename>isunaligned.fq,the#1and#2mates thatfailtoalignwillbewrittentounaligned_l fo andunaligned_2 q respectively.Unless--maxisalsospecified,readswith anumberofvalidalignmentsexceedingthelimitsetwiththe-moptionarealsowrittento<filenane>. Writeallreadswithanumberofvalidalignmentsexceedingthelimitsetwiththe-moptiontoafilewithname<filename>. Writtenreadswillappearastheydidintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhave takenplacewithin•zowtie.Paired-endreadswillbewrittentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if <filename>ismax.fa,the#1and#2matesthatexceedthe-mlimitwillbewrittentomax_1.fqandmax_2.fqrespectively. Thesereadsarenotwrittentothefilespecifiedwith--lart. Suppresscolumnsofoutputinthedefaultoutputmode.E.g.if--suppress 1, 5,6isspecified,thereadname,readsequence, andreadqualityfieldswillbeomitted.SeeDefaultBowtieoutputforfielddescriptions.Thisoptionisignored if theoutput modeis-S/--sarr.. Printthefullreferncesequencename,includingwhitespace,inalignmentoutput.Bydefaultbowtieprintseverythinguptobut notincludingthe firstwhitespace.
  • 21. Colorspace --snpphred <int> --snpfrac <dec> --col-cseq --col-equal --col-keepends SAM -S/--sam Whendecodingcolorspacealignments,use <int> astheSNPpenalty.Thisshouldbesetto theuser'sbestguessof thetrue ratio ofSNPsperbasein thesubjectgenome,converted to thePhredqualityscale.E.g., if theuserexpectsabout1SNPevery1,000 positions,--snpphredshouldbeset to30(whichisalsothedefault).Tospecifythefractiondirectly,use --snpfrac. Whendecodingcolorspacealignments,use<dot>astheestimatedratio ofSNPsperbase.Forbestdecodingresults, thisshould beset to theuser'sbestguessof thetrue ratio. bowtie internallyconvertsthe ratio toaPhredquality,andbehavesas if that qualityhadbeensetviathe--zinpphredoption.Default:0.001. Ifreadsareincolorspaceandthe defaultoutputmodeisactive, --col-cseq causesthereads'colorsequencetoappearinthe read-sequencecolumn(column5)instead of thedecodednucleotidesequence.SeetheDecodingcolorspacealignmentssection fordetailsaboutdecoding.Thisoptionisignoredin -s/--sammode. Ifreadsareincolorspaceandthedefaultoutputmodeisactive,--col-cguaicausesthereadsoriginal(color)qualitysequence toappearinthe qualitycolumn(column6)instead of thedecodedqualities.SeetheColorspacealignmentsectionfor details aboutdecoding.Thisoptionisignoredin-S1--sarrimode. Whendecodingcolorpsacealignments,bowtie trims offanucleotideandqualityfromthe leftandrightedgesofthealignment. Thisisbecausethosenucleotidesaresupportedbyonlyonecolor,in contrasttothemiddlenucleotideswhicharesupportedby two.Specify--col-keepends tokeeptheextreme-endnucleotidesandqualities. PrintalignmentsinSAMformat.SeetheSAMoutputsectionofthemanualfordetails.TosuppressallSAMheaders,use--sam- noheadinaddition to -S/--sam.Tosuppressjust the headers (e.g. if thealignmentisagainstaverylargenumberofreference sequences),use--sam-nosqinaddition to -S/--sam. bowtiedoesnot writeBAMfilesdirectly, butSAMoutputcanbeconvertedto BAMonthe flybypiping•DowtielSoutput tosamtools view. -Si—sarnisnotcompatiblewith --refout. --mapo<int> I f analignmentisnon-repetitive(accordingto-m,--strataandotheroptions)settheMAPQ(mappingquality)fieldtothisvalue. SeetheSAMSpecfordetailsabouttheMAK,fieldDefault:255. --sam-hohead S u p p r e s s headerlines(starting with@)whenoutputis-S/--sarr..Thismustbespecifiedinadditionto -S/--sam.--sam-noheadis ignoredunless-s/--sarr. isalsospecified. --sam-hosq S u p p r e s s 1S0headerlineswhenoutputis--Si—sam.Thismustbespecifiedinaddition to -S/--sam.--sam-hosqisignoredunless -sj--sam isalsospecified. --sam-RG<text> A d d <text> (usually of theformTAG:VAL,e.g.ID:IL-1LANE2)asafieldonthe2:RGheaderline.Specify--sam-RGmultipletimesto setmultiplefields.SeetheSAMSpecfordetailsaboutwhatfieldsarelegal.Notethat, if any@RGfieldsaresetusingthisoption, theIDandSMfieldsmustbothbeamongthemtomakethegRGlinelegalaccordingto theSAMSpec.--sari-RGisignoredunless -
  • 22. Performance -of—offrate <int> -pi—threads <int> --mm --shmem Other Overridetheoffrate oftheindexwith <int>. If <int> isgreaterthantheoffrateusedtobuildtheindex,thensomerow markingsarediscardedwhentheindexisreadintomemory.Thisreducesthememoryfootprintofthealignerbutrequires moretimetocalculatetextoffsets. <int> mustbegreaterthanthevalueusedtobuildtheindex. Launch<in':>parallelsearchthreads(default: 1).Threadswillrunonseparateprocessors/coresandsynchronizewhenparsing readsandoutputtingalignments.Searchingforalignmentsishighlyparallel,andspeedupisfairlyclosetolinear.Thisoptionis onlyavailable if b,owtieislinkedwiththeothreadslibrary(i.e. ifBOVIIE_PTHREADS=0isnotspecifiedatbuildtime). Usememory-mappedI/O toloadtheIndex,ratherthannormalCfileI/O.Memory-mappingtheindexallowsmanyconcurrent bowtioprocessesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverhead justonce).Thisfacilitatesmemory-efficientparallelizationofbowtieInsituationswhereusing-p isnotpossible. Usesharedmemorytoloadtheindex,ratherthannormalCfileI/O.Usingsharedmemoryallowsmanyconcurrentbowtie processesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverheadjustonce). Thisfacilitatesmemory-efficientparallelizationofbowtieinsituationswhereusing-p isnotdesirable.Unlike--mm,--shnem installstheindexintosharedmemorypermanently,or untiltheuserdeletesthesharedmemorychunksmanually.Seeyour operatingsystemdocumentationfordetailsonhowtomanuallylistandremovesharedmemorychunks(onLinuxandMacOS X,thesecommandsareipcsandipcm).YoumayalsoneedtoincreaseyourOS'smaximumshared-memorychunksizeto accomodatelargerindexes;seeyourOSdocumentation. --seed <int> U s e <int>astheseedforpseudo-randomnumbergenerator. --verbose P r i n t verboseoutput(fordebugging). --version P r i n t versioninformationandquit. -hi—help P r i n t usageinformationandquit.
  • 24. "I'll just use the default parameters!""I'lljustusethedefaultparameters!"
  • 25. "What could go wrong?""Whatcouldgowrong?"
  • 26. First, some terminology… Read 1 Read 2 'Insert' inner-mate pair distance DNA/RNA Fragmentadapter adapteradapter First,someterminology... DNA/RNAFragment adapter Read1 inner-matepairdistance 'Insert' Read2
  • 27. We can plot the distribution of inner mate pair distances Wecanplotthedistribution ofinnermatepairdistances
  • 28. ReadsmappedtoTranscriptomewithBowtie2 200 4 0 0 6 0 0 8 0 0 Innersizebetweenmappedreadpairs
  • 30. Bowtie 2 has an -X option for 'max fragment length' The default value is 500 bp = 100 + 100 + 300 What happens if we increase -X to 2000 bp? Bowtie2hasan-Xoption for'maxfragmentlength' Thedefaultvalueis500bp =100+100+300 Whathappensifwe increase-Xto2000bp?
  • 31. New data! c7, 0 2 0 0 ReadsmappedtoTranscriptomewithBowtie2 1 Newdata! 11111r1n1Ithimin Innersizebetweenmappedreadpairs 400 6 0 0 8 0 0
  • 32. Most programs will have some options that you should consider changing Mostprogramswillhavesomeoptions thatyoushouldconsiderchanging
  • 33. Some options from TopHat TopHat command-line option Meaning Default value --num-threads How many CPU threads to use when running TopHat 1 --min-intron-length Minimum intron length 70 -r / --mate-inner-dist Expected (mean) inner distance between mate pairs 50 --mate-std-dev Standard deviation for the distribution on inner distances 20 SomeoptionsfromTopHat 1WTopHat command-lineoption Meaning Default value --num-threads HowmanyCPUthreadsto usewhenrunningTopHat 1
  • 34. You nearly always can run with more processors/threads than the default (1) Younearlyalwayscanrunwithmore processors/threadsthanthedefault(1)
  • 35. Some options from TopHat TopHat command-line option Meaning Default value --num-threads How many CPU threads to use when running TopHat 1 --min-intron-length Minimum intron length 70 -r / --mate-inner-dist Expected (mean) inner distance between mate pairs 50 --mate-std-dev Standard deviation for the distribution on inner distances 20 SomeoptionsfromTopHat 1WTopHat command-lineoption Meaning Default value --num-threads HowmanyCPUthreadsto usewhenrunningTopHat 1 --min-intron-lengthMinimumintronlength 7 0
  • 36. This might not be suitable for non-vertebratesThismightnotbesuitablefornonvertebrates
  • 38. You should document your efforts!Youshoulddocumentyourefforts!
  • 39. You should document as you goYoushoulddocumentasyougo
  • 40. 1iortt,peke),,,v, 4Z-c;(>t_t' 17:11 tresP L LA-r,,oc-nrt t Lek (20tf-1-*)1re-4, (3,31 - or- 1?-1.,tokos•,,,,,4 Rool1,11-- 12ProN rc RIcA.046 -AccAlitipow) Pr5TPlowe opoy.•)&!). t r Aer-od %Pt,' • •••••- t'vsa ( ( A t %44.--5 0-F 6 Co% c - (tcei L0pAr COV Pv-0 t ) ‘_ 5i et-f)triz. a c et 0 ) Loe_ev-T,t,voS otir•I'L re e? (c• ctfte,4411, rev6-1esTmok•as, kJ,So&ISIFir,at- 05.771 - efive,:t •••1 _ 0V(t,d2(sty LI5 IP(112,'A lActi - r r e r Cc. 5 4 , e c e 14(,,r5 ,Ferv'A4t r 3L. PSAe e V O I N 4 or-ti-t1/4 etec.vt, 6,11,2) e ot%Ps. C E-4• a t . t , LAJo-oevw'_1__ s v l y, - 1 c/94,er Ttrei-4/. _ (3 eos-1- r . v c-rf Pi410 /4,5 ok.1 t%'t / e n AY) 6"-',00•4) 6•, O f f t ) s s . .SV.Cte, (v-11re,iYVIte%kV%) ,,,,,e1r,C rot; C : ) r pu ) 4 61 Cke,4tteV c•At r a t tocr CT') kJsIlk); 0 1 ; , 4 - S P -h0460w b l ' " " re.4,ki,6- 1 S t . e12-",r,POT cve,e3e4crey "egg,40LS-TAN oPcaftle - r t , *ger o-yreleve _ A t , 1 7 . 7 L t 4-,5 F a t 1 1 0 PI21rA, ke, i 2 eleta I:3.0ex4itiv a , - Pp v . / )•% ) 1 a rient or- g,vA.) e s - $ • T4P ••••• _ 3 covy L a _ Tsres )ti7t-eltri —1TO r e (Drrn)t-iPM-N4C Arc- CA-13 rrsArtf..tg-S6,„tid—toAtt- e1244-r (0;:t tn.) •••,trttrteT—.1.t.,,IA 1 (..•Oren t irtra t4..c..ec4 IAI/oe4 ( I N T cyt _ tu.s.+-• ST ft:,S i42the4.1st2_7 IRO Ft-ca*.$ nit.P0-6
  • 41. Lab books are good… tiortt,peke),,,v, 17-6,Nos•••,,,.4 PsIcA.0*s6.._111Q• -kce.iitoow) RIN'YrIl -1---- — T c p - efIc•I:rt _ 3 44,elk, -r 4rer -:-1) re-v6-1esTA.Was, U_Si)S IFINit( OS:71—C plAlt " 1 n r • t e . . . n c , 1:Y4'k.. r e PIONCti (C•Nitt. C I E tot, L wkoD°'6'.%L• 54,ec e 14e.m-, r a 1E4--iter,'A‘tr 1 ) PsAee (vo f;) wtE ct.c1,..J) 1160411,pt„,„ 1 V)1.N„. 41.-ettneT—.1„.„;1,1 abbooksaregood... L • •11 • • 1 • - • •nrt (20•- • - • • • • — • - •• • • ••• - opoysn&i) 4._(3,3-, 0 r- r FpaL ( f A ( 0 5 - F 6 col6,1 c - L/tors E 0 P v - 0 s p . ) _ P v i z , 5 i e r •-t• 1 2 0 et 0 ) Weaft-Tt0,1 S , A r c , 6 ( el2-• ittg, LSTAN Or:- L/ ( r , 1 7 2 - 1 ) , oex ' L k - P p 1 . 1 1 ) ) ) 1 a 1 f l I 1/1/1,-1_1w4,12,'A •5) eltri OPco2at1/4.) P'1't F a t * - or- "••••••-) a. Qr44-19— v1,6A*5z•- *gerE-yrn,J,6_1 PQtrAl / 2 e Cer7 / rvs-1,- It 4 tA>11)*-, it-N4C 1244--r (or'sesn IAI/to'NNI (INT cyt V I - , E WA ' nii.P0-6
  • 42. …but electronic lab books are more helpful few,":-) 1iortt,peke),,,v, 17-1-Nos•et,4 RIcA.e*s6 -kce.tiovw) ROVNI-11 -1----- — 51NA _ L 1:3"kk- r e x? IONCti (C•Nist. ClEtoc., PsAee v —ef4c--;rtt (Les-$ Tcp t - r 4rer -z1) 01"14 i1/4JitsTA.LIVai1241 - C 5-0Zc e k.,041,,,,r, 1 ) (vo ,;) e ct.cro,) _ f i c tert-t'4(-1 r 416041 4 , 1 1 V)I d_et41 - butelectroniclabbooksaremorehelpful L LA-1--,oc.•nrt tve7).1re-4, (3,310ife--or- (//k./ 4 L t e l l " - S e P p A r S P i t Pit$ rc • ••••- opoysn&i) F C014:140 *C- C0ev—o1)_%--tvs") 5ieor-(3,11-0.'" a° ec.,-) t w o s ittg. LSTphekje Or- ,/71-111kA est- I:100v4lt-Ae'Lk- pp 1 . 1 1 ) ) ) 1 1 4 t (1:1 "•5 co2at1/4.) ci 17-7,E,Fat •g,cv,rOrtor>y or- 904---, aver to,Vreteve6-1 Pi2trA,Ike,t, /2e/ta mc.A It71- -1 b C LI7 / rr.,1-1-t 4 a>11n),-,P11-N4cAgzAwk-o_p_ Qr44-19- e12-44--r ratorNN.)eiNtlcyt nit.P0-6
  • 45. Tools like Microsoft Word might not be future proof ToolslikeMicrosoftWord mightnotbefutureproof
  • 46. Consider using plain text filesConsiderusingplaintextfiles
  • 47. I.e. something that can be read using 'less'Lasomethingthatcanbereadusing'less'
  • 48. I like to write README files in Markdown format for everything Milk-DNase-Seq-Project:RNA-SeqAnalyis --- - - - Seemain,READMErl'ADME.md) file for moreinformation about this project. 4*BovineRNA-seqdata ## Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100 bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/ share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows. Theultimate goal is to find genes that are differentially expressedbetween thesetwodevelopmentalstages. Thesefiles were originallycompressedwith bzip2, will re-compress with gzip sothat existing pipelines canwork with them.And will alsorenamethem to havefastq suffix: —bash cdishare/tamu/Data/RNA-Seq/Cow/2014-10 bunzip2*.bz2 rename.pl sitxt/fastq/ *.txt gzip *.fastq tttCheckingbarcodes inRNA-Seqdata ## Let'scheckon all barcodesbeingused. Will makesomesoft links to the data files "'bash cdAnalysis/Test mkdirRNA-Seq_Barcode_check cdRNA-Seq_Barcode_check qlogin bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.* [12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe needto treat each file separately. #TestrunofScytheandSickle# IliketowriteREADMEfilesinMarkdownformatforeverything
  • 49. Easy to output to HTML or PDF Milk-DNase-Seq-Project:RNA-SeqAnalyis --- - - - Seemain,READMErl'ADME.md) file for moreinformation about this project. 4*BovineRNA-seqdata ## Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100 bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/ share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows. Theultimate goal is to find genes that are differentially expressedbetween thesetwodevelopmentalstages. Thesefiles were originallycompressedwith bzip2, will re-compress with gzip sothat existing pipelines canwork with them.And will alsorenamethem to havefastq suffix: —bash cdishare/tamu/Data/RNA-5eq/Cow/2014-10 bunzip2*.bz2 rename.pl sitxt/fastq/ *.txt gzip *.fastq tttCheckingbarcodes inRNA-Seqdata ## 1 is/Test Let'scheckon all barcodesbeingused. Will makesomesoft links to the data Barcodecheck files q Barcode_check "'bash cdAnalysis/Test mkdirRNA-Seq_Barcode_check cdRNA-Seq_Barcode_check qlogin bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.* [12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe needto treat each file separately. #TestrunofScytheandSickle# Milk-DNase-Seq-Project:RNA-SeqAnalyis SeemainREADMEfileformoreinformationaboutthisproject. BovineRNA-secidata Storedin/share/tamu/Data/RNA-Seq/Cow/2014-10Lookslikepaired-read100bpdata.Intotal31x2files,rangingfrom1-3.5 GBinsize.Seealsotheishareitamo/Data/RNA-Seq/Cow/Metadatadirectorywhichcontainsametadatafilewhichsuggeststhat wehavedatafrom15virgincowsand16lacatingcows. Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages. Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwiththem.Andwill alsorenamethemtohavefastqsuffix: cd/share/tamu/Data/RNA-Seq/Cow/2014-10 bunzip2*.b22 rename.pl s/txt/fastq/ *.txt gzip *.fastq CheckingbarcodesinRNA-Seqdata et'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles ogin bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz2 1 grep "MI" 1 sed 's/.. f121:N:0://' 1 sort 1 unlq -c > UnfortunatelythisfailedduetoaNospaceleftondeviceerror'.Somaybeneedtotreateachfileseparately. TestrunofScytheandSickle UnliketheDNase-Segdata.wenowhavepaired-enddata,whichrequiresrunningSicklealittledifferently.Sofirst,let'sdoatest (using10.000readsfromeachoftwopairedFAST()files): cdishare/tamu/Analysis/Test mkdirPaired_end_seythe_sickle test EasytooutputtoHIM_orPDF
  • 50. http://korflab.ucdavis.edu/bootcamp.md http://korflab.ucdavis.edu/bootcamp.html Markdown is easy to read, and converts to useful HTML (with hyperlinks and formatting) http://kortlabiucdavis.edu/bootcamp.md http://kortlabiucdavis.edu/bootcampihtml Markdowniseasytoread,andconvertsto usefulHIM_(withhyperlinksandformatting)
  • 51. Title: Command-line Bootcamp Authors: Keith Bradnam Date: 2015-06-14 Address:Genome Center, UC Davis, Davis, CA, 95616 #Command-line Bootcamp ### Keith Bradnam ###UC Davis Genome Center #10 Version 1.0 - - - June 2015 <br><br><br> ><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img ait="Creative CommonsLicense" style-"border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0188x31.png" /></a><br />This work is licensed under a <a rel-"license" href-"http://creativecommons.org/licenses/by-nc- sa/4.0/">CreativeCommonsAttribution-NonCommercial-ShareAlike 4.0 International License</a>. Please send feedback, questions, money, or abuse to <krbradnamquedavis.edu> Introduction [Introduction] This 'bootcamp' i s intended to provide the reader with a basic overview of essential Unix/Linux commands that will allow them to navigate a file system and move, copy, edit files. I t will also introduce a brief overview of some 'power' commands in Unix. ##Why Unix? [Why Unix] The [Unix operating system][Unix] has been around since 1969. Back then thing as a graphical user interface. You typed everything. I t mayseem a keyboard to issue commands today, but i t ' s much easier to automate keybo mouse tasks. There are several variants of Unix (including [Linux][Linux o u g differences do not matter much for most basic functions. [Unix]: http://en.wikipedia.org/wiki/Unix [Linux]: http://en.wikipedia.org/wiki/Linux Increasingly, the raw output of biological research exists as _in silico_ data, usually in the form of large text files. Unix is particularly suited to working with such files andhas several powerful (and flexible) commands that can process your data for you. The real strength of learning Unix is that most of these commands can be combined in an almost unlimited fashion. So i f you can learn just five Unix commands, you will be able to do a lot more than just five things. OfTypeset Conventions [Typeset] Command-line examples that you are meant to type into a terminal window will be shown_ Command-lineBootcamp KeithBradnam UCDavisGenomeCenter Version1.0—June2015 ThisworkislicensedunderaCreativeCommonsAttribution- NonCommercial-ShareAlike4.0InternationalLicense.Pleasesend feedback,questions,money,orabusetokrbradnamgucdavis.edu Introduction This'bootcampisintendedtoprovidethereaderwithabasicoverviewofessential Unix/Linuxcommandsthatwillallowthemtonavigateafilesystemandmove,copy, editfiles.Itwillalsointroduceabriefoverviewofsome'power'commandsinUnix. WhyUnix? TheUnixoperatingsystemhasbeenaroundsince1969.Backthentherewasno suchthingasagraphicaluserinterface.Youtypedeverything.itmayseemarchaicto useakeyboardtoissuecommandstoday,butitsmucheasiertoautomatekeyboard
  • 52. 0 This repositorySearch Explore Gist Blog Help k b r a d n a m 0 0 KorfLab/Milk-DNase-Seq-Project i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand 0 Watch cdAnalysis/Test mkdirRNA—Seq_Barcode_check cdRNA—Seq_Barcode_check qlogin *Star 0 V F o r k 0 i= koradnam3daysagoNewanalysisusingRtorunDEseq2 1_c::,tribiAtOr 317lines(213sloc)12.729kb R a w Blame History m Milk-DNase-Seq-Project:RNA-SeqAnalyis SeemainREADMEfileformoreinformationaboutthisprolect. BovineRNA-seqdata Storedin/shereitamu/Data/RNA-Seq/Cow/2014-1.0Lookslikepaired-read100bpdata.Intotal31x2files,ranging from1-3.5GBinsize.SeealsotheisharettamuiData/RNA-Seq/CowiMetadatadirectorywhichcontainsametadata filewhichsuggeststhatwehavedatafrom15virgincowsand16lacatingcows. Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages. Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwith them.Andwillalsorenamethemtohavefastesuffix: cdishare/tamu/Data/RNA—Seq/Cow/2014-10 bunzip2*.bz2 rename.pl sitxt/fastq/ *.txt gzip *.fastq CheckingbarcodesinRNA-Selldata Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles 1
  • 53. Sites like GitHub use Markdown ThisrepositorySearch 1 Explore Gist Blog Help kbradnam 0 0 1 KorfLab/Milk-DNase-Seq-Project i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand 0 Watch *Star 0 V F o r k 0 i= kbradnam3daysagoNewanalysisusingRtorunDEseq2 1 b 10r 317lines(213sloc)12.729kb RawBlame History I l m Milk-DNase-Seq-Project:RNA-SeqAnalyis SeemainREADMEfileformoreinformationaboutthisprolect. SiteslikeGitHubuseMarkdown bunzip2*.bz2 rename.pl sitxt/fastq/ *.txt gzip *.fastq CheckingbarcodesinRNA-Selldata Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles cdAnalysis/Test mkdirRNA-Seq_Barcode_check cdRNA-Seg_Barcode_check qlogin
  • 54. Reproducible science is important!Reproduciblescienceisimportant!
  • 55. Reviewers increasingly want more details regarding bioinformatics methods Reviewersincreasinglywantmore detailsregardingbloinformaticsmethods
  • 56. Make it easy to for others to follow your workMakeiteasytoforotherstofollowyourwork