Some tips on Unix and Bioinformatics. 'D' is for 'Default parameters', 'Danger', and 'Documentation'
This was a talk given at UC Davis on 15th June 2015 as part of a Bioinformatics Core teaching workshop.
Author: Keith Bradnam, Genome Center, UC Davis. This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
16. bowtie [options]* <ebwt> {-1 <m1> -2 <m2> | --12 <r> | <s>} [<hit>]
Bowtie has a lot of options!Bowtiehasalotofoptions!
17. Thequeryinputfiles(specifiedeitheras<m1>and<m2>,oras<s>)areFASTQfiles(usuallyhavingextension • fq or , fastg).
Thisisthedefault.Seealso: --solexa-quais and --integer-quals.
Thequeryinputfiles(specifiedeitheras<mi>and<m2>,oras<s>)areFASTAfiles(usuallyhavingextension fa, .mfa, fna
orsimilar).All qualityvaluesareassumedtobe40onthePhredqualityscale.
-r T h e queryinputfiles(specifiedeitheras<rni>and<m2>,oras<s>)areRawfiles:onesequenceperline,withoutqualityvalues
ornames.All qualityvaluesareassumedtobe40onthePhredqualityscale.
-c T h e querysequencesaregivenoncommandline. I.e.<ml>,<m2>and<singles>arecomma-separatedlists ofreadsrather
thanlists ofreadfiles.
-C/--color A l i g n incolorspace.Readcharactersareinterpretedascolors.Theindexspecifiedmustbeacolorspaceindex(i.e. built with
bowtie-build -C,or bowtie will printanerrormessageandquit.SeeColorspacealignmentformoredetails.
-Qt--quals <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingunpairedCSFASTAreads.Useincombinationwith -c
and-t. --integer-quais is setautomaticallywhen-Q/--guals isspecified.
--Q1 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#1.mates.Useincombinationwith-C, -f,
and-1. --integer-quals issetautomaticallywhen--Q1isspecified.
--Q2 <files> Comma-separated list offilescontainingqualityvaluesforcorrespondingCSFASTA#2mates.Useincombinationwith-C, -f,
and-2. --integer-quals issetautomaticallywhen--Q2isspecified.
-s/--skip <int> S k i p (i.e.donotalign)the first <int>readsorpairsintheinput.
-u/--qupto <int> O n l y alignthe first <int>readsorreadpairsfromtheinput (afterthe -s/--skip readsorpairshavebeenskipped).Default:
nolimit.
-51--trim5 <int> T r i m <int>basesfromhigh-quality(left)endofeachreadbeforealignment(default:0).
-3/--trim3 <int> T r i m <int>basesfromlow-quality (right)endofeachreadbeforealignment(default: 0).
--phred33-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus33.Default:on.
--phred64-quals I n p u t qualitiesareASCIIcharsequalto thePhredqualityplus64.Default: off.
--solexa-quals C o n v e r t inputqualitiesfromSolexa(whichcanbenegative)toPhred(whichcan't).Thisisusuallytherightoptionforusewith
(unconverted)readsemittedbyGAPipelineversionspriorto1.3.Default: off.
--solexa1.3-quals S a m e as--phred64-quals.Thisisusuallythe rightoption forusewith(unconverted)readsemittedbyGAPipelineversion1.3
orlater.Default: off.
--integer-quals Qualityvaluesarerepresentedinthereadinput fileasspace-separatedASCIIintegers,e.g.,4040 30 40-, ratherthanASCII
characters,e.g., I n t e g e r s aretreatedasbeingonthePhredqualityscaleunless--s01.exa-quals isalsospecified.
19. -v <int> R e p o r t alignmentswithatmost<int>mismatches.-0and-1optionsareignoredandqualityvalueshavenoeffectonwhat
alignmentsarevalid.-v ismutuallyexclusivewith-n.
-n/--seedmms<int> Maximum numberofmismatchespermittedinthe"seed",i.e.thefirstLbasepairsoftheread(whereLissetwith -1/--
seedien).Thismaybe0,1, 2or3andthedefaultis2.Thisoptionismutuallyexclusivewiththe -voption.
-ef--magerr <int> Maximum permittedtotalofqualityvaluesatallmismatchedreadpositionsthroughouttheentirealignment,notjustinthe
"seed".Thedefaultis70.LikeMaq,Dow-tieroundsqualityvaluestothenearest10andsaturatesat30;roundingcanbe
disabled with --nomaground.
-1/--seedien <int>
--nomaground
-I/--minins <int>
-X/--maxins <int>
--nofw/--norc
The"seedlength";i.e.,thenumberofbasesonthehigh-qualityendofthereadtowhichthe-nceilingapplies.Thelowest
permittedsettingis5andthedefaultis28.bowtieisfasterforlargervaluesof
MaqacceptsqualityvaluesinthePhredqualityscale,butinternallyroundsvaluestothenearest10,withamaximumof30.By
default,bowtiealsoroundsthisway.--nomagrouncipreventsthisroundinginbowtie.
Theminimuminsertsizeforvalidpaired-endalignments.E.g.if -I 60isspecifiedandapaired-endalignmentconsistsoftwo
20-bpalignmentsintheappropriateorientationwitha20-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas
-xisalsosatisfied).A19-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or-!,;arealsoused,the constraint
isappliedwithrespecttotheuntrimmedmates.Default:O.
Themaximuminsertsizeforvalidpaired-endalignments.E.g.if -x100isspecifiedandapaired-endalignmentconsistsoftwo
20-bpalignmentsintheproperorientationwitha60-bpgapbetweenthem,thatalignmentisconsideredvalid(aslongas -I is
alsosatisfied).A61-bpgapwouldnotbevalidinthatcase.Iftrimmingoptions-3or -5arealsoused,the -xconstraintis
appliedwithrespecttotheuntrimmedmates,notthetrimmedmates.Default:250.
Theupstream/downstreammateorientationsforavalidpaired-endalignmentagainsttheforwardreferencestrand.E.g.,if --
fr isspecifiedandthereisacandidatepaired-endalignmentwherematelappearsupstreamofthereversecomplementof
mate2andtheinsertlengthconstraintsaremet,thatalignmentisvalid.Also,ifmate2appearsupstreamofthereverse
complementofmatelandallotherconstraintsaremet,thattooisvalid. --rf likewiserequiresthatanupstreammatelbe
reverse-complementedandadownstreammate2beforward-oriented. --ff requiresbothanupstreammatelanda
downstreammate2tobeforward-oriented.Default: --fr when-C(colorspacealignment)isnotspecified, --ff when-Cis
specified.
If --nowisspecified,bowtiewillnotattempttoalignagainsttheforwardreferencestrand. If --nort isspecified,bowtiewill
notattempttoalignagainstthereverse-complementreferencestrand.Forpaired-endreadsusing --fr or --rf modes,--nofIsT
and--norcapplytotheforwardandreverse-complementpairorientations.I.e.specifying--nofwand--±r willonlyfindreads
intheR/Forientationwheremate2occursupstreamofmate1withrespecttotheforwardreferencestrand.
--maxbts T h e maximumnumberofbacktrackspermittedwhenaligningareadin 2 or-n3mode(default:125without--best,800
with--best).A"backtrack"istheintroductionofaspeculativesubstitutionintothealignment.Withoutthislimit,thedefault
20. Printtheamountofwall-clocktimetakenbyeachphase.
-V--offbase <int> When outputtingalignmentsinBowtieformat,considerthefirstbaseofareferencesequencetohaveoffset<int>.Thisoption
hasnoeffectin-si—salamode,sinceSAMmandates1-basedoffsets.Default:O.
--quiet P r i n t nothingbesidesalignments.
--refout
--al <filename>
--un <filename>
--max <filename>
--suppress <cols>
--fullref
WritealignmentstoasetoffilesnamedrefXXXXX.map,wherexxxXXisthe0-paddedindexofthereferencesequencealigned
to.Thiscanbeausefulwaytobreakupworkfordownstreamanalyseswhendealingwith,forexample,largenumbersofreads
alignedtotheassembledhumangenome.If <hits>isalsospecified,itwillbeignored.
--refidx W h e n areferencesequenceisreferredtoinareportedalignment,refertoitby0-basedindex(itsoffsetintothelistof
referencesthatwereindexed)ratherthanbyname.
Writeallreadsforwhichatleastonealignmentwasreportedtoafilewithname<filename>.Writtenreadswillappearasthey
didintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinbowtie.Paired-
endreadswillbewrittentotwoparallelfileswith_1andinserted inthefilename,e.g.,if <filename>isaligned.fq,the#1
andIt2matesthatalignatleastoncewillbewrittentoaligned_l.fqandaligned_2.fa_respectively.
Writeallreadsthatcouldnotbealignedtoafilewithname<filename>.Writtenreadswillappearastheydidintheinput,
withoutanyofthetrimmingortranslationofqualityvaluesthatmayhavetakenplacewithinBowtie.Paired-endreadswillbe
writtentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if<filename>isunaligned.fq,the#1and#2mates
thatfailtoalignwillbewrittentounaligned_l fo andunaligned_2 q respectively.Unless--maxisalsospecified,readswith
anumberofvalidalignmentsexceedingthelimitsetwiththe-moptionarealsowrittento<filenane>.
Writeallreadswithanumberofvalidalignmentsexceedingthelimitsetwiththe-moptiontoafilewithname<filename>.
Writtenreadswillappearastheydidintheinput,withoutanyofthetrimmingortranslationofqualityvaluesthatmayhave
takenplacewithin•zowtie.Paired-endreadswillbewrittentotwoparallelfileswith_1and_2insertedinthefilename,e.g.,if
<filename>ismax.fa,the#1and#2matesthatexceedthe-mlimitwillbewrittentomax_1.fqandmax_2.fqrespectively.
Thesereadsarenotwrittentothefilespecifiedwith--lart.
Suppresscolumnsofoutputinthedefaultoutputmode.E.g.if--suppress 1, 5,6isspecified,thereadname,readsequence,
andreadqualityfieldswillbeomitted.SeeDefaultBowtieoutputforfielddescriptions.Thisoptionisignored if theoutput
modeis-S/--sarr..
Printthefullreferncesequencename,includingwhitespace,inalignmentoutput.Bydefaultbowtieprintseverythinguptobut
notincludingthe firstwhitespace.
21. Colorspace
--snpphred <int>
--snpfrac <dec>
--col-cseq
--col-equal
--col-keepends
SAM
-S/--sam
Whendecodingcolorspacealignments,use <int> astheSNPpenalty.Thisshouldbesetto theuser'sbestguessof thetrue ratio
ofSNPsperbasein thesubjectgenome,converted to thePhredqualityscale.E.g., if theuserexpectsabout1SNPevery1,000
positions,--snpphredshouldbeset to30(whichisalsothedefault).Tospecifythefractiondirectly,use --snpfrac.
Whendecodingcolorspacealignments,use<dot>astheestimatedratio ofSNPsperbase.Forbestdecodingresults, thisshould
beset to theuser'sbestguessof thetrue ratio. bowtie internallyconvertsthe ratio toaPhredquality,andbehavesas if that
qualityhadbeensetviathe--zinpphredoption.Default:0.001.
Ifreadsareincolorspaceandthe defaultoutputmodeisactive, --col-cseq causesthereads'colorsequencetoappearinthe
read-sequencecolumn(column5)instead of thedecodednucleotidesequence.SeetheDecodingcolorspacealignmentssection
fordetailsaboutdecoding.Thisoptionisignoredin -s/--sammode.
Ifreadsareincolorspaceandthedefaultoutputmodeisactive,--col-cguaicausesthereadsoriginal(color)qualitysequence
toappearinthe qualitycolumn(column6)instead of thedecodedqualities.SeetheColorspacealignmentsectionfor details
aboutdecoding.Thisoptionisignoredin-S1--sarrimode.
Whendecodingcolorpsacealignments,bowtie trims offanucleotideandqualityfromthe leftandrightedgesofthealignment.
Thisisbecausethosenucleotidesaresupportedbyonlyonecolor,in contrasttothemiddlenucleotideswhicharesupportedby
two.Specify--col-keepends tokeeptheextreme-endnucleotidesandqualities.
PrintalignmentsinSAMformat.SeetheSAMoutputsectionofthemanualfordetails.TosuppressallSAMheaders,use--sam-
noheadinaddition to -S/--sam.Tosuppressjust the headers (e.g. if thealignmentisagainstaverylargenumberofreference
sequences),use--sam-nosqinaddition to -S/--sam. bowtiedoesnot writeBAMfilesdirectly, butSAMoutputcanbeconvertedto
BAMonthe flybypiping•DowtielSoutput tosamtools view. -Si—sarnisnotcompatiblewith --refout.
--mapo<int> I f analignmentisnon-repetitive(accordingto-m,--strataandotheroptions)settheMAPQ(mappingquality)fieldtothisvalue.
SeetheSAMSpecfordetailsabouttheMAK,fieldDefault:255.
--sam-hohead S u p p r e s s headerlines(starting with@)whenoutputis-S/--sarr..Thismustbespecifiedinadditionto -S/--sam.--sam-noheadis
ignoredunless-s/--sarr. isalsospecified.
--sam-hosq S u p p r e s s 1S0headerlineswhenoutputis--Si—sam.Thismustbespecifiedinaddition to -S/--sam.--sam-hosqisignoredunless
-sj--sam isalsospecified.
--sam-RG<text> A d d <text> (usually of theformTAG:VAL,e.g.ID:IL-1LANE2)asafieldonthe2:RGheaderline.Specify--sam-RGmultipletimesto
setmultiplefields.SeetheSAMSpecfordetailsaboutwhatfieldsarelegal.Notethat, if any@RGfieldsaresetusingthisoption,
theIDandSMfieldsmustbothbeamongthemtomakethegRGlinelegalaccordingto theSAMSpec.--sari-RGisignoredunless -
22. Performance
-of—offrate <int>
-pi—threads <int>
--mm
--shmem
Other
Overridetheoffrate oftheindexwith <int>. If <int> isgreaterthantheoffrateusedtobuildtheindex,thensomerow
markingsarediscardedwhentheindexisreadintomemory.Thisreducesthememoryfootprintofthealignerbutrequires
moretimetocalculatetextoffsets. <int> mustbegreaterthanthevalueusedtobuildtheindex.
Launch<in':>parallelsearchthreads(default: 1).Threadswillrunonseparateprocessors/coresandsynchronizewhenparsing
readsandoutputtingalignments.Searchingforalignmentsishighlyparallel,andspeedupisfairlyclosetolinear.Thisoptionis
onlyavailable if b,owtieislinkedwiththeothreadslibrary(i.e. ifBOVIIE_PTHREADS=0isnotspecifiedatbuildtime).
Usememory-mappedI/O toloadtheIndex,ratherthannormalCfileI/O.Memory-mappingtheindexallowsmanyconcurrent
bowtioprocessesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverhead
justonce).Thisfacilitatesmemory-efficientparallelizationofbowtieInsituationswhereusing-p isnotpossible.
Usesharedmemorytoloadtheindex,ratherthannormalCfileI/O.Usingsharedmemoryallowsmanyconcurrentbowtie
processesonthesamecomputertosharethesamememoryimageoftheindex(i.e.youpaythememoryoverheadjustonce).
Thisfacilitatesmemory-efficientparallelizationofbowtieinsituationswhereusing-p isnotdesirable.Unlike--mm,--shnem
installstheindexintosharedmemorypermanently,or untiltheuserdeletesthesharedmemorychunksmanually.Seeyour
operatingsystemdocumentationfordetailsonhowtomanuallylistandremovesharedmemorychunks(onLinuxandMacOS
X,thesecommandsareipcsandipcm).YoumayalsoneedtoincreaseyourOS'smaximumshared-memorychunksizeto
accomodatelargerindexes;seeyourOSdocumentation.
--seed <int> U s e <int>astheseedforpseudo-randomnumbergenerator.
--verbose P r i n t verboseoutput(fordebugging).
--version P r i n t versioninformationandquit.
-hi—help P r i n t usageinformationandquit.
30. Bowtie 2 has an -X option
for 'max fragment length'
The default value is 500 bp
= 100 + 100 + 300
What happens if we
increase -X to 2000 bp?
Bowtie2hasan-Xoption
for'maxfragmentlength'
Thedefaultvalueis500bp
=100+100+300
Whathappensifwe
increase-Xto2000bp?
32. Most programs will have some options
that you should consider changing
Mostprogramswillhavesomeoptions
thatyoushouldconsiderchanging
33. Some options from TopHat
TopHat
command-line option
Meaning
Default
value
--num-threads
How many CPU threads to
use when running TopHat
1
--min-intron-length Minimum intron length 70
-r / --mate-inner-dist
Expected (mean) inner
distance between mate pairs
50
--mate-std-dev
Standard deviation for the
distribution on inner distances
20
SomeoptionsfromTopHat
1WTopHat
command-lineoption Meaning
Default
value
--num-threads HowmanyCPUthreadsto
usewhenrunningTopHat
1
34. You nearly always can run with more
processors/threads than the default (1)
Younearlyalwayscanrunwithmore
processors/threadsthanthedefault(1)
35. Some options from TopHat
TopHat
command-line option
Meaning
Default
value
--num-threads
How many CPU threads to
use when running TopHat
1
--min-intron-length Minimum intron length 70
-r / --mate-inner-dist
Expected (mean) inner
distance between mate pairs
50
--mate-std-dev
Standard deviation for the
distribution on inner distances
20
SomeoptionsfromTopHat
1WTopHat
command-lineoption Meaning
Default
value
--num-threads HowmanyCPUthreadsto
usewhenrunningTopHat
1
--min-intron-lengthMinimumintronlength 7 0
36. This might not be suitable for non-vertebratesThismightnotbesuitablefornonvertebrates
40. 1iortt,peke),,,v,
4Z-c;(>t_t'
17:11
tresP L
LA-r,,oc-nrt
t Lek
(20tf-1-*)1re-4,
(3,31 - or-
1?-1.,tokos•,,,,,4
Rool1,11--
12ProN
rc
RIcA.046
-AccAlitipow)
Pr5TPlowe
opoy.•)&!).
t
r Aer-od %Pt,'
• •••••-
t'vsa ( ( A t %44.--5 0-F 6 Co% c -
(tcei L0pAr COV Pv-0 t ) ‘_
5i et-f)triz. a c et 0 ) Loe_ev-T,t,voS
otir•I'L re e?
(c• ctfte,4411,
rev6-1esTmok•as,
kJ,So&ISIFir,at- 05.771 - efive,:t •••1
_
0V(t,d2(sty
LI5
IP(112,'A
lActi - r r e r
Cc. 5 4 , e c e 14(,,r5 ,Ferv'A4t r
3L.
PSAe e V O I N 4
or-ti-t1/4 etec.vt,
6,11,2) e
ot%Ps. C E-4• a t . t ,
LAJo-oevw'_1__ s v l y, - 1 c/94,er Ttrei-4/. _
(3 eos-1- r . v c-rf Pi410 /4,5
ok.1 t%'t / e n AY) 6"-',00•4)
6•, O f f t ) s s .
.SV.Cte,
(v-11re,iYVIte%kV%) ,,,,,e1r,C rot; C : )
r pu ) 4 61 Cke,4tteV c•At r a t tocr
CT') kJsIlk); 0 1 ; , 4 - S P -h0460w b l ' " "
re.4,ki,6- 1 S t . e12-",r,POT cve,e3e4crey
"egg,40LS-TAN oPcaftle - r t , *ger o-yreleve
_ A t , 1 7 . 7 L t 4-,5 F a t
1 1 0
PI21rA, ke, i 2 eleta
I:3.0ex4itiv a , - Pp v . / )•% ) 1 a rient
or-
g,vA.) e s - $ • T4P
••••• _ 3
covy L a _ Tsres
)ti7t-eltri
—1TO r e
(Drrn)t-iPM-N4C Arc- CA-13
rrsArtf..tg-S6,„tid—toAtt-
e1244-r
(0;:t tn.)
•••,trttrteT—.1.t.,,IA
1
(..•Oren
t
irtra
t4..c..ec4
IAI/oe4 ( I N T cyt _
tu.s.+-•
ST ft:,S
i42the4.1st2_7
IRO Ft-ca*.$
nit.P0-6
41. Lab books are good…
tiortt,peke),,,v,
17-6,Nos•••,,,.4
PsIcA.0*s6.._111Q•
-kce.iitoow)
RIN'YrIl
-1---- —
T c p
- efIc•I:rt _ 3
44,elk, -r 4rer
-:-1)
re-v6-1esTA.Was,
U_Si)S IFINit( OS:71—C plAlt " 1 n r • t e . . . n c ,
1:Y4'k.. r e PIONCti
(C•Nitt. C I E tot,
L
wkoD°'6'.%L•
54,ec e 14e.m-, r a 1E4--iter,'A‘tr
1 )
PsAee
(vo f;) wtE ct.c1,..J)
1160411,pt„,„
1
V)1.N„.
41.-ettneT—.1„.„;1,1
abbooksaregood...
L
• •11 • • 1 • - •
•nrt (20•- • - • • • • — •
- •• • • ••• -
opoysn&i)
4._(3,3-, 0 r- r
FpaL ( f A ( 0 5 - F 6 col6,1 c -
L/tors E 0 P v - 0 s p . ) _
P v i z , 5 i e r •-t• 1 2 0 et 0 ) Weaft-Tt0,1 S
, A r c , 6 ( el2-•
ittg, LSTAN
Or:-
L/ ( r , 1 7 2 - 1 ) ,
oex ' L k - P p 1 . 1 1 ) ) ) 1 a 1 f l I
1/1/1,-1_1w4,12,'A
•5) eltri
OPco2at1/4.)
P'1't F a t * -
or-
"••••••-)
a. Qr44-19—
v1,6A*5z•-
*gerE-yrn,J,6_1
PQtrAl / 2 e
Cer7 / rvs-1,- It 4
tA>11)*-, it-N4C
1244--r
(or'sesn
IAI/to'NNI (INT cyt
V I - , E WA '
nii.P0-6
42. …but electronic lab books are more helpful
few,":-)
1iortt,peke),,,v,
17-1-Nos•et,4
RIcA.e*s6
-kce.tiovw)
ROVNI-11
-1----- —
51NA
_
L
1:3"kk- r e x? IONCti
(C•Nist. ClEtoc.,
PsAee
v
—ef4c--;rtt
(Les-$ Tcp
t - r 4rer
-z1)
01"14
i1/4JitsTA.LIVai1241
- C
5-0Zc e k.,041,,,,r,
1 )
(vo ,;) e ct.cro,) _
f i c tert-t'4(-1 r
416041 4 , 1
1
V)I
d_et41
-
butelectroniclabbooksaremorehelpful
L
LA-1--,oc.•nrt tve7).1re-4,
(3,310ife--or-
(//k./ 4 L t e l l " - S
e P p A r S
P i t
Pit$ rc
• ••••-
opoysn&i)
F C014:140 *C-
C0ev—o1)_%--tvs")
5ieor-(3,11-0.'" a° ec.,-) t w o s
ittg. LSTphekje
Or-
,/71-111kA
est-
I:100v4lt-Ae'Lk- pp 1 . 1 1 ) ) ) 1 1 4 t (1:1 "•5
co2at1/4.) ci
17-7,E,Fat
•g,cv,rOrtor>y
or- 904---,
aver to,Vreteve6-1
Pi2trA,Ike,t, /2e/ta
mc.A
It71-
-1 b C LI7 / rr.,1-1-t 4
a>11n),-,P11-N4cAgzAwk-o_p_
Qr44-19-
e12-44--r
ratorNN.)eiNtlcyt
nit.P0-6
47. I.e. something that can be read using 'less'Lasomethingthatcanbereadusing'less'
48. I like to write README files in Markdown format for everything
Milk-DNase-Seq-Project:RNA-SeqAnalyis
--- - - -
Seemain,READMErl'ADME.md) file for moreinformation about this project.
4*BovineRNA-seqdata ##
Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100
bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/
share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file
whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows.
Theultimate goal is to find genes that are differentially expressedbetween
thesetwodevelopmentalstages.
Thesefiles were originallycompressedwith bzip2, will re-compress with gzip
sothat existing pipelines canwork with them.And will alsorenamethem to
havefastq suffix:
—bash
cdishare/tamu/Data/RNA-Seq/Cow/2014-10
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
tttCheckingbarcodes inRNA-Seqdata ##
Let'scheckon all barcodesbeingused. Will makesomesoft links to the data
files
"'bash
cdAnalysis/Test
mkdirRNA-Seq_Barcode_check
cdRNA-Seq_Barcode_check
qlogin
bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.*
[12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt
Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe
needto treat each file separately.
#TestrunofScytheandSickle#
IliketowriteREADMEfilesinMarkdownformatforeverything
49. Easy to output to HTML or PDF
Milk-DNase-Seq-Project:RNA-SeqAnalyis
--- - - -
Seemain,READMErl'ADME.md) file for moreinformation about this project.
4*BovineRNA-seqdata ##
Stored in 'ishare/tamu/Data/RNA-Seq/Cow/2014-10'.Looks like paired-read100
bpdata. In total 31 x 2 files, ranging from 1-3.5GBin size. Seealso the '/
share/tamu/Data/RNA-Seq/Cow/Metadata'directory whichcontains ametadata file
whichsuggests thatwehavedata from15 virgin cowsand16 lacating cows.
Theultimate goal is to find genes that are differentially expressedbetween
thesetwodevelopmentalstages.
Thesefiles were originallycompressedwith bzip2, will re-compress with gzip
sothat existing pipelines canwork with them.And will alsorenamethem to
havefastq suffix:
—bash
cdishare/tamu/Data/RNA-5eq/Cow/2014-10
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
tttCheckingbarcodes inRNA-Seqdata ## 1
is/Test
Let'scheckon all barcodesbeingused. Will makesomesoft links to the data Barcodecheck
files q Barcode_check
"'bash
cdAnalysis/Test
mkdirRNA-Seq_Barcode_check
cdRNA-Seq_Barcode_check
qlogin
bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz21grep@HWI"1 sed 's/.*
[12]:N:0://' I sort I uniq -c > barcodes_in_identifiers.txt
Unfortunately this failed due to a 'Nospace left ondevice error'. Somaybe
needto treat each file separately.
#TestrunofScytheandSickle#
Milk-DNase-Seq-Project:RNA-SeqAnalyis
SeemainREADMEfileformoreinformationaboutthisproject.
BovineRNA-secidata
Storedin/share/tamu/Data/RNA-Seq/Cow/2014-10Lookslikepaired-read100bpdata.Intotal31x2files,rangingfrom1-3.5
GBinsize.Seealsotheishareitamo/Data/RNA-Seq/Cow/Metadatadirectorywhichcontainsametadatafilewhichsuggeststhat
wehavedatafrom15virgincowsand16lacatingcows.
Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages.
Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwiththem.Andwill
alsorenamethemtohavefastqsuffix:
cd/share/tamu/Data/RNA-Seq/Cow/2014-10
bunzip2*.b22
rename.pl s/txt/fastq/ *.txt
gzip *.fastq
CheckingbarcodesinRNA-Seqdata
et'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles
ogin
bunzip2 -c ../../../Data/RNA-Seq/Cow/2014-10/*.bz2 1 grep "MI" 1 sed 's/.. f121:N:0://' 1 sort 1 unlq -c >
UnfortunatelythisfailedduetoaNospaceleftondeviceerror'.Somaybeneedtotreateachfileseparately.
TestrunofScytheandSickle
UnliketheDNase-Segdata.wenowhavepaired-enddata,whichrequiresrunningSicklealittledifferently.Sofirst,let'sdoatest
(using10.000readsfromeachoftwopairedFAST()files):
cdishare/tamu/Analysis/Test
mkdirPaired_end_seythe_sickle test
EasytooutputtoHIM_orPDF
51. Title: Command-line Bootcamp
Authors: Keith Bradnam
Date: 2015-06-14
Address:Genome Center, UC Davis, Davis, CA, 95616
#Command-line Bootcamp
### Keith Bradnam
###UC Davis Genome Center
#10 Version 1.0 - - - June 2015
<br><br><br>
><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img
ait="Creative CommonsLicense" style-"border-width:0"
src="https://i.creativecommons.org/l/by-nc-sa/4.0188x31.png" /></a><br />This work is
licensed under a <a rel-"license" href-"http://creativecommons.org/licenses/by-nc-
sa/4.0/">CreativeCommonsAttribution-NonCommercial-ShareAlike 4.0 International
License</a>. Please send feedback, questions, money, or abuse to <krbradnamquedavis.edu>
Introduction [Introduction]
This 'bootcamp' i s intended to provide the reader with a basic overview of essential
Unix/Linux commands that will allow them to navigate a file system and move, copy, edit
files. I t will also introduce a brief overview of some 'power' commands in Unix.
##Why Unix? [Why Unix]
The [Unix operating system][Unix] has been around since 1969. Back then
thing as a graphical user interface. You typed everything. I t mayseem a
keyboard to issue commands today, but i t ' s much easier to automate keybo
mouse tasks. There are several variants of Unix (including [Linux][Linux o u g
differences do not matter much for most basic functions.
[Unix]: http://en.wikipedia.org/wiki/Unix
[Linux]: http://en.wikipedia.org/wiki/Linux
Increasingly, the raw output of biological research exists as _in silico_ data, usually
in the form of large text files. Unix is particularly suited to working with such files
andhas several powerful (and flexible) commands that can process your data for you. The
real strength of learning Unix is that most of these commands can be combined in an
almost unlimited fashion. So i f you can learn just five Unix commands, you will be able
to do a lot more than just five things.
OfTypeset Conventions [Typeset]
Command-line examples that you are meant to type into a terminal window will be shown_
Command-lineBootcamp
KeithBradnam
UCDavisGenomeCenter
Version1.0—June2015
ThisworkislicensedunderaCreativeCommonsAttribution-
NonCommercial-ShareAlike4.0InternationalLicense.Pleasesend
feedback,questions,money,orabusetokrbradnamgucdavis.edu
Introduction
This'bootcampisintendedtoprovidethereaderwithabasicoverviewofessential
Unix/Linuxcommandsthatwillallowthemtonavigateafilesystemandmove,copy,
editfiles.Itwillalsointroduceabriefoverviewofsome'power'commandsinUnix.
WhyUnix?
TheUnixoperatingsystemhasbeenaroundsince1969.Backthentherewasno
suchthingasagraphicaluserinterface.Youtypedeverything.itmayseemarchaicto
useakeyboardtoissuecommandstoday,butitsmucheasiertoautomatekeyboard
52. 0 This repositorySearch Explore Gist Blog Help k b r a d n a m 0 0
KorfLab/Milk-DNase-Seq-Project
i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand
0 Watch
cdAnalysis/Test
mkdirRNA—Seq_Barcode_check
cdRNA—Seq_Barcode_check
qlogin
*Star 0 V F o r k 0
i=
koradnam3daysagoNewanalysisusingRtorunDEseq2
1_c::,tribiAtOr
317lines(213sloc)12.729kb R a w Blame History m
Milk-DNase-Seq-Project:RNA-SeqAnalyis
SeemainREADMEfileformoreinformationaboutthisprolect.
BovineRNA-seqdata
Storedin/shereitamu/Data/RNA-Seq/Cow/2014-1.0Lookslikepaired-read100bpdata.Intotal31x2files,ranging
from1-3.5GBinsize.SeealsotheisharettamuiData/RNA-Seq/CowiMetadatadirectorywhichcontainsametadata
filewhichsuggeststhatwehavedatafrom15virgincowsand16lacatingcows.
Theultimategoalistofindgenesthataredifferentiallyexpressedbetweenthesetwodevelopmentalstages.
Thesefileswereoriginallycompressedwithbzip2,willre-compresswithgzipsothatexistingpipelinescanworkwith
them.Andwillalsorenamethemtohavefastesuffix:
cdishare/tamu/Data/RNA—Seq/Cow/2014-10
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
CheckingbarcodesinRNA-Selldata
Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles
1
53. Sites like GitHub use Markdown
ThisrepositorySearch
1
Explore Gist Blog Help kbradnam 0 0
1
KorfLab/Milk-DNase-Seq-Project
i2/branch:master • Milk-DNase-Seq-Project/ README_RNA-SecLanalysisand
0 Watch *Star 0 V F o r k 0
i=
kbradnam3daysagoNewanalysisusingRtorunDEseq2
1 b 10r
317lines(213sloc)12.729kb RawBlame History I l m
Milk-DNase-Seq-Project:RNA-SeqAnalyis
SeemainREADMEfileformoreinformationaboutthisprolect.
SiteslikeGitHubuseMarkdown
bunzip2*.bz2
rename.pl sitxt/fastq/ *.txt
gzip *.fastq
CheckingbarcodesinRNA-Selldata
Let'scheckonallbarcodesbeingused.Willmakesomesoftlinkstothedatafiles
cdAnalysis/Test
mkdirRNA-Seq_Barcode_check
cdRNA-Seg_Barcode_check
qlogin