SlideShare a Scribd company logo
1 of 63
BLAST and FASTA


                  1
Pairwise Alignment

          Global                        Local
• Best score from among       • Best score from among
  alignments of full-length     alignments of partial
  sequences                     sequences
• Needelman-Wunch             • Smith-Waterman
  algorithm                     algorithm




                                                        2
Why do we need local alignments?

 •   To compare a short sequence to a large one.

 •   To compare a single sequence to an entire
     database

 •   To compare a partial sequence to the whole.



                                                   3
Why do we need local alignments?
 • Identify newly determined sequences
 • Compare new genes to known ones
 • Guess functions for entire genomes full of
   ORFs of unknown function




                                                4
Mathematical Basis
for Local Alignment
• Model matches as a sequence of coin
  tosses
• Let p be the probability of “head”
   – For a “fair” coin, p = 0.5
• According to Paul Erdös-Alfréd Rényi
  law:
  If there are n throws, then the expected
  length, R, of the longest run of “heads”
  is
               R = log1/p (n).               Paul Erdös
                                                          5
Mathematical Basis
for Local Alignment

• Example: Suppose n = 20 for a “fair” coin
             R=log2(20)=4.32
• Problem: How does one model DNA (or
  amino acid) alignments as coin tosses.




                                              6
Modeling Sequence Alignments
• To model random sequence alignments, replace a match by
  “head” (H) and mismatch by “tail” (T).

             AATCAT
                              HTHHHT
             ATTCAG

• For ungapped DNA alignments, the probability of a “head”
  is 1/4.

• For ungapped amino acid alignments, the probability of a
  “head” is 1/20.
                                                             7
Modeling Sequence Alignments
• Thus, for any one particular alignment, the Erdös-
  Rényi law can be applied
• What about for all possible alignments?
   – Consider that sequences can being shifted back and
     forth in the dot matrix plot
• The expected length of the longest match is
                   R = log1/p(mn)
  where m and n are the lengths of the two
  sequences.
                                                          8
Modeling Sequence Alignments
• Suppose m = n = 10, and we deal with DNA
  sequences
            R = log4(100) = 3.32
• This analysis assumes that the base
  composition is uniform and the alignment is
  ungapped. The result is approximate, but
  not bad.

                                            9
10
Heuristic Methods: FASTA and BLAST

FASTA
• First fast sequence searching algorithm for
  comparing a query sequence against a database.

BLAST
• Basic Local Alignment Search Technique
  improvement of FASTA: Search speed, ease of
  use, statistical rigor.
                                              11
FASTA and BLAST
• Basic idea: a good alignment contains
  subsequences of absolute identity (short lengths
  of exact matches):

  – First, identify very short exact matches.
  – Next, the best short hits from the first step are
    extended to longer regions of similarity.
  – Finally, the best hits are optimized.


                                                        12
FASTA
Derived from logic of the dot plot
  – compute best diagonals from all frames of
    alignment
The method looks for exact matches between
 words in query and test sequence
  – DNA words are usually 6 nucleotides long
  – protein words are 2 amino acids long



                                                13
FASTA Algorithm




                  14
Makes Longest Diagonal
After all diagonals are found, tries to join
 diagonals by adding gaps

Computes alignments in regions of best
 diagonals


                                           15
FASTA Alignments




                   16
FASTA Results - Histogram
!!SEQUENCE_LIST 1.0
(Nucleotide) FASTA of: b2.seq from: 1 to: 693 December 9, 2002 14:02
TO: /u/browns02/Victor/Search-set/*.seq Sequences:     2,050 Symbols:
913,285 Word Size: 6
 Searching with both strands of the query.
 Scoring matrix: GenRunData:fastadna.cmp
 Constant pamfactor used
 Gap creation penalty: 16 Gap extension penalty: 4

Histogram Key:
 Each histogram symbol represents 4 search set sequences
 Each inset symbol represents 1 search set sequences
 z-scores computed from opt scores
z-score obs    exp
        (=)    (*)
< 20      0      0:
  22      0      0:
  24      3      0:=
  26      2      0:=
  28      5      0:==
  30     11      3:*==
  32     19     11:==*==
  34     38     30:=======*==
  36     58     61:===============*
  38     79    100:====================    *
  40    134    140:==================================*
  42    167    171:==========================================*
  44    205    189:===============================================*====
  46    209    192:===============================================*=====   17
  48    177    184:=============================================*
FASTA Results - List
The best scores are:                   init1 initn      opt     z-sc E(1018780)..

SW:PPI1_HUMAN    Begin: 1 End: 269
! Q00169 homo sapiens (human). phosph... 1854   1854   1854   2249.3   1.8e-117
SW:PPI1_RABIT    Begin: 1 End: 269
! P48738 oryctolagus cuniculus (rabbi... 1840   1840   1840   2232.4   1.6e-116
SW:PPI1_RAT    Begin: 1 End: 270
! P16446 rattus norvegicus (rat). pho... 1543   1543   1837   2228.7   2.5e-116
SW:PPI1_MOUSE    Begin: 1 End: 270
! P53810 mus musculus (mouse). phosph... 1542   1542   1836   2227.5   2.9e-116
SW:PPI2_HUMAN    Begin: 1 End: 270
! P48739 homo sapiens (human). phosph... 1533   1533   1533   1861.0   7.7e-96
SPTREMBL_NEW:BAC25830    Begin: 1 End: 270
! Bac25830 mus musculus (mouse). 10, ... 1488   1488   1522   1847.6   4.2e-95
SP_TREMBL:Q8N5W1    Begin: 1 End: 268
! Q8n5w1 homo sapiens (human). simila... 1477   1477   1522   1847.6   4.3e-95
SW:PPI2_RAT    Begin: 1 End: 269
! P53812 rattus norvegicus (rat). pho... 1482   1482   1516   1840.4   1.1e-94




                                                                                    18
FASTA Results - Alignment
SCORES   Init1: 1515 Initn: 1565 Opt: 1687 z-score: 1158.1 E(): 2.3e-58
>>GB_IN3:DMU09374                                         (2038 nt)
 initn: 1565 init1: 1515 opt: 1687 Z-score: 1158.1 expect(): 2.3e-58
  66.2% identity in 875 nt overlap
 (83-957:151-1022)

                   60        70        80         90      100       110
u39412.gb_pr CCCTTTGTGGCCGCCATGGACAATTCCGGGAAGGAAGCGGAGGCGATGGCGCTGTTGGCC
                                            || ||| | ||||| |    ||| |||||
DMU09374     AGGCGGACATAAATCCTCGACATGGGTGACAACGAACAGAAGGCGCTCCAACTGATGGCC
                    130       140       150        160      170       180

                  120       130       140       150       160       170
u39412.gb_pr GAGGCGGAGCGCAAAGTGAAGAACTCGCAGTCCTTCTTCTCTGGCCTCTTTGGAGGCTCA
             |||||||||   || |||    |   | || ||| |         || || ||||| ||
DMU09374     GAGGCGGAGAAGAAGTTGACCCAGCAGAAGGGCTTTCTGGGATCGCTGTTCGGAGGGTCC
                    190       200       210       220       230       240

                  180       190       200       210       220       230
u39412.gb_pr TCCAAAATAGAGGAAGCATGCGAAATCTACGCCAGAGCAGCAAACATGTTCAAAATGGCC
               ||| | ||||| ||    |||   ||||    | || | |||||||| || ||| ||
DMU09374     AACAAGGTGGAGGACGCCATCGAGTGCTACCAGCGGGCGGGCAACATGTTTAAGATGTCC
                    250       260       270       280       290       300

                  240       250       260       270       280       290
u39412.gb_pr AAAAACTGGAGTGCTGCTGGAAACGCGTTCTGCCAGGCTGCACAGCTGCACCTGCAGCTC
             ||||||||||     ||||| |     |||||| |||| |||   || ||| || |
DMU09374     AAAAACTGGACAAAGGCTGGGGAGTGCTTCTGCGAGGCGGCAACTCTACACGCGCGGGCT   19
                    310       320       330       340       350       360
FASTA on the Web

• Many websites offer
  FASTA searches
• Each server has its limits
• Be aware that you
  depend “on the kindness
  of strangers.”

                               20
Institut de Génétique Humaine, Montpellier France, GeneStream server
         http://www2.igh.cnrs.fr/bin/fasta-guess.cgi
Oak Ridge National Laboratory GenQuest server
         http://avalon.epm.ornl.gov/
European Bioinformatics Institute, Cambridge, UK
         http://www.ebi.ac.uk/htbin/fasta.py?request
EMBL, Heidelberg, Germany
         http://www.embl-heidelberg.de/cgi/fasta-wrapper-free
Munich Information Center for Protein Sequences (MIPS)
at Max-Planck-Institut, Germany
         http://speedy.mips.biochem.mpg.de/mips/programs/fasta.html
Institute of Biology and Chemistry of Proteins Lyon, France
         http://www.ibcp.fr/serv_main.html
Institute Pasteur, France
         http://central.pasteur.fr/seqanal/interfaces/fasta.html
GenQuest at The Johns Hopkins University
         http://www.bis.med.jhmi.edu/Dan/gq/gq.form.html
National Cancer Center of Japan
         http://bioinfo.ncc.go.jp

                                                                       21
FASTA Format
• simple format used by almost all programs
• >header line with a [return] at end
• Sequence (no specific requirements for line
  length, characters, etc)
>URO1 uro1.seq   Length: 2018   November 9, 2000 11:50   Type: N   Check: 3854   ..
CGCAGAAAGAGGAGGCGCTTGCCTTCAGCTTGTGGGAAATCCCGAAGATGGCCAAAGACA
ACTCAACTGTTCGTTGCTTCCAGGGCCTGCTGATTTTTGGAAATGTGATTATTGGTTGTT
GCGGCATTGCCCTGACTGCGGAGTGCATCTTCTTTGTATCTGACCAACACAGCCTCTACC
CACTGCTTGAAGCCACCGACAACGATGACATCTATGGGGCTGCCTGGATCGGCATATTTG
TGGGCATCTGCCTCTTCTGCCTGTCTGTTCTAGGCATTGTAGGCATCATGAAGTCCAGCA
GGAAAATTCTTCTGGCGTATTTCATTCTGATGTTTATAGTATATGCCTTTGAAGTGGCAT
CTTGTATCACAGCAGCAACACAACAAGACTTTTTCACACCCAACCTCTTCCTGAAGCAGA
TGCTAGAGAGGTACCAAAACAACAGCCCTCCAAACAATGATGACCAGTGGAAAAACAATG
GAGTCACCAAAACCTGGGACAGGCTCATGCTCCAGGACAATTGCTGTGGCGTAAATGGTC
CATCAGACTGGCAAAAATACACATCTGCCTTCCGGACTGAGAATAATGATGCTGACTATC
CCTGGCCTCGTCAATGCTGTGTTATGAACAATCTTAAAGAACCTCTCAACCTGGAGGCTT                          22
Assessing Alignment Significance
• Generate random alignments and
calculate their scores
• Compute the mean and the standard
deviation (SD) for random scores
• Compute the deviation of the actual score
from the mean of random scores
               Z = (meanX)/SD
• Evaluate the significance of the alignment
• The probability of a Z value is called the E
score
                                            23
E scores or E values
E scores are not equivalent to p
values where
             p < 0.05
are generally considered
statistically significant.
                               24
E values (rules of thumb)
E values below 10-6 are most probably
statistically significant.
E values above 10-6 but below 10-3
deserve a second look.
E values above 10-3 should not be
tossed aside lightly; they should be
thrown out with great force.           25
BLAST
• Basic Local Alignment Search Tool
  – Altschul et al. 1990,1994,1997
• Heuristic method for local alignment
• Designed specifically for database searches
• Based on the same assumption as FASTA
  that good alignments contain short lengths
  of exact matches
                                            26
BLAST
• Both BLAST and FASTA search for local
  sequence similarity - indeed they have exactly
  the same goals, though they use somewhat
  different algorithms and statistical approaches.

• BLAST benefits
  – Speed
  – User friendly
  – Statistical rigor
  – More sensitive
                                                27
Input/Output
• Input:
  – Query sequence Q
  – Database of sequences DB
  – Minimal score S

• Output:
  – Sequences from DB (Seq), such that Q and Seq
    have scores > S

                                               28
BLAST Searches GenBank
[BLAST= Basic Local Alignment Search Tool]
The NCBI BLAST web server lets you compare your
  query sequence to various sections of GenBank:
        –   nr = non-redundant (main sections)
        –   month = new sequences from the past few weeks
        –   refseq_rna
        –   RNA entries from NCBI's Reference Sequence project
        –   refseq_genomic
        –   Genomic entries from NCBI's Reference Sequence project
        –   ESTs
        –   Taxon = e.g., human, Drososphila, yeast, E. coli
        –   proteins (by automatic translation)
        –   pdb = Sequences derived from the 3-dimensional structure
            from Brookhaven Protein Data Bank
                                                                  29
BLAST
• Uses word matching like FASTA
• Similarity matching of words (3 amino acids, 11
  bases)
  – does not require identical words.
• If no words are similar, then no alignment
  – Will not find matches for very short sequences

• Does not handle gaps well
• “gapped BLAST” is somewhat better
                                                     30
BLAST Algorithm




                  31
BLAST Word Matching
MEAAVKEEISVEDEAVDKNI
MEA
 EAA
  AAV        Break query
    AVK
     VKE     into words:
      KEE
       EEI
         EIS
          ISV
          ...         Break database
                        sequences
                        into words:


                                       32
Find locations of matching words
       in database sequences

      ELEPRRPRYRVPDVLVADPPIARLSVSGRDENSVELT MEAT
MEA
EAA     TDVRWMSETGIIDVFLLLGPSISDVFRQYASLTGTQALPPLFSLGYHQSRWNY
AAV        IWLDIEEIHADGKRYFTWDPSRFPQPRTMLERLASKRRV KLVAIVDPH
AVK
KLV
KEE
EEI
EIS
ISV




                                                         33
Extend hits one base at a time




                                 34
Seq_XYZ:      HVTGRSAF_FSYYGYGCYCGLGTGKGLPVDATDRCCWA
Query:           QSVFDYIYYGCYCGWGLG_GK__PRDA

E-val=10-13




  •Use two word matches as anchors to build an alignment
  between the query and a database sequence.

  •Then score the alignment.
                                                     35
HSPs are Aligned Regions
• The results of the word matching and
  attempts to extend the alignment are
  segments
   - called HSPs (High-Scoring Segment
     Pairs)
• BLAST often produces several short HSPs
  rather than a single aligned region

                                            36
•   >gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'.
•             Length = 369
•    Score =    272 bits (137),   Expect = 4e-71
•    Identities = 258/297 (86%), Gaps = 1/297 (0%)
•    Strand = Plus / Plus
•
•   Query: 17    aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76
•                |||||||||||||||| | ||| | ||| || ||| | |||| ||||| |||||||||
•   Sbjct: 1     aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59
•
•   Query: 77    agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136
•                |||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| ||
•   Sbjct: 60    agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119
•
•   Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196
•               |||||||| | || | ||||||||||||||| ||||||||||| || ||||||||||||
•   Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179
•
•   Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256
•              ||||||||| | |||||||| |||||||||||||||||| ||||||||||||||||||||
•   Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239
•
•   Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313
•              || || ||||| || ||||||||||| | |||||||||||||||||| ||||||||
•   Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296




                                                                                    37
BLAST variants




                 38
39
40
41
42
43
Understanding BLAST output




                         44
45
46
47
48
49
50
51
52
53
Choosing the right parameters




                            54
55
56
57
Controlling the output




                         58
59
60
61
62
More on BLAST

NCBI Blast Glossary
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html

Education: Blast Information
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html

Steve Altschul's Blast Course
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html




                                                             63

More Related Content

What's hot (20)

Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Cath
CathCath
Cath
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
BLAST AND FASTA.pptx
BLAST AND FASTA.pptxBLAST AND FASTA.pptx
BLAST AND FASTA.pptx
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
FASTA
FASTAFASTA
FASTA
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Ddbj
DdbjDdbj
Ddbj
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Scop database
Scop databaseScop database
Scop database
 

Similar to Blast fasta 4

Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Prof. Wim Van Criekinge
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Prof. Wim Van Criekinge
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekingeProf. Wim Van Criekinge
 
MSc Thesis Presentation
MSc Thesis PresentationMSc Thesis Presentation
MSc Thesis PresentationReem Sherif
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekingeProf. Wim Van Criekinge
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...Hao Jin
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingAntonio Maria Fiscarelli
 
Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...David Rosenblum
 
Representations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data MiningRepresentations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data MiningVijay Raghavan
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Dataaimsnist
 

Similar to Blast fasta 4 (20)

Similarity
SimilaritySimilarity
Similarity
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
Bioinformatica t4-alignments
Bioinformatica t4-alignmentsBioinformatica t4-alignments
Bioinformatica t4-alignments
 
Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013Bioinformatics t4-alignments wim_vancriekingev2013
Bioinformatics t4-alignments wim_vancriekingev2013
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Arom fold
Arom foldArom fold
Arom fold
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
BLAST
BLASTBLAST
BLAST
 
MSc Thesis Presentation
MSc Thesis PresentationMSc Thesis Presentation
MSc Thesis Presentation
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
Tree building 2
Tree building 2Tree building 2
Tree building 2
 
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
(SAC2020 SVT-2) Constrained Detecting Arrays for Fault Localization in Combin...
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop scheduling
 
Lecture6.pptx
Lecture6.pptxLecture6.pptx
Lecture6.pptx
 
Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...
 
_BLAST.ppt
_BLAST.ppt_BLAST.ppt
_BLAST.ppt
 
Representations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data MiningRepresentations for large-scale (Big) Sequence Data Mining
Representations for large-scale (Big) Sequence Data Mining
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

Blast fasta 4

  • 2. Pairwise Alignment Global Local • Best score from among • Best score from among alignments of full-length alignments of partial sequences sequences • Needelman-Wunch • Smith-Waterman algorithm algorithm 2
  • 3. Why do we need local alignments? • To compare a short sequence to a large one. • To compare a single sequence to an entire database • To compare a partial sequence to the whole. 3
  • 4. Why do we need local alignments? • Identify newly determined sequences • Compare new genes to known ones • Guess functions for entire genomes full of ORFs of unknown function 4
  • 5. Mathematical Basis for Local Alignment • Model matches as a sequence of coin tosses • Let p be the probability of “head” – For a “fair” coin, p = 0.5 • According to Paul Erdös-Alfréd Rényi law: If there are n throws, then the expected length, R, of the longest run of “heads” is R = log1/p (n). Paul Erdös 5
  • 6. Mathematical Basis for Local Alignment • Example: Suppose n = 20 for a “fair” coin R=log2(20)=4.32 • Problem: How does one model DNA (or amino acid) alignments as coin tosses. 6
  • 7. Modeling Sequence Alignments • To model random sequence alignments, replace a match by “head” (H) and mismatch by “tail” (T). AATCAT HTHHHT ATTCAG • For ungapped DNA alignments, the probability of a “head” is 1/4. • For ungapped amino acid alignments, the probability of a “head” is 1/20. 7
  • 8. Modeling Sequence Alignments • Thus, for any one particular alignment, the Erdös- Rényi law can be applied • What about for all possible alignments? – Consider that sequences can being shifted back and forth in the dot matrix plot • The expected length of the longest match is R = log1/p(mn) where m and n are the lengths of the two sequences. 8
  • 9. Modeling Sequence Alignments • Suppose m = n = 10, and we deal with DNA sequences R = log4(100) = 3.32 • This analysis assumes that the base composition is uniform and the alignment is ungapped. The result is approximate, but not bad. 9
  • 10. 10
  • 11. Heuristic Methods: FASTA and BLAST FASTA • First fast sequence searching algorithm for comparing a query sequence against a database. BLAST • Basic Local Alignment Search Technique improvement of FASTA: Search speed, ease of use, statistical rigor. 11
  • 12. FASTA and BLAST • Basic idea: a good alignment contains subsequences of absolute identity (short lengths of exact matches): – First, identify very short exact matches. – Next, the best short hits from the first step are extended to longer regions of similarity. – Finally, the best hits are optimized. 12
  • 13. FASTA Derived from logic of the dot plot – compute best diagonals from all frames of alignment The method looks for exact matches between words in query and test sequence – DNA words are usually 6 nucleotides long – protein words are 2 amino acids long 13
  • 15. Makes Longest Diagonal After all diagonals are found, tries to join diagonals by adding gaps Computes alignments in regions of best diagonals 15
  • 17. FASTA Results - Histogram !!SEQUENCE_LIST 1.0 (Nucleotide) FASTA of: b2.seq from: 1 to: 693 December 9, 2002 14:02 TO: /u/browns02/Victor/Search-set/*.seq Sequences: 2,050 Symbols: 913,285 Word Size: 6 Searching with both strands of the query. Scoring matrix: GenRunData:fastadna.cmp Constant pamfactor used Gap creation penalty: 16 Gap extension penalty: 4 Histogram Key: Each histogram symbol represents 4 search set sequences Each inset symbol represents 1 search set sequences z-scores computed from opt scores z-score obs exp (=) (*) < 20 0 0: 22 0 0: 24 3 0:= 26 2 0:= 28 5 0:== 30 11 3:*== 32 19 11:==*== 34 38 30:=======*== 36 58 61:===============* 38 79 100:==================== * 40 134 140:==================================* 42 167 171:==========================================* 44 205 189:===============================================*==== 46 209 192:===============================================*===== 17 48 177 184:=============================================*
  • 18. FASTA Results - List The best scores are: init1 initn opt z-sc E(1018780).. SW:PPI1_HUMAN Begin: 1 End: 269 ! Q00169 homo sapiens (human). phosph... 1854 1854 1854 2249.3 1.8e-117 SW:PPI1_RABIT Begin: 1 End: 269 ! P48738 oryctolagus cuniculus (rabbi... 1840 1840 1840 2232.4 1.6e-116 SW:PPI1_RAT Begin: 1 End: 270 ! P16446 rattus norvegicus (rat). pho... 1543 1543 1837 2228.7 2.5e-116 SW:PPI1_MOUSE Begin: 1 End: 270 ! P53810 mus musculus (mouse). phosph... 1542 1542 1836 2227.5 2.9e-116 SW:PPI2_HUMAN Begin: 1 End: 270 ! P48739 homo sapiens (human). phosph... 1533 1533 1533 1861.0 7.7e-96 SPTREMBL_NEW:BAC25830 Begin: 1 End: 270 ! Bac25830 mus musculus (mouse). 10, ... 1488 1488 1522 1847.6 4.2e-95 SP_TREMBL:Q8N5W1 Begin: 1 End: 268 ! Q8n5w1 homo sapiens (human). simila... 1477 1477 1522 1847.6 4.3e-95 SW:PPI2_RAT Begin: 1 End: 269 ! P53812 rattus norvegicus (rat). pho... 1482 1482 1516 1840.4 1.1e-94 18
  • 19. FASTA Results - Alignment SCORES Init1: 1515 Initn: 1565 Opt: 1687 z-score: 1158.1 E(): 2.3e-58 >>GB_IN3:DMU09374 (2038 nt) initn: 1565 init1: 1515 opt: 1687 Z-score: 1158.1 expect(): 2.3e-58 66.2% identity in 875 nt overlap (83-957:151-1022) 60 70 80 90 100 110 u39412.gb_pr CCCTTTGTGGCCGCCATGGACAATTCCGGGAAGGAAGCGGAGGCGATGGCGCTGTTGGCC || ||| | ||||| | ||| ||||| DMU09374 AGGCGGACATAAATCCTCGACATGGGTGACAACGAACAGAAGGCGCTCCAACTGATGGCC 130 140 150 160 170 180 120 130 140 150 160 170 u39412.gb_pr GAGGCGGAGCGCAAAGTGAAGAACTCGCAGTCCTTCTTCTCTGGCCTCTTTGGAGGCTCA ||||||||| || ||| | | || ||| | || || ||||| || DMU09374 GAGGCGGAGAAGAAGTTGACCCAGCAGAAGGGCTTTCTGGGATCGCTGTTCGGAGGGTCC 190 200 210 220 230 240 180 190 200 210 220 230 u39412.gb_pr TCCAAAATAGAGGAAGCATGCGAAATCTACGCCAGAGCAGCAAACATGTTCAAAATGGCC ||| | ||||| || ||| |||| | || | |||||||| || ||| || DMU09374 AACAAGGTGGAGGACGCCATCGAGTGCTACCAGCGGGCGGGCAACATGTTTAAGATGTCC 250 260 270 280 290 300 240 250 260 270 280 290 u39412.gb_pr AAAAACTGGAGTGCTGCTGGAAACGCGTTCTGCCAGGCTGCACAGCTGCACCTGCAGCTC |||||||||| ||||| | |||||| |||| ||| || ||| || | DMU09374 AAAAACTGGACAAAGGCTGGGGAGTGCTTCTGCGAGGCGGCAACTCTACACGCGCGGGCT 19 310 320 330 340 350 360
  • 20. FASTA on the Web • Many websites offer FASTA searches • Each server has its limits • Be aware that you depend “on the kindness of strangers.” 20
  • 21. Institut de Génétique Humaine, Montpellier France, GeneStream server http://www2.igh.cnrs.fr/bin/fasta-guess.cgi Oak Ridge National Laboratory GenQuest server http://avalon.epm.ornl.gov/ European Bioinformatics Institute, Cambridge, UK http://www.ebi.ac.uk/htbin/fasta.py?request EMBL, Heidelberg, Germany http://www.embl-heidelberg.de/cgi/fasta-wrapper-free Munich Information Center for Protein Sequences (MIPS) at Max-Planck-Institut, Germany http://speedy.mips.biochem.mpg.de/mips/programs/fasta.html Institute of Biology and Chemistry of Proteins Lyon, France http://www.ibcp.fr/serv_main.html Institute Pasteur, France http://central.pasteur.fr/seqanal/interfaces/fasta.html GenQuest at The Johns Hopkins University http://www.bis.med.jhmi.edu/Dan/gq/gq.form.html National Cancer Center of Japan http://bioinfo.ncc.go.jp 21
  • 22. FASTA Format • simple format used by almost all programs • >header line with a [return] at end • Sequence (no specific requirements for line length, characters, etc) >URO1 uro1.seq Length: 2018 November 9, 2000 11:50 Type: N Check: 3854 .. CGCAGAAAGAGGAGGCGCTTGCCTTCAGCTTGTGGGAAATCCCGAAGATGGCCAAAGACA ACTCAACTGTTCGTTGCTTCCAGGGCCTGCTGATTTTTGGAAATGTGATTATTGGTTGTT GCGGCATTGCCCTGACTGCGGAGTGCATCTTCTTTGTATCTGACCAACACAGCCTCTACC CACTGCTTGAAGCCACCGACAACGATGACATCTATGGGGCTGCCTGGATCGGCATATTTG TGGGCATCTGCCTCTTCTGCCTGTCTGTTCTAGGCATTGTAGGCATCATGAAGTCCAGCA GGAAAATTCTTCTGGCGTATTTCATTCTGATGTTTATAGTATATGCCTTTGAAGTGGCAT CTTGTATCACAGCAGCAACACAACAAGACTTTTTCACACCCAACCTCTTCCTGAAGCAGA TGCTAGAGAGGTACCAAAACAACAGCCCTCCAAACAATGATGACCAGTGGAAAAACAATG GAGTCACCAAAACCTGGGACAGGCTCATGCTCCAGGACAATTGCTGTGGCGTAAATGGTC CATCAGACTGGCAAAAATACACATCTGCCTTCCGGACTGAGAATAATGATGCTGACTATC CCTGGCCTCGTCAATGCTGTGTTATGAACAATCTTAAAGAACCTCTCAACCTGGAGGCTT 22
  • 23. Assessing Alignment Significance • Generate random alignments and calculate their scores • Compute the mean and the standard deviation (SD) for random scores • Compute the deviation of the actual score from the mean of random scores Z = (meanX)/SD • Evaluate the significance of the alignment • The probability of a Z value is called the E score 23
  • 24. E scores or E values E scores are not equivalent to p values where p < 0.05 are generally considered statistically significant. 24
  • 25. E values (rules of thumb) E values below 10-6 are most probably statistically significant. E values above 10-6 but below 10-3 deserve a second look. E values above 10-3 should not be tossed aside lightly; they should be thrown out with great force. 25
  • 26. BLAST • Basic Local Alignment Search Tool – Altschul et al. 1990,1994,1997 • Heuristic method for local alignment • Designed specifically for database searches • Based on the same assumption as FASTA that good alignments contain short lengths of exact matches 26
  • 27. BLAST • Both BLAST and FASTA search for local sequence similarity - indeed they have exactly the same goals, though they use somewhat different algorithms and statistical approaches. • BLAST benefits – Speed – User friendly – Statistical rigor – More sensitive 27
  • 28. Input/Output • Input: – Query sequence Q – Database of sequences DB – Minimal score S • Output: – Sequences from DB (Seq), such that Q and Seq have scores > S 28
  • 29. BLAST Searches GenBank [BLAST= Basic Local Alignment Search Tool] The NCBI BLAST web server lets you compare your query sequence to various sections of GenBank: – nr = non-redundant (main sections) – month = new sequences from the past few weeks – refseq_rna – RNA entries from NCBI's Reference Sequence project – refseq_genomic – Genomic entries from NCBI's Reference Sequence project – ESTs – Taxon = e.g., human, Drososphila, yeast, E. coli – proteins (by automatic translation) – pdb = Sequences derived from the 3-dimensional structure from Brookhaven Protein Data Bank 29
  • 30. BLAST • Uses word matching like FASTA • Similarity matching of words (3 amino acids, 11 bases) – does not require identical words. • If no words are similar, then no alignment – Will not find matches for very short sequences • Does not handle gaps well • “gapped BLAST” is somewhat better 30
  • 32. BLAST Word Matching MEAAVKEEISVEDEAVDKNI MEA EAA AAV Break query AVK VKE into words: KEE EEI EIS ISV ... Break database sequences into words: 32
  • 33. Find locations of matching words in database sequences ELEPRRPRYRVPDVLVADPPIARLSVSGRDENSVELT MEAT MEA EAA TDVRWMSETGIIDVFLLLGPSISDVFRQYASLTGTQALPPLFSLGYHQSRWNY AAV IWLDIEEIHADGKRYFTWDPSRFPQPRTMLERLASKRRV KLVAIVDPH AVK KLV KEE EEI EIS ISV 33
  • 34. Extend hits one base at a time 34
  • 35. Seq_XYZ: HVTGRSAF_FSYYGYGCYCGLGTGKGLPVDATDRCCWA Query: QSVFDYIYYGCYCGWGLG_GK__PRDA E-val=10-13 •Use two word matches as anchors to build an alignment between the query and a database sequence. •Then score the alignment. 35
  • 36. HSPs are Aligned Regions • The results of the word matching and attempts to extend the alignment are segments - called HSPs (High-Scoring Segment Pairs) • BLAST often produces several short HSPs rather than a single aligned region 36
  • 37. >gb|BE588357.1|BE588357 194087 BARC 5BOV Bos taurus cDNA 5'. • Length = 369 • Score = 272 bits (137), Expect = 4e-71 • Identities = 258/297 (86%), Gaps = 1/297 (0%) • Strand = Plus / Plus • • Query: 17 aggatccaacgtcgctccagctgctcttgacgactccacagataccccgaagccatggca 76 • |||||||||||||||| | ||| | ||| || ||| | |||| ||||| ||||||||| • Sbjct: 1 aggatccaacgtcgctgcggctacccttaaccact-cgcagaccccccgcagccatggcc 59 • • Query: 77 agcaagggcttgcaggacctgaagcaacaggtggaggggaccgcccaggaagccgtgtca 136 • |||||||||||||||||||||||| | || ||||||||| | ||||||||||| ||| || • Sbjct: 60 agcaagggcttgcaggacctgaagaagcaagtggagggggcggcccaggaagcggtgaca 119 • • Query: 137 gcggccggagcggcagctcagcaagtggtggaccaggccacagaggcggggcagaaagcc 196 • |||||||| | || | ||||||||||||||| ||||||||||| || |||||||||||| • Sbjct: 120 tcggccggaacagcggttcagcaagtggtggatcaggccacagaagcagggcagaaagcc 179 • • Query: 197 atggaccagctggccaagaccacccaggaaaccatcgacaagactgctaaccaggcctct 256 • ||||||||| | |||||||| |||||||||||||||||| |||||||||||||||||||| • Sbjct: 180 atggaccaggttgccaagactacccaggaaaccatcgaccagactgctaaccaggcctct 239 • • Query: 257 gacaccttctctgggattgggaaaaaattcggcctcctgaaatgacagcagggagac 313 • || || ||||| || ||||||||||| | |||||||||||||||||| |||||||| • Sbjct: 240 gagactttctcgggttttgggaaaaaacttggcctcctgaaatgacagaagggagac 296 37
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. 42
  • 43. 43
  • 45. 45
  • 46. 46
  • 47. 47
  • 48. 48
  • 49. 49
  • 50. 50
  • 51. 51
  • 52. 52
  • 53. 53
  • 54. Choosing the right parameters 54
  • 55. 55
  • 56. 56
  • 57. 57
  • 59. 59
  • 60. 60
  • 61. 61
  • 62. 62
  • 63. More on BLAST NCBI Blast Glossary http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html Education: Blast Information http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html Steve Altschul's Blast Course http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html 63

Editor's Notes

  1. 27
  2. 29