SlideShare a Scribd company logo
1 of 15
The Smith-Waterman algorithm

                    Dr Avril Coghlan
                   alc@sanger.ac.uk

Note: this talk contains animations which can only be seen by
downloading and using ‘View Slide show’ in Powerpoint
Global versus Local Alignment
• A global alignment covers the entire lengths of the
  sequences involved
  The Needleman-Wunsch algorithm finds the best global alignment
  between 2 sequences
• A local alignment only covers parts of the sequences
  The Smith-Waterman algorithm finds the best local   alignment
  between 2 sequences


  Global alignment       Q K E S G P S S S Y C
                         |   | | |           |
                       V Q Q E S G L V R T T C
  Local alignment              E S G
                               | | |
                               E S G
Local alignment
• The concept of ‘local alignment’ was introduced by
  Smith & Waterman in 1981
• A local alignment of 2 sequences is an alignment
  between parts of the 2 sequences
  Two proteins may one share one stretch of high sequence
  similarity,      but be very dissimilar outside that region
  A global (N-W) alignment of such sequences would have:
   (i) lots of matches in the region of high sequence similarity
  (ii) lots of mismatches & gaps (insertions/deletions) outside the region
          of similarity
  It makes sense to find the best local alignment instead
Real data: fruitfly & human Eyeless
                    • This is a global
                      alignment of human
                      & fruitfly Eyeless

                     Do you think it’s
                     sensible to make a
                     global alignment of
                     these two sequences?
Real data: fruitfly & human Eyeless
                     There are 2 short
                     regions of high
                     similarity

                     Outside those regions,
                     there are many
                     mismatches and gaps

                     It might be more
                     sensible to make local
                     alignments of one or
                     both of the regions of
                     high similarity
Real data: fruitfly & human Eyeless
                     • This is a local
                       alignment of human
                       & fruitfly Eyeless

                       What parts of the
                       sequences were
                       used in the local
                       alignment?
The Smith-Waterman algorithm
• S-W is mathematically proven to find the best
  (highest-scoring) local alignment of 2 sequences
  The best local alignment is the best alignment of all possible
  subsequences (parts) of sequences S1 and S2
  The 0th row and 0th column of T are first filled with zeroes
  The recurrence relation used to fill table T is:
                 T(i-1, j-1) + σ(S1(i), S2(j))
  T(i, j) = max  T(i-1, j) + gap penalty
                 T(i, j-1) + gap penalty                A 4th possibility (unlike
                 0                                      N-W)
  The traceback starts at the highest scoring cell in the matrix T, and travels
  up/left while the score is still positive
  (While in N-W, traceback starts at the bottom right, & ends at the top
        left, which ensures it’s a global alignment)
• eg., to find the best local alignment of sequences
  “ACCTAAGG” and “GGCTCAATCA”, using +2 for a
  match, -1 for a mismatch, and -2 for a gap:
  We first make matrix T (as in N-W):
  The 0th row and 0th column of T are filled with zeroes
  The recurrence relation is then used to fill the matrix T
                     G   G   C   T   C   A   A   T   C   A
                0    0   0   0   0   0   0   0   0   0   0
            A   0
            C   0
            C   0
            T   0
            A   0
            A   0
            G   0
            G   0
We first calculate T(1,1) using the recurrence relation:
           T(i-1, j-1) + σ(S1(i), S2(j)) = 0 – 1 = -1
    T(i, j) = max       T(i-1, j) + gap penalty = 0 -2 = -2
     T(i, j-1) + gap penalty = 0 -2 = -2
     0
    The maximum value is 0, so we set T(1,1) to 0
        G   G   C   T   C    A   A   T   C   A
    0   0   0   0   0   0    0   0   0   0   0
                                                 We next calculate T(2,1)…
A   0   0
        ?   ?
C   0
C   0
T   0
A   0
A   0
G   0
G   0
You fill in the whole of T, recording the previous cell (if any)   used
to calculate the value of each T(i, j):
                 G
                 G   G
                     G   C
                         C    T
                              T   C
                                  C   A
                                      A   A
                                          A   T
                                              T   C
                                                  C    A
                                                       A
             0   0   0   0    0   0   0   0   0   0    0
         A   0   0   0   0    0   0   2   2   0   0    2

         C   0   0   0   2    0   2   0   1   1   2    0
         C   0   0   0   2    1   2   1   0   0   3    1
         T   0   0   0   0    4   2   1   0   2   1    2
         A
         A   0   0   0   0    2   3   4   3   1   1    3
         A
         A   0   0   0   0    0   1   5   6   4   2    3
         G
         G   0   2   2   0    0   0   3   4   5   3    1
         G
         G   0   2   4   2    0   0   1   2   3   4    2
G   G   C   T   C   A   A   T   C   A
             0   0   0   0   0   0   0   0   0   0   0
         A   0   0   0   0   0   0   2   2   0   0   2
         C   0   0   0   2   0   2   0   1   1   2   0
         C   0   0   0   2   1   2   1   0   0   3   1
         T   0   0   0   0   4   2   1   0   2   1   2
         A   0   0   0   0   2   3   4   3   1   1   3
         A   0   0   0   0   0   1   5   6   4   2   3
         G   0   2   2   0   0   0   3   4   5   3   1
         G   0   2   4   2   0   0   1   2   3   4   2

You work out the best local alignment from the traceback (just like in N-
W):                          C T C A A
                             | |    | |
                             C T - A A
Software for making alignments
• For Smith-Waterman pairwise alignment
  pairwiseAlignment() in the “Biostrings” R library
  the EMBOSS (emboss.sourceforge.net/) water program
Problem
• Find the best local alignment between
  “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2
  for a mismatch, and -2 for a gap.
Answer
• Find the best local alignment between
  “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2
  for a mismatch, and -2 for a gap
  Matrix T looks like this, with the pink traceback:
           T   C   A   G   T   T   G   C   C
       0   0   0   0   0   0   0   0   0   0
   A   0   0   0   1   0   0   0   0   0   0
                                                       Alignment:

   G   0   0   0   0   2   0   0   1   0   0
                                                       G T T G
   G   0   0   0   0   1   0   0   1   0   0           | | | |
   T   0   1   0   0   0   2   1   0   0   0           G T T G

   T   0   1   0   0   0   1   3   1   0   0      (Pink traceback)

   G   0   0   0   0   1   0   1   4   2   0
Further Reading
•   Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn
•   Chapter 6 in Deonier et al Computational Genome Analysis
•   Practical on pairwise alignment in R in the Little Book of R for
    Bioinformatics:
    https://a-little-book-of-r-for-
    bioinformatics.readthedocs.org/en/latest/src/chapter4.html

More Related Content

What's hot (20)

sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Blast
BlastBlast
Blast
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Blast Algorithm
Blast AlgorithmBlast Algorithm
Blast Algorithm
 
Fasta
FastaFasta
Fasta
 
BLAST
BLASTBLAST
BLAST
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted Mutation
 
Cath
CathCath
Cath
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Clustal
ClustalClustal
Clustal
 

Similar to The Smith Waterman algorithm

Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...AIST
 
A new six point finite difference scheme for nonlinear waves interaction model
A new six point finite difference scheme for nonlinear waves interaction modelA new six point finite difference scheme for nonlinear waves interaction model
A new six point finite difference scheme for nonlinear waves interaction modelAlexander Decker
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataTony Fast
 
Epidemic processes on switching networks
Epidemic processes on switching networksEpidemic processes on switching networks
Epidemic processes on switching networksNaoki Masuda
 
A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...Alexander Decker
 
Estimating ecosystem functional features from intra-specific trait data
Estimating ecosystem functional features from intra-specific trait dataEstimating ecosystem functional features from intra-specific trait data
Estimating ecosystem functional features from intra-specific trait dataTano Gutiérrez Cánovas
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
Robust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsRobust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsCemal Ardil
 
Controllability of Linear Dynamical System
Controllability of  Linear Dynamical SystemControllability of  Linear Dynamical System
Controllability of Linear Dynamical SystemPurnima Pandit
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)Amro Elfeki
 
Hierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matricesHierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matricesAlexander Litvinenko
 

Similar to The Smith Waterman algorithm (20)

D028036046
D028036046D028036046
D028036046
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
Valeri Labunets - Fast multiparametric wavelet transforms and packets for ima...
 
Asymptotic Analysis.ppt
Asymptotic Analysis.pptAsymptotic Analysis.ppt
Asymptotic Analysis.ppt
 
A new six point finite difference scheme for nonlinear waves interaction model
A new six point finite difference scheme for nonlinear waves interaction modelA new six point finite difference scheme for nonlinear waves interaction model
A new six point finite difference scheme for nonlinear waves interaction model
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
17330361.ppt
17330361.ppt17330361.ppt
17330361.ppt
 
Lecture 23 loop transfer function
Lecture 23 loop transfer functionLecture 23 loop transfer function
Lecture 23 loop transfer function
 
Spatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud dataSpatially resolved pair correlation functions for point cloud data
Spatially resolved pair correlation functions for point cloud data
 
Epidemic processes on switching networks
Epidemic processes on switching networksEpidemic processes on switching networks
Epidemic processes on switching networks
 
A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...A common unique random fixed point theorem in hilbert space using integral ty...
A common unique random fixed point theorem in hilbert space using integral ty...
 
Estimating ecosystem functional features from intra-specific trait data
Estimating ecosystem functional features from intra-specific trait dataEstimating ecosystem functional features from intra-specific trait data
Estimating ecosystem functional features from intra-specific trait data
 
E023048063
E023048063E023048063
E023048063
 
E023048063
E023048063E023048063
E023048063
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Robust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systemsRobust fuzzy-observer-design-for-nonlinear-systems
Robust fuzzy-observer-design-for-nonlinear-systems
 
Controllability of Linear Dynamical System
Controllability of  Linear Dynamical SystemControllability of  Linear Dynamical System
Controllability of Linear Dynamical System
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)
 
Hierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matricesHierarchical matrix approximation of large covariance matrices
Hierarchical matrix approximation of large covariance matrices
 

More from avrilcoghlan

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club avrilcoghlan
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomesavrilcoghlan
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignmentsavrilcoghlan
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignmentavrilcoghlan
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functionsavrilcoghlan
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithmavrilcoghlan
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignmentavrilcoghlan
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformaticsavrilcoghlan
 

More from avrilcoghlan (11)

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Homology
HomologyHomology
Homology
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
 
BLAST
BLASTBLAST
BLAST
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
 

Recently uploaded

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 

Recently uploaded (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 

The Smith Waterman algorithm

  • 1. The Smith-Waterman algorithm Dr Avril Coghlan alc@sanger.ac.uk Note: this talk contains animations which can only be seen by downloading and using ‘View Slide show’ in Powerpoint
  • 2. Global versus Local Alignment • A global alignment covers the entire lengths of the sequences involved The Needleman-Wunsch algorithm finds the best global alignment between 2 sequences • A local alignment only covers parts of the sequences The Smith-Waterman algorithm finds the best local alignment between 2 sequences Global alignment Q K E S G P S S S Y C | | | | | V Q Q E S G L V R T T C Local alignment E S G | | | E S G
  • 3. Local alignment • The concept of ‘local alignment’ was introduced by Smith & Waterman in 1981 • A local alignment of 2 sequences is an alignment between parts of the 2 sequences Two proteins may one share one stretch of high sequence similarity, but be very dissimilar outside that region A global (N-W) alignment of such sequences would have: (i) lots of matches in the region of high sequence similarity (ii) lots of mismatches & gaps (insertions/deletions) outside the region of similarity It makes sense to find the best local alignment instead
  • 4. Real data: fruitfly & human Eyeless • This is a global alignment of human & fruitfly Eyeless Do you think it’s sensible to make a global alignment of these two sequences?
  • 5. Real data: fruitfly & human Eyeless There are 2 short regions of high similarity Outside those regions, there are many mismatches and gaps It might be more sensible to make local alignments of one or both of the regions of high similarity
  • 6. Real data: fruitfly & human Eyeless • This is a local alignment of human & fruitfly Eyeless What parts of the sequences were used in the local alignment?
  • 7. The Smith-Waterman algorithm • S-W is mathematically proven to find the best (highest-scoring) local alignment of 2 sequences The best local alignment is the best alignment of all possible subsequences (parts) of sequences S1 and S2 The 0th row and 0th column of T are first filled with zeroes The recurrence relation used to fill table T is: T(i-1, j-1) + σ(S1(i), S2(j)) T(i, j) = max T(i-1, j) + gap penalty T(i, j-1) + gap penalty A 4th possibility (unlike 0 N-W) The traceback starts at the highest scoring cell in the matrix T, and travels up/left while the score is still positive (While in N-W, traceback starts at the bottom right, & ends at the top left, which ensures it’s a global alignment)
  • 8. • eg., to find the best local alignment of sequences “ACCTAAGG” and “GGCTCAATCA”, using +2 for a match, -1 for a mismatch, and -2 for a gap: We first make matrix T (as in N-W): The 0th row and 0th column of T are filled with zeroes The recurrence relation is then used to fill the matrix T G G C T C A A T C A 0 0 0 0 0 0 0 0 0 0 0 A 0 C 0 C 0 T 0 A 0 A 0 G 0 G 0
  • 9. We first calculate T(1,1) using the recurrence relation: T(i-1, j-1) + σ(S1(i), S2(j)) = 0 – 1 = -1 T(i, j) = max T(i-1, j) + gap penalty = 0 -2 = -2 T(i, j-1) + gap penalty = 0 -2 = -2 0 The maximum value is 0, so we set T(1,1) to 0 G G C T C A A T C A 0 0 0 0 0 0 0 0 0 0 0 We next calculate T(2,1)… A 0 0 ? ? C 0 C 0 T 0 A 0 A 0 G 0 G 0
  • 10. You fill in the whole of T, recording the previous cell (if any) used to calculate the value of each T(i, j): G G G G C C T T C C A A A A T T C C A A 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 2 2 0 0 2 C 0 0 0 2 0 2 0 1 1 2 0 C 0 0 0 2 1 2 1 0 0 3 1 T 0 0 0 0 4 2 1 0 2 1 2 A A 0 0 0 0 2 3 4 3 1 1 3 A A 0 0 0 0 0 1 5 6 4 2 3 G G 0 2 2 0 0 0 3 4 5 3 1 G G 0 2 4 2 0 0 1 2 3 4 2
  • 11. G G C T C A A T C A 0 0 0 0 0 0 0 0 0 0 0 A 0 0 0 0 0 0 2 2 0 0 2 C 0 0 0 2 0 2 0 1 1 2 0 C 0 0 0 2 1 2 1 0 0 3 1 T 0 0 0 0 4 2 1 0 2 1 2 A 0 0 0 0 2 3 4 3 1 1 3 A 0 0 0 0 0 1 5 6 4 2 3 G 0 2 2 0 0 0 3 4 5 3 1 G 0 2 4 2 0 0 1 2 3 4 2 You work out the best local alignment from the traceback (just like in N- W): C T C A A | | | | C T - A A
  • 12. Software for making alignments • For Smith-Waterman pairwise alignment pairwiseAlignment() in the “Biostrings” R library the EMBOSS (emboss.sourceforge.net/) water program
  • 13. Problem • Find the best local alignment between “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2 for a mismatch, and -2 for a gap.
  • 14. Answer • Find the best local alignment between “TCAGTTGCC” & “AGGTTG”, with +1 for a match, -2 for a mismatch, and -2 for a gap Matrix T looks like this, with the pink traceback: T C A G T T G C C 0 0 0 0 0 0 0 0 0 0 A 0 0 0 1 0 0 0 0 0 0 Alignment: G 0 0 0 0 2 0 0 1 0 0 G T T G G 0 0 0 0 1 0 0 1 0 0 | | | | T 0 1 0 0 0 2 1 0 0 0 G T T G T 0 1 0 0 0 1 3 1 0 0 (Pink traceback) G 0 0 0 0 1 0 1 4 2 0
  • 15. Further Reading • Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn • Chapter 6 in Deonier et al Computational Genome Analysis • Practical on pairwise alignment in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html

Editor's Notes

  1. Image credit (Temple Smith): http://www.modulargenetics.com/Temple%20Smith.jpg Image credit (Michael Waterman): http://www.iscb.org/cms_addon/conferences/ismb2003/images/watterman.jpg
  2. Made alignment of human.fa and fly.fa using Needleman-wunsch with default parameters at: http://emboss.bioinformatics.nl/cgi-bin/emboss/needle (EMBOSS needle) Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Viewed in jalview, and saved as humanfly_needlemanwunsch.png
  3. Made alignment of human.fa and fly.fa using Smith-Waterman with default parameters at: http://emboss.bioinformatics.nl/cgi-bin/emboss/water (EMBOSS) Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Viewed in jalview, and saved as humanfly_smithwaterman.png
  4. In R: >library("Biostrings") >seq1 <- "GGCTCAATCA" >seq2 <- "ACCTAAGG" >sigma <- nucleotideSubstitutionMatrix(match = 2, mismatch = -1, baseOnly = TRUE) >pairwiseAlignment(seq1, seq2, substitutionMatrix = sigma, gapOpening = 0, gapExtension = -2, scoreOnly = FALSE,type="local") dFixedSubject (1 of 1) pattern: [3] CTCAA subject: [3] CT-AA score: 6 Also: >source("C:/Documents and Settings/Avril Coughlan/My Documents/Rfunctions.R") >dnasmithwaterman(seq1,seq2,gapopen=0,gapextend=-2,mymatch=2,mymismatch=-1) [1] "maxT= 6" NA G G C T C A A T C A NA NA NA NA NA NA NA NA NA NA NA NA A NA "0 +" "0 +" "0 +" "0 +" "0 +" "2 >" "2 >" "0 -" "0 +" "2 >" C NA "0 +" "0 +" "2 >" "0 -" "2 >" "0 L" "1 >" "1 >" "2 >" "0 L" C NA "0 +" "0 +" "2 >" "1 >" "2 >" "1 >" "0 +" "0 >" "3 >" "1 Z" T NA "0 +" "0 +" "0 |" "4 >" "2 -" "1 >" "0 >" "2 >" "1 |" "2 >" A NA "0 +" "0 +" "0 +" "2 |" "3 >" "4 >" "3 >" "1 -" "1 >" "3 >" A NA "0 +" "0 +" "0 +" "0 |" "1 V" "5 >" "6 >" "4 -" "2 -" "3 >" G NA "2 >" "2 >" "0 -" "0 +" "0 +" "3 |" "4 V" "5 >" "3 Z" "1 *" G NA "2 >" "4 >" "2 -" "0 -" "0 +" "1 |" "2 V" "3 V" "4 >" "2 Z“ NOTE: there seems to be a mistake in the Deonier book for this example on page 157 of Deonier – it has “... 2 3 4 3 2 1 3” on one row, but should have “ ... 2 3 4 3 1 1 3” on that row (row i =5).
  5. In R: >library("Biostrings") >seq1 <- " TCAGTTGCC " >seq2 <- " AGGTTG " >sigma <- nucleotideSubstitutionMatrix(match = 1, mismatch = -2, baseOnly = TRUE) >pairwiseAlignment(seq1, seq2, substitutionMatrix = sigma, gapOpening = 0, gapExtension = -2, scoreOnly = FALSE,type="local") Local PairwiseAlignedFixedSubject (1 of 1) pattern: [4] GTTG subject: [3] GTTG score: 4 Also: >source("C:/Documents and Settings/Avril Coughlan/My Documents/Rfunctions.R") >dnasmithwaterman(seq1,seq2,gapopen=0,gapextend=-2,mymatch=1,mymismatch=-2) [1] "maxT= 4" NA T C A G T T G C C NA NA NA NA NA NA NA NA NA NA NA A NA "0 +" "0 +" "1 >" "0 +" "0 +" "0 +" "0 +" "0 +" "0 +" G NA "0 +" "0 +" "0 +" "2 >" "0 -" "0 +" "1 >" "0 +" "0 +" G NA "0 +" "0 +" "0 +" "1 >" "0 >" "0 +" "1 >" "0 +" "0 +" T NA "1 >" "0 +" "0 +" "0 +" "2 >" "1 >" "0 +" "0 +" "0 +" T NA "1 >" "0 +" "0 +" "0 +" "1 >" "3 >" "1 -" "0 +" "0 +" G NA "0 +" "0 +" "0 +" "1 >" "0 +" "1 |" "4 >" "2 -" "0 -"