SlideShare a Scribd company logo
1 of 61
Download to read offline
[I0D51A] Bioinformatics: High-Throughput Analysis
 Next-generation sequencing. Part 1: Technologies
Prof Jan Aerts
Faculty of Engineering - ESAT/SCD
jan.aerts@esat.kuleuven.be

TA: Alejandro Sifrim (alejandro.sifrim@esat.kuleuven.be)




                                                           1
Announcements

May 27th (9am-noon): evaluation


open book




                                  2
Note to self...

Upload s_1_sequence.txt and s_2_sequence.txt to Galaxy first...




                                                                 3
Overview

• linux refresher (6/5)


• next-generation sequencing technologies and applications (6/5)


• sequence mapping (13/5)


• variant calling - SNPs (20/5)


• variant calling - structural variation (20/5)




                                                                   4
Linux Refresher...




                     5
Next-generation sequencing technologies




                                          6
General principle




                    7
Big data...




              8
First vs second generation sequencing
Sanger sequencing (1st gen)   2nd/next gen sequencing




                                                 Shendure & Ji, 2008




                                                                       9
Paired-end sequencing




                        Korbel et al, 2007




                                             10
General approaches

• 2nd generation: clonally amplified single molecules


  • Roche 454 pyrosequencing


  • Illumina Genome Analyzer -> HiSeq: reversible terminator technology


  • ABI SOLiD: ligation-based extension


• Next-next-generation/3rd generation: true single molecule


  • Helicos: Heliscore


  • Pacific Biosciences: SMRT
                                                                          11
Mardis, 2011

               12
Steps


        genome enrichment




                    template preparation



                              sequencing and imaging



                                           data analysis




                                                           13
A. Genome enrichment




                       14
Sequencing costs




                   15
What?

Only sequence relevant parts of the genome instead of whole genome, e.g.:


• specific Mb-scale regions known to be involved in particular disease (e.g.
  based on GWAS)


• specific candidate genes belonging to disease pathway


• exome (= all exons)


 => how to isolate these from non-target sequence? “pulldown”




                                                                              16
Pulldown: on-array




                     Turner et al, 2009




                                          17
Pulldown: in-solution




                        Turner et al, 2009




                                             18
Performance metrics

• fold-enrichment: ratio of abundance of target sequences post-enrichment vs
  pre-enrichment


• capture specificity: fraction of sequence reads that map to target


• uniformity: relative abundance of individual targets after enrichment


• completeness: fraction of target bases detectably captured




                                                                           19
B. Template preparation




                          20
Problem: most imaging systems not designed to detect single fluorescent event
=> need amplified templates


Aim: to produce a representative, non-biased source of nucleic acid material
from the genome under investigation => population of identical templates


Steps:


   1. shear DNA


   2. amplify templates


 Options: emulsion PCR (emPCR) or solid phase amplification

                                                                               21
Amplification by emulsion PCR

emulsion = mixture of two or more immiscible (unblendable) liquids; e.g.
mayonnaise, vinaigrette


emPCR: thousands of microreactors/micro-eppendorfs


one bead + one DNA molecule per microreactor => PCR to 1000s of copies




                                                                           22
Williams et al, 2006




 Metzker et al, 2010


                       23
Solid-phase amplification




                                             http://bit.ly/6JYIUz




http://www.youtube.com/watch?v=77r5p8IBwJk&NR=1
                                                                    Metzker et al, 2010
                                                                                       24
C. Sequencing and imaging




                            25
Sequencing and imaging

Technologies:


1. cyclic reversible termination


2. sequencing by ligation


3. pyrosequencing


4. real-time sequencing




                                   26
Cyclic reversible termination

DNA synthesis is terminated after adding single nucleotide


start/stop/start/stop/start/stop/...

                            Illumina: 4-colour



sequencing result
                      sequencing steps




                               Metzker et al, 2010
                                                             27
Helicos: 1-colour




         sequencing steps




sequencing result




                                      Metzker et al, 2010




          Metzker et al, 2010



                                                            28
Sequencing by ligation




   http://bit.ly/fPh22X




sequencing steps




                          29
sequencing result




http://bit.ly/fPh22X




                       30
Pyrosequencing




                                  Metzker et al, 2010




            Metzker et al, 2010                         31
Real-time sequencing




                    “ZMW” zero-mode waveguide
   DNA polymerase

                                        “strobe sequencing”


                                                              32
Run time   Gb/run

Roche 454    8.5 hr     45

 Illumina    9 days     35

 SOLiD      14 days     50

 Helicos     8 days     37

 PacBio        ?         ?


                                33
Accuracy - base calling error

• base quality drops along read


        Sanger > SOLiD > Illumina > 454 > Helicos


        (“dephasing” within clusters)




• base calling errors




                                                    34
Accuracy - homopolymer runs

 Issue for Roche 454:


   39% of errors are homopolymers


      A5 motifs: 3.3% error rate


      A8 motifs: 50% error rate


   Reason: use signal intensity as a measure for homopolymer length




                                                                      35
36
Ronaghi, Genome Res 11:3-11 (2001)




                                     37
http://mammoth.psu.edu/labPhotos/imageOfFlowgram.jpg




                                                       38
Is it 4? Is it 5? Is it 4?




      http://mammoth.psu.edu/labPhotos/imageOfFlowgram.jpg




                                                             39
Consensus accuracy

Increase accuracy for SNP calling by increasing coverage:


   Illumina: 20X


   SOLiD: 12X


   454: 7.4X


   Sanger: 3X


Factors: raw accuracy + read length


How deep do you have to sequence? => Poisson distribution: “If you sequence at
average of 10X, how much of the genome will be covered at least 5X”?

                                                                                 40
Bentley et al, Nature 456:53-56 (2008)




                                         41
FASTQ file format
                                                   example fasta entries (n=2)




             “@” + identifier            example fastq entries (n=2)
               sequence
  “+” + identifier (optional)
phred-based quality scores




         phred quality score encoding




                                                                Wikipedia

                                                                                 42
Sequence quality control

Is this good sequence? (essential!)


E.g.: using FastQC tool (Babraham Institute, UK; http://
www.bioinformatics.bbsrc.ac.uk/projects/fastqc/)




                                                           43
Sequence quality control
              per base sequence quality
                    good         bad




                                          44
Sequence quality control
              per sequence quality scores
                    good         bad




                                            45
Sequence quality control
              per base sequence content
                   good         bad




                                          46
Sequence quality control
                per base GC content
                  good         bad




                                      47
Sequence quality control
               per sequence GC content
                   good        bad




                                         48
Sequence quality control
                   k-mer content
                  good       bad




                                   49
Intermezzo: Galaxy




                     50
Online genome analysis

http://galaxy.psu.edu/


“Galaxy allows you to do analyses you cannot do anywhere else without the
need to install or download anything. You can analyze multiple alignments,
compare genomic annotations, profile metagenomic samples and much much
more...”




                                                                             51
52
53
Applications of next-generation sequencing




                                             54
Kahvejian et al, 2008


                        55
DNA-seq

ChIP-seq




           RNA-seq




                        Kahvejian et al, 2008


                                                50
                                                56
identify
                                                            sequence
                                                            variations



                          DNA-seq

            ChIP-seq




                       RNA-seq

 identify
pathogens

                                    Kahvejian et al, 2008


                                                                         50
                                                                         51
                                                                         57
Exercises




            58
Try to login to the server mentioned on Toledo with username and password
provided there.



There are 2 FASTQ files in /mnt/homes/jaerts/: s_1_sequence.txt and
s_2_sequence.txt (= paired ends)



  • How many sequences are in s_1_sequence.txt?


  • What encoding was used for the quality score? Illumina? Sanger?


  • What are the numerical quality scores for the first sequence in
    s_1_sequence.txt (i.e. 7172283/1)?




                                                                            59
• Create an account on the Galaxy server



• Download s_1_sequence.txt and s_2_sequence.txt from Toledo and upload
  them into Galaxy. These files are also available on the linux server



• Have a look at the contents of s_1_sequence.txt.



• Convert quality scores to numeric values for s_1_sequence.txt (“FASTQ
  Groomer”)



• Draw the quality score boxplot for s_1_sequence.txt



• Draw the nucleotide distribution chart for s_1_sequence.txt

                                                                          60
References

Bentley DR et al. Accurate whole human genome sequencing using reversible
terminator chemistry. Nature 456: 53-59 (2008)
Kahvejian A, Quackenbush J & Thompson JF. What would you do if you could
sequence everything? Nature Biotechnology 26: 1125-1133 (2008)
Korbel JO et al. Paired-end mapping reveals extensive structural variation in the
human genome. Science 318: 420-426 (2007)
Mardis ER. A decade’s perspective on DNA sequencing technology. Nature
470: 198-203 (2011)
Metzker ML. Sequencing technologies - the next generation. Nature Reviews
Genetics 11:31-46 (2010)
Shendure J & Ji H. Next-generation DNA sequencing. Nature Biotechnology
26:1135-1145 (2008)
Turner EH et al. Methods for genomic partitioning. Annual Review of Genomics
and Human Genetics 10 (2009)

                                                                                61

More Related Content

What's hot

Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approachHong ChangBum
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiomejukais
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Mrinal Vashisth
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 
Correlagen next gen presentation 042711
Correlagen next gen presentation 042711Correlagen next gen presentation 042711
Correlagen next gen presentation 042711algunduz28
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods Mrinal Vashisth
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Productsbiochain
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingSajad Rafatiyan
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewPaolo Dametto
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataAdrian Baez-Ortega
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingTapish Goel
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencingPeter Egorov
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 
Next Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisNext Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisBastian Greshake
 

What's hot (20)

Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
Correlagen next gen presentation 042711
Correlagen next gen presentation 042711Correlagen next gen presentation 042711
Correlagen next gen presentation 042711
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
BioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing ProductsBioChain Next Generation Sequencing Products
BioChain Next Generation Sequencing Products
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
New Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overviewNew Generation Sequencing Technologies: an overview
New Generation Sequencing Technologies: an overview
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
next generation sequencing
next generation sequencingnext generation sequencing
next generation sequencing
 
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 
Next Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome AnalysisNext Generation Sequencing & Transcriptome Analysis
Next Generation Sequencing & Transcriptome Analysis
 

Viewers also liked

Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingDayananda Salam
 
Next generation sequencing course - part 2: sequence mapping
Next generation sequencing course - part 2: sequence mappingNext generation sequencing course - part 2: sequence mapping
Next generation sequencing course - part 2: sequence mappingJan Aerts
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data SolutionsSurya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing DataSurya Saha
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Surya Saha
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomicssonam786
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biologylemberger
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applicationsAGRF_Ltd
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 

Viewers also liked (17)

Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Ngs intro_v6_public
 Ngs intro_v6_public Ngs intro_v6_public
Ngs intro_v6_public
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 
Next generation sequencing course - part 2: sequence mapping
Next generation sequencing course - part 2: sequence mappingNext generation sequencing course - part 2: sequence mapping
Next generation sequencing course - part 2: sequence mapping
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data Solutions
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
ChIP-seq - Data processing
ChIP-seq - Data processingChIP-seq - Data processing
ChIP-seq - Data processing
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biology
 
High throughput sequencing
High throughput sequencingHigh throughput sequencing
High throughput sequencing
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
NGS technologies - platforms and applications
NGS technologies - platforms and applicationsNGS technologies - platforms and applications
NGS technologies - platforms and applications
 
Ngs ppt
Ngs pptNgs ppt
Ngs ppt
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
Clinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation SequencingClinical Applications of Next Generation Sequencing
Clinical Applications of Next Generation Sequencing
 

Similar to Next-generation sequencing course, part 1: technologies

New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...Eastern Pennsylvania Branch ASM
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsPawan Kumar
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification Senthil Natesan
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Generations of sequencing technologies.
Generations of sequencing technologies. Generations of sequencing technologies.
Generations of sequencing technologies. ShadenAlharbi
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdfKristen DeAngelis
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansGenomeInABottle
 
Gene sequencing technique
Gene sequencing techniqueGene sequencing technique
Gene sequencing techniqueDarshan Patel
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencingARUNDHATI MEHTA
 
nextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfnextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfAkhileshPathak33
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
DNA SEQUENCING (1).pptx
DNA SEQUENCING (1).pptxDNA SEQUENCING (1).pptx
DNA SEQUENCING (1).pptxDeenaRahul
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Thermo Fisher Scientific
 
2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekingeProf. Wim Van Criekinge
 
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingEVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingJonathan Eisen
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsAjit Shinde
 

Similar to Next-generation sequencing course, part 1: technologies (20)

New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification DNA Markers Techniques for Plant Varietal Identification
DNA Markers Techniques for Plant Varietal Identification
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Generations of sequencing technologies.
Generations of sequencing technologies. Generations of sequencing technologies.
Generations of sequencing technologies.
 
03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf03_Microbio590B_sequencing_2022.pdf
03_Microbio590B_sequencing_2022.pdf
 
Aug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plansAug2014 abrf interlaboratory study plans
Aug2014 abrf interlaboratory study plans
 
Gene sequencing technique
Gene sequencing techniqueGene sequencing technique
Gene sequencing technique
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
nextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdfnextgenerationsequencing-170606100132.pdf
nextgenerationsequencing-170606100132.pdf
 
NGS.pptx
NGS.pptxNGS.pptx
NGS.pptx
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
12 arrays
12 arrays12 arrays
12 arrays
 
12 arrays
12 arrays12 arrays
12 arrays
 
Hamas 1
Hamas 1Hamas 1
Hamas 1
 
DNA SEQUENCING (1).pptx
DNA SEQUENCING (1).pptxDNA SEQUENCING (1).pptx
DNA SEQUENCING (1).pptx
 
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
Massively Parallel Sequencing - integrating the Ion PGM™ sequencer into your ...
 
2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge2014 09 30_t1_bioinformatics_wim_vancriekinge
2014 09 30_t1_bioinformatics_wim_vancriekinge
 
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA SequencingEVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
EVE161: Microbial Phylogenomics - Class 2 - Evolution of DNA Sequencing
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 

More from Jan Aerts

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationJan Aerts
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Jan Aerts
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Jan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data AnalysisJan Aerts
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsJan Aerts
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...Jan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumJan Aerts
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisJan Aerts
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...Jan Aerts
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...Jan Aerts
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...Jan Aerts
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsJan Aerts
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 

More from Jan Aerts (20)

VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Humanizing Data Analysis
Humanizing Data AnalysisHumanizing Data Analysis
Humanizing Data Analysis
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
 
A Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining componentsA Kalderimis - InterMine: Embeddable datamining components
A Kalderimis - InterMine: Embeddable datamining components
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 

Recently uploaded

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 

Recently uploaded (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 

Next-generation sequencing course, part 1: technologies

  • 1. [I0D51A] Bioinformatics: High-Throughput Analysis Next-generation sequencing. Part 1: Technologies Prof Jan Aerts Faculty of Engineering - ESAT/SCD jan.aerts@esat.kuleuven.be TA: Alejandro Sifrim (alejandro.sifrim@esat.kuleuven.be) 1
  • 2. Announcements May 27th (9am-noon): evaluation open book 2
  • 3. Note to self... Upload s_1_sequence.txt and s_2_sequence.txt to Galaxy first... 3
  • 4. Overview • linux refresher (6/5) • next-generation sequencing technologies and applications (6/5) • sequence mapping (13/5) • variant calling - SNPs (20/5) • variant calling - structural variation (20/5) 4
  • 9. First vs second generation sequencing Sanger sequencing (1st gen) 2nd/next gen sequencing Shendure & Ji, 2008 9
  • 10. Paired-end sequencing Korbel et al, 2007 10
  • 11. General approaches • 2nd generation: clonally amplified single molecules • Roche 454 pyrosequencing • Illumina Genome Analyzer -> HiSeq: reversible terminator technology • ABI SOLiD: ligation-based extension • Next-next-generation/3rd generation: true single molecule • Helicos: Heliscore • Pacific Biosciences: SMRT 11
  • 13. Steps genome enrichment template preparation sequencing and imaging data analysis 13
  • 16. What? Only sequence relevant parts of the genome instead of whole genome, e.g.: • specific Mb-scale regions known to be involved in particular disease (e.g. based on GWAS) • specific candidate genes belonging to disease pathway • exome (= all exons) => how to isolate these from non-target sequence? “pulldown” 16
  • 17. Pulldown: on-array Turner et al, 2009 17
  • 18. Pulldown: in-solution Turner et al, 2009 18
  • 19. Performance metrics • fold-enrichment: ratio of abundance of target sequences post-enrichment vs pre-enrichment • capture specificity: fraction of sequence reads that map to target • uniformity: relative abundance of individual targets after enrichment • completeness: fraction of target bases detectably captured 19
  • 21. Problem: most imaging systems not designed to detect single fluorescent event => need amplified templates Aim: to produce a representative, non-biased source of nucleic acid material from the genome under investigation => population of identical templates Steps: 1. shear DNA 2. amplify templates Options: emulsion PCR (emPCR) or solid phase amplification 21
  • 22. Amplification by emulsion PCR emulsion = mixture of two or more immiscible (unblendable) liquids; e.g. mayonnaise, vinaigrette emPCR: thousands of microreactors/micro-eppendorfs one bead + one DNA molecule per microreactor => PCR to 1000s of copies 22
  • 23. Williams et al, 2006 Metzker et al, 2010 23
  • 24. Solid-phase amplification http://bit.ly/6JYIUz http://www.youtube.com/watch?v=77r5p8IBwJk&NR=1 Metzker et al, 2010 24
  • 25. C. Sequencing and imaging 25
  • 26. Sequencing and imaging Technologies: 1. cyclic reversible termination 2. sequencing by ligation 3. pyrosequencing 4. real-time sequencing 26
  • 27. Cyclic reversible termination DNA synthesis is terminated after adding single nucleotide start/stop/start/stop/start/stop/... Illumina: 4-colour sequencing result sequencing steps Metzker et al, 2010 27
  • 28. Helicos: 1-colour sequencing steps sequencing result Metzker et al, 2010 Metzker et al, 2010 28
  • 29. Sequencing by ligation http://bit.ly/fPh22X sequencing steps 29
  • 31. Pyrosequencing Metzker et al, 2010 Metzker et al, 2010 31
  • 32. Real-time sequencing “ZMW” zero-mode waveguide DNA polymerase “strobe sequencing” 32
  • 33. Run time Gb/run Roche 454 8.5 hr 45 Illumina 9 days 35 SOLiD 14 days 50 Helicos 8 days 37 PacBio ? ? 33
  • 34. Accuracy - base calling error • base quality drops along read Sanger > SOLiD > Illumina > 454 > Helicos (“dephasing” within clusters) • base calling errors 34
  • 35. Accuracy - homopolymer runs Issue for Roche 454: 39% of errors are homopolymers A5 motifs: 3.3% error rate A8 motifs: 50% error rate Reason: use signal intensity as a measure for homopolymer length 35
  • 36. 36
  • 37. Ronaghi, Genome Res 11:3-11 (2001) 37
  • 39. Is it 4? Is it 5? Is it 4? http://mammoth.psu.edu/labPhotos/imageOfFlowgram.jpg 39
  • 40. Consensus accuracy Increase accuracy for SNP calling by increasing coverage: Illumina: 20X SOLiD: 12X 454: 7.4X Sanger: 3X Factors: raw accuracy + read length How deep do you have to sequence? => Poisson distribution: “If you sequence at average of 10X, how much of the genome will be covered at least 5X”? 40
  • 41. Bentley et al, Nature 456:53-56 (2008) 41
  • 42. FASTQ file format example fasta entries (n=2) “@” + identifier example fastq entries (n=2) sequence “+” + identifier (optional) phred-based quality scores phred quality score encoding Wikipedia 42
  • 43. Sequence quality control Is this good sequence? (essential!) E.g.: using FastQC tool (Babraham Institute, UK; http:// www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) 43
  • 44. Sequence quality control per base sequence quality good bad 44
  • 45. Sequence quality control per sequence quality scores good bad 45
  • 46. Sequence quality control per base sequence content good bad 46
  • 47. Sequence quality control per base GC content good bad 47
  • 48. Sequence quality control per sequence GC content good bad 48
  • 49. Sequence quality control k-mer content good bad 49
  • 51. Online genome analysis http://galaxy.psu.edu/ “Galaxy allows you to do analyses you cannot do anywhere else without the need to install or download anything. You can analyze multiple alignments, compare genomic annotations, profile metagenomic samples and much much more...” 51
  • 52. 52
  • 53. 53
  • 55. Kahvejian et al, 2008 55
  • 56. DNA-seq ChIP-seq RNA-seq Kahvejian et al, 2008 50 56
  • 57. identify sequence variations DNA-seq ChIP-seq RNA-seq identify pathogens Kahvejian et al, 2008 50 51 57
  • 58. Exercises 58
  • 59. Try to login to the server mentioned on Toledo with username and password provided there. There are 2 FASTQ files in /mnt/homes/jaerts/: s_1_sequence.txt and s_2_sequence.txt (= paired ends) • How many sequences are in s_1_sequence.txt? • What encoding was used for the quality score? Illumina? Sanger? • What are the numerical quality scores for the first sequence in s_1_sequence.txt (i.e. 7172283/1)? 59
  • 60. • Create an account on the Galaxy server • Download s_1_sequence.txt and s_2_sequence.txt from Toledo and upload them into Galaxy. These files are also available on the linux server • Have a look at the contents of s_1_sequence.txt. • Convert quality scores to numeric values for s_1_sequence.txt (“FASTQ Groomer”) • Draw the quality score boxplot for s_1_sequence.txt • Draw the nucleotide distribution chart for s_1_sequence.txt 60
  • 61. References Bentley DR et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456: 53-59 (2008) Kahvejian A, Quackenbush J & Thompson JF. What would you do if you could sequence everything? Nature Biotechnology 26: 1125-1133 (2008) Korbel JO et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318: 420-426 (2007) Mardis ER. A decade’s perspective on DNA sequencing technology. Nature 470: 198-203 (2011) Metzker ML. Sequencing technologies - the next generation. Nature Reviews Genetics 11:31-46 (2010) Shendure J & Ji H. Next-generation DNA sequencing. Nature Biotechnology 26:1135-1145 (2008) Turner EH et al. Methods for genomic partitioning. Annual Review of Genomics and Human Genetics 10 (2009) 61