SlideShare a Scribd company logo
1 of 30
Download to read offline
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 1
Long-read: assets and challenges of a
(not so) emerging technology
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 2
Summary
1. Second generation sequencing
2. Long-read technology
3. Error rates & error correction
4. Alternative splicing & isoforms
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 3
Second generation sequencing
● Sequencing by synthesis
➢ Cyclic reversible termination (Illumina)
●
GeneReader (Qiagen)
➢ Single-nucleotide addition
●
454 pyrosequencing (Roche)
●
IonTorrent (ThermoFisher)
● Sequencing by ligation
➢ SOLiD (ThermoFisher)
➢ Complete Genomics (Beijing Genomics Institute)
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 4
Second generation sequencing
Sequencing by synthesis (Illumina)
(a) Goodwin et al., 2016
(b) Metzker, 2010
(a) (b)
1. Amplification of fragments
2. Addition of four types of reversible terminator bases
3. Removal of non-incorporated nucleotides
4. Imaging of the fluorescently labeled nucleotides
5. Removal of dye and terminal 3' blocker
6. New cycle
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 5
Second generation sequencing
Sequencing by synthesis (454 pyrosequencing)
(a) (b)
1. Amplification PCR emulsion
2. Beads are deposited in wells
3. Addition of a single type of dNTP
4. Emissionof pyrophosphate if dNTP is incorporated
5. Production of luciferase
6. Light is detected and recorded in a flowgram
7. New cycle with another dNTP
(a) Goodwin et al., 2016
(b) Metzker, 2010
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 6
Second generation sequencing
Sequencing by ligation (SOLiD)
(a) Metzker, 2010
1. Target molecule to be sequenced: single strand of
unknown DNA sequence
2. Flanked on at least one end by a known sequence.
3. Addition of short "anchor" strand to bind the
known sequence
4. Addition of mixed pool of labeled probe
oligonucleotides
5. DNA ligase preferentially joins the molecule to the
anchor when bases match the target
6. Cycle repeated
7. Anchor is shifted
...
(a)
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 7
Long-read technology: characteristics
● Single-molecule real-time sequencing (Eid et al., 2009)
–
Main techno:
➢ PacBio
➢ Oxford Nanopore
–
No library prep, no amplification
–
Requires expensive equipment
● Synthetic long reads (McCoy et al., 2014)
–
Main techno:
➢ Illumina
➢ 10X Genomics
–
Relies on typical short read sequencing
–
No new equipment required
–
Long reads are constructed in silico using barcodes
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 8
Long-read technology
Single-Molecule Real-Time sequencing
(a) Eid et al., 2009
(b) Goodwin et al., 2016
1. Flowcells made of zero-mode waveguides (ZMW)
anchored on a glass substrate
2. Polymerase fixed at the bottom of well/waveguide,
hence the single-molecule focus
3. dNTP incorporation visualized continuously by
laser
4. Labelled nucleotide pauses during incorporation,
fluorophore is removed
5. Circular templates allow several passes for a single
target sequence
(a) (b)
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 9
Long-read technology
Single-Molecule Real-Time nanopore sequencing
(a) Goodwin et al., 2016
1. Single-stranded DNA is passed through a pore
thanks to a secondary motor protein
2. Current passes through the pore
3. The voltage shifts depending on the k-mers passing
through
4. System called “squiggle space”
5. More than 1,000 possible levels of signal
corresponding to as many different k-mers
6. Hairpin templates allow two passes, forward and
reverse
7. Consensus sequence is computed
(a)
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 10
Long-read technology
Synthetic long-reads (Illumina)
1. Large DNA fragments are partitioned into wells
2. Within each well, fragments are sheared into short
reads and barcoded
3. DNA from each well is pooled, and short reads are
sequenced using standard library preparation and
instrumentation
4. Resulting data is split according to barcodes and
reassembled
(a) Goodwin et al., 2016
(a)
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 11
Long-read technology
Synthetic long-reads (10X Genomics)
(a) Goodwin et al., 2016
(a)
1. Large fragments of DNA partitioned into micelles
called GEMs using emulsion
2. Each GEM has its own barcode
3. Each large fragment is amplified into smaller
fragments
4. DNA is pooled and sequenced
5. Reads are aligned and linked together
6. Alignment doesn't have to be continuous
7. Coverage is achieved by using a so-called “read
cloud”
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 12
Long-read technology: summary
Goodwin et al., 2016
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 13
Long-read technology: SMRT
● PacBio RS II (most widely used)
➢ Average read length ~ 10-15 kb
➢ Single-pass error rate ~ 15%
➢ Mostly indel
➢ Random distribution → overcome with higher coverage
➢ Limited throughput, high cost
● MinION
➢ Small, USB-based device (+ library prep equipment)
➢ Single-pass error rate up to 30%
➢ Base-calling algorithms have improved accuracy recently
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 14
Long-read technology: synthetic reads
● Illumina
● Relies on standard equipment > affordable
● Throughput and error profiles similar to those of standard techno
● Requires more coverage due to additional level of partitioning
● 10X Genomics
● Additional but affordable equipment
● Works with as little as 1ng of starting material
● Inefficient DNA partitioning / limited number of barcodes
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 15
Long-read technology: applications
● De novo assembly
● Sequencing of "challenging" genomes (repetitive regions...)
● Genome finishing
● Genome phasing
–
Analyze compound heterozygotes
–
Measure allele-specific expression
–
Identify variant linkage
–
Phase de novo mutations
...
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 16
Error rates & correction
● Sequencing errors lead to weaker alignments
–
Mismatches
–
Shorter alignments
● Second generation sequencing > substitutions
● Long-read techno (SMRT) > insertions/deletions
–
Median accuracy of 99.3% with 15-fold coverage (Eid et al., 2009)
–
Accuracy 82.1%–84.6% (Koren et al., 2013)
–
Error rate 15% (Salmela et Rivals, 2014)
–
Error rate 13% single pass, <1% circular template (Goodwin et al.,
2016)
➢ Errors are considered unbiased and uniformly distributed
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 17
Error rates & correction
● Self correction (eg. HGAP)
➢ Computing local alignment between long reads
➢ Building multiple alignments
➢ Calling a consensus sequence
● Hybrid correction (eg. AHA)
➢ Aligning short reads on long reads
➢ Correcting long reads using short reads' better accuracy
➢ Computationally costly
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 18
Error rates & correction
● Spectral alignment-based methods (2nd gen.)
–
“With a sufficient coverage, it is possible to compute a
minimal threshold such that, with high probability, each
error-free k-mer appears at least that number of times in
the read set.” (Salmela et Rivals, 2014)
–
A k-mer above/below the threshold is qualified as solid
or weak, respectively
–
A de Bruijn graph (DBG) can be constructed using the
solid k-mers as nodes
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 19
LoRDEC: long-read error correction
● Mix of the hybrid and the spectral approaches
● Construction of a DBG using short-read data
● Correction of long reads by searching an optimal path in
the graph
Salmela et Rivals, 2014
● Any k-mer that occurs less than s
times within the shorts reads is
filtered out
● So-called “solid k-mers” are kept
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 20
LoRDEC: long-read error correction
● Long reads are partitioned into weak and solid regions
according to the short read DBG
● Several pairs of source/target solid k-mers are
investigated to find optimal path over weak region
Salmela et Rivals, 2014
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 21
LoRDEC: long-read error correction
● Effects of parameters on accuracy of correction
● Sensitivity = TP/(TP + FN)
➢ How well does the tool recognize erroneous positions?
● Gain = (TP – FP)/(TP + FN)
➢ How well does the tool remove errors without introducing
new ones?
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 22
LoRDEC: long-read error correction
Salmela et Rivals, 2014
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 23
Comparative study of LR and SR
● Comparative study of alternative splicing in
strawberry develoment using SMRT
sequencing & Illumina short reads
● After filtering and correction (LoRDEC) of
SMRT data, 96.4% of consensus transcripts
could be aligned to the genome (GMAP)
● Removal of redundant transcripts
➢ 33,236 full-length isoforms/transcripts
➢ 26,737 known transcripts
➢ 5,501 novel transcripts
Li et al., 2016
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 24
● Long-read data (PacBio SMRT)
● novel transcripts are shorter on average
● new introns are longer than previously annotated introns
● more isoforms identified
● Short-read data (Illumina)
● more annotated genes are found
● more novel genes are discovered
● Distribution of splicing junctions and splice sites
are similar
Comparative study of LR and SR
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 25
Quality assessment of LR data
Li et al., 2016
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 26
Comparison of SMRT and Illumina
in AS events detection
Li et al., 2016
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 27
Comparison of SMRT and Illumina
in AS events detection
Li et al., 2016
● Identification of genes
undergoing alternative splicing
➢ Illumina: 33.48%
➢ SMRT: 57.67%
➢ Only a few genes have more
than 30 isoforms
➢ AS events have different
profiles depending on tissue &
stage of development
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 28
References
➢ Eid et al., 2009 - Real-Time DNA Sequencing from Single Polymerase Molecules. Science 02 Jan 2009: Vol. 323, Issue
5910, pp. 133-138
➢ Goodwin et al., 2016 - Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics 17,
333–351 (2016)
➢ Koren et al., 2013 - Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol.
2012 Jul; 30(7): 693–700
➢ Li et al., 2016 - Global identification of alternative splicing via comparative analysis of SMRT- and Illumina-based RNA-
seq in strawberry. Plant J, 90: 164–176. doi:10.1111/tpj.13462
➢ McCoy et al., 2014 - Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly-
Repetitive Transposable Elements. PLoS ONE 9(9): e106689.
➢ Metzker, 2010 - Sequencing technologies - the next generation. Nat Rev Genet. 2010 Jan;11(1):31-46
➢ Salmela et Rivals, 2014 - LoRDEC: accurate and efficient long read error correction. Bioinformatics (2014) 30 (24): 3506-
3514
➢ Stöcker, Köster et Rahmann, 2016 - SimLoRD: Simulation of Long Read Data. Bioinformatics (2016) 32 (17): 2704-2706
Thank you!
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 29
Other work related to long-read techno
Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 30
Other work related to long-read techno

More Related Content

What's hot

Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Torsten Seemann
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Torsten Seemann
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Torsten Seemann
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Li Shen
 
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsNick Loman
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014Torsten Seemann
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyQIAGEN
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities Paolo Dametto
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)James Hadfield
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSMirko Rossi
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Torsten Seemann
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
 Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ... Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...Fabio Caligaris
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?Nick Loman
 

What's hot (20)

Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
Approaches to analysing 1000s of bacterial isolates - ICEID 2015 Atlanta, USA...
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
Pipeline or pipe dream - Midlands Micro Meeting UK - mon 15 sep 2014
 
Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2Bioinfo ngs data format visualization v2
Bioinfo ngs data format visualization v2
 
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics ToolsECCMID 2015 Meet-The-Expert: Bioinformatics Tools
ECCMID 2015 Meet-The-Expert: Bioinformatics Tools
 
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014Rapid outbreak characterisation  - UK Genome Sciences 2014 - wed 3 sep 2014
Rapid outbreak characterisation - UK Genome Sciences 2014 - wed 3 sep 2014
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Introduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) TechnologyIntroduction to Next-Generation Sequencing (NGS) Technology
Introduction to Next-Generation Sequencing (NGS) Technology
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities RNA sequencing: advances and opportunities
RNA sequencing: advances and opportunities
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
Toolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGSToolbox for bacterial population analysis using NGS
Toolbox for bacterial population analysis using NGS
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
 Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ... Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
Odyssey Of The IWGSC Reference Genome Sequence: 12 Years 1 Month 28 Days 11 ...
 
ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?ECCMID 2015 - So I have sequenced my genome ... what now?
ECCMID 2015 - So I have sequenced my genome ... what now?
 

Similar to Long-read: assets and challenges of a (not so) emerging technology

Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Luís Rita
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.keyYannick Wurm
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSIntegrated DNA Technologies
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Paolo Missier
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesMartin Hartmann
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowGolden Helix
 
Bda2015 tutorial-part2-data&amp;databases
Bda2015 tutorial-part2-data&amp;databasesBda2015 tutorial-part2-data&amp;databases
Bda2015 tutorial-part2-data&amp;databasesInterpretOmics
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...Andor Kiss
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...OECD Environment
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 

Similar to Long-read: assets and challenges of a (not so) emerging technology (20)

Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]Community Finding with Applications on Phylogenetic Networks [Thesis]
Community Finding with Applications on Phylogenetic Networks [Thesis]
 
2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key2015 09-29-sbc322-methods.key
2015 09-29-sbc322-methods.key
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
Cloud bioinformatics 2
Cloud bioinformatics 2Cloud bioinformatics 2
Cloud bioinformatics 2
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
Expanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGSExpanding Your Research Capabilities Using Targeted NGS
Expanding Your Research Capabilities Using Targeted NGS
 
Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...Preserving the currency of analytics outcomes over time through selective re-...
Preserving the currency of analytics outcomes over time through selective re-...
 
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial CommunitiesProcessing Amplicon Sequence Data for the Analysis of Microbial Communities
Processing Amplicon Sequence Data for the Analysis of Microbial Communities
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing Workflow
 
Bda2015 tutorial-part2-data&amp;databases
Bda2015 tutorial-part2-data&amp;databasesBda2015 tutorial-part2-data&amp;databases
Bda2015 tutorial-part2-data&amp;databases
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
New Technologies at the Center for Bioinformatics & Functional Genomics at Mi...
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...Overview of the commonly used sequencing platforms, bioinformatic search tool...
Overview of the commonly used sequencing platforms, bioinformatic search tool...
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 

Recently uploaded

CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicAditi Jain
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 

Recently uploaded (20)

CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Servosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by PetrovicServosystem Theory / Cybernetic Theory by Petrovic
Servosystem Theory / Cybernetic Theory by Petrovic
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 

Long-read: assets and challenges of a (not so) emerging technology

  • 1. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 1 Long-read: assets and challenges of a (not so) emerging technology
  • 2. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 2 Summary 1. Second generation sequencing 2. Long-read technology 3. Error rates & error correction 4. Alternative splicing & isoforms
  • 3. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 3 Second generation sequencing ● Sequencing by synthesis ➢ Cyclic reversible termination (Illumina) ● GeneReader (Qiagen) ➢ Single-nucleotide addition ● 454 pyrosequencing (Roche) ● IonTorrent (ThermoFisher) ● Sequencing by ligation ➢ SOLiD (ThermoFisher) ➢ Complete Genomics (Beijing Genomics Institute)
  • 4. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 4 Second generation sequencing Sequencing by synthesis (Illumina) (a) Goodwin et al., 2016 (b) Metzker, 2010 (a) (b) 1. Amplification of fragments 2. Addition of four types of reversible terminator bases 3. Removal of non-incorporated nucleotides 4. Imaging of the fluorescently labeled nucleotides 5. Removal of dye and terminal 3' blocker 6. New cycle
  • 5. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 5 Second generation sequencing Sequencing by synthesis (454 pyrosequencing) (a) (b) 1. Amplification PCR emulsion 2. Beads are deposited in wells 3. Addition of a single type of dNTP 4. Emissionof pyrophosphate if dNTP is incorporated 5. Production of luciferase 6. Light is detected and recorded in a flowgram 7. New cycle with another dNTP (a) Goodwin et al., 2016 (b) Metzker, 2010
  • 6. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 6 Second generation sequencing Sequencing by ligation (SOLiD) (a) Metzker, 2010 1. Target molecule to be sequenced: single strand of unknown DNA sequence 2. Flanked on at least one end by a known sequence. 3. Addition of short "anchor" strand to bind the known sequence 4. Addition of mixed pool of labeled probe oligonucleotides 5. DNA ligase preferentially joins the molecule to the anchor when bases match the target 6. Cycle repeated 7. Anchor is shifted ... (a)
  • 7. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 7 Long-read technology: characteristics ● Single-molecule real-time sequencing (Eid et al., 2009) – Main techno: ➢ PacBio ➢ Oxford Nanopore – No library prep, no amplification – Requires expensive equipment ● Synthetic long reads (McCoy et al., 2014) – Main techno: ➢ Illumina ➢ 10X Genomics – Relies on typical short read sequencing – No new equipment required – Long reads are constructed in silico using barcodes
  • 8. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 8 Long-read technology Single-Molecule Real-Time sequencing (a) Eid et al., 2009 (b) Goodwin et al., 2016 1. Flowcells made of zero-mode waveguides (ZMW) anchored on a glass substrate 2. Polymerase fixed at the bottom of well/waveguide, hence the single-molecule focus 3. dNTP incorporation visualized continuously by laser 4. Labelled nucleotide pauses during incorporation, fluorophore is removed 5. Circular templates allow several passes for a single target sequence (a) (b)
  • 9. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 9 Long-read technology Single-Molecule Real-Time nanopore sequencing (a) Goodwin et al., 2016 1. Single-stranded DNA is passed through a pore thanks to a secondary motor protein 2. Current passes through the pore 3. The voltage shifts depending on the k-mers passing through 4. System called “squiggle space” 5. More than 1,000 possible levels of signal corresponding to as many different k-mers 6. Hairpin templates allow two passes, forward and reverse 7. Consensus sequence is computed (a)
  • 10. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 10 Long-read technology Synthetic long-reads (Illumina) 1. Large DNA fragments are partitioned into wells 2. Within each well, fragments are sheared into short reads and barcoded 3. DNA from each well is pooled, and short reads are sequenced using standard library preparation and instrumentation 4. Resulting data is split according to barcodes and reassembled (a) Goodwin et al., 2016 (a)
  • 11. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 11 Long-read technology Synthetic long-reads (10X Genomics) (a) Goodwin et al., 2016 (a) 1. Large fragments of DNA partitioned into micelles called GEMs using emulsion 2. Each GEM has its own barcode 3. Each large fragment is amplified into smaller fragments 4. DNA is pooled and sequenced 5. Reads are aligned and linked together 6. Alignment doesn't have to be continuous 7. Coverage is achieved by using a so-called “read cloud”
  • 12. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 12 Long-read technology: summary Goodwin et al., 2016
  • 13. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 13 Long-read technology: SMRT ● PacBio RS II (most widely used) ➢ Average read length ~ 10-15 kb ➢ Single-pass error rate ~ 15% ➢ Mostly indel ➢ Random distribution → overcome with higher coverage ➢ Limited throughput, high cost ● MinION ➢ Small, USB-based device (+ library prep equipment) ➢ Single-pass error rate up to 30% ➢ Base-calling algorithms have improved accuracy recently
  • 14. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 14 Long-read technology: synthetic reads ● Illumina ● Relies on standard equipment > affordable ● Throughput and error profiles similar to those of standard techno ● Requires more coverage due to additional level of partitioning ● 10X Genomics ● Additional but affordable equipment ● Works with as little as 1ng of starting material ● Inefficient DNA partitioning / limited number of barcodes
  • 15. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 15 Long-read technology: applications ● De novo assembly ● Sequencing of "challenging" genomes (repetitive regions...) ● Genome finishing ● Genome phasing – Analyze compound heterozygotes – Measure allele-specific expression – Identify variant linkage – Phase de novo mutations ...
  • 16. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 16 Error rates & correction ● Sequencing errors lead to weaker alignments – Mismatches – Shorter alignments ● Second generation sequencing > substitutions ● Long-read techno (SMRT) > insertions/deletions – Median accuracy of 99.3% with 15-fold coverage (Eid et al., 2009) – Accuracy 82.1%–84.6% (Koren et al., 2013) – Error rate 15% (Salmela et Rivals, 2014) – Error rate 13% single pass, <1% circular template (Goodwin et al., 2016) ➢ Errors are considered unbiased and uniformly distributed
  • 17. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 17 Error rates & correction ● Self correction (eg. HGAP) ➢ Computing local alignment between long reads ➢ Building multiple alignments ➢ Calling a consensus sequence ● Hybrid correction (eg. AHA) ➢ Aligning short reads on long reads ➢ Correcting long reads using short reads' better accuracy ➢ Computationally costly
  • 18. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 18 Error rates & correction ● Spectral alignment-based methods (2nd gen.) – “With a sufficient coverage, it is possible to compute a minimal threshold such that, with high probability, each error-free k-mer appears at least that number of times in the read set.” (Salmela et Rivals, 2014) – A k-mer above/below the threshold is qualified as solid or weak, respectively – A de Bruijn graph (DBG) can be constructed using the solid k-mers as nodes
  • 19. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 19 LoRDEC: long-read error correction ● Mix of the hybrid and the spectral approaches ● Construction of a DBG using short-read data ● Correction of long reads by searching an optimal path in the graph Salmela et Rivals, 2014 ● Any k-mer that occurs less than s times within the shorts reads is filtered out ● So-called “solid k-mers” are kept
  • 20. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 20 LoRDEC: long-read error correction ● Long reads are partitioned into weak and solid regions according to the short read DBG ● Several pairs of source/target solid k-mers are investigated to find optimal path over weak region Salmela et Rivals, 2014
  • 21. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 21 LoRDEC: long-read error correction ● Effects of parameters on accuracy of correction ● Sensitivity = TP/(TP + FN) ➢ How well does the tool recognize erroneous positions? ● Gain = (TP – FP)/(TP + FN) ➢ How well does the tool remove errors without introducing new ones?
  • 22. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 22 LoRDEC: long-read error correction Salmela et Rivals, 2014
  • 23. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 23 Comparative study of LR and SR ● Comparative study of alternative splicing in strawberry develoment using SMRT sequencing & Illumina short reads ● After filtering and correction (LoRDEC) of SMRT data, 96.4% of consensus transcripts could be aligned to the genome (GMAP) ● Removal of redundant transcripts ➢ 33,236 full-length isoforms/transcripts ➢ 26,737 known transcripts ➢ 5,501 novel transcripts Li et al., 2016
  • 24. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 24 ● Long-read data (PacBio SMRT) ● novel transcripts are shorter on average ● new introns are longer than previously annotated introns ● more isoforms identified ● Short-read data (Illumina) ● more annotated genes are found ● more novel genes are discovered ● Distribution of splicing junctions and splice sites are similar Comparative study of LR and SR
  • 25. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 25 Quality assessment of LR data Li et al., 2016
  • 26. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 26 Comparison of SMRT and Illumina in AS events detection Li et al., 2016
  • 27. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 27 Comparison of SMRT and Illumina in AS events detection Li et al., 2016 ● Identification of genes undergoing alternative splicing ➢ Illumina: 33.48% ➢ SMRT: 57.67% ➢ Only a few genes have more than 30 isoforms ➢ AS events have different profiles depending on tissue & stage of development
  • 28. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 28 References ➢ Eid et al., 2009 - Real-Time DNA Sequencing from Single Polymerase Molecules. Science 02 Jan 2009: Vol. 323, Issue 5910, pp. 133-138 ➢ Goodwin et al., 2016 - Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics 17, 333–351 (2016) ➢ Koren et al., 2013 - Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol. 2012 Jul; 30(7): 693–700 ➢ Li et al., 2016 - Global identification of alternative splicing via comparative analysis of SMRT- and Illumina-based RNA- seq in strawberry. Plant J, 90: 164–176. doi:10.1111/tpj.13462 ➢ McCoy et al., 2014 - Illumina TruSeq Synthetic Long-Reads Empower De Novo Assembly and Resolve Complex, Highly- Repetitive Transposable Elements. PLoS ONE 9(9): e106689. ➢ Metzker, 2010 - Sequencing technologies - the next generation. Nat Rev Genet. 2010 Jan;11(1):31-46 ➢ Salmela et Rivals, 2014 - LoRDEC: accurate and efficient long read error correction. Bioinformatics (2014) 30 (24): 3506- 3514 ➢ Stöcker, Köster et Rahmann, 2016 - SimLoRD: Simulation of Long Read Data. Bioinformatics (2016) 32 (17): 2704-2706 Thank you!
  • 29. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 29 Other work related to long-read techno
  • 30. Thu, Mar 16th 2017 Bioinformatics meeting - Claire Rioualen 30 Other work related to long-read techno