SlideShare a Scribd company logo
1 of 20
Download to read offline
Building a platinum human genome
assembly from single haplotype
human genomes generated from
long molecule sequencing
Karyn Meltz Steinberg
ASHG 2015
@KMS_Meltzy
0
100000
200000
300000
400000
CHM1_1.1 HuRef ALLPATHS YH_2.0
Contig Number
Contig N50
Figure 1
Last year…
Steinberg et al, 2014
This year…
0
5000000
10000000
15000000
20000000
25000000
30000000
CHM13
Draft
CHM1
PB_2
CHM1
PB_1
CHM1_1.1 HuRef ALLPATHS YH_2.0
Contig Number
Contig N50
This year…
Log
scale
1
10
100
1000
10000
100000
1000000
10000000
100000000
CHM13
Draft
CHM1
PB_2
CHM1
PB_1
CHM1_1.1 HuRef ALLPATHS YH_2.0
Contig Number
Contig N50
We combine PacBio with other technologies to construct
the assembly
How do we define platinum and gold standards?
GRCh38
Platinum
(CHM1)
Gold
(NA19240)
% Reference genome
covered
100 98.40 90.80
% Assigned chromosomes 99.60 98.40 90.80
% gene models covered
(>95% id, >90% length)
99.96 98.78 94.26
Contig N50 67.8 Mb 26.9 Mb 6.0 Mb
Number of gaps 875 3,640 3,568
Total Assembled size 3.067 Gb 2.996 Gb 2.745 Gb
% haplotype blocks
(>1kb) resolved
NA >95 >80
http://genome.wustl.edu/projects/detail/reference-genomes-improvement/
CHM13 Draft Assembly (GCA_000983455.1)
•  60X PacBio (P5 and P6 chemistry)
•  Average read length ~11kb
•  Daligner/Falcon v 0.2
Total sequence length 2,851,367,788
Number of contigs 2,873
Contig N50 12,981,785
Contig L50 68
Gene Model (RefSeq) Analysis
GRCh38
CHM1_
1.1
CHM1_PB1 CHM1_PB2 CHM13
Number of
sequences
not aligning
21 88 67 67 125
Split
Transcripts 8 35 1,245 1,131 285
CDS coverage
<95% 17 266 1,339 1,212 265
Total Sequences Retrieved from Entrez 49,680
Short read sequence analysis
•  100X Illumina sequence
•  Align with BWA-MEM to ordered and
oriented assembly
•  Variant calling via SpeedSeq (Chiang et al,
2015)
•  SNVs, indels: FreeBayes
•  SVs: LUMPY, SVTyper
•  CNV: CNVnator
CHM13 Illumina data aligned to CHM13 assembly
202,016 SNVs/indels on unplaced scaffolds
SV_TYPES	
   >10kb	
   5-10kb	
   1-5kb	
   <1kb	
  
DELETIONS	
   174	
   131	
   430	
   2582	
  
INVERSIONS	
   5	
   0	
   2	
   7	
  
DUPLICATIONS	
   151	
   112	
   309	
   113	
  
TOTAL	
   330	
   243	
   741	
   2702	
  
BioNano SV calls can be used to identify misassembly
Collapse
Expansion
inAssembly
Gap in SequencePacBio Assembly
BioNano Map
SV_TYPES	
  
DELETIONS	
   41	
  
INVERSIONS	
   10	
  
INSERTIONS	
   15
TOTAL	
   66	
  
BioNano alignment to CHM13
BioNano reveals collapse in PacBio assembly
PacBio Assembly
BioNano Map
Illumina data aligned to PacBio assembly also shows
collapse
BioNano reveals collapse in PacBio assembly due to
highly homologous segmental duplications
SD = 96%
CHR1	
   46746040	
   46857004	
   40	
   W	
   LBHZ01000938.1	
   110965	
  
CHR1	
   46857005	
   47034202	
   41	
   N	
   177198	
   gap	
  
CHR1	
   47034203	
   52157695	
   42	
   W	
   LBHZ01000245.1	
   5123493	
  
PacBio Assembly
BioNano Map
This region is rich in medically relevant genes
chr1 (p33) p31.1 1q12 q41 43 44
CYP4Z2P
CYP4A11
CYP4X1
CYP4Z1
CYP4A22
SegDups
Genes
CHM13
PacBio
LBHZ010000938.1 LBHZ010000938.1
LBHZ010000245.1
CHM13 Hybrid Scaffold
Hybrid Scaffold
PacBio Contigs
BioNano Contigs
CHM13 Hybrid Scaffolds
BioNano Map PacBio Assmbly Hybrid Scaffold
# of Contigs 3593 1590 * 254
Min Contig Length 0.08 Mb 0 0.27 Mb
Median Contig
Length
0.61 Mb 0.06 Mb 4.35 Mb
Mean Contig Length 0.78 Mb 1.78 Mb 9.68 Mb
Contig N50 1.02 Mb 13.46 Mb 20.79 Mb
Max Contig Length 5.27 Mb 63.15 Mb 82.83 Mb
Total Contig Length 2.812 Gb 2.824 Gb 2.458 Gb
*Number of contigs used in hybrid scaffolding
Combining CHM1 and CHM13
reference
mapping
CHM1 CHM13
Pipeline analysis
Variant Evaluation
97 13
Acknowledgements
The McDonnell Genome Institute at
Washington University in St. Louis
Rick Wilson
Bob Fulton
Wes Warren
Tina Graves-Lindsay
Vince Magrini
Sean McGrath
Derek Albracht
Milinn Kremitzki
Susan Rock
Debbie Scheer
Aye Wollam
The Finishing and Bioinformatics
Teams at The Genome Institute
University of Washington
Evan Eichler
John Huddleston
Archana Raja
NCBI
Valerie Schneider
University of Pittsburgh
School of Medicine (CHM13 cell line)
Urvashi Surti
Personalis
Deanna Church
BioNano Genomics
Palak Sheth
Pacific Biosciences
Jason Chin
Nick Sisneros
Building a platinum human genome assembly from single haplotype human genomes generated from long molecule sequencing

More Related Content

What's hot

hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)Shaojun Xie
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Deanna Church
 
Novel hexavalent GITR agonists stimulate T cells and enhance memory formation
Novel hexavalent GITR agonists stimulate T cells and enhance memory formationNovel hexavalent GITR agonists stimulate T cells and enhance memory formation
Novel hexavalent GITR agonists stimulate T cells and enhance memory formationThomas Hoeger
 
Семинар ДНК 16/05/2014 Сибэнзим
Семинар ДНК 16/05/2014 СибэнзимСеминар ДНК 16/05/2014 Сибэнзим
Семинар ДНК 16/05/2014 СибэнзимRuslan Titov
 
Giovanni Blandino: geni e tumori
Giovanni Blandino: geni e tumoriGiovanni Blandino: geni e tumori
Giovanni Blandino: geni e tumoriScienzainrete
 
Jan2016 seracare giab update d yuzuki
Jan2016 seracare giab update d yuzukiJan2016 seracare giab update d yuzuki
Jan2016 seracare giab update d yuzukiGenomeInABottle
 
My oral presentation at the Maize Genetics Conference, 2008, Washington DC
My oral presentation at the Maize Genetics Conference, 2008, Washington DCMy oral presentation at the Maize Genetics Conference, 2008, Washington DC
My oral presentation at the Maize Genetics Conference, 2008, Washington DCMihai Miclăuș
 
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...Thermo Fisher Scientific
 
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...St John's Laboratory Ltd
 
The Paternal Tree of Humanity
The Paternal Tree of HumanityThe Paternal Tree of Humanity
The Paternal Tree of HumanityFamily Tree DNA
 
Why do the silent mutations matter?
Why do the silent mutations matter?Why do the silent mutations matter?
Why do the silent mutations matter?Mehis Pold
 

What's hot (20)

hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
GRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slidesGRCWorkshop_geval_1KG_slides
GRCWorkshop_geval_1KG_slides
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Grc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudgeGrc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudge
 
Novel hexavalent GITR agonists stimulate T cells and enhance memory formation
Novel hexavalent GITR agonists stimulate T cells and enhance memory formationNovel hexavalent GITR agonists stimulate T cells and enhance memory formation
Novel hexavalent GITR agonists stimulate T cells and enhance memory formation
 
Семинар ДНК 16/05/2014 Сибэнзим
Семинар ДНК 16/05/2014 СибэнзимСеминар ДНК 16/05/2014 Сибэнзим
Семинар ДНК 16/05/2014 Сибэнзим
 
Giovanni Blandino: geni e tumori
Giovanni Blandino: geni e tumoriGiovanni Blandino: geni e tumori
Giovanni Blandino: geni e tumori
 
ZhouPoster
ZhouPosterZhouPoster
ZhouPoster
 
2009 09 08 Wiltshire Ipit Seminar Slides
2009 09 08 Wiltshire Ipit Seminar Slides2009 09 08 Wiltshire Ipit Seminar Slides
2009 09 08 Wiltshire Ipit Seminar Slides
 
Jan2016 seracare giab update d yuzuki
Jan2016 seracare giab update d yuzukiJan2016 seracare giab update d yuzuki
Jan2016 seracare giab update d yuzuki
 
My oral presentation at the Maize Genetics Conference, 2008, Washington DC
My oral presentation at the Maize Genetics Conference, 2008, Washington DCMy oral presentation at the Maize Genetics Conference, 2008, Washington DC
My oral presentation at the Maize Genetics Conference, 2008, Washington DC
 
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
Resolving false positive CYP2D6 genotype results: CYP2D7 variation is the cul...
 
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
Presentation of Fridtof Lund-Johansen in 1st International Antibody Validatio...
 
The Paternal Tree of Humanity
The Paternal Tree of HumanityThe Paternal Tree of Humanity
The Paternal Tree of Humanity
 
Why do the silent mutations matter?
Why do the silent mutations matter?Why do the silent mutations matter?
Why do the silent mutations matter?
 

Viewers also liked

Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Keith Bradnam
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Keith Bradnam
 
Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestKeith Bradnam
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Keith Bradnam
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Keith Bradnam
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.mkim8
 

Viewers also liked (7)

Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2Genome assembly: then and now — v1.2
Genome assembly: then and now — v1.2
 
Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2Genome assembly: then and now (with notes) — v1.2
Genome assembly: then and now (with notes) — v1.2
 
Thoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contestThoughts on the feasibility of an Assemblathon 3 contest
Thoughts on the feasibility of an Assemblathon 3 contest
 
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...Genome Assembly: the art of trying to make one BIG thing from millions of ver...
Genome Assembly: the art of trying to make one BIG thing from millions of ver...
 
Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...Genome assembly: the art of trying to make one big thing from millions of ver...
Genome assembly: the art of trying to make one big thing from millions of ver...
 
A Comparison of NGS Platforms.
A Comparison of NGS Platforms.A Comparison of NGS Platforms.
A Comparison of NGS Platforms.
 
Introduction to next generation sequencing
Introduction to next generation sequencingIntroduction to next generation sequencing
Introduction to next generation sequencing
 

Similar to Building a platinum human genome assembly from single haplotype human genomes generated from long molecule sequencing

Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han caoGenomeInABottle
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsAndrea Ujvari
 
Presentation at ptcog54th meeting minglei kang
Presentation at ptcog54th meeting minglei kangPresentation at ptcog54th meeting minglei kang
Presentation at ptcog54th meeting minglei kangMinglei Kang
 
BSPTV incorporating 4DCT for PBS proton therapy of thoracic tumors-at ptcog54...
BSPTV incorporating 4DCT for PBS proton therapy of thoracic tumors-at ptcog54...BSPTV incorporating 4DCT for PBS proton therapy of thoracic tumors-at ptcog54...
BSPTV incorporating 4DCT for PBS proton therapy of thoracic tumors-at ptcog54...Minglei Kang
 
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate SetAxiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate SetAffymetrix
 
KCACS Presentation
KCACS Presentation KCACS Presentation
KCACS Presentation Jon Tally
 
Octet Potency Assay: Development, Qualification and Validation Strategies
Octet Potency Assay: Development, Qualification and Validation StrategiesOctet Potency Assay: Development, Qualification and Validation Strategies
Octet Potency Assay: Development, Qualification and Validation StrategiesKBI Biopharma
 
V_Hematology_Forum_Prashant_Tembhare
V_Hematology_Forum_Prashant_TembhareV_Hematology_Forum_Prashant_Tembhare
V_Hematology_Forum_Prashant_TembhareEAFO1
 
Bacterial transcriptome profiling using Ion Torrent Proton™ technology
Bacterial transcriptome profiling using Ion Torrent Proton™ technologyBacterial transcriptome profiling using Ion Torrent Proton™ technology
Bacterial transcriptome profiling using Ion Torrent Proton™ technologyThermo Fisher Scientific
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput SequencingMark Pallen
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
2011 AACR OncoPanel Poster
2011 AACR OncoPanel Poster2011 AACR OncoPanel Poster
2011 AACR OncoPanel Posterovechkina
 
Dorobantu Adina BMS2 - Molecular Biology FLR.pdf
Dorobantu Adina BMS2 - Molecular Biology FLR.pdfDorobantu Adina BMS2 - Molecular Biology FLR.pdf
Dorobantu Adina BMS2 - Molecular Biology FLR.pdfAdinaGeorgiana7
 
Tobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingTobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingGenomeInABottle
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...Benjamin Schwessinger
 

Similar to Building a platinum human genome assembly from single haplotype human genomes generated from long molecule sequencing (20)

Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han cao
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_results
 
Presentation at ptcog54th meeting minglei kang
Presentation at ptcog54th meeting minglei kangPresentation at ptcog54th meeting minglei kang
Presentation at ptcog54th meeting minglei kang
 
BSPTV incorporating 4DCT for PBS proton therapy of thoracic tumors-at ptcog54...
BSPTV incorporating 4DCT for PBS proton therapy of thoracic tumors-at ptcog54...BSPTV incorporating 4DCT for PBS proton therapy of thoracic tumors-at ptcog54...
BSPTV incorporating 4DCT for PBS proton therapy of thoracic tumors-at ptcog54...
 
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate SetAxiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
Axiom® Genome-Wide CHB 1 & CHB 2 Array Plate Set
 
KCACS Presentation
KCACS Presentation KCACS Presentation
KCACS Presentation
 
Octet Potency Assay: Development, Qualification and Validation Strategies
Octet Potency Assay: Development, Qualification and Validation StrategiesOctet Potency Assay: Development, Qualification and Validation Strategies
Octet Potency Assay: Development, Qualification and Validation Strategies
 
Aacr poster2007
Aacr poster2007Aacr poster2007
Aacr poster2007
 
V_Hematology_Forum_Prashant_Tembhare
V_Hematology_Forum_Prashant_TembhareV_Hematology_Forum_Prashant_Tembhare
V_Hematology_Forum_Prashant_Tembhare
 
26072016 uc davis_small
26072016 uc davis_small26072016 uc davis_small
26072016 uc davis_small
 
Amit Kumar (CRS4, Università di Cagliari)
Amit Kumar (CRS4, Università di Cagliari)Amit Kumar (CRS4, Università di Cagliari)
Amit Kumar (CRS4, Università di Cagliari)
 
Bacterial transcriptome profiling using Ion Torrent Proton™ technology
Bacterial transcriptome profiling using Ion Torrent Proton™ technologyBacterial transcriptome profiling using Ion Torrent Proton™ technology
Bacterial transcriptome profiling using Ion Torrent Proton™ technology
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 
Church sfaf13
Church sfaf13Church sfaf13
Church sfaf13
 
High-Throughput Sequencing
High-Throughput SequencingHigh-Throughput Sequencing
High-Throughput Sequencing
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
2011 AACR OncoPanel Poster
2011 AACR OncoPanel Poster2011 AACR OncoPanel Poster
2011 AACR OncoPanel Poster
 
Dorobantu Adina BMS2 - Molecular Biology FLR.pdf
Dorobantu Adina BMS2 - Molecular Biology FLR.pdfDorobantu Adina BMS2 - Molecular Biology FLR.pdf
Dorobantu Adina BMS2 - Molecular Biology FLR.pdf
 
Tobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotypingTobias marschall haplotype aware genotyping
Tobias marschall haplotype aware genotyping
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
 

Recently uploaded

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 

Recently uploaded (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 

Building a platinum human genome assembly from single haplotype human genomes generated from long molecule sequencing

  • 1. Building a platinum human genome assembly from single haplotype human genomes generated from long molecule sequencing Karyn Meltz Steinberg ASHG 2015 @KMS_Meltzy
  • 2. 0 100000 200000 300000 400000 CHM1_1.1 HuRef ALLPATHS YH_2.0 Contig Number Contig N50 Figure 1 Last year… Steinberg et al, 2014
  • 5. We combine PacBio with other technologies to construct the assembly
  • 6. How do we define platinum and gold standards? GRCh38 Platinum (CHM1) Gold (NA19240) % Reference genome covered 100 98.40 90.80 % Assigned chromosomes 99.60 98.40 90.80 % gene models covered (>95% id, >90% length) 99.96 98.78 94.26 Contig N50 67.8 Mb 26.9 Mb 6.0 Mb Number of gaps 875 3,640 3,568 Total Assembled size 3.067 Gb 2.996 Gb 2.745 Gb % haplotype blocks (>1kb) resolved NA >95 >80 http://genome.wustl.edu/projects/detail/reference-genomes-improvement/
  • 7. CHM13 Draft Assembly (GCA_000983455.1) •  60X PacBio (P5 and P6 chemistry) •  Average read length ~11kb •  Daligner/Falcon v 0.2 Total sequence length 2,851,367,788 Number of contigs 2,873 Contig N50 12,981,785 Contig L50 68
  • 8. Gene Model (RefSeq) Analysis GRCh38 CHM1_ 1.1 CHM1_PB1 CHM1_PB2 CHM13 Number of sequences not aligning 21 88 67 67 125 Split Transcripts 8 35 1,245 1,131 285 CDS coverage <95% 17 266 1,339 1,212 265 Total Sequences Retrieved from Entrez 49,680
  • 9. Short read sequence analysis •  100X Illumina sequence •  Align with BWA-MEM to ordered and oriented assembly •  Variant calling via SpeedSeq (Chiang et al, 2015) •  SNVs, indels: FreeBayes •  SVs: LUMPY, SVTyper •  CNV: CNVnator
  • 10. CHM13 Illumina data aligned to CHM13 assembly 202,016 SNVs/indels on unplaced scaffolds SV_TYPES   >10kb   5-10kb   1-5kb   <1kb   DELETIONS   174   131   430   2582   INVERSIONS   5   0   2   7   DUPLICATIONS   151   112   309   113   TOTAL   330   243   741   2702  
  • 11. BioNano SV calls can be used to identify misassembly Collapse Expansion inAssembly Gap in SequencePacBio Assembly BioNano Map SV_TYPES   DELETIONS   41   INVERSIONS   10   INSERTIONS   15 TOTAL   66   BioNano alignment to CHM13
  • 12. BioNano reveals collapse in PacBio assembly PacBio Assembly BioNano Map
  • 13. Illumina data aligned to PacBio assembly also shows collapse
  • 14. BioNano reveals collapse in PacBio assembly due to highly homologous segmental duplications SD = 96% CHR1   46746040   46857004   40   W   LBHZ01000938.1   110965   CHR1   46857005   47034202   41   N   177198   gap   CHR1   47034203   52157695   42   W   LBHZ01000245.1   5123493   PacBio Assembly BioNano Map
  • 15. This region is rich in medically relevant genes chr1 (p33) p31.1 1q12 q41 43 44 CYP4Z2P CYP4A11 CYP4X1 CYP4Z1 CYP4A22 SegDups Genes CHM13 PacBio LBHZ010000938.1 LBHZ010000938.1 LBHZ010000245.1
  • 16. CHM13 Hybrid Scaffold Hybrid Scaffold PacBio Contigs BioNano Contigs
  • 17. CHM13 Hybrid Scaffolds BioNano Map PacBio Assmbly Hybrid Scaffold # of Contigs 3593 1590 * 254 Min Contig Length 0.08 Mb 0 0.27 Mb Median Contig Length 0.61 Mb 0.06 Mb 4.35 Mb Mean Contig Length 0.78 Mb 1.78 Mb 9.68 Mb Contig N50 1.02 Mb 13.46 Mb 20.79 Mb Max Contig Length 5.27 Mb 63.15 Mb 82.83 Mb Total Contig Length 2.812 Gb 2.824 Gb 2.458 Gb *Number of contigs used in hybrid scaffolding
  • 18. Combining CHM1 and CHM13 reference mapping CHM1 CHM13 Pipeline analysis Variant Evaluation 97 13
  • 19. Acknowledgements The McDonnell Genome Institute at Washington University in St. Louis Rick Wilson Bob Fulton Wes Warren Tina Graves-Lindsay Vince Magrini Sean McGrath Derek Albracht Milinn Kremitzki Susan Rock Debbie Scheer Aye Wollam The Finishing and Bioinformatics Teams at The Genome Institute University of Washington Evan Eichler John Huddleston Archana Raja NCBI Valerie Schneider University of Pittsburgh School of Medicine (CHM13 cell line) Urvashi Surti Personalis Deanna Church BioNano Genomics Palak Sheth Pacific Biosciences Jason Chin Nick Sisneros