SlideShare a Scribd company logo
1 of 19
Review of:
“A draft human
pangenome reference”
Presented by:
Stuart MacGowan
Liao, WW., Asri, M., Ebler, J. et al.
A draft human pangenome reference.
Nature 617, 312–324 (2023).
https://doi.org/10.1038/s41586-023-05896-x
Source: Nature Vol. 617 Issue 7960 (Image: Darryl Leja/NHGRI)
Why isn’t one human genome enough?
Source: The Human Pangenome – NHGRI/ YouTube
A draft human
pangenome reference
• First draft of human pangenome reference!
• 47 phased, diploid assemblies from
genetically diverse individuals
• Coverage and accuracy:
• Over 99% of expected genome sequence
• over 99% structural and base pair accuracy
• Contributions:
• Reveals new alleles at structurally complex loci
• Adds 119M base pairs of euchromatic polymorphic sequences
• Identifies 1,115 new gene duplications
• Benefits:
• Reduced small variant discovery errors by 34%
• increased detection of structural variants by 104% compared
to GRCh38 workflows
0
50
100
150
200
250
Small Variant Discovery Errors
(%)
Detected Structural Variants (%)
Percent
relative
to
GRCh38
(%)
Variant Discovery in
GRCh38 vs. Pangenome
GRCh38 Pangenome
© 2023 Stuart A. MacGowan, CC BY 4.0
Introduction
• Limitations of current human genome reference (GRCh38):
• Contains ~210 Mb of unknown or simulated
sequences, limiting study scope.
• Achievements with T2T-CHM13 genome sequencing:
• Uncovered 3.7 million additional SNPs.
• Better representation of true copy number variants
(CNVs).
• Shortcomings of a single reference genome:
• Can't capture full human genetic diversity.
• Overlooks many structurally variant (SV) regions,
which significantly impact gene function.
• The solution: Transition to a pangenomic reference:
• Overcomes reference bias.
• Current study presents a draft human pangenome.
• Ultimate goal: Capture global genomic diversity with
a panel of 700 haplotypes from 350 individuals. Credit: Darryl Leja, NHGRI, from http://www.ensembl.info/
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license as indicated.
The manuscript
• Assembling 47 diverse human genomes
• Assembly assessment
• Regional assembly assessment
• Completeness and CNV
• Annotating 47 diverse genomes
• Constructing a draft pangenome
• Measuring pangenome variation
• Pangenomes represent complex loci
• Applications of the pangenome
• Pangenome-based short variant discovery
• A pangenome variant resource
• SV genotyping
• Improved tandem repeat representation
• Improved RNA sequencing mapping
• Improved chromatin immunoprecipitation and sequencing analysis
Credit: NHGRI and Massive Science
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
Assembling 47 diverse
human genomes
• 29 HPRC samples and 18 samples from other
efforts.
• Multimodal sequencing for all samples
• PacBio HiFi, ONT long-read, Bionano, Hi-C
Illumina.
• On average, 39.7x HiFi sequence depth of
coverage
• N50 value averaged at 19.6 kb for the HiFi reads.
• Core assembler: Trio-Hifiasm, which uses PacBio
HiFi long-read and parental Illumina short-read for
phased contig assemblies.
Supplementary Figure 1. Trio-Hifiasm assembly pipeline.
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
Assembly assessment
• Fixed misassemblies
• three large duplication errors
• one large phasing error,
• 217 putative interchromosomal joins.
• Haploid assemblies with X chromosome average 3.04 Gb,
(99.3% of CHM13). With Y chromosome average 2.93 Gb.
• Average NG50 value was 40 Mb (cf. 56 Mb for GRCh38)
• Yak k-mer QV of 53.57 ≈ 1 error per 227,509 bases.
• QVs for two samples confirmed against Genome in a Bottle
• 32% of indel errors were in homopolymers >5 bp
• 48% were in tandem repeats and low-complexity regions
• Average haplotype switch error rate of 0.67%
Fig 1c. Total assembled sequence per assembly.
Fig 1d. Assembly contiguity NGx plot.
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
Ensembl
GENCODE
annotation
Reference Gene
Set (GENCODE)
Identify Gene
Clusters
(100kb windows)
Define Anchor
Points
(3 per cluster)
Map Anchors to
Target Genome
(minimap2)
Determine High-
Confidence
Regions
Align GRCh38
Region with Target
Region (MAFFT)
Reconstruct
Transcripts
Check Mapping
Quality and
Resolve
Inconsistencies
Look for Recent
Duplications
© 2023 Stuart A. MacGowan, CC BY 4.0
Ensembl GENCODE annotation statistics
• Transcriptome mapping
• Median of 99.07% protein-coding genes and 99.42% transcripts
• Median of 98.16% noncoding genes and 98.96% transcripts
• Median of 25 nonsense and 72 frameshifts per assembly
• Within expected range of loss-of-function mutations
• Over 80% supported by independent Illumina variant
callsets
• Suggest upper bound of 18 transcript-altering errors per
transcriptome or
1 per 1.7 million assembled transcriptome bases.
• Identified 1,115 protein-coding gene families with
copy number gain in 1+ genome
Fig 2a. Percentages of coding and noncoding genes and transcripts
annotated from the reference set each assembly.
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
Building A Draft
Human Pangenome
• Pangenomes can be visualized as sequence
graphs with DNA segments as nodes and
combinations of orientations as edges.
• Haplotype sequences are walks in the
graph and are implicitly aligned.
• Graph Construction:
• Minigraph: Reference-based,
gradually adds assemblies.
• Minigraph-Cactus (MC): Enhances
Minigraph with further alignments.
• PanGenome Graph Builder (PGGB):
All-to-all assembly alignments.
• Includes GRCh38 and T2T-CHM13
• Three samples held out for benchmarking.
Source: Li, H., Feng, X. & Chu, C. Genome Biol 21, 265 (2020).
Fig 3a. A pangenome variation graph. Source: Liao et al., Nature, 2023.
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license as indicated.
Exploring Human Genetic Variation with
Pangenome Reference Draft
• ~22 million small variants
• ~70k structural variants
• Novel ~175-190 Mb of euchromatic
autosomal sequence
• High concordance of variants with
conventional genotyping
• > 97% HiFi reads align to the MC graph
• Annotated nearly 99.1% of protein-coding
transcripts per assembly
Fig 3g. Pangenome growth curves for PGGB. Depth measures how often a
segment is contained in any haplotype sequence. Core is present in ≥95% of
haplotypes, common is ≥5%.
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
Applications of the pangenome to
downstream analysis workflows
• Pangenome-based short variant
discovery
• A pangenome variant resource
• SV genotyping
• Tandem repeat representation
• RNA sequencing mapping
• Chromatin immunoprecipitation and
sequencing analysis
Credit: Darryl Leja, NHGRI.
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023 and as indicated.
Improved Variant Calling Accuracy
with Pangenomic Approach
• Methods:
• Giraffe alignment to MC pangenome graph
• cf. alignments to GRCh38 and Dragen Graph.
• Pangenomic approach (Giraffe + DeepVariant)
outperformed others in calling small variants.
• Error comparison:
• 21,700 errors vs. 36,144 (GRCh38) and
26,852 (Dragen pipeline).
• Even better in complex medically relevant genes.
• Additive improvements with with DeepTrio
Fig 6a. GIAB (v.4.2.1) HG005 benchmark
Fig 6b. CMRG (v.1.0) benchmark
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
Pangenome Variant
Resource
• Applied Giraffe + DeepVariant pipeline to high-
coverage short-reads from 1KG
• Mendelian consistency across 100 trios
comparable to samples from GIAB
• On average 64,000 more variants per sample
compared to 1KG catalogue
• Improved performance in challenging regions
yields better allele frequencies at complex,
medically relevant loci
Link: Google Cloud Bucket
© 2023 Stuart A. MacGowan, CC BY 4.0
Accessing draft
pangenome
resources
• UCSC Genome Browser (http://hprc-
browser.ucsc.edu)
• Ensembl Rapid Release Genome Browser
(https://rapid.ensembl.org)
• Ensembl HPRC project page
(https://projects.ensembl.org/hprc/)
• AnVIL_HPRC workspace
(https://anvilproject.org/)
• AWS Open Data Program in human-pangenomics
S3 bucket (https://s3-us-west-
2.amazonaws.com/human-
pangenomics/index.html)
• Various BioProject, Zenodo and GitHub repos
© 2023 Stuart A. MacGowan, CC BY 4.0
FLG assemblies in
Jalview
• Extracted from Proteins FASTAs from Ensembl HPRC
project page (https://projects.ensembl.org/hprc/)
• nb. Ensembl Ids were mangled…
• Alignment properties:
• Sequences: 97
• Minimum Sequence Length: 2390
• Maximum Sequence Length: 4710
• Average Length: 3680
© 2023 Stuart A. MacGowan, CC BY 4.0
Conclusion
• A Draft Human Pangenome: 94 diverse, high-quality de novo haplotype
assemblies.
• New Genetic Insights: Uncovered novel genetic variations and
mutational processes.
• Pangenomes are Powerful Tools: Enhanced mapping workflows and
error reduction.
• Future of SVs: Pangenome + long-reads = comprehensive SV
genotyping.
• Globalising genomics: Pangenomic workflows improve genotype
detection across diverse individuals and ancestries, and help mitigate
detection bias.
• Challenges Ahead: Assembly reliability, sequencing errors, and need
more diversity.
• Implications: Promises to improve understanding of genomics and
ability to predict, diagnose, and treat disease. Set new standards for
capturing variant diversity.
• Towards a Global Reference: Anticipated rapid pangenome
improvements and many applications.
Source: Nature Vol. 617 Issue 7960 (Image: Darryl Leja/NHGRI)
© 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license as indicated.
Sources
• Unless otherwise indicated, the figures used in this presentation are
sourced from the article:
• A draft human pangenome reference" by Liao, WW., Asri, M., Ebler, J. et al.,
published in Nature, 2023.
• https://doi.org/10.1038/s41586-023-05896-x
• The figures are used in accordance with the Creative Commons Attribution
4.0 International License, which permits use, sharing, adaptation,
distribution and reproduction in any medium or format, as long as
appropriate credit is given to the original author(s) and the source.
• The Creative Commons license can be viewed here:
http://creativecommons.org/licenses/by/4.0/
• Figures were resized and cropped to fit the slide format.
License
• This presentation, including all original figures, is created by Stuart A.
MacGowan and is licensed under a Creative Commons Attribution 4.0
International License.
• You are free to share (copy and redistribute the material in any
medium or format) and adapt (remix, transform, and build upon the
material) for any purpose, even commercially, provided you give
appropriate credit, provide a link to the license, and indicate if
changes were made.
• For full details of the license, visit:
http://creativecommons.org/licenses/by/4.0/
• © 2023 Stuart A. MacGowan

More Related Content

What's hot

Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
Nikhil Aggarwal
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
avrilcoghlan
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...
butest
 

What's hot (20)

Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 
An introduction to promoter prediction and analysis
An introduction to promoter prediction and analysisAn introduction to promoter prediction and analysis
An introduction to promoter prediction and analysis
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Genome analysis2
Genome analysis2Genome analysis2
Genome analysis2
 
Whole genome sequence
Whole genome sequenceWhole genome sequence
Whole genome sequence
 
Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013Prokka - rapid bacterial genome annotation - ABPHM 2013
Prokka - rapid bacterial genome annotation - ABPHM 2013
 
Single cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsSingle cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applications
 
Quality control of sequencing with fast qc obtained with
Quality control of sequencing with fast qc obtained withQuality control of sequencing with fast qc obtained with
Quality control of sequencing with fast qc obtained with
 
Genomic aided selection for crop improvement
Genomic aided selection for crop improvementGenomic aided selection for crop improvement
Genomic aided selection for crop improvement
 
Zinc Finger Nuclease.
Zinc Finger Nuclease.Zinc Finger Nuclease.
Zinc Finger Nuclease.
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)
 
Data analysis pipelines for NGS applications
Data analysis pipelines for NGS applicationsData analysis pipelines for NGS applications
Data analysis pipelines for NGS applications
 
PCR and primer design techniques
PCR and primer design techniquesPCR and primer design techniques
PCR and primer design techniques
 
Intro to illumina sequencing
Intro to illumina sequencingIntro to illumina sequencing
Intro to illumina sequencing
 
Introduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR GenomicsIntroduction to 16S Analysis with NGS - BMR Genomics
Introduction to 16S Analysis with NGS - BMR Genomics
 
Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...Lecture 9 slides: Machine learning for Protein Structure ...
Lecture 9 slides: Machine learning for Protein Structure ...
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 

Similar to Review of Liao et al - A draft human pangenome reference - Nature (2023)

140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
GenomeInABottle
 

Similar to Review of Liao et al - A draft human pangenome reference - Nature (2023) (20)

Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417GIAB and long reads for bio it world 190417
GIAB and long reads for bio it world 190417
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology Rewriting the Genome Using CRISPR and Synthetic Biology
Rewriting the Genome Using CRISPR and Synthetic Biology
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal140127 abrf interlaboratory study proposal
140127 abrf interlaboratory study proposal
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 

Recently uploaded

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 

Recently uploaded (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 

Review of Liao et al - A draft human pangenome reference - Nature (2023)

  • 1. Review of: “A draft human pangenome reference” Presented by: Stuart MacGowan Liao, WW., Asri, M., Ebler, J. et al. A draft human pangenome reference. Nature 617, 312–324 (2023). https://doi.org/10.1038/s41586-023-05896-x Source: Nature Vol. 617 Issue 7960 (Image: Darryl Leja/NHGRI)
  • 2. Why isn’t one human genome enough? Source: The Human Pangenome – NHGRI/ YouTube
  • 3. A draft human pangenome reference • First draft of human pangenome reference! • 47 phased, diploid assemblies from genetically diverse individuals • Coverage and accuracy: • Over 99% of expected genome sequence • over 99% structural and base pair accuracy • Contributions: • Reveals new alleles at structurally complex loci • Adds 119M base pairs of euchromatic polymorphic sequences • Identifies 1,115 new gene duplications • Benefits: • Reduced small variant discovery errors by 34% • increased detection of structural variants by 104% compared to GRCh38 workflows 0 50 100 150 200 250 Small Variant Discovery Errors (%) Detected Structural Variants (%) Percent relative to GRCh38 (%) Variant Discovery in GRCh38 vs. Pangenome GRCh38 Pangenome © 2023 Stuart A. MacGowan, CC BY 4.0
  • 4. Introduction • Limitations of current human genome reference (GRCh38): • Contains ~210 Mb of unknown or simulated sequences, limiting study scope. • Achievements with T2T-CHM13 genome sequencing: • Uncovered 3.7 million additional SNPs. • Better representation of true copy number variants (CNVs). • Shortcomings of a single reference genome: • Can't capture full human genetic diversity. • Overlooks many structurally variant (SV) regions, which significantly impact gene function. • The solution: Transition to a pangenomic reference: • Overcomes reference bias. • Current study presents a draft human pangenome. • Ultimate goal: Capture global genomic diversity with a panel of 700 haplotypes from 350 individuals. Credit: Darryl Leja, NHGRI, from http://www.ensembl.info/ © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license as indicated.
  • 5. The manuscript • Assembling 47 diverse human genomes • Assembly assessment • Regional assembly assessment • Completeness and CNV • Annotating 47 diverse genomes • Constructing a draft pangenome • Measuring pangenome variation • Pangenomes represent complex loci • Applications of the pangenome • Pangenome-based short variant discovery • A pangenome variant resource • SV genotyping • Improved tandem repeat representation • Improved RNA sequencing mapping • Improved chromatin immunoprecipitation and sequencing analysis Credit: NHGRI and Massive Science © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
  • 6. Assembling 47 diverse human genomes • 29 HPRC samples and 18 samples from other efforts. • Multimodal sequencing for all samples • PacBio HiFi, ONT long-read, Bionano, Hi-C Illumina. • On average, 39.7x HiFi sequence depth of coverage • N50 value averaged at 19.6 kb for the HiFi reads. • Core assembler: Trio-Hifiasm, which uses PacBio HiFi long-read and parental Illumina short-read for phased contig assemblies. Supplementary Figure 1. Trio-Hifiasm assembly pipeline. © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
  • 7. Assembly assessment • Fixed misassemblies • three large duplication errors • one large phasing error, • 217 putative interchromosomal joins. • Haploid assemblies with X chromosome average 3.04 Gb, (99.3% of CHM13). With Y chromosome average 2.93 Gb. • Average NG50 value was 40 Mb (cf. 56 Mb for GRCh38) • Yak k-mer QV of 53.57 ≈ 1 error per 227,509 bases. • QVs for two samples confirmed against Genome in a Bottle • 32% of indel errors were in homopolymers >5 bp • 48% were in tandem repeats and low-complexity regions • Average haplotype switch error rate of 0.67% Fig 1c. Total assembled sequence per assembly. Fig 1d. Assembly contiguity NGx plot. © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
  • 8. Ensembl GENCODE annotation Reference Gene Set (GENCODE) Identify Gene Clusters (100kb windows) Define Anchor Points (3 per cluster) Map Anchors to Target Genome (minimap2) Determine High- Confidence Regions Align GRCh38 Region with Target Region (MAFFT) Reconstruct Transcripts Check Mapping Quality and Resolve Inconsistencies Look for Recent Duplications © 2023 Stuart A. MacGowan, CC BY 4.0
  • 9. Ensembl GENCODE annotation statistics • Transcriptome mapping • Median of 99.07% protein-coding genes and 99.42% transcripts • Median of 98.16% noncoding genes and 98.96% transcripts • Median of 25 nonsense and 72 frameshifts per assembly • Within expected range of loss-of-function mutations • Over 80% supported by independent Illumina variant callsets • Suggest upper bound of 18 transcript-altering errors per transcriptome or 1 per 1.7 million assembled transcriptome bases. • Identified 1,115 protein-coding gene families with copy number gain in 1+ genome Fig 2a. Percentages of coding and noncoding genes and transcripts annotated from the reference set each assembly. © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
  • 10. Building A Draft Human Pangenome • Pangenomes can be visualized as sequence graphs with DNA segments as nodes and combinations of orientations as edges. • Haplotype sequences are walks in the graph and are implicitly aligned. • Graph Construction: • Minigraph: Reference-based, gradually adds assemblies. • Minigraph-Cactus (MC): Enhances Minigraph with further alignments. • PanGenome Graph Builder (PGGB): All-to-all assembly alignments. • Includes GRCh38 and T2T-CHM13 • Three samples held out for benchmarking. Source: Li, H., Feng, X. & Chu, C. Genome Biol 21, 265 (2020). Fig 3a. A pangenome variation graph. Source: Liao et al., Nature, 2023. © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license as indicated.
  • 11. Exploring Human Genetic Variation with Pangenome Reference Draft • ~22 million small variants • ~70k structural variants • Novel ~175-190 Mb of euchromatic autosomal sequence • High concordance of variants with conventional genotyping • > 97% HiFi reads align to the MC graph • Annotated nearly 99.1% of protein-coding transcripts per assembly Fig 3g. Pangenome growth curves for PGGB. Depth measures how often a segment is contained in any haplotype sequence. Core is present in ≥95% of haplotypes, common is ≥5%. © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
  • 12. Applications of the pangenome to downstream analysis workflows • Pangenome-based short variant discovery • A pangenome variant resource • SV genotyping • Tandem repeat representation • RNA sequencing mapping • Chromatin immunoprecipitation and sequencing analysis Credit: Darryl Leja, NHGRI. © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023 and as indicated.
  • 13. Improved Variant Calling Accuracy with Pangenomic Approach • Methods: • Giraffe alignment to MC pangenome graph • cf. alignments to GRCh38 and Dragen Graph. • Pangenomic approach (Giraffe + DeepVariant) outperformed others in calling small variants. • Error comparison: • 21,700 errors vs. 36,144 (GRCh38) and 26,852 (Dragen pipeline). • Even better in complex medically relevant genes. • Additive improvements with with DeepTrio Fig 6a. GIAB (v.4.2.1) HG005 benchmark Fig 6b. CMRG (v.1.0) benchmark © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license from Liao et al., Nature, 2023.
  • 14. Pangenome Variant Resource • Applied Giraffe + DeepVariant pipeline to high- coverage short-reads from 1KG • Mendelian consistency across 100 trios comparable to samples from GIAB • On average 64,000 more variants per sample compared to 1KG catalogue • Improved performance in challenging regions yields better allele frequencies at complex, medically relevant loci Link: Google Cloud Bucket © 2023 Stuart A. MacGowan, CC BY 4.0
  • 15. Accessing draft pangenome resources • UCSC Genome Browser (http://hprc- browser.ucsc.edu) • Ensembl Rapid Release Genome Browser (https://rapid.ensembl.org) • Ensembl HPRC project page (https://projects.ensembl.org/hprc/) • AnVIL_HPRC workspace (https://anvilproject.org/) • AWS Open Data Program in human-pangenomics S3 bucket (https://s3-us-west- 2.amazonaws.com/human- pangenomics/index.html) • Various BioProject, Zenodo and GitHub repos © 2023 Stuart A. MacGowan, CC BY 4.0
  • 16. FLG assemblies in Jalview • Extracted from Proteins FASTAs from Ensembl HPRC project page (https://projects.ensembl.org/hprc/) • nb. Ensembl Ids were mangled… • Alignment properties: • Sequences: 97 • Minimum Sequence Length: 2390 • Maximum Sequence Length: 4710 • Average Length: 3680 © 2023 Stuart A. MacGowan, CC BY 4.0
  • 17. Conclusion • A Draft Human Pangenome: 94 diverse, high-quality de novo haplotype assemblies. • New Genetic Insights: Uncovered novel genetic variations and mutational processes. • Pangenomes are Powerful Tools: Enhanced mapping workflows and error reduction. • Future of SVs: Pangenome + long-reads = comprehensive SV genotyping. • Globalising genomics: Pangenomic workflows improve genotype detection across diverse individuals and ancestries, and help mitigate detection bias. • Challenges Ahead: Assembly reliability, sequencing errors, and need more diversity. • Implications: Promises to improve understanding of genomics and ability to predict, diagnose, and treat disease. Set new standards for capturing variant diversity. • Towards a Global Reference: Anticipated rapid pangenome improvements and many applications. Source: Nature Vol. 617 Issue 7960 (Image: Darryl Leja/NHGRI) © 2023 Stuart A. MacGowan, CC BY 4.0. Content used under license as indicated.
  • 18. Sources • Unless otherwise indicated, the figures used in this presentation are sourced from the article: • A draft human pangenome reference" by Liao, WW., Asri, M., Ebler, J. et al., published in Nature, 2023. • https://doi.org/10.1038/s41586-023-05896-x • The figures are used in accordance with the Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as appropriate credit is given to the original author(s) and the source. • The Creative Commons license can be viewed here: http://creativecommons.org/licenses/by/4.0/ • Figures were resized and cropped to fit the slide format.
  • 19. License • This presentation, including all original figures, is created by Stuart A. MacGowan and is licensed under a Creative Commons Attribution 4.0 International License. • You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially, provided you give appropriate credit, provide a link to the license, and indicate if changes were made. • For full details of the license, visit: http://creativecommons.org/licenses/by/4.0/ • © 2023 Stuart A. MacGowan

Editor's Notes

  1. Minigraph figure: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02168-z
  2. https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP/vg/graph_to_grch38;tab=objects?pli=1&prefix=&forceOnObjectsSortingFiltering=false&pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22)) E.g. detected gene conversion event in RHCE and variants within KCNE1 (previously inaccessible due to a false duplication in GRCh38)