GBS: Genotyping by sequencing

Introduction
• Genetic markers
– heritable polymorphisms that can be measured in one or
more populations of individuals
– heart of modern genetics
– enable the study of important questions in population
genetics, ecological genetics and evolution
• Advent of next-generation sequencing (NGS)
– whole genome sequencing
– re-sequencing : discovering, sequencing and genotyping
thousands of markers across almost any genome
• comprehensive genome-wide association studies for any
organism
• genome-wide studies on wild populations

NGS marker discovery and
genotyping methods
• RRL and CRoPS (reduced-representation libraries and
complexity reduction of polymorphic sequences)
• RAD seq (Restriction-site associated DNA sequencing)
• GBS (Genotyping by sequencing)
– the digestion of multiple samples of genomic DNA
– a selection or reduction of the resulting restriction fragments
– NGS of the final set of fragments, which should be less than 1
kb in size

(Davey et al., 2011 Nat Rev Genet )
RRL RAD GBS

GBS adapters and primers
(Elshire et al., 2011 PLOS One)

GBS library construction
(Elshire et al., 2011 PLOS One)

GBS results in Maize
• Parental line
– 98% of 1,146,449 HQ reads were aligned with maize genome
– 868,336 reads that aligned perfectly to the maize genome
• 276 RILs
– 6 lanes, 48-plex, 2,090 Mbp per lane on average
– From 145,836,644 raw reads, 83% passed filtering process (120,438,739
GBS reads)
– 436,372 reads were produced per DNA sample and 95% of samples
– 809,651 sequence tags covering 51.8 Mbp or 2.3% of the maize
genome
– 167,494 of the dominant markers, could be placed upon frame work
map of 25,185 sequence tags.

TASSEL-GBS
• new bottleneck is the efficient bioinformatics
analysis of the vast and ever-expanding sea of
data
• TASSEL-GBS (Trait Analysis by aSSociation, Evolution and Linkage)
– Not limited to the specific restriction enzymes
utilized in those protocols:
– work on nearly any restriction enzyme and
barcoding approach specifically
– designed to efficiently handle large quantities of
data from large numbers of samples

(Glaubitz et al., 2014 PLOS One)

Population genetic-based filtering of
putative SNPS
• Putative SNPs from GBS may be of low quality
– sequencing error
– paralogous sequence tags from different loci
• To detect and filter out error-prone SNPs
– minor allele frequency (MAF)
– inbreeding coefficient (or ‘‘index of panmixia’’)
𝐹𝐼𝑇 = 1 −
𝐻𝑜
𝐻𝑒
𝐻𝑒 = 2𝑞(1 − 𝑞)

Capacity for large numbers of markers
and samples
• 31,978 samples took 495 CPU-hours on 64 core Linux
machine with 512GB of RAM
• 383 samples requires approximately 1 CPU-hour on a
MacBook Pro with a 2.6 GHz Intel Core i7 processor
and 16GB of RAM running OS X.

UNEAK pipeline in TASSEL-GBS
• Absence of a reference genome,
– SNP calling may be much less accurate with short-
read sequencing technologies,
– true SNPs, sequencing errors and SNPs between
paralogs can be difficult to distinguish
• Universal Network-Enabled Analysis Kit (UNEAK)
– To enable genome-wide association studies (GWAS)
and genomic selection (GS)

The analytical framework of UNEAK
(Lu et al ., 2013 PLOS Genetics)

(Lu et al ., 2013 PLOS Genetics)

SNP discovery in switchgrass
Full-sib population
(n=130)
Half-sib population
(n=168)
66 diverse population
(n=540)
400,107 476,005 700,236
• The average coverage of the three data sets was less
than 1X
• Using most informative markers (0.2<MAF<0.3), 3000
paternal SNPs into 18 linkage groups
• Paternal linkage map 41,709 markers, maternal map
46,508 markers

Strengths and Weaknesses of GBS
• Strengths of GBS and TASSEL-GBS
– The large number of markers potentially produced
– Low cost and minimal startup cost
– Integration of SNP discovery with SNP calling
• Weakness
– When conducted at low coverage, is the amount of
missing data

Reference
• Elshire R, Glaubitz J, Sun Q, Poland J, Kawamoto K, et al. (2011) A
robust, simple genotyping-by-sequencing (GBS) approach for
high diversity species. PLoS ONE 6.
• Glaubitz JC, Casstevens TM, Lu F, Harriman J, Elshire RJ, et al.
(2014) TASSEL-GBS: A High Capacity Genotyping by Sequencing
Analysis Pipeline. PLoS ONE 9
• Lu F, Lipka AE, Glaubitz J, Elshire R, Cherney JH, et al. (2013)
Switchgrass Genomic Diversity, Ploidy, and Evolution: Novel
Insights from a Network-Based SNP Discovery Protocol. PLoS
Genet 9
• Davey J, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, et al.
(2011) Genome-wide genetic marker discovery and genotyping
using next-genration sequencing. Nat Rev Genet 12:499-510

GBS: Genotyping by sequencing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to GBS: Genotyping by sequencing

Similar to GBS: Genotyping by sequencing (20)

Recently uploaded

Recently uploaded (20)

GBS: Genotyping by sequencing

Editor's Notes