Introduction (GS) & Method of Selection ; Where we come from..??Why Genomic Selection ?Factors contribute to success of GS Steps involved in GSGenomic selection prediction modelsGS prediction accuraciesFuture directions of GS Conclusion
Selection was played important role in Human-plant co evolution Conditions changed over time…. Resources are constraintThe goal of selection has never changed…the target has changedSelection should happen in short intervals of time with accuracyIf we need to increase the yield and productivity, we need to increase genetic gainGenetic gain / genetic advance; amount of increased performance ie., achieved through a breeding program after each cycle of selection. Selection in plant breeding is usually based on estimates of breeding values obtained with pedigree-mixed models.
Selection should happen in short intervals of time with accuracyIf we need to increase the yield and productivity, we need to increase genetic gain.Selection in GS is usually based on Genomic estimated of breeding values. In near future Selections can take place in laboratorySelection in plant breeding is usually based on estimates of breeding values obtained with pedigree-mixed models.
Limitations:Traits with low heritabilityTraits that are expressed late in individual’s lifeTraits that cannot be measured easily (ex: disease resistance & quality traits) Time consuming and the rate of breeding is slow In contrast, the expected increase in INBREEDING among elite populations derived from intense prior selection may also limit the creation of new genetic combinations for future gain. Intermating source populations for genetic recombination may overcome this problem, but delays line development.
Limitations Capture Major QTLs effects, but not Minor QTLs.
Prediction of total genetic value using genome-wide dense marker mapsA close look at the livestock breed where GS has been implemented in its full potential, Holstein–Friesian dairy cattle.
In recent years, tremendous advancements have been made in the area of plant genomics leading to the dramatic increase in the number of genomic tools and technologies for almost every crop species. Importantly, this Progress has been driven by next generation sequencing- (NGS-) based technologies and high-throughput (HTP) marker genotyping systems that have truly revolutionized the plant genomics. ie. Cost-effective NGS based sequencing technologies such as Roche 454 and Illumina have been successfully employed for de Novo WGS & WGRS, GBS, RAD, NGS Based Marker discovery, The remarkable progress has provided access to the Plethora of genome-wide genetic markers especially single nucleotide polymorphism (SNP) markers which are particularly important, precedes to availability of abundant markers & enormous genomic information . MAS has been successfully employed for improvement of monogenic traits & gene pyramiding programme with Major effect QTLs but inefficient for QTs the often controlled by many small effect & still a considerable proportion of genetic variation remains unexplored or uncounted This failure of MAS & To overcome the deficiencies of MAS access to the plethora of genome-wide genetic markers particularly SNP. To deal with this concern, Modification / variant of MAS was proposed Called as WGS/GS/GWS. Genomic selection is an emerging breeding methodology designed to exploit high-throughput, inexpensive DNA marker information to accurately predict the genetic value of breeding candidates for complex traits. (Lorenz lab) A marker-based selection approach called genomic selection Estimated breeding values calculated from phenotypic records and pedigrees, and on knowledge of the heritability of each trait. GEVBs; prediction of the genetic merit of an individual based on its genome. GEBVs are estimated using the genomic relationship matrix (instead of the pedigree) in combination with the EBV or phenotypes of an individual. There is a wide variety of methods to estimate GEBVs that primarily differ in their assumptions about the genetic architecture of the trait of interest GEBVs sum of all marker effects for individuals
Here p is the number of chromosome segments across the genome, Xi is a design matrix allocating animals to the haplotype effects at segment i, and ∧ i g is the vector of effects of the haplotypes within chromosome segment i. Genomic selection exploits LD. -Assumption is that effect of haplotypes or markers within chromosome segments will have same effect across the whole population. Genomic selection avoids bias in estimation of effects due to multiple testing, as all effects fitted simultaneously.
How to estimate this EBV..??Traditional Selection strongly depends upon phenotypic observationsWhat is the Breeding value of this cow for milk production?
Well, How this GS different form from MAS..?? GS: genome-wide panel of dense markers so that all QTls are in LD with at least one marker MAS concentrate on a small number of QTLs that are tagged by markers with well verified associations. Marker-assisted selection strategies increase gain mainly through gain per unit time, rather than gain per cycle (Bernardo and Yu, 2007)From Marker-Assisted Selection to Genomic Selection
Markers used for GS must be able to tag all loci that explain some of the phenotypic variation of the trait of interest in the selection population. Platforms for genotyping must cover the whole genome sufficiently, implying that linkage disequilibrium (LD) between neighbouring markers and accuracy of genotyping must be high. Higher marker densities are required if LD is expected to be low, such as in biparental populations, which are common in crop breeding.
Genotyping gives us ‘picture/snapshot’ of the genetic makeup of an individual - The more SNP the clearer the pictureHow…?By identifying the markers linked to the trait of interestMABC:Goal is to upgrade an established elite genotype with trait(s) controlled by one or a few loci, backcrossing is used to introgress a single geneMARC:+Information from a large number of markers could be used to estimate breeding values without having a precise knowledge of where specific genes are located on the genome• This new source of information can now (or soon) be used in genetic evaluation by – Combining genotyping data with traditional pedigree and phenotypic recordsCan breeder cut short intervals of time ?
The 'training population' is genotyped and phenotyped to 'train' the genomic selection (GS) prediction model. In GS main role of phenotyping is to calculate effect of markers & cross validation. Genotypic information from the breeding material is then fed into the model to calculate genomic estimated breeding values (GEBV) for these lines. Basic steps for implementation of GS can be summarized in four steps:(i) designing training populations with complete phenotypic and genotypic data, (ii) estimating marker effects in the training population, (iii) calculating GEBV of new breeding lines with genotype data, and (iv) selection
Validation is not theoretically essential for a GS, although it is practically important to confirm the adequacy of a GS model before moving onto the breeding phase.
Since, then marker of choice is very important to accurate estimate GEBVBs,&If want to do in polyploidy like wheat / crops with low markers. How can we accomplish ??
SNP is a biallelic markerAbundant in numberSNP marker throughout the genome average of one SNP every 1000 bases, 1 in 100 to 300 bases.
Precise phenotyping for a trait : accurate prediction GEBVs from a GS model
Genotyping by sequencing (GBS) in any large genome species requires reduction of genome complexityEfficient barcoding systemGenotyping-by-sequencing can be applied to different populations or even different species without any prior genomic knowledge as marker discovery is simultaneous with the genotyping of the populationThe use of GBS for GS, therefore, should be applicable to a range of model and Nonmodel crop species to implement genomics-assisted breeding. restriction-site-associated DNA sequencing (RAD-seq) Low coverage sequencing for genotypingThe above methods reduce the proportion of the genome targeted for sequencing so that each marker can be sequenced at high coverage with limited resources, thus enabling markers to be genotyped accurately across many individuals
GBS were 0.28 to 0.45 for grain yield, an improvement of 0.1 to 0.2 over an established marker platform for wheat. The prediction accuracies found in this study are sufficiently high to merit implementation of GS in applied breeding program It is unclear why more accurate predictions were observed with GBS than with DArT, even when controlling for marker number. One possibility is that the GBS markers are free of the genotypic ascertainment bias that is found with fixed array genotyping. No ascertainment bias+ Low per sample costPolymorphism discovery simultaneous with genotyping very good for wheat where polyploidy and duplications cause problems with hybridization/PCR assay GBS has become an attractive alternative technology for genomic selection. Genotyping-by-sequencing (GBS) technologies have proven capacity for delivering large numbers of marker genotypes with potentially less ascertainment bias than standard single nucleotide polymorphism (SNP) array Genotyping-by-sequencing (GBS) is an NGS approach that reduces genome complexity via restriction enzymes RFMVN EM Cornell University Buckler Group, Genotyping by sequencing or GBS, 1 Million SNPs, $30/sample. Later this year, $10-20/sample + DNA extractionGEBVs More Accurate than Current EBVsThe prediction accuracies found in this study are sufficiently high to merit implementation of GS in applied breeding programs. (Jp)
Genomic Prediction: basic ideaChoice of statistical methods for estimating marker Effects also can affect model accuracy. A variety of methods for genomic prediction is currently available. For brevity, we highlight three statistical methods available to train the GS model: ridge regression best linear unbiased prediction (RR-BLUP), Bayes-A, and Bayes-B.
(loss of elitebreeding lines)
Genomic selection for Fusarium head blight resistance in barleyGenomic selection for winter wheat breedingGenomic selection for soybean breedingGenomic selection for maize silage breeding
It is expected that genomic selection will revolutionize breeding in the next decade
Why Genomic Selection ?
Steps involved in GS
Factors contribute to success of GS
Future directions of GS
Method of Selection ; Where we come
Genetic gain /GA;
Selection was played important role in
Human-plant co evolution
Accuracy of selection X intensity of selection X genetic standard deviation
Selection in GS is usually based on Genomic estimated
of breeding values.
Selections can take place in laboratory
Method of Selection ; Where we come
Traits with low heritability
Traits that are expressed late in individual’s life
Traits that can not be measured easily (ex: disease resistance &
Time consuming and the rate of breeding is slow
Limitations of MAS
“Picking the low hanging fruit”
The genes with big QTL effects
The major success is only achieved with the qualitative traits
The biparental mapping populations used in most
QTL studies do not readily translate to breeding
The term ‘GS’ was first introduced by Haley and Visscher at the
6th World Congress on Genetics Applied to Livestock Production
at Armidale, Australia in 1998.
Dr. Theo Meuwissen
GS was first propounded by Meuwissen et al (2001) :
Seminal paper ‘Meuwissen et al (2001) Prediction of total genetic value
using genome-wide dense marker maps. Genetics 157: 1819-29.”
Whole Genome Selection
Genomic Selection; an emerging breeding methodology designed to
exploit high-throughput, inexpensive DNA marker information to
accurately predict the genetic value of breeding candidates for complex
EBV; An estimate of the additive genetic merit for a particular trait
that an individual will pass on to its descendant's.”
GEBVs; Prediction of the genetic merit of an individual based on its
Trace all segments of the genome with markers
-Capture all QTL = all genetic variance
Predict genomic breeding values as sum of effects over all
Genomic selection exploits LD.
Genomic selection avoids bias in estimation of effects due
to multiple testing, as all effects fitted simultaneously.
How to estimate Breeding value?
What is the Breeding value of this
cow for milk production?
Breeding value =h2(milk production-average)
= 4.35 litres
GS : genome-wide panel of dense
markers so that all QTls are in LD
with at least one marker
MAS concentrates on a small number of
QTLs that are tagged by markers with well
Why Genomic selection
important to turn on now..??
Relatively slow progress via phenotypic
Large cost of phenotyping
Limited throughput (plot area, time, people)
QTs + small effects
Decreasing cost of genotyping
Promising results from simulation and cross
validation of GS.
Meet the challenge of feeding 9.5
billion @ 2050.
Pre-requisite for the introduction of GS
The need for adequate and affordable genotyping
Relatively simple breeding schemes in which
selection of additive genetic effects will generate
How can we do that..?
Crops are Concerned
Training Population (genotypes + phenotypes)
Selection Candidates (genotypes)
Heffner et al (2009)
Biparental vs. Multi-Family
Reduced number of markers required
Smaller training populations required
Balanced allele frequencies
Best for introgression of exotic
Allows prediction across a broader
range of adapted germplasm
Allows sampling of more E
Cycle duration is reduced because
retraining model is on-going.
Allows larger training populations
Greater genetic diversity
Cardinal points for success of GS
1. Population type & size of training population
2. Genotyping Platforms & marker densities.
3. Availability of HD genome wide markers.
4. Appropriate statistical methods for accurate
5. Epistasis & G x E.
6. Linkage disequilibrium
7. Long term selection
SNP chip in Genomic selection
Single markers (gene) predict in very small
Abundant in nature. 1kb-2SNP.
Predicting differences in BVs.
What sequences we can call as haplotypes?
The similar haplotypes will make haplotype block where there will be high LD and
Is GBS a suitable marker platform for genomic
Elshire et al (2010)
GBS accesses regulatory regions and sequence tag mapping.
Flexibility and low cost.
GBS markers led to higher genomic prediction accuracies.
Impute missing data.
Even for a species with a genome as challenging as wheat
(Absence of a reference genome)
Poland et al (2011)
Statistical model used
GBS markers are more uniformly distributed across the genome than the DArT markers
GS Prediction Accuracies
Number and size of QTLs.
LD between marker and QTLs.
Marker density, marker type, and training population
Number of lines increases (accuracy GEBVs ↑)
GS Prediction Accuracies
Heritability of the trait.
Genetic structure of the trait.
Simulation study results.
Cross-validation; How close is the simulated data to
Genomic selection prediction models
Meuwissen et al (2001) Prediction of total genetic value using genome-wide dense
marker maps. Genetics 157: 1819-29.
Stepwise Regression (SR)
Select most significant markers on the basis of arbitrary
significant thresholds and non significant markers effect equals
(Lande and Thompson, 1990)
Estimate the effect of significant markers using multiple
regression Since, only a portion of the genetic variance will be
Detects only large effects and that cause overestimation of
(Goddard and Hayes, 2007; Beavis, 1998 )
SR resulted in low GEBVs accuracy due to limited detection of
(Meuwissen et al 2001)
Ridge Regression BLUP (RR-BLUP)
Simultaneously select all marker effects rather than categorizing
into significant or having no effect
Ridge regression shrinks all marker effects towards zero.
The method makes the assumption that markers are random
effects with a equal variance. (Meuwissen et al 2001)
RR-BLUP incorrectly treats all effects equally which is
(Xu et al 2003)
RR-BLUP Superior to SR
Bayesian Regression (BR)
Marker variance treated more realistically by assuming
specified prior distribution.
BayesA: uses an inverted chi-square to regress the marker
variance towards zero.
All marker effects are > 0 (Bayes A)
BayesB: assume a prior mass at zero, thereby allowing for
markers with no effects.
Some marker effects can be = 0 (Bayes B)
(Meuwissen et al 2001)
Other potential Genomic selection
Least absolute shrinkage and selection operator (LASSO)
ii. Reproducing Kernel Hilbert spaces and support vector
machine regression. (RKHS) Gianola et al (2006)
iii. Partial Least Squares regression & principle component
iv. RF (R package random forest)
v. MVN EM Algorithm
R-Package for GS
A genome of 1000 cM was simulated with a marker spacing of 1 cM
Modeling epistasis and dominance
Accurate prediction of dominance and epistatic effects fetch
Lorenza et al pointed out inclusion of epistatic effects in prediction
models will give improve accuracy with condition as;
Epistasis is present & can be modelled accurately.
Blanc et al (2006) reported that epistasis will contribute to
Empirical studies harnessing data are illuminating for this topic.
GS in relation to strong subpopulation
GWAS studies, SPS potentially cause spurious long distance /
unlinked association b/w marker allele & phenotype.
GS, shifts to being able to maintain predictive ability despite a
structure training data set & spurious association will not be an
important cause for loss of predictive ability.
LD is not consistent, allelic effects estimated in one subpopulation
will not be predictive for another subpopulation.
Improving gain in the long-term necessarily requires a trade-off
with short-term gain.
Long-term gain is often explicit, as in quantitative genetic models
that maximize immediate predicted gain subject to a constraint on
the rate of inbreeding. Meuwissen (1997).
1. Select individuals or groups
2. Analytical prediction, deterministic simulation using
Numerical approaches to optimization, and stochastic
Has proved its value in animal breeding particularly dairy cattle
(Hayes and Goddard, 2010)
Still to prove its value over generations in crop plants
Simulation studies in plants suggest potential for improved gain per
(Jannink et al 2010)
GS has been seldom implemented in the field
Where to apply GS in the breeding cycle
How many lines to select for genotyping.
Where and how do we place our training population in
comparison to the selection candidates?
How many markers are required, determined by the
extent of LD.
How can we implement non additive effects into our
models to allow predictions across multiple generations?
How do non-additive effects affect the accuracy of
How often to re-estimate the chromosome segment
Outstanding questions that remain
How much gain do we expect when using GS?
how much potential loss ??
can a breeding program absorb?
GS future perspectives
Training population design.
Epistatic modelling in GS.
Strength of different statistical methods.
Managing short & long term gain.
Department of Agronomy & Horticulture
University of Nebraska-Lincoln
Department of Agronomy and Plant Genetics
University of Minnesota
Ongoing projects on GS
Univ. of Minnesota
research and for
yield and silage
State Plant Breeding
“Nothing In Science Has Any Value To
Society If It Is Not Communicated”-Anne