Genomic selection (GS) is a method for predicting an individual's genetic merit based on its genome-wide marker data. It allows for selection to take place in the laboratory based on genomic estimated breeding values. Key factors for the success of GS include the size and type of the training population, marker density and type, availability of high-density genome-wide markers, and appropriate statistical prediction models. Ridge regression BLUP and Bayesian regression methods are commonly used prediction models. Future directions for improving GS include determining optimal training population design, modeling non-additive genetic effects, and managing long-term genetic gain.
3. Method of Selection ; Where we come
from..??
Genetic gain /GA;
Selection was played important role in
Human-plant co evolution
ΔG=
Accuracy of selection X intensity of selection X genetic standard deviation
Generation interval
Selection in GS is usually based on Genomic estimated
of breeding values.
Selections can take place in laboratory
4. Method of Selection ; Where we come
from..??
Genetic gain
selection
QTL/gene
Phenotype
Breeder
Genotype
Find markers
Find population
5. Traditional Selection
Traits with low heritability
Traits that are expressed late in individual’s life
Traits that can not be measured easily (ex: disease resistance &
quality traits)
Time consuming and the rate of breeding is slow
7. Limitations of MAS
“Picking the low hanging fruit”
The genes with big QTL effects
The major success is only achieved with the qualitative traits
The biparental mapping populations used in most
QTL studies do not readily translate to breeding
applications
8. The term ‘GS’ was first introduced by Haley and Visscher at the
6th World Congress on Genetics Applied to Livestock Production
at Armidale, Australia in 1998.
Dr. Theo Meuwissen
GS was first propounded by Meuwissen et al (2001) :
Seminal paper ‘Meuwissen et al (2001) Prediction of total genetic value
using genome-wide dense marker maps. Genetics 157: 1819-29.”
9. Whole Genome Selection
Genomic Selection; an emerging breeding methodology designed to
exploit high-throughput, inexpensive DNA marker information to
accurately predict the genetic value of breeding candidates for complex
traits.
EBV; An estimate of the additive genetic merit for a particular trait
that an individual will pass on to its descendant's.”
GEBVs; Prediction of the genetic merit of an individual based on its
genome.
10. Genome Selection
Trace all segments of the genome with markers
-Capture all QTL = all genetic variance
Predict genomic breeding values as sum of effects over all
segments
Genomic selection exploits LD.
Genomic selection avoids bias in estimation of effects due
to multiple testing, as all effects fitted simultaneously.
11. How to estimate Breeding value?
X
10 litre
What is the Breeding value of this
cow for milk production?
0.5 litre
8 litre
10 litre
12 litre
Breeding value =h2(milk production-average)
= (12-7.625)*h2
= 4.35 litres
12. GS
GS : genome-wide panel of dense
markers so that all QTls are in LD
with at least one marker
MAS
MAS concentrates on a small number of
QTLs that are tagged by markers with well
verified associations.
120 cms
15cms
14. Why Genomic selection
important to turn on now..??
Relatively slow progress via phenotypic
selection
Large cost of phenotyping
Limited throughput (plot area, time, people)
QTs + small effects
Decreasing cost of genotyping
Promising results from simulation and cross
validation of GS.
Meet the challenge of feeding 9.5
billion @ 2050.
15. Pre-requisite for the introduction of GS
The need for adequate and affordable genotyping
platforms.
Relatively simple breeding schemes in which
selection of additive genetic effects will generate
useful results.
Statistical methods.
17. How can we do that..?
Crops are Concerned
Prerequisite
Training Population (genotypes + phenotypes)
Selection Candidates (genotypes)
Accurate phenotypes
Inexpensive,
high-density genotypes
Heffner et al (2009)
18. Training Population
Biparental vs. Multi-Family
Biparental
1.
2.
3.
4.
5.
6.
Population specific
Reduced epistasis
Reduced number of markers required
Smaller training populations required
Balanced allele frequencies
Best for introgression of exotic
Multi-Family
1.
2.
3.
4.
5.
Allows prediction across a broader
range of adapted germplasm
Allows sampling of more E
Cycle duration is reduced because
retraining model is on-going.
Allows larger training populations
Greater genetic diversity
20. Cardinal points for success of GS
1. Population type & size of training population
2. Genotyping Platforms & marker densities.
3. Availability of HD genome wide markers.
4. Appropriate statistical methods for accurate
GEBVs.
5. Epistasis & G x E.
6. Linkage disequilibrium
7. Long term selection
22. SNP chip in Genomic selection
Single markers (gene) predict in very small
differences.
Abundant in nature. 1kb-2SNP.
Predicting differences in BVs.
23. What sequences we can call as haplotypes?
The similar haplotypes will make haplotype block where there will be high LD and
less recombination's.
24. Is GBS a suitable marker platform for genomic
selection?
Obviously ..!!!
25. GBS
Elshire et al (2010)
GBS accesses regulatory regions and sequence tag mapping.
Flexibility and low cost.
GBS markers led to higher genomic prediction accuracies.
Impute missing data.
Highly multiplexed
Even for a species with a genome as challenging as wheat
(Absence of a reference genome)
26. Poland et al (2011)
Statistical model used
i.
RF
ii.
MVN EM
GBS markers are more uniformly distributed across the genome than the DArT markers
27. GS Prediction Accuracies
Number and size of QTLs.
LD between marker and QTLs.
Marker density, marker type, and training population
size.
Number of lines increases (accuracy GEBVs ↑)
28. GS Prediction Accuracies
Heritability of the trait.
Genetic structure of the trait.
Simulation study results.
Cross-validation; How close is the simulated data to
real data?
29. Genomic selection prediction models
Meuwissen et al (2001) Prediction of total genetic value using genome-wide dense
marker maps. Genetics 157: 1819-29.
30. Stepwise Regression (SR)
Select most significant markers on the basis of arbitrary
significant thresholds and non significant markers effect equals
to zero.
(Lande and Thompson, 1990)
Estimate the effect of significant markers using multiple
regression Since, only a portion of the genetic variance will be
captured.
Limitations :
Detects only large effects and that cause overestimation of
significant effects.
(Goddard and Hayes, 2007; Beavis, 1998 )
SR resulted in low GEBVs accuracy due to limited detection of
QTLs.
(Meuwissen et al 2001)
31. Ridge Regression BLUP (RR-BLUP)
Simultaneously select all marker effects rather than categorizing
into significant or having no effect
Ridge regression shrinks all marker effects towards zero.
The method makes the assumption that markers are random
effects with a equal variance. (Meuwissen et al 2001)
Limitations :
RR-BLUP incorrectly treats all effects equally which is
unrealistic.
(Xu et al 2003)
RR-BLUP Superior to SR
32. Bayesian Regression (BR)
Marker variance treated more realistically by assuming
specified prior distribution.
BayesA: uses an inverted chi-square to regress the marker
variance towards zero.
All marker effects are > 0 (Bayes A)
BayesB: assume a prior mass at zero, thereby allowing for
markers with no effects.
Some marker effects can be = 0 (Bayes B)
(Meuwissen et al 2001)
33. Other potential Genomic selection
prediction models
i.
Least absolute shrinkage and selection operator (LASSO)
ii. Reproducing Kernel Hilbert spaces and support vector
machine regression. (RKHS) Gianola et al (2006)
iii. Partial Least Squares regression & principle component
regression.
iv. RF (R package random forest)
v. MVN EM Algorithm
R-Package for GS
http://www.r-project.org
34. A genome of 1000 cM was simulated with a marker spacing of 1 cM
35. Modeling epistasis and dominance
Accurate prediction of dominance and epistatic effects fetch
advantageous.
Lorenza et al pointed out inclusion of epistatic effects in prediction
models will give improve accuracy with condition as;
Epistasis is present & can be modelled accurately.
Blanc et al (2006) reported that epistasis will contribute to
marker effects.
Empirical studies harnessing data are illuminating for this topic.
36. GS in relation to strong subpopulation
structure
GWAS studies, SPS potentially cause spurious long distance /
unlinked association b/w marker allele & phenotype.
GS, shifts to being able to maintain predictive ability despite a
structure training data set & spurious association will not be an
important cause for loss of predictive ability.
LD is not consistent, allelic effects estimated in one subpopulation
will not be predictive for another subpopulation.
37. Long-term selection
Improving gain in the long-term necessarily requires a trade-off
with short-term gain.
Long-term gain is often explicit, as in quantitative genetic models
that maximize immediate predicted gain subject to a constraint on
the rate of inbreeding. Meuwissen (1997).
Two approaches:
1. Select individuals or groups
2. Analytical prediction, deterministic simulation using
Numerical approaches to optimization, and stochastic
simulation
38. Has proved its value in animal breeding particularly dairy cattle
(Hayes and Goddard, 2010)
Still to prove its value over generations in crop plants
Simulation studies in plants suggest potential for improved gain per
unit time.
(Jannink et al 2010)
39. Future Directions..???
GS has been seldom implemented in the field
Where to apply GS in the breeding cycle
(which generations)
How many lines to select for genotyping.
Where and how do we place our training population in
comparison to the selection candidates?
40. Future Directions..???
How many markers are required, determined by the
extent of LD.
How can we implement non additive effects into our
models to allow predictions across multiple generations?
How do non-additive effects affect the accuracy of
genomic selection.
How often to re-estimate the chromosome segment
effects?
41. Outstanding questions that remain
unanswered..??
How much gain do we expect when using GS?
how much potential loss ??
can a breeding program absorb?
42. GS future perspectives
Training population design.
Epistatic modelling in GS.
Strength of different statistical methods.
Managing short & long term gain.
43.
44. Further Interest..??
Visit….
Lorenz Lab
Department of Agronomy & Horticulture
University of Nebraska-Lincoln
http://www.lorenzlab.net
Rex Bernardo
Department of Agronomy and Plant Genetics
University of Minnesota
45. Ongoing projects on GS
Crop
Trait
Markers
FUNDING
AGENCY
PROJECT
DURATION
Tomato
Quality, shape,
shelf life
SNP
Barley
FHB resistance
SNP
Univ. of Minnesota
2013
Trifolium
Yield
SNP
Danish plant
research and for
Aarhus University
2010-2015
Wheat
Winter wheat
genotype-bysequencing
Wheat Breeding
Presidential Chair
2014
Maize
Drought
SNP
CIMMYT
2014
Maize
Total biomass
yield and silage
quality
SNP
USDA-AFRI
2014
Sugar beet
White sugar
yield, sugar
content
SNP
State Plant Breeding
Institute, University
of Hohenheim
2013
2009-2013
USDA/AFRI
Introduction (GS) & Method of Selection ; Where we come from..??Why Genomic Selection ?Factors contribute to success of GS Steps involved in GSGenomic selection prediction modelsGS prediction accuraciesFuture directions of GS Conclusion
Selection was played important role in Human-plant co evolution Conditions changed over time…. Resources are constraintThe goal of selection has never changed…the target has changedSelection should happen in short intervals of time with accuracyIf we need to increase the yield and productivity, we need to increase genetic gainGenetic gain / genetic advance; amount of increased performance ie., achieved through a breeding program after each cycle of selection. Selection in plant breeding is usually based on estimates of breeding values obtained with pedigree-mixed models.
Selection should happen in short intervals of time with accuracyIf we need to increase the yield and productivity, we need to increase genetic gain.Selection in GS is usually based on Genomic estimated of breeding values. In near future Selections can take place in laboratorySelection in plant breeding is usually based on estimates of breeding values obtained with pedigree-mixed models.
Limitations:Traits with low heritabilityTraits that are expressed late in individual’s lifeTraits that cannot be measured easily (ex: disease resistance & quality traits) Time consuming and the rate of breeding is slow In contrast, the expected increase in INBREEDING among elite populations derived from intense prior selection may also limit the creation of new genetic combinations for future gain. Intermating source populations for genetic recombination may overcome this problem, but delays line development.
Limitations Capture Major QTLs effects, but not Minor QTLs.
Prediction of total genetic value using genome-wide dense marker mapsA close look at the livestock breed where GS has been implemented in its full potential, Holstein–Friesian dairy cattle.
In recent years, tremendous advancements have been made in the area of plant genomics leading to the dramatic increase in the number of genomic tools and technologies for almost every crop species. Importantly, this Progress has been driven by next generation sequencing- (NGS-) based technologies and high-throughput (HTP) marker genotyping systems that have truly revolutionized the plant genomics. ie. Cost-effective NGS based sequencing technologies such as Roche 454 and Illumina have been successfully employed for de Novo WGS & WGRS, GBS, RAD, NGS Based Marker discovery, The remarkable progress has provided access to the Plethora of genome-wide genetic markers especially single nucleotide polymorphism (SNP) markers which are particularly important, precedes to availability of abundant markers & enormous genomic information . MAS has been successfully employed for improvement of monogenic traits & gene pyramiding programme with Major effect QTLs but inefficient for QTs the often controlled by many small effect & still a considerable proportion of genetic variation remains unexplored or uncounted This failure of MAS & To overcome the deficiencies of MAS access to the plethora of genome-wide genetic markers particularly SNP. To deal with this concern, Modification / variant of MAS was proposed Called as WGS/GS/GWS. Genomic selection is an emerging breeding methodology designed to exploit high-throughput, inexpensive DNA marker information to accurately predict the genetic value of breeding candidates for complex traits. (Lorenz lab) A marker-based selection approach called genomic selection Estimated breeding values calculated from phenotypic records and pedigrees, and on knowledge of the heritability of each trait. GEVBs; prediction of the genetic merit of an individual based on its genome. GEBVs are estimated using the genomic relationship matrix (instead of the pedigree) in combination with the EBV or phenotypes of an individual. There is a wide variety of methods to estimate GEBVs that primarily differ in their assumptions about the genetic architecture of the trait of interest GEBVs sum of all marker effects for individuals
Here p is the number of chromosome segments across the genome, Xi is a design matrix allocating animals to the haplotype effects at segment i, and ∧ i g is the vector of effects of the haplotypes within chromosome segment i. Genomic selection exploits LD. -Assumption is that effect of haplotypes or markers within chromosome segments will have same effect across the whole population. Genomic selection avoids bias in estimation of effects due to multiple testing, as all effects fitted simultaneously.
How to estimate this EBV..??Traditional Selection strongly depends upon phenotypic observationsWhat is the Breeding value of this cow for milk production?
Well, How this GS different form from MAS..?? GS: genome-wide panel of dense markers so that all QTls are in LD with at least one marker MAS concentrate on a small number of QTLs that are tagged by markers with well verified associations. Marker-assisted selection strategies increase gain mainly through gain per unit time, rather than gain per cycle (Bernardo and Yu, 2007)From Marker-Assisted Selection to Genomic Selection
Markers used for GS must be able to tag all loci that explain some of the phenotypic variation of the trait of interest in the selection population. Platforms for genotyping must cover the whole genome sufficiently, implying that linkage disequilibrium (LD) between neighbouring markers and accuracy of genotyping must be high. Higher marker densities are required if LD is expected to be low, such as in biparental populations, which are common in crop breeding.
Genotyping gives us ‘picture/snapshot’ of the genetic makeup of an individual - The more SNP the clearer the pictureHow…?By identifying the markers linked to the trait of interestMABC:Goal is to upgrade an established elite genotype with trait(s) controlled by one or a few loci, backcrossing is used to introgress a single geneMARC:+Information from a large number of markers could be used to estimate breeding values without having a precise knowledge of where specific genes are located on the genome• This new source of information can now (or soon) be used in genetic evaluation by – Combining genotyping data with traditional pedigree and phenotypic recordsCan breeder cut short intervals of time ?
The 'training population' is genotyped and phenotyped to 'train' the genomic selection (GS) prediction model. In GS main role of phenotyping is to calculate effect of markers & cross validation. Genotypic information from the breeding material is then fed into the model to calculate genomic estimated breeding values (GEBV) for these lines. Basic steps for implementation of GS can be summarized in four steps:(i) designing training populations with complete phenotypic and genotypic data, (ii) estimating marker effects in the training population, (iii) calculating GEBV of new breeding lines with genotype data, and (iv) selection
Validation is not theoretically essential for a GS, although it is practically important to confirm the adequacy of a GS model before moving onto the breeding phase.
Since, then marker of choice is very important to accurate estimate GEBVBs,&If want to do in polyploidy like wheat / crops with low markers. How can we accomplish ??
SNP is a biallelic markerAbundant in numberSNP marker throughout the genome average of one SNP every 1000 bases, 1 in 100 to 300 bases.
Precise phenotyping for a trait : accurate prediction GEBVs from a GS model
Genotyping by sequencing (GBS) in any large genome species requires reduction of genome complexityEfficient barcoding systemGenotyping-by-sequencing can be applied to different populations or even different species without any prior genomic knowledge as marker discovery is simultaneous with the genotyping of the populationThe use of GBS for GS, therefore, should be applicable to a range of model and Nonmodel crop species to implement genomics-assisted breeding. restriction-site-associated DNA sequencing (RAD-seq) Low coverage sequencing for genotypingThe above methods reduce the proportion of the genome targeted for sequencing so that each marker can be sequenced at high coverage with limited resources, thus enabling markers to be genotyped accurately across many individuals
GBS were 0.28 to 0.45 for grain yield, an improvement of 0.1 to 0.2 over an established marker platform for wheat. The prediction accuracies found in this study are sufficiently high to merit implementation of GS in applied breeding program It is unclear why more accurate predictions were observed with GBS than with DArT, even when controlling for marker number. One possibility is that the GBS markers are free of the genotypic ascertainment bias that is found with fixed array genotyping. No ascertainment bias+ Low per sample costPolymorphism discovery simultaneous with genotyping very good for wheat where polyploidy and duplications cause problems with hybridization/PCR assay GBS has become an attractive alternative technology for genomic selection. Genotyping-by-sequencing (GBS) technologies have proven capacity for delivering large numbers of marker genotypes with potentially less ascertainment bias than standard single nucleotide polymorphism (SNP) array Genotyping-by-sequencing (GBS) is an NGS approach that reduces genome complexity via restriction enzymes RFMVN EM Cornell University Buckler Group, Genotyping by sequencing or GBS, 1 Million SNPs, $30/sample. Later this year, $10-20/sample + DNA extractionGEBVs More Accurate than Current EBVsThe prediction accuracies found in this study are sufficiently high to merit implementation of GS in applied breeding programs. (Jp)
Genomic Prediction: basic ideaChoice of statistical methods for estimating marker Effects also can affect model accuracy. A variety of methods for genomic prediction is currently available. For brevity, we highlight three statistical methods available to train the GS model: ridge regression best linear unbiased prediction (RR-BLUP), Bayes-A, and Bayes-B.
(loss of elitebreeding lines)
Genomic selection for Fusarium head blight resistance in barleyGenomic selection for winter wheat breedingGenomic selection for soybean breedingGenomic selection for maize silage breeding
Ongoing projects
It is expected that genomic selection will revolutionize breeding in the next decade