Whole Genome Selection

Whole Genome Selection ;
Theoretical Consideration

Raghavendra N.R

Ph.D Scholar
Plant Breeding & Genetics

Presentation Overview
1.
2.
3.
4.
5.
6.

Introduction (GS)
Why Genomic Selection ?
Steps involved in GS
Factors contribute to success of GS
Future directions of GS
Conclusion

Method of Selection ; Where we come
from..??
Genetic gain /GA;
Selection was played important role in
Human-plant co evolution
ΔG=

Accuracy of selection X intensity of selection X genetic standard deviation
Generation interval

Selection in GS is usually based on Genomic estimated
of breeding values.
Selections can take place in laboratory

Method of Selection ; Where we come
from..??
Genetic gain

selection

QTL/gene
Phenotype
Breeder
Genotype
Find markers
Find population

Traditional Selection
Traits with low heritability
Traits that are expressed late in individual’s life
Traits that can not be measured easily (ex: disease resistance &
quality traits)
Time consuming and the rate of breeding is slow

Limitations of MAS
“Picking the low hanging fruit”

The genes with big QTL effects

The major success is only achieved with the qualitative traits

The biparental mapping populations used in most
QTL studies do not readily translate to breeding
applications

The term ‘GS’ was first introduced by Haley and Visscher at the
6th World Congress on Genetics Applied to Livestock Production
at Armidale, Australia in 1998.

Dr. Theo Meuwissen

GS was first propounded by Meuwissen et al (2001) :
Seminal paper ‘Meuwissen et al (2001) Prediction of total genetic value
using genome-wide dense marker maps. Genetics 157: 1819-29.”

Whole Genome Selection

Genomic Selection; an emerging breeding methodology designed to
exploit high-throughput, inexpensive DNA marker information to
accurately predict the genetic value of breeding candidates for complex
traits.
EBV; An estimate of the additive genetic merit for a particular trait
that an individual will pass on to its descendant's.”

GEBVs; Prediction of the genetic merit of an individual based on its
genome.

Genome Selection
Trace all segments of the genome with markers
-Capture all QTL = all genetic variance

Predict genomic breeding values as sum of effects over all
segments

Genomic selection exploits LD.
Genomic selection avoids bias in estimation of effects due
to multiple testing, as all effects fitted simultaneously.

How to estimate Breeding value?

X

10 litre

What is the Breeding value of this
cow for milk production?

0.5 litre

8 litre

10 litre

12 litre

Breeding value =h2(milk production-average)
= (12-7.625)*h2

= 4.35 litres

GS
GS : genome-wide panel of dense
markers so that all QTls are in LD
with at least one marker

MAS
MAS concentrates on a small number of
QTLs that are tagged by markers with well
verified associations.

120 cms

15cms

Why Genomic selection
important to turn on now..??
Relatively slow progress via phenotypic
selection
Large cost of phenotyping
Limited throughput (plot area, time, people)
QTs + small effects
Decreasing cost of genotyping
Promising results from simulation and cross
validation of GS.

Meet the challenge of feeding 9.5
billion @ 2050.

Pre-requisite for the introduction of GS
The need for adequate and affordable genotyping
platforms.
Relatively simple breeding schemes in which
selection of additive genetic effects will generate
useful results.
Statistical methods.

How can we do that..?
Crops are Concerned

Prerequisite
Training Population (genotypes + phenotypes)
Selection Candidates (genotypes)
Accurate phenotypes
Inexpensive,
high-density genotypes

Heffner et al (2009)

Training Population
Biparental vs. Multi-Family
Biparental
1.
2.
3.
4.
5.
6.

Population specific
Reduced epistasis
Reduced number of markers required
Smaller training populations required
Balanced allele frequencies
Best for introgression of exotic

Multi-Family
1.
2.
3.
4.
5.

Allows prediction across a broader
range of adapted germplasm
Allows sampling of more E
Cycle duration is reduced because
retraining model is on-going.
Allows larger training populations
Greater genetic diversity

Cardinal points for success of GS
1. Population type & size of training population
2. Genotyping Platforms & marker densities.
3. Availability of HD genome wide markers.
4. Appropriate statistical methods for accurate
GEBVs.
5. Epistasis & G x E.

6. Linkage disequilibrium
7. Long term selection

Marker types & Marker density
SNP

DArT

GBS

SNP chip in Genomic selection
Single markers (gene) predict in very small
differences.
Abundant in nature. 1kb-2SNP.
Predicting differences in BVs.

What sequences we can call as haplotypes?

The similar haplotypes will make haplotype block where there will be high LD and
less recombination's.

Is GBS a suitable marker platform for genomic
selection?

Obviously ..!!!

GBS
Elshire et al (2010)

GBS accesses regulatory regions and sequence tag mapping.
Flexibility and low cost.
GBS markers led to higher genomic prediction accuracies.

Impute missing data.
Highly multiplexed

Even for a species with a genome as challenging as wheat
(Absence of a reference genome)

Poland et al (2011)
Statistical model used
i.

RF

ii.

MVN EM

GBS markers are more uniformly distributed across the genome than the DArT markers

GS Prediction Accuracies
Number and size of QTLs.
LD between marker and QTLs.
Marker density, marker type, and training population
size.

Number of lines increases (accuracy GEBVs ↑)

GS Prediction Accuracies
Heritability of the trait.
Genetic structure of the trait.
Simulation study results.
Cross-validation; How close is the simulated data to
real data?

Genomic selection prediction models

Meuwissen et al (2001) Prediction of total genetic value using genome-wide dense
marker maps. Genetics 157: 1819-29.

Stepwise Regression (SR)
Select most significant markers on the basis of arbitrary
significant thresholds and non significant markers effect equals
to zero.
(Lande and Thompson, 1990)
Estimate the effect of significant markers using multiple
regression Since, only a portion of the genetic variance will be
captured.
Limitations :
Detects only large effects and that cause overestimation of
significant effects.
(Goddard and Hayes, 2007; Beavis, 1998 )
SR resulted in low GEBVs accuracy due to limited detection of
QTLs.
(Meuwissen et al 2001)

Ridge Regression BLUP (RR-BLUP)
Simultaneously select all marker effects rather than categorizing
into significant or having no effect

Ridge regression shrinks all marker effects towards zero.
The method makes the assumption that markers are random
effects with a equal variance. (Meuwissen et al 2001)
Limitations :
RR-BLUP incorrectly treats all effects equally which is
unrealistic.
(Xu et al 2003)
RR-BLUP Superior to SR

Bayesian Regression (BR)
Marker variance treated more realistically by assuming
specified prior distribution.

BayesA: uses an inverted chi-square to regress the marker
variance towards zero.
All marker effects are > 0 (Bayes A)
BayesB: assume a prior mass at zero, thereby allowing for
markers with no effects.
Some marker effects can be = 0 (Bayes B)
(Meuwissen et al 2001)

Other potential Genomic selection
prediction models
i.

Least absolute shrinkage and selection operator (LASSO)

ii. Reproducing Kernel Hilbert spaces and support vector
machine regression. (RKHS) Gianola et al (2006)
iii. Partial Least Squares regression & principle component
regression.

iv. RF (R package random forest)
v. MVN EM Algorithm

R-Package for GS

http://www.r-project.org

A genome of 1000 cM was simulated with a marker spacing of 1 cM

Modeling epistasis and dominance
Accurate prediction of dominance and epistatic effects fetch
advantageous.
Lorenza et al pointed out inclusion of epistatic effects in prediction
models will give improve accuracy with condition as;


Epistasis is present & can be modelled accurately.
Blanc et al (2006) reported that epistasis will contribute to
marker effects.

Empirical studies harnessing data are illuminating for this topic.

GS in relation to strong subpopulation
structure
GWAS studies, SPS potentially cause spurious long distance /
unlinked association b/w marker allele & phenotype.

GS, shifts to being able to maintain predictive ability despite a
structure training data set & spurious association will not be an
important cause for loss of predictive ability.

LD is not consistent, allelic effects estimated in one subpopulation
will not be predictive for another subpopulation.

Long-term selection
Improving gain in the long-term necessarily requires a trade-off
with short-term gain.

Long-term gain is often explicit, as in quantitative genetic models
that maximize immediate predicted gain subject to a constraint on
the rate of inbreeding. Meuwissen (1997).
Two approaches:
1. Select individuals or groups
2. Analytical prediction, deterministic simulation using
Numerical approaches to optimization, and stochastic
simulation

Has proved its value in animal breeding particularly dairy cattle
(Hayes and Goddard, 2010)
Still to prove its value over generations in crop plants

Simulation studies in plants suggest potential for improved gain per
unit time.
(Jannink et al 2010)

Future Directions..???
GS has been seldom implemented in the field

Where to apply GS in the breeding cycle
(which generations)
How many lines to select for genotyping.
Where and how do we place our training population in
comparison to the selection candidates?

Future Directions..???

How many markers are required, determined by the
extent of LD.
How can we implement non additive effects into our
models to allow predictions across multiple generations?
How do non-additive effects affect the accuracy of
genomic selection.

How often to re-estimate the chromosome segment
effects?

Outstanding questions that remain
unanswered..??

How much gain do we expect when using GS?
how much potential loss ??
can a breeding program absorb?

GS future perspectives
Training population design.

Epistatic modelling in GS.
Strength of different statistical methods.
Managing short & long term gain.

Further Interest..??
Visit….
Lorenz Lab
Department of Agronomy & Horticulture
University of Nebraska-Lincoln
http://www.lorenzlab.net
Rex Bernardo
Department of Agronomy and Plant Genetics
University of Minnesota

Ongoing projects on GS
Crop

Trait

Markers

FUNDING
AGENCY

PROJECT
DURATION

Tomato

Quality, shape,
shelf life

SNP

Barley

FHB resistance

SNP

Univ. of Minnesota

2013

Trifolium

Yield

SNP

Danish plant
research and for
Aarhus University

2010-2015

Wheat

Winter wheat

genotype-bysequencing

Wheat Breeding
Presidential Chair

2014

Maize

Drought

SNP

CIMMYT

2014

Maize

Total biomass
yield and silage
quality

SNP

USDA-AFRI

2014

Sugar beet

White sugar
yield, sugar
content

SNP

State Plant Breeding
Institute, University
of Hohenheim

2013

2009-2013
USDA/AFRI

Conclusion

“Nothing In Science Has Any Value To
Society If It Is Not Communicated”-Anne
Roe

Whole Genome Selection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Whole Genome Selection

Similar to Whole Genome Selection (20)

Recently uploaded

Recently uploaded (20)

Whole Genome Selection

Editor's Notes