1. Genomic selection on rice
Early generation selection in a recurrent
selection breeding program within a
synthetic population
Since 1967 / Science to cultivate change
Cécile Grenier
Tuong-Vi Cao
2. Genomic selection
Since 1967 / Science to cultivate change
• Decreased genotyping costs and
new statistical methods enable
simultaneous estimation of all
marker effects!
• GS – a new form of MAS that
estimates all marker effects across
the whole genome to calculate
genome estimated breeding values
(GEBVs )
• Markers are not tested for
significance – all markers are used
in selection
3. Genomic selection on Rice
Can genomic selection can be applied on rice synthetic population (SP)
managed through recurrent selection (RS)?
Can GS be adapted to Recurrent Genomic Selection (RGS)?
Since 1967 / Science to cultivate change
Theme 2 – Varietal Development
4. The breeding scheme
Since 1967 / Science to cultivate change
Recombination
Candidate
units
Evaluation
Phenotype in target
environments
Synthetic Population
3000 S0 plants
Varieties
Selected
units
Evaluations
5. The SP derived training and
breeding population
Fixation through
SSD for ~350 lines
Since 1967 / Science to cultivate change
Synthetic Population
3000 S0 plants 343 S2:4 and S3:5 families
Extraction of
400 S0 plants
Recombination
(35 plants)
Training Population
Breeding Population
6. Testing the Feasibility of Genomic Selection through Cross-Validations
Phenotypes (Y) Genotypes (X)
Since 1967 / Science to cultivate change
343 families
(from a SP with 10 cycles of recombination)
Whole Genome Regression Model
7. Since 1967 / Science to cultivate change
GBS technology
6,874 SNP with MAF ≥ 2.5% (1 marker every ~ 57 kb)
4,098 SNP with MAF ≥ 10.0% (1 marker every ~ 95 kb)
LD decay curve for chromosome 1 and MAF ≥ 10%
For ½ initial r², the average extent of LD is ~ 0.639 Mb, i.e. at least 610
markers are required to cover the whole genome
8. Heatmap (G matrix of 343 individuals with
Un-rooted Neighbor Joining
(dissimilarity matrix among 343 individuals with 6874 SNP)
6874 SNP)
Since 1967 / Science to cultivate change
The genetic material
9. Since 1967 / Science to cultivate change
The genetic material
Evaluation of the 343 families (301 S2:4 and 42 S3:5) under a Lattice Design with 2 repetitions
Panicle weight (h2=0.19) Grain yield (h2=0.30)
Flowering date (h2=0.86)
Plant height (h2=0.61)
10. Testing the Feasibility of Genomic Selection through Cross-Validations
Phenotypes (Y) Genotypes (X)
Since 1967 / Science to cultivate change
343 families
(from a SP with 10 cycles of recombination)
Whole Genome Regression Model
k-folds cross-validation:
100 samplings of Training Population (TP) and Validation Population (VP)
100 cor(y, X)
Mean of correlations: ‘Predictive ability of genomic selection’
11. GS in Rice synthetic populations
Regression models
G-BLUP
Ridged Regression
Bayesian LASSO
Bayesian RR
Since 1967 / Science to cultivate change
Limit for r² MAF (%) No. SNP
r² <= 0.75
2.5 1758
5 1158
10 678
r² <= 0.90
2.5 4314
5 3268
10 2152
r² <= 1.00
2.5 6874
5 5605
10 4098
k No. ind.
[tst]
3 114
6 57
9 38
Incidence matrix
choice of SNP markers based on LD and MAF
FD (Flowering date) {h2 = 0.86}
PH (Plant height) {h2 = 0.61}
PW (Panicle weight) {h2 = 0.19}
GY (Grain yield) {h2 = 0.30}
k-folds cross-validation
fraction k of the population (n=343)
used for validation
Traits
12. Statistical models for GS
Criteria rrBLUP B-RR B-LASSO
Variable selection No No Yes
Marker effects All markers with same
2 σ2, λ2
Since 1967 / Science to cultivate change
Penalized regressions
– Parametric linear regression models (frequentist and Bayesians)
• Ridge Regression (RR), Best Linear Unbiased Predictors (BLUP), Least Absolute Shrinkage and
Selection Operator (LASSO), G-BLUP, RR-BLUP, LASSO, Bayesian RR, Bayesian LASSO…
effect
– Non-parametric nonlinear models
• RKHS, NN, RBFNN
All marker have an effect Some markers have null
effect
Parameter shrinkage
of estimates effects
Same extend of
shrinkage
Same extend of
shrinkage
Marker-specific shrinkage
Hyper-parameters No σβ
Distribution of effects Gaussian Gaussian Double exponential
Best for… Trait controlled by many
loci w. small effects
Trait controlled by few loci
varying in effect size
13. Regression model and marker effects
Bayesian LASSO (MAF2.5 - r2≤0.75) with 9-fold CV -- Grain yield
cor(ŷ[tst], y[tst]) = 0.25
cor(ŷ, y) = 0.84
Since 1967 / Science to cultivate change
Marker Effects
14. Accuracy is function of trait genetic architecture,
heritability and, for FD, of choice of markers
Bayesian LASSO (9 X matrices) with 9-fold CV
Grain yield {h2 = 0.30} Panicle weight {h2 = 0.19}
Since 1967 / Science to cultivate change
0.600
0.500
0.400
0.300
0.200
0.100
0.600
0.500
0.400
0.300
0.200
0.100
0.000
0 2000 4000 6000 8000
0.600
0.500
0.400
0.300
0.200
0.100
0.000
0 2000 4000 6000 8000
0.600
0.500
0.400
0.300
0.200
0.100
0.000
0 2000 4000 6000 8000
0.000
0 2000 4000 6000 8000
Flowering date {h2 = 0.86}
Plant height {h2 = 0.61}
No. SNP
Accuracy (cor(ŷ, y))
No. SNP
Accuracy (cor(ŷ, y))
15. Selection of markers (predictors) based on LD
improved the accuracy for oligogenic traits
Flowering date (9 X matrices, 3-fold CV, Ridged Regression)
2.5%
5.0%
Since 1967 / Science to cultivate change
7.5%
10%
0.50
0.40
0.30
0.20
0.10
0.00
-0.10
0 1000 2000 3000 4000 5000 6000 7000
Accuracy = corr(Yobs, Yhat)
Number of SNP
No. SNP
r² <= 0.75 r² <= 0.80 r² <= 0.90 r² <= 1.00
Series5 Series6 Series7
Accuracy (cor(ŷ, y))
16. Slight superiority of the Bayesian Statistics
Since 1967 / Science to cultivate change
Grain Yield (9 X matrices with 9-fold CV)
0.350
0.300
0.250
0.200
0.150
0.100
0.050
0.000
0 2000 4000 6000 8000
BL
BRR
GBLUP
RR
No. SNP
Accuracy (cor(ŷ, y))
17. The RS breeding scheme
Since 1967 / Science to cultivate change
Recombination
Candidate
units
Evaluation
Phenotype in target
environments
Synthetic Population
3000 S0 plants
Varieties
Selected
units
Evaluations
18. The RGS breeding scheme
Since 1967 / Science to cultivate change
Recombination
Candidate
units
Whole
Genome
Genotyping
Breeding Population
3000 S0 plants
Promising lines
Selected
units
GEBVs
Genomic
prediction
MET Evaluations
GS models
Evaluations
Training Population New varieties
19. Since 1967 / Science to cultivate change
Conclusions
• Yes, GS on rice synthetic population is feasible!
• Although not fantastic accuracies were achieved, it was 1 site, 1 year and a
first promising result
• Small accuracy may still be worth considering the cost of field evaluation,
the gain in time to select during the off-season and the possibility to apply
stronger selection intensity
• Soon to come:
More data, more sites, and more adequate statistics (experimental
design and multi-site evaluations accounted in the model) for
nonparametric non linear models
GS on the breeding population using the entire training population
to develop the genomic prediction model
Maximizing the benefit of GS on earlier generation of the RS
scheme (S0 generation)