SlideShare a Scribd company logo
1 of 47
Partner Logo
Partner
Logo
Genomic selection in Livestock
Raphael Mrode, ILRI
Essential Knowledge for Effective Improvement and Dissemination of
Genetics in Sheep and Goats
3 – 5 November 2020
Addis Ababa , Ethiopia
2
The basic goal: genetic progress and
increased productivity
 Identifying animals with best genetic merit as parents of
the next generation genetic improvement
Distributio
n of offsprin
g p
heno
ty
pes
Gene
tic
im
pro
vem
en
t
O
P
P
P
Distributio
n of ph
enoty
pes in the p
are
ntal ge
neration
Anim
als selected
to be parents
P
S
P
P
3
The basic goal: genetic progress and
increased productivity
 To achieve this goal we need accurate estimation breeding
values (EBVs)
 However what is available are phenotypes (Y) which are
influenced by genetic and environmental effects
 Y = Genetic (G) + Environment (E)
 Thus Var(Y) = Var(G) + Var(E)
 Sources of Var(E) could be environmental and management factors
 Sources of Var(G) due to different forms of inheritance leading to
different components of genetic variance (e.g. additive genetic
variance, additive maternal genetic variance and so on.
 Accurate estimate of G (EBVs) is our challenge
4
Reality of Field data
 Often we deal with field data → variety of
environmental factors, animals with different degree of
relatedness, many generations and unbalanced
 We need framework to model phenotypic observations
accounting for non-genetic systematic sources of
variation (Var(E)) to estimate EBV accurately
 The linear mixed model provides such a framework
5
Example data and pedigree files
• Data file Pedigree file
6
Linear mixed model
In matrix notation, a mixed linear model may be represented as
y = Xb + Za + e
where
y = n x 1 vector of observations; n = number of records.
b = p x 1 vector of fixed effects; p = number of levels for fixed (lactation
number) effects
a = q x 1 vector of random animal effects; q = number of levels for
random effects
e = n x 1 vector of random residual effects
X = design matrix of order n x p, that relates records to fixed effects
Z = design matrix of order n x q, that relates records to random animal
effects
Both X and Z are both termed design or incidence matrices.
7
Assumptions of the linear mixed model
 It is assumed that residual effects are independently
distributed with variance σ2
e, therefore, var(e) = Iσ2
e = R;
var(a) = G = Iσ2
a or Aσ2
a and A is the numerator
relationship matrix. Thus


















R
0
0
G
e
a
V
8
MME for breeding values
 Mixed Model equations (MME) with the relationship
matrix incorporated are
 with α = σ2
e/σ2
a = 1-h2/h2
=
+ 



























 y
Z
y
X
a
b
A
Z
Z
X
Z
Z
X
X
X
1 ˆ
ˆ

9
Limitations of A matrix
 The relationship matrix A based on pedigree is an
average relationship which assumes infinite loci.
 Real relationships are a bit different due to finite
genome size
 Therefore A is the expectation of realized relationships
 Two half-sibs might have a correlation of 0.3 or 0.2
10
Use of microsatellites as markers and
limitations
 Initially microsatellites were used as genetic markers in the 1980s
and 1990s.
 Microsatellites are set of short repeated DNA sequences at a
particular locus on a chromosome, which vary in number in
different individuals and so can be used as markers
 Most significant genetic marker can be 10 cM or more from the
QTL, therefore QTL are not mapped precisely.
 The association between marker and QTL may not persist through
the population.
 The phase between marker and QTL may have to estimated for
each family
10
11
Single Nucleotide Polymorphism (SNP).
• SNP is a DNA sequence variation occurring when a
single nucleotide — A, T, C, or G — in the genome
differs between paired chromosomes in an individual.
• For example, two sequenced DNA fragments from an
individual, AAGCCTA to AAGCTTA, contain a difference in
a single nucleotide.
• In this case we say that there are two alleles: C and T.
Almost all common SNPs have only two alleles.
11
12
SNP
Whole Genome Sequence & Genotyping chips
 Began with Human (2001) and mice, WGS of the chicken (2004), the dog
(2005), bovine (2006), horse (2007), pig (2009), ...
 New technologies for genotyping and sequencing
 Simultaneous genotyping of many SNP
 From few dozens up to several million SNP
 Two main technology providers, Illumina and Affymetrix
 Illumina products in cattle
 3000 (7000 1000 20000=« LD »)
 54 000=« 50k »
 777 000=« HD »
14
Genomic Selection (GS)
 GS - the use of genomic breeding values (GEBV), used for the
selection of animals.
 Genomic selection requires that markers (SNPs) are in linkage
disequilibrium (LD) with the QTLs across the whole population
 Thus the use of SNPs as markers enables all QTL in the genome
to be traced through the tracing of chromosome segments
defined by adjacent SNPs.
15
Steps in Genomic Selection (GS)
• Genotype animals with phenotypes (sires with daughters records for sex
limited traits)
• Estimate SNP solutions (SNP Key) in the reference population
• Validate in another data set but records excluded to determine accuracy of
SNP key
• Genotype animals at birth or young age (no phenotypes) and use SNP key to
prediction their GEBV and do selection
Reference
population
Genotyped and
phenotyped animals
Genotyped but no
phenotypes
Selection
candidates
Genotyped & phenotyped
but phenotypes excluded
Validation
candidates
16
Main advantages of Genomics
 Young bulls can be genotyped early in life and breeding values
computed
 Can be used to select young bulls to be progeny tested, thereby
reducing cost
 Higher accuracy of about 20-40% for young bulls above parent
average
 Reduction in generation interval
17
Genomic Selection : efficiency
 Two main factors :
• Accuracy of SNP effect estimation
• size of reference population
• heritability of the trait
• statistical methodology used
• Linkage Disequilibrium (LD) between markers and QTL
• marker density
• effective size of the population => number of
« independent » segments
• Relationship between candidates and reference population
18
Size of the Reference populaton
 Greatly influences the accuracy of genomic evaluations
(Goddard, 2008)
20
25
30
35
40
45
20 25 30 35 40 45
DYD
Estimated BV
Training set
20
25
30
35
40
45
20 25 30 35 40 45
DYD
(proxy
of
true
BV)
GEBV
Training set
Validation set
Two important parameters :
• R2 or r(DYD, GEBV) (should be « large enough »)
• slope of the regression (should be close to 1)
Overestimation
= “inflation”
Validation test
20
Increasing the size of the Reference
populations
 Genotype as many progeny tested sires as possible
 International collaborations
 Holstein: 2 big consortia
USA + Canada ~>35000 bulls + UK + Italy
Eurogenomics France, The Netherlands, (Germany),
Nordic countries, Spain,Poland ~34000 bulls (?)
 For small breeds or other species (goats, Sheep, beef cattle : not
enough sires
 combine with many genotyped cows
 About 4 -5 cows records provide equivalent information to one
proven sire (Goddard (2009) and Daetwyler et al. (2013) )

21
General linear model
The general linear model underlying genomic evaluation is of the form
y = Xb + gi + e
where
m is the number of SNPs ; y is the data vector,
b the vector for mean or fixed effects
gi the genetic effect of the ith SNP genotype and e is the error.
The matrix M is of the dimension n (number of animals) and m, and Mi relates
the ith SNPs to data
It is assumed that all the additive genetic variance is explained by all the markers
effects such that the estimate of animal’s total genetic merit or breeding value
(a) is: a = gi.

m
i
i
M

m
i
i
M
22
Data types used for genomic evaluation
• y = YD (Yield deviation) = Individual record corrected for
all fixed and non genetic random effects
• y = DYD (Daughter yield deviation) = twice average for
a bull of all YD of their daughters corrected for ½ genetic
merit of their dams (with associated weight = EDC
(Equivalent Daughter Contribution
• y = de-regressed proofs -- obtained by solving the MME
to get the right-hand side
• EBVs --- NO
23
Coding and scaling genotypes
• The genotypes of animals (elements of M) are
commonly coded as 2 and 0 for the two homozygotes
(AA and BB) and 1 for the heterozygote (AB).
• Or if alleles are expressed in terms of nucleotides, and
reference allele at a locus is G and the alternative allele
is C, then code 0 = GG , 1 = GC and 2 = CC.
• The diagonal elements of MM’ then indicate the
individual relationship with itself (inbreeding) and the
off-diagonal indicate the number of alleles shared by
relatives
24
Scaling of genotypes
• SNPs → 2 alleles A/B but only one effect defined substitution effect mi
• Commonly elements of M are scaled
– to set the mean values of the alleles effects to zero
– account for differences in allele frequencies of the various SNPS
– Let the frequency of the second or alternative allele at locus j be pj
– Elements of M can be scaled by subtracting 2pj.
– If the element of column j of a matrix P equals 2pj, then matrix Z,
which contained the scaled elements of M is : Z = M - P.
• Furthermore, the elements of Z be normalised by dividing
the column for marker j by its standard deviation assumed to
be
.
25
Mixed linear model for computing SNP
effect
• The most common random model used assumes
– the effect of the SNP are normally distributed,
– all SNP are from a common normal distribution (eg. the same genetic variance for all
SNPs).
• There are two equivalent models with these assumptions
• (1) SNP-BLUP - a model fitting individual SNP effects simultaneously.
– DGV for selection candidates are calculated as DGV = Zĝ, where ĝ are the estimates of
random SNP effects.
– Assumes σ2
g is known but this may not be the case in practise and σ2
g may be
approximated from σ2
a.
• (2) GBLUP - a model estimates DGV directly, with a (co) variance among
breeding values of G σ2
a, where G is the genomic relationship matrix, the
realised proportion of the genome that animals share in common estimated
from the SNP.
26
SNP BLUP model
 In matrix form, model is
 Y = Xb + Zg + e
 Y = vector of observations: these can be de-regressed
EBVs, phenotypes corrected for all fixed effects
 where g = vector of additive genetic effects
corresponding to allele substitution effects for each SNP
and Z = scaled matrix of genotypes
 MME are below with α = σ2
e/σ2
g
































y
Z
y
X
g
b
I
Z
Z
X
Z
Z
X
X
X
ˆ
ˆ
α
27
SNP-BLUP
• If y in MME = de-regressed breeding values of bulls,
then
– Each observation may be associated with differing reliabilities.
– Thus a weighted analysis may be required to account for these
differences in bull reliabilities.
– Weight (wti) = effective daughter contribution or wti = (1/
reldtr) – 1, where reldtr is the bull’s reliability from daughters
with parent information excluded
28
SNP-BLUP
• The MME then are
• where R = D and D is a diagonal matrix with diagonal
element i = wti.
• In practise, the value of σ2
g may not been known and σ2
g
could be obtained
• either as σ2
g = σ2
a /m, with m = the number of markers
• or as σ2
g = σ2
a /2Σpj(1 – pj)
• and α = 2Σpj(1 – pj) *[ σ2
e/σ2
a]










































y
R
Z
y
R
X
g
b
I
Z
R
Z
X
R
Z
Z
R
X
X
R
X
1
1
1
1
1
1
ˆ
ˆ
α
29
Example 1
FAT SNP
Animal Sire Dam Mean EDC DYD Genotype
13 0 0 1 558 9.0 2 0 1 1 0 0 0 2 1 2
14 0 0 1 722 13.4 1 0 0 0 0 2 0 2 1 0
15 13 4 1 300 12.7 1 1 2 1 1 0 0 2 1 2
16 15 2 1 73 15.4 0 0 2 1 0 1 0 2 2 1
17 15 5 1 52 5.9 0 1 1 2 0 0 0 2 1 2
18 14 6 1 87 7.7 1 1 0 1 0 2 0 2 2 1
19 14 9 1 64 10.2 0 0 1 1 0 2 0 2 2 0
20 14 9 1 103 4.8 0 1 1 0 0 1 0 2 2 0
21 1 3 1 13 7.6 2 0 0 0 0 1 2 2 1 2
22 14 8 1 125 8.8 0 0 0 1 1 2 0 2 0 0
23 14 11 1 93 9.8 0 1 1 0 0 1 0 2 2 1
24 14 10 1 66 9.2 1 0 0 0 1 1 0 2 0 0
25 14 7 1 75 11.5 0 0 0 1 1 2 0 2 1 0
26 14 12 1 33 13.3 1 0 1 1 0 2 0 1 0 0
30
Example 1
• The observations are the daughter yield deviations for fat yield and the effective
daughter contribution (EDC) for each bull is also given.
• The EDC can be used as weights in the analysis but will ignore for this presentation
• It is assumed the genetic variance for fat yield is 35.241kg2 and residual variance of
245kg2
• Animals 13 to 20 as assumed as the reference population and 21 to 26 as
validation candidates.
• SNP effects are predicted using using all 10 SNPs.
• The incidence matrix X = Iq , with q = 8, the number of
animals in the reference population
31
Computing the matrices we need
• The incidence matrix X = Iq , with q = 8, the number of
animals in the reference population
• X’ = [ 1 1 1 1 1 1 1 1]
• The computation of Z requires calculating the allele
frequency for each SNP.
32
Computing Matrices
• The allele frequency for the ith SNP was computed as
with n = 14, the number of animals with genotypes and mij are
elements of M.
• Allele frequencies 0.321, 0.179, 0.357, 0.357, 0.143,
0.607, 0.071, 0.964, 0.571 and 0.393 respective.
• Using those frequencies 2Σpj(1 – pj) = 3.5383. Thus α =
3.5383*(245/35.242) = 24.598
n
*
2
m
n
j
ij

33
Z matrix
• Z= M – P and is
• We have computed X and Z.
• Remaining matrices X’Z and Z’X and Z’Z are computed by
multiplication. Then add Iα to Z’Z then MME are formed.
• When solved we these solutions:


































































0.786
0.857
0.071
0.143
0.214
0.286
0.714
0.286
0.643
0.643
0.786
0.857
0.071
0.143
0.786
0.286
0.286
0.286
0.357
0.643
0.214
0.857
0.071
0.143
0.786
0.286
0.286
0.714
0.643
0.357
1.214
0.143
0.071
0.143
1.214
0.286
1.286
0.286
0.643
0.643
0.214
0.857
0.071
0.143
0.214
0.286
0.286
1.286
0.357
0.643
1.214
0.143
0.071
0.143
1.214
0.714
0.286
1.286
0.643
0.357
0.786
0.143
0.071
0.143
0.786
0.286
0.714
0.714
0.357
0.357
1.214
0.143
0.071
0.143
1.214
0.286
0.286
0.286
0.357
1.357
Z
34
Computing GEBVs
• Solutions
• -----------------------
• Mean effect
•
• 9.944
•
• SNP effects (ĝ)
• 1 0.087
• 2 -0.311
• 3 0.262
• 4 -0.080
• 5 0.110
• 6 0.139
• 7 0.000
• 8 0.000
• 9 -0.061
• 10 -0.016
• The SNP solutions are also called as the SNP key
35
GEBVs for Validation animals
• The DGV for the reference animals (animals 13- 20) is
then computed as Zĝ.
• For the validation animals (animals 21 -26) , DGV = Z2ĝ
where Z2 contains the centralised genotypes for the
validation candidates
36
Solutions for validation animals



















































































































016
.
0
061
.
0
000
.
0
000
.
0
139
.
0
110
.
0
080
.
0
262
.
0
311
.
0
087
.
0
786
.
0
143
.
1
929
.
0
143
.
0
786
.
0
286
.
0
286
.
0
286
.
0
357
.
0
357
.
0
786
.
0
143
.
0
071
.
0
143
.
0
786
.
0
714
.
0
286
.
0
714
.
0
357
.
0
643
.
0
786
.
0
143
.
1
071
.
0
143
.
0
214
.
0
714
.
0
714
.
0
714
.
0
357
.
0
357
.
0
214
.
0
857
.
0
071
.
0
143
.
0
214
.
0
286
.
0
714
.
0
286
.
0
643
.
0
643
.
0
786
.
0
143
.
1
071
.
0
143
.
0
786
.
0
714
.
0
286
.
0
0714
357
.
0
643
.
0
214
.
1
143
.
0
071
.
0
857
.
1
214
.
0
286
.
0
714
.
0
714
.
0
357
.
0
357
.
1
ˆ
ˆ
ˆ
ˆ
ˆ
ˆ
26
25
24
23
22
21
a
a
a
a
a
a






















354
.
0
054
.
0
143
.
0
240
.
0
114
.
0
027
.
0
37
GBLUP
 Equivalent model to SNP-BLUP
 BLUP MME but with A-1) replaced by G-1
 The DGV is computed directly as the sum of the SNP
effects(a = Zg)
 Model is
 y = Xb + Wa + e
 where a = vector of DGVs and W is the design matrix
linking records to animals
 Matrix X is as defined before and W is an identity
matrix ( a diagonal matrix with all diagonal elements =
1)
38
GBLUP
 Given that a = Zg
 Then var(a) = ZZ’σ2
g.
 Note that σ2
g =
 then the matrix ZZ’ can be scaled such that

 G =

 and var(a) = Gσ2
a .
 Division by 2Σpi(1−pi) makes G analogous to A.
  )
p
(1
p
2
σ
j
j
2
a
 

)
p
(1
p
2
Z
Z
j
j
39
G matrix from 42K SNPs
Gall =
13 0.957
14 -0.108 0.973
15 0.452 -0.116 1.182
16 0.209 -0.058 0.424 1.025
17 0.234 -0.083 0.425 0.312 1.037
18 -0.040 0.438 0.097 -0.047 -0.043 1.151 symmetric
19 -0.089 0.458 0.039 -0.067 -0.070 0.426 1.175
20 -0.093 0.460 0.053 -0.058 -0.063 0.432 0.707 1.183
21 0.077 -0.082 0.064 0.104 0.082 -0.071 -0.069 -0.069 1.031
22 -0.056 0.418 0.093 -0.046 -0.038 0.408 0.355 0.342 -0.044 1.139
23 -0.005 0.464 -0.038 -0.035 -0.038 0.206 0.223 0.215 0.011 0.280 0.993
24 -0.070 0.468 0.075 -0.027 -0.053 0.403 0.521 0.550 -0.079 0.424 0.260 1.198
25 -0.052 0.416 0.098 -0.009 -0.031 0.386 0.363 0.342 -0.038 0.370 0.219 0.419
1.125
26 -0.070 0.493 -0.084 -0.039 -0.044 0.258 0.241 0.270 -0.072 0.253 0.178 0.259
0.214 1.009
40
A matrix for the same individuals
13 1.008
14 0.033 1.037
15 0.545 0.021 1.041
16 0.288 0.021 0.536 1.016
17 0.285 0.031 0.541 0.293 1.020
18 0.047 0.580 0.036 0.028 0.032 1.062
19 0.033 0.613 0.021 0.021 0.031 0.365 1.095 symmetric
20 0.033 0.613 0.021 0.021 0.031 0.365 0.613 1.095
21 0.099 0.031 0.082 0.118 0.074 0.028 0.031 0.031 1.021
22 0.046 0.586 0.032 0.031 0.039 0.351 0.373 0.373 0.044 1.068
23 0.096 0.569 0.067 0.043 0.047 0.329 0.357 0.357 0.042 0.338 1.050
24 0.041 0.574 0.027 0.019 0.026 0.331 0.406 0.406 0.028 0.335 0.335 1.056
25 0.033 0.548 0.035 0.039 0.039 0.315 0.336 0.336 0.037 0.321 0.310 0.310 1.029
26 0.035 0.588 0.023 0.024 0.039 0.337 0.376 0.376 0.036 0.347 0.341 0.348 0.325 1.070
41
GBLUP
• MME are
• where α now equals σ2
e/σ2
a . Solutions for example in previous
table
• Advantages:
– Existing software for genetic evaluation can be used by replacing A with G
– systems of equations are of the size of animals which tend to be fewer
than the number of SNP.
– In pedigreed populations G discriminates among sibs, and other relatives,
capture information on Mendelian sampling.
– method is attractive for populations without good pedigree as G will
capture this information among the genotyped individuals











































y
R
W
y
R
X
a
b
G
W
R
W
X
R
W
W
R
X
X
R
X
1
1
1
1
1
1
ˆ
ˆ
1α
42
Solutions for the example data
• Reference Animals
• 13 0.069
• 14 0.116
• 15 0.049
• 16 0.260
• 17 -0.500
• 18 -0.359
• 19 0.146
• 20 -0.231
•
• Selection or validation candidates
• 21 0.028
• 22 0.115
• 23 -0.240
• 24 0.143
• 25 0.054
• 26 0.353
43
Single Step Method
GBLUP computes genomic breeding values only for
genotyped animals.
How can non-genotyped animals benefit from genomic
information
Let g2 be the genetic (genomic) values of genotyped animals and
g1 the genetic values of non genotyped animals
An estimate of g1 based on genomic information is obtained by
regression of g1 on g2 and added to information from BLUP through
the usual MME
44
Single Step Method
• We define variance of vector of g1 (non-genotyped) and
g2 (genotyped)
H = Variance of
1
2
 
 
 
g
g
 
1 1 1
11 12 11 12 22 22 22 21 12 22
1
21 22 22 21
=
  

 
 
 
  
 
   
H H A A A G A A A A A G
H
H H GA A G
non genotyped genotyped
45
Single Step Method
• Model is just as before but uses all data (genotyped
and ungenotyped):
• MME are the usual but with A-1 replaced with H-1
• Surprisely, H-1 has simple form:
  
y X Za e
1

  
   

     
     
 
' ' '
'
' '
X X X Z 1 y
Z y
X g
Z Z Z H
better lives through livestock
ilri.org
ILRI thanks all donors and organizations who globally supported its work through their contributions to the CGIAR Trust Fund
CRP and CG logos

More Related Content

What's hot

Animal genetic resource conservation and biotechnology
Animal genetic resource conservation and biotechnologyAnimal genetic resource conservation and biotechnology
Animal genetic resource conservation and biotechnology
Bruno Mmassy
 
Genetics of animal breeding 9
Genetics of animal breeding 9Genetics of animal breeding 9
Genetics of animal breeding 9
zerdon
 
Genomic selection
Genomic  selectionGenomic  selection
Genomic selection
pandadebadatta
 

What's hot (20)

Mating System and Livestock Breeding Policy
Mating System and Livestock Breeding Policy Mating System and Livestock Breeding Policy
Mating System and Livestock Breeding Policy
 
Breeding Approaches Towards Disease Resistance In Livestocks
Breeding Approaches Towards Disease Resistance In LivestocksBreeding Approaches Towards Disease Resistance In Livestocks
Breeding Approaches Towards Disease Resistance In Livestocks
 
Animal breeding and selection
Animal breeding and selectionAnimal breeding and selection
Animal breeding and selection
 
Correlations
CorrelationsCorrelations
Correlations
 
Sire evaluation
Sire evaluationSire evaluation
Sire evaluation
 
Animal genetic resource conservation and biotechnology
Animal genetic resource conservation and biotechnologyAnimal genetic resource conservation and biotechnology
Animal genetic resource conservation and biotechnology
 
Sire evaluation
Sire evaluationSire evaluation
Sire evaluation
 
Genetics of animal breeding 9
Genetics of animal breeding 9Genetics of animal breeding 9
Genetics of animal breeding 9
 
Practical application of advanced molecular techniques in the improvement of ...
Practical application of advanced molecular techniques in the improvement of ...Practical application of advanced molecular techniques in the improvement of ...
Practical application of advanced molecular techniques in the improvement of ...
 
Advanced genetics
Advanced geneticsAdvanced genetics
Advanced genetics
 
Basis of selection in animal genetics and breeding
Basis of selection in animal genetics and breeding Basis of selection in animal genetics and breeding
Basis of selection in animal genetics and breeding
 
Progeny testing
Progeny testingProgeny testing
Progeny testing
 
Major economic traits of cattle and buffalo
Major economic traits of cattle and buffaloMajor economic traits of cattle and buffalo
Major economic traits of cattle and buffalo
 
Breeding better sheep
Breeding better sheepBreeding better sheep
Breeding better sheep
 
Forces changing gene frequency
Forces changing gene frequencyForces changing gene frequency
Forces changing gene frequency
 
Presentation on Heritability
 Presentation on Heritability Presentation on Heritability
Presentation on Heritability
 
Bases of selection family
Bases of selection  familyBases of selection  family
Bases of selection family
 
Selection to disease resistance
Selection to disease resistance Selection to disease resistance
Selection to disease resistance
 
Genomic selection with weighted GBLUP and APY single step
Genomic selection with weighted GBLUP and APY single stepGenomic selection with weighted GBLUP and APY single step
Genomic selection with weighted GBLUP and APY single step
 
Genomic selection
Genomic  selectionGenomic  selection
Genomic selection
 

Similar to Genomic selection in Livestock

allele distributionIn population genetics, allele frequencies are.pdf
allele distributionIn population genetics, allele frequencies are.pdfallele distributionIn population genetics, allele frequencies are.pdf
allele distributionIn population genetics, allele frequencies are.pdf
aparnaagenciestvm
 
Genetic markers in characterization2
Genetic markers in characterization2Genetic markers in characterization2
Genetic markers in characterization2
Bruno Mmassy
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
yuvraj404
 
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Satish Khadia
 
Microarray Statistics
Microarray StatisticsMicroarray Statistics
Microarray Statistics
A Roy
 
Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...
Mahesh Biradar
 

Similar to Genomic selection in Livestock (20)

Early generation selection in an intra population recurrent selection breedin...
Early generation selection in an intra population recurrent selection breedin...Early generation selection in an intra population recurrent selection breedin...
Early generation selection in an intra population recurrent selection breedin...
 
Association mapping
Association mapping Association mapping
Association mapping
 
Genomic Selection in Plants
Genomic Selection in PlantsGenomic Selection in Plants
Genomic Selection in Plants
 
QTL mapping for crop improvement
QTL mapping for crop improvementQTL mapping for crop improvement
QTL mapping for crop improvement
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactions
 
allele distributionIn population genetics, allele frequencies are.pdf
allele distributionIn population genetics, allele frequencies are.pdfallele distributionIn population genetics, allele frequencies are.pdf
allele distributionIn population genetics, allele frequencies are.pdf
 
Genetic markers in characterization2
Genetic markers in characterization2Genetic markers in characterization2
Genetic markers in characterization2
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
Analysis of Variance (ANOVA), MANOVA: Expected variance components, Random an...
 
Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...Combining co-expression and co-location for gene network inference in porcine...
Combining co-expression and co-location for gene network inference in porcine...
 
Quantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breedingQuantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breeding
 
ROLE OF INHERITANCE IN CROP IMPROVEMENT
ROLE OF INHERITANCE IN CROP IMPROVEMENTROLE OF INHERITANCE IN CROP IMPROVEMENT
ROLE OF INHERITANCE IN CROP IMPROVEMENT
 
16 bink
16 bink16 bink
16 bink
 
Quantitative genetics
Quantitative geneticsQuantitative genetics
Quantitative genetics
 
Microarray Statistics
Microarray StatisticsMicroarray Statistics
Microarray Statistics
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
wheat association mapping LTN
wheat association mapping LTNwheat association mapping LTN
wheat association mapping LTN
 
Selection system: Biplots and Mapping genotyoe
Selection system: Biplots and Mapping genotyoeSelection system: Biplots and Mapping genotyoe
Selection system: Biplots and Mapping genotyoe
 
A Note On Exact Tests Of Hardy-Weinberg Equilibrium
A Note On Exact Tests Of Hardy-Weinberg EquilibriumA Note On Exact Tests Of Hardy-Weinberg Equilibrium
A Note On Exact Tests Of Hardy-Weinberg Equilibrium
 
Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...
 

More from ILRI

More from ILRI (20)

How the small-scale low biosecurity sector could be transformed into a more b...
How the small-scale low biosecurity sector could be transformed into a more b...How the small-scale low biosecurity sector could be transformed into a more b...
How the small-scale low biosecurity sector could be transformed into a more b...
 
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
 
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
Small ruminant keepers’ knowledge, attitudes and practices towards peste des ...
 
A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...A training, certification and marketing scheme for informal dairy vendors in ...
A training, certification and marketing scheme for informal dairy vendors in ...
 
Milk safety and child nutrition impacts of the MoreMilk training, certificati...
Milk safety and child nutrition impacts of the MoreMilk training, certificati...Milk safety and child nutrition impacts of the MoreMilk training, certificati...
Milk safety and child nutrition impacts of the MoreMilk training, certificati...
 
Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseases
Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseasesPreventing the next pandemic: a 12-slide primer on emerging zoonotic diseases
Preventing the next pandemic: a 12-slide primer on emerging zoonotic diseases
 
Preventing preventable diseases: a 12-slide primer on foodborne disease
Preventing preventable diseases: a 12-slide primer on foodborne diseasePreventing preventable diseases: a 12-slide primer on foodborne disease
Preventing preventable diseases: a 12-slide primer on foodborne disease
 
Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistance
Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistancePreventing a post-antibiotic era: a 12-slide primer on antimicrobial resistance
Preventing a post-antibiotic era: a 12-slide primer on antimicrobial resistance
 
Food safety research in low- and middle-income countries
Food safety research in low- and middle-income countriesFood safety research in low- and middle-income countries
Food safety research in low- and middle-income countries
 
Food safety research LMIC
Food safety research LMICFood safety research LMIC
Food safety research LMIC
 
The application of One Health: Observations from eastern and southern Africa
The application of One Health: Observations from eastern and southern AfricaThe application of One Health: Observations from eastern and southern Africa
The application of One Health: Observations from eastern and southern Africa
 
One Health in action: Perspectives from 10 years in the field
One Health in action: Perspectives from 10 years in the fieldOne Health in action: Perspectives from 10 years in the field
One Health in action: Perspectives from 10 years in the field
 
Reservoirs of pathogenic Leptospira species in Uganda
Reservoirs of pathogenic Leptospira species in UgandaReservoirs of pathogenic Leptospira species in Uganda
Reservoirs of pathogenic Leptospira species in Uganda
 
Minyoo ya mbwa
Minyoo ya mbwaMinyoo ya mbwa
Minyoo ya mbwa
 
Parasites in dogs
Parasites in dogsParasites in dogs
Parasites in dogs
 
Assessing meat microbiological safety and associated handling practices in bu...
Assessing meat microbiological safety and associated handling practices in bu...Assessing meat microbiological safety and associated handling practices in bu...
Assessing meat microbiological safety and associated handling practices in bu...
 
Ecological factors associated with abundance and distribution of mosquito vec...
Ecological factors associated with abundance and distribution of mosquito vec...Ecological factors associated with abundance and distribution of mosquito vec...
Ecological factors associated with abundance and distribution of mosquito vec...
 
Livestock in the agrifood systems transformation
Livestock in the agrifood systems transformationLivestock in the agrifood systems transformation
Livestock in the agrifood systems transformation
 
Development of a fluorescent RBL reporter system for diagnosis of porcine cys...
Development of a fluorescent RBL reporter system for diagnosis of porcine cys...Development of a fluorescent RBL reporter system for diagnosis of porcine cys...
Development of a fluorescent RBL reporter system for diagnosis of porcine cys...
 
Practices and drivers of antibiotic use in Kenyan smallholder dairy farms
Practices and drivers of antibiotic use in Kenyan smallholder dairy farmsPractices and drivers of antibiotic use in Kenyan smallholder dairy farms
Practices and drivers of antibiotic use in Kenyan smallholder dairy farms
 

Recently uploaded

Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 

Recently uploaded (20)

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 

Genomic selection in Livestock

  • 1. Partner Logo Partner Logo Genomic selection in Livestock Raphael Mrode, ILRI Essential Knowledge for Effective Improvement and Dissemination of Genetics in Sheep and Goats 3 – 5 November 2020 Addis Ababa , Ethiopia
  • 2. 2 The basic goal: genetic progress and increased productivity  Identifying animals with best genetic merit as parents of the next generation genetic improvement Distributio n of offsprin g p heno ty pes Gene tic im pro vem en t O P P P Distributio n of ph enoty pes in the p are ntal ge neration Anim als selected to be parents P S P P
  • 3. 3 The basic goal: genetic progress and increased productivity  To achieve this goal we need accurate estimation breeding values (EBVs)  However what is available are phenotypes (Y) which are influenced by genetic and environmental effects  Y = Genetic (G) + Environment (E)  Thus Var(Y) = Var(G) + Var(E)  Sources of Var(E) could be environmental and management factors  Sources of Var(G) due to different forms of inheritance leading to different components of genetic variance (e.g. additive genetic variance, additive maternal genetic variance and so on.  Accurate estimate of G (EBVs) is our challenge
  • 4. 4 Reality of Field data  Often we deal with field data → variety of environmental factors, animals with different degree of relatedness, many generations and unbalanced  We need framework to model phenotypic observations accounting for non-genetic systematic sources of variation (Var(E)) to estimate EBV accurately  The linear mixed model provides such a framework
  • 5. 5 Example data and pedigree files • Data file Pedigree file
  • 6. 6 Linear mixed model In matrix notation, a mixed linear model may be represented as y = Xb + Za + e where y = n x 1 vector of observations; n = number of records. b = p x 1 vector of fixed effects; p = number of levels for fixed (lactation number) effects a = q x 1 vector of random animal effects; q = number of levels for random effects e = n x 1 vector of random residual effects X = design matrix of order n x p, that relates records to fixed effects Z = design matrix of order n x q, that relates records to random animal effects Both X and Z are both termed design or incidence matrices.
  • 7. 7 Assumptions of the linear mixed model  It is assumed that residual effects are independently distributed with variance σ2 e, therefore, var(e) = Iσ2 e = R; var(a) = G = Iσ2 a or Aσ2 a and A is the numerator relationship matrix. Thus                   R 0 0 G e a V
  • 8. 8 MME for breeding values  Mixed Model equations (MME) with the relationship matrix incorporated are  with α = σ2 e/σ2 a = 1-h2/h2 = +                              y Z y X a b A Z Z X Z Z X X X 1 ˆ ˆ 
  • 9. 9 Limitations of A matrix  The relationship matrix A based on pedigree is an average relationship which assumes infinite loci.  Real relationships are a bit different due to finite genome size  Therefore A is the expectation of realized relationships  Two half-sibs might have a correlation of 0.3 or 0.2
  • 10. 10 Use of microsatellites as markers and limitations  Initially microsatellites were used as genetic markers in the 1980s and 1990s.  Microsatellites are set of short repeated DNA sequences at a particular locus on a chromosome, which vary in number in different individuals and so can be used as markers  Most significant genetic marker can be 10 cM or more from the QTL, therefore QTL are not mapped precisely.  The association between marker and QTL may not persist through the population.  The phase between marker and QTL may have to estimated for each family 10
  • 11. 11 Single Nucleotide Polymorphism (SNP). • SNP is a DNA sequence variation occurring when a single nucleotide — A, T, C, or G — in the genome differs between paired chromosomes in an individual. • For example, two sequenced DNA fragments from an individual, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide. • In this case we say that there are two alleles: C and T. Almost all common SNPs have only two alleles. 11
  • 13. Whole Genome Sequence & Genotyping chips  Began with Human (2001) and mice, WGS of the chicken (2004), the dog (2005), bovine (2006), horse (2007), pig (2009), ...  New technologies for genotyping and sequencing  Simultaneous genotyping of many SNP  From few dozens up to several million SNP  Two main technology providers, Illumina and Affymetrix  Illumina products in cattle  3000 (7000 1000 20000=« LD »)  54 000=« 50k »  777 000=« HD »
  • 14. 14 Genomic Selection (GS)  GS - the use of genomic breeding values (GEBV), used for the selection of animals.  Genomic selection requires that markers (SNPs) are in linkage disequilibrium (LD) with the QTLs across the whole population  Thus the use of SNPs as markers enables all QTL in the genome to be traced through the tracing of chromosome segments defined by adjacent SNPs.
  • 15. 15 Steps in Genomic Selection (GS) • Genotype animals with phenotypes (sires with daughters records for sex limited traits) • Estimate SNP solutions (SNP Key) in the reference population • Validate in another data set but records excluded to determine accuracy of SNP key • Genotype animals at birth or young age (no phenotypes) and use SNP key to prediction their GEBV and do selection Reference population Genotyped and phenotyped animals Genotyped but no phenotypes Selection candidates Genotyped & phenotyped but phenotypes excluded Validation candidates
  • 16. 16 Main advantages of Genomics  Young bulls can be genotyped early in life and breeding values computed  Can be used to select young bulls to be progeny tested, thereby reducing cost  Higher accuracy of about 20-40% for young bulls above parent average  Reduction in generation interval
  • 17. 17 Genomic Selection : efficiency  Two main factors : • Accuracy of SNP effect estimation • size of reference population • heritability of the trait • statistical methodology used • Linkage Disequilibrium (LD) between markers and QTL • marker density • effective size of the population => number of « independent » segments • Relationship between candidates and reference population
  • 18. 18 Size of the Reference populaton  Greatly influences the accuracy of genomic evaluations (Goddard, 2008)
  • 19. 20 25 30 35 40 45 20 25 30 35 40 45 DYD Estimated BV Training set 20 25 30 35 40 45 20 25 30 35 40 45 DYD (proxy of true BV) GEBV Training set Validation set Two important parameters : • R2 or r(DYD, GEBV) (should be « large enough ») • slope of the regression (should be close to 1) Overestimation = “inflation” Validation test
  • 20. 20 Increasing the size of the Reference populations  Genotype as many progeny tested sires as possible  International collaborations  Holstein: 2 big consortia USA + Canada ~>35000 bulls + UK + Italy Eurogenomics France, The Netherlands, (Germany), Nordic countries, Spain,Poland ~34000 bulls (?)  For small breeds or other species (goats, Sheep, beef cattle : not enough sires  combine with many genotyped cows  About 4 -5 cows records provide equivalent information to one proven sire (Goddard (2009) and Daetwyler et al. (2013) ) 
  • 21. 21 General linear model The general linear model underlying genomic evaluation is of the form y = Xb + gi + e where m is the number of SNPs ; y is the data vector, b the vector for mean or fixed effects gi the genetic effect of the ith SNP genotype and e is the error. The matrix M is of the dimension n (number of animals) and m, and Mi relates the ith SNPs to data It is assumed that all the additive genetic variance is explained by all the markers effects such that the estimate of animal’s total genetic merit or breeding value (a) is: a = gi.  m i i M  m i i M
  • 22. 22 Data types used for genomic evaluation • y = YD (Yield deviation) = Individual record corrected for all fixed and non genetic random effects • y = DYD (Daughter yield deviation) = twice average for a bull of all YD of their daughters corrected for ½ genetic merit of their dams (with associated weight = EDC (Equivalent Daughter Contribution • y = de-regressed proofs -- obtained by solving the MME to get the right-hand side • EBVs --- NO
  • 23. 23 Coding and scaling genotypes • The genotypes of animals (elements of M) are commonly coded as 2 and 0 for the two homozygotes (AA and BB) and 1 for the heterozygote (AB). • Or if alleles are expressed in terms of nucleotides, and reference allele at a locus is G and the alternative allele is C, then code 0 = GG , 1 = GC and 2 = CC. • The diagonal elements of MM’ then indicate the individual relationship with itself (inbreeding) and the off-diagonal indicate the number of alleles shared by relatives
  • 24. 24 Scaling of genotypes • SNPs → 2 alleles A/B but only one effect defined substitution effect mi • Commonly elements of M are scaled – to set the mean values of the alleles effects to zero – account for differences in allele frequencies of the various SNPS – Let the frequency of the second or alternative allele at locus j be pj – Elements of M can be scaled by subtracting 2pj. – If the element of column j of a matrix P equals 2pj, then matrix Z, which contained the scaled elements of M is : Z = M - P. • Furthermore, the elements of Z be normalised by dividing the column for marker j by its standard deviation assumed to be .
  • 25. 25 Mixed linear model for computing SNP effect • The most common random model used assumes – the effect of the SNP are normally distributed, – all SNP are from a common normal distribution (eg. the same genetic variance for all SNPs). • There are two equivalent models with these assumptions • (1) SNP-BLUP - a model fitting individual SNP effects simultaneously. – DGV for selection candidates are calculated as DGV = Zĝ, where ĝ are the estimates of random SNP effects. – Assumes σ2 g is known but this may not be the case in practise and σ2 g may be approximated from σ2 a. • (2) GBLUP - a model estimates DGV directly, with a (co) variance among breeding values of G σ2 a, where G is the genomic relationship matrix, the realised proportion of the genome that animals share in common estimated from the SNP.
  • 26. 26 SNP BLUP model  In matrix form, model is  Y = Xb + Zg + e  Y = vector of observations: these can be de-regressed EBVs, phenotypes corrected for all fixed effects  where g = vector of additive genetic effects corresponding to allele substitution effects for each SNP and Z = scaled matrix of genotypes  MME are below with α = σ2 e/σ2 g                                 y Z y X g b I Z Z X Z Z X X X ˆ ˆ α
  • 27. 27 SNP-BLUP • If y in MME = de-regressed breeding values of bulls, then – Each observation may be associated with differing reliabilities. – Thus a weighted analysis may be required to account for these differences in bull reliabilities. – Weight (wti) = effective daughter contribution or wti = (1/ reldtr) – 1, where reldtr is the bull’s reliability from daughters with parent information excluded
  • 28. 28 SNP-BLUP • The MME then are • where R = D and D is a diagonal matrix with diagonal element i = wti. • In practise, the value of σ2 g may not been known and σ2 g could be obtained • either as σ2 g = σ2 a /m, with m = the number of markers • or as σ2 g = σ2 a /2Σpj(1 – pj) • and α = 2Σpj(1 – pj) *[ σ2 e/σ2 a]                                           y R Z y R X g b I Z R Z X R Z Z R X X R X 1 1 1 1 1 1 ˆ ˆ α
  • 29. 29 Example 1 FAT SNP Animal Sire Dam Mean EDC DYD Genotype 13 0 0 1 558 9.0 2 0 1 1 0 0 0 2 1 2 14 0 0 1 722 13.4 1 0 0 0 0 2 0 2 1 0 15 13 4 1 300 12.7 1 1 2 1 1 0 0 2 1 2 16 15 2 1 73 15.4 0 0 2 1 0 1 0 2 2 1 17 15 5 1 52 5.9 0 1 1 2 0 0 0 2 1 2 18 14 6 1 87 7.7 1 1 0 1 0 2 0 2 2 1 19 14 9 1 64 10.2 0 0 1 1 0 2 0 2 2 0 20 14 9 1 103 4.8 0 1 1 0 0 1 0 2 2 0 21 1 3 1 13 7.6 2 0 0 0 0 1 2 2 1 2 22 14 8 1 125 8.8 0 0 0 1 1 2 0 2 0 0 23 14 11 1 93 9.8 0 1 1 0 0 1 0 2 2 1 24 14 10 1 66 9.2 1 0 0 0 1 1 0 2 0 0 25 14 7 1 75 11.5 0 0 0 1 1 2 0 2 1 0 26 14 12 1 33 13.3 1 0 1 1 0 2 0 1 0 0
  • 30. 30 Example 1 • The observations are the daughter yield deviations for fat yield and the effective daughter contribution (EDC) for each bull is also given. • The EDC can be used as weights in the analysis but will ignore for this presentation • It is assumed the genetic variance for fat yield is 35.241kg2 and residual variance of 245kg2 • Animals 13 to 20 as assumed as the reference population and 21 to 26 as validation candidates. • SNP effects are predicted using using all 10 SNPs. • The incidence matrix X = Iq , with q = 8, the number of animals in the reference population
  • 31. 31 Computing the matrices we need • The incidence matrix X = Iq , with q = 8, the number of animals in the reference population • X’ = [ 1 1 1 1 1 1 1 1] • The computation of Z requires calculating the allele frequency for each SNP.
  • 32. 32 Computing Matrices • The allele frequency for the ith SNP was computed as with n = 14, the number of animals with genotypes and mij are elements of M. • Allele frequencies 0.321, 0.179, 0.357, 0.357, 0.143, 0.607, 0.071, 0.964, 0.571 and 0.393 respective. • Using those frequencies 2Σpj(1 – pj) = 3.5383. Thus α = 3.5383*(245/35.242) = 24.598 n * 2 m n j ij 
  • 33. 33 Z matrix • Z= M – P and is • We have computed X and Z. • Remaining matrices X’Z and Z’X and Z’Z are computed by multiplication. Then add Iα to Z’Z then MME are formed. • When solved we these solutions:                                                                   0.786 0.857 0.071 0.143 0.214 0.286 0.714 0.286 0.643 0.643 0.786 0.857 0.071 0.143 0.786 0.286 0.286 0.286 0.357 0.643 0.214 0.857 0.071 0.143 0.786 0.286 0.286 0.714 0.643 0.357 1.214 0.143 0.071 0.143 1.214 0.286 1.286 0.286 0.643 0.643 0.214 0.857 0.071 0.143 0.214 0.286 0.286 1.286 0.357 0.643 1.214 0.143 0.071 0.143 1.214 0.714 0.286 1.286 0.643 0.357 0.786 0.143 0.071 0.143 0.786 0.286 0.714 0.714 0.357 0.357 1.214 0.143 0.071 0.143 1.214 0.286 0.286 0.286 0.357 1.357 Z
  • 34. 34 Computing GEBVs • Solutions • ----------------------- • Mean effect • • 9.944 • • SNP effects (ĝ) • 1 0.087 • 2 -0.311 • 3 0.262 • 4 -0.080 • 5 0.110 • 6 0.139 • 7 0.000 • 8 0.000 • 9 -0.061 • 10 -0.016 • The SNP solutions are also called as the SNP key
  • 35. 35 GEBVs for Validation animals • The DGV for the reference animals (animals 13- 20) is then computed as Zĝ. • For the validation animals (animals 21 -26) , DGV = Z2ĝ where Z2 contains the centralised genotypes for the validation candidates
  • 36. 36 Solutions for validation animals                                                                                                                    016 . 0 061 . 0 000 . 0 000 . 0 139 . 0 110 . 0 080 . 0 262 . 0 311 . 0 087 . 0 786 . 0 143 . 1 929 . 0 143 . 0 786 . 0 286 . 0 286 . 0 286 . 0 357 . 0 357 . 0 786 . 0 143 . 0 071 . 0 143 . 0 786 . 0 714 . 0 286 . 0 714 . 0 357 . 0 643 . 0 786 . 0 143 . 1 071 . 0 143 . 0 214 . 0 714 . 0 714 . 0 714 . 0 357 . 0 357 . 0 214 . 0 857 . 0 071 . 0 143 . 0 214 . 0 286 . 0 714 . 0 286 . 0 643 . 0 643 . 0 786 . 0 143 . 1 071 . 0 143 . 0 786 . 0 714 . 0 286 . 0 0714 357 . 0 643 . 0 214 . 1 143 . 0 071 . 0 857 . 1 214 . 0 286 . 0 714 . 0 714 . 0 357 . 0 357 . 1 ˆ ˆ ˆ ˆ ˆ ˆ 26 25 24 23 22 21 a a a a a a                       354 . 0 054 . 0 143 . 0 240 . 0 114 . 0 027 . 0
  • 37. 37 GBLUP  Equivalent model to SNP-BLUP  BLUP MME but with A-1) replaced by G-1  The DGV is computed directly as the sum of the SNP effects(a = Zg)  Model is  y = Xb + Wa + e  where a = vector of DGVs and W is the design matrix linking records to animals  Matrix X is as defined before and W is an identity matrix ( a diagonal matrix with all diagonal elements = 1)
  • 38. 38 GBLUP  Given that a = Zg  Then var(a) = ZZ’σ2 g.  Note that σ2 g =  then the matrix ZZ’ can be scaled such that   G =   and var(a) = Gσ2 a .  Division by 2Σpi(1−pi) makes G analogous to A.   ) p (1 p 2 σ j j 2 a    ) p (1 p 2 Z Z j j
  • 39. 39 G matrix from 42K SNPs Gall = 13 0.957 14 -0.108 0.973 15 0.452 -0.116 1.182 16 0.209 -0.058 0.424 1.025 17 0.234 -0.083 0.425 0.312 1.037 18 -0.040 0.438 0.097 -0.047 -0.043 1.151 symmetric 19 -0.089 0.458 0.039 -0.067 -0.070 0.426 1.175 20 -0.093 0.460 0.053 -0.058 -0.063 0.432 0.707 1.183 21 0.077 -0.082 0.064 0.104 0.082 -0.071 -0.069 -0.069 1.031 22 -0.056 0.418 0.093 -0.046 -0.038 0.408 0.355 0.342 -0.044 1.139 23 -0.005 0.464 -0.038 -0.035 -0.038 0.206 0.223 0.215 0.011 0.280 0.993 24 -0.070 0.468 0.075 -0.027 -0.053 0.403 0.521 0.550 -0.079 0.424 0.260 1.198 25 -0.052 0.416 0.098 -0.009 -0.031 0.386 0.363 0.342 -0.038 0.370 0.219 0.419 1.125 26 -0.070 0.493 -0.084 -0.039 -0.044 0.258 0.241 0.270 -0.072 0.253 0.178 0.259 0.214 1.009
  • 40. 40 A matrix for the same individuals 13 1.008 14 0.033 1.037 15 0.545 0.021 1.041 16 0.288 0.021 0.536 1.016 17 0.285 0.031 0.541 0.293 1.020 18 0.047 0.580 0.036 0.028 0.032 1.062 19 0.033 0.613 0.021 0.021 0.031 0.365 1.095 symmetric 20 0.033 0.613 0.021 0.021 0.031 0.365 0.613 1.095 21 0.099 0.031 0.082 0.118 0.074 0.028 0.031 0.031 1.021 22 0.046 0.586 0.032 0.031 0.039 0.351 0.373 0.373 0.044 1.068 23 0.096 0.569 0.067 0.043 0.047 0.329 0.357 0.357 0.042 0.338 1.050 24 0.041 0.574 0.027 0.019 0.026 0.331 0.406 0.406 0.028 0.335 0.335 1.056 25 0.033 0.548 0.035 0.039 0.039 0.315 0.336 0.336 0.037 0.321 0.310 0.310 1.029 26 0.035 0.588 0.023 0.024 0.039 0.337 0.376 0.376 0.036 0.347 0.341 0.348 0.325 1.070
  • 41. 41 GBLUP • MME are • where α now equals σ2 e/σ2 a . Solutions for example in previous table • Advantages: – Existing software for genetic evaluation can be used by replacing A with G – systems of equations are of the size of animals which tend to be fewer than the number of SNP. – In pedigreed populations G discriminates among sibs, and other relatives, capture information on Mendelian sampling. – method is attractive for populations without good pedigree as G will capture this information among the genotyped individuals                                            y R W y R X a b G W R W X R W W R X X R X 1 1 1 1 1 1 ˆ ˆ 1α
  • 42. 42 Solutions for the example data • Reference Animals • 13 0.069 • 14 0.116 • 15 0.049 • 16 0.260 • 17 -0.500 • 18 -0.359 • 19 0.146 • 20 -0.231 • • Selection or validation candidates • 21 0.028 • 22 0.115 • 23 -0.240 • 24 0.143 • 25 0.054 • 26 0.353
  • 43. 43 Single Step Method GBLUP computes genomic breeding values only for genotyped animals. How can non-genotyped animals benefit from genomic information Let g2 be the genetic (genomic) values of genotyped animals and g1 the genetic values of non genotyped animals An estimate of g1 based on genomic information is obtained by regression of g1 on g2 and added to information from BLUP through the usual MME
  • 44. 44 Single Step Method • We define variance of vector of g1 (non-genotyped) and g2 (genotyped) H = Variance of 1 2       g g   1 1 1 11 12 11 12 22 22 22 21 12 22 1 21 22 22 21 =                    H H A A A G A A A A A G H H H GA A G non genotyped genotyped
  • 45. 45 Single Step Method • Model is just as before but uses all data (genotyped and ungenotyped): • MME are the usual but with A-1 replaced with H-1 • Surprisely, H-1 has simple form:    y X Za e 1                        ' ' ' ' ' ' X X X Z 1 y Z y X g Z Z Z H
  • 46. better lives through livestock ilri.org ILRI thanks all donors and organizations who globally supported its work through their contributions to the CGIAR Trust Fund
  • 47. CRP and CG logos

Editor's Notes

  1. There MUST be a CGIAR logo or a CRP logo. You can copy and paste the logo you need from the final slide of this presentation. Then you can delete that final slide   To replace a photo above, copy and paste this link in your browser: http://www.flickr.com/photos/ilri/sets/72157632057087650/detail/   Find a photo you like and the right size, copy and paste it in the block above.
  2. Once you have the right logos please delete this slide