SlideShare a Scribd company logo
1 of 41
Download to read offline
Genome-wide Association Mapping
Avjinder Singh Kaler
PhD Candidate
Department of Crop, Soil, and Environmental Sciences
University of Arkansas
Nov-15-2016
Plant Breeding Lecture
Identify genomic regions associated with
phenotypes
Phenotypic Data
• Flowering time
• Plant height
• Yield
• Phenotype Variation
• Phenotypes are response
variables
Genotypic Data
• Genomic markers that span the
entire genome
• Single nucleotide
polymorphisms (SNPs) are
commonly used as markers
• Markers are explanatory
variables
Functional Diversity: Phenotype
Plant Height Seed Color
Genetic Architecture of Complex Traits
Phenotype
Genotype Environment
P = G + E + GE
How do we connect genotype to phenotype?
Functional Diversity: Phenotype Variation
• Few recombination events, resulting in relatively low mapping resolution
• Historical recombination events and natural genetic diversity, resulting in high
mapping resolution
GWAS based on Linkage Disequilibrium (LD)
• LD is the non-random correlation or association of alleles at
two loci
• D, D′ (normalized), and r2 are commonly used summary
statistics to estimate pairwise LD
• r2 is preferred in association studies because it is more
indicative of how markers might correlate with QTL
Visualize extent of LD between pairs of loci
LD Decay LD Block (Haplotype View)
Genome-wide association study (GWAS)
• Identify genomic regions associated with a phenotype
• Fit a statistical model at each SNP in genome
• Use fitted models to test H0: No association with SNP
and phenotype
Associating SNPs with phenotypes
• At each SNP: Conduct a test of association with trait
• Significant SNP/trait association suggests:
– SNP has direct biological function (functional polymorphism)
– SNP in LD with functional polymorphism(s)
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
A/C T/C G/A A/G G/T
Genetic diversity can lead to false positives in a GWAS
• Two sources for false positives:
– Population structure—allele frequency differences among individuals due to local
adaptation or diversifying selection
– Familial relatedness—allele frequency differences among individuals due to recent co-
ancestry
Genetic Diversity of 2,815 Maize Inbreds
Principal Coordinate 1
PrincipalCoordinate2
Romay et al. (2013)
Controlling False Positives due to Population
Structure
• STRUCTURE (Q)
• Identify different subpopulations within a sample of individuals
collected from a population of unknown structure
• Estimating Q- matrix
• Time Consuming
• Principle Component Analysis
• Fast and effective approach to diagnose population structure
• PCA summarizes variation observed across all markers into a smaller
number of underlying component variables
• Estimating PCs-matrix
Principle Component Analysis
•Scree plot –shows the
fraction of total variance
in the data explained by
each PC
•PCs selected based on the
L-curve
Controlling False Positives due to Familial
relatedness
•A kinship coefficient (F) is the probability that two
homologous genes are identical by descent
•Kinship from genetic markers is an estimate of relative
kinship that is based on probabilities of identical by
state
Mixed models reduce false positives in GWAS
• (Line1,…, Linen) ~ MVN(0, )
• K = kinship matrix
• εi ~ i.i.d. N(0, )
Phenotype of ith
individual
Grand Mean
Fixed effects: account
for population
structure
Marker effect
Observed SNP alleles
of ith individual
Random effects:
account for familial
relatedness
Random error
term
Yu et al. (2006)
Measures relatedness between
individuals
Association Mapping Pipeline
Germplasm Selection
•Choice of germplasm is critical to the success of the
association analysis
•Phenotyping
•Design Experiment
• Collection of high quality phenotypic data
Phenotypic Outliers
•Outliers are “unusual” data points that substantially
deviate from the mean and strongly influence
parameter estimates
•Should ALWAYS check for outliers in our data sets
• Do NOT ignore outliers if detected
Phenotypic Outliers
• Outliers can
• increase error variance
• reduce the power of statistical tests
• distort estimates
• decrease normality if non-randomly distributed
• Potential Causes of Outliers
• Human errors in data collection, recording, or entry
• Technical errors from faulty or non-calibrated phenotyping equipment
• Intentional or motivated mis-reporting such as “speed” phenotyping in a
hot field environment
Evaluate Data for Outliers
•Histogram
•Box-plot (Box and Whisker plot)
•Quantile-Quantile plot – graphical method for
comparing two probability distributions to assess
goodness-of-fit
Get to know your data!
Statistical Identification of Outliers
•Cook’s distance – measures influence of a data point.
Data points that substantially change effect estimates.
•Deleted studentized residuals – measures leverage of
a data point. Data points that affect least squares fit.
Two of several possible methods
Removal of Outliers
•Removing anomalous data points from data sets is
controversial to some folks.
•If outliers are not removed, inferences made from the
fitted model may not be representative of the
population under study.
•If you remove outliers, then be sure to report it in the
manuscript.
Non-Normal Trait Data
•When fitting a mixed model, two very important
assumptions are that the error terms follow a normal
distribution and that there is a constant variance.
•When data are non-normal, these two assumptions in
particular could be violated.
Analysis of Non-Normal Trait Data
•Generalized linear mixed models can be used to
analyze non-normal data
•The Box-Cox procedure can be used to find the most
appropriate transformation that corrects for non-
normality of the error terms and unequal variances.
Box-Cox Transformation
Association Mapping Pipeline
Genotyping
• SNPs most commonly used in association mapping
Genotype-Quality Control
• Removing the monomorphic markers
• Markers with Minor allele Frequency < 5% or < 3%
• Markers with high missing rate (e.g. > 10%)
• Imputation for missing data (LD-kNNi, FILLIN, FSHAP,
BEAGLE)
Controlling False Positives
• Population structure—allele frequency differences among individuals
due to local adaptation or diversifying selection
• Familial relatedness—allele frequency differences among individuals
due to recent co-ancestry
• If not properly controlled both can cause spurious associations in
GWAS
Controlling False Positives
• Population structure
• STRUCTURE (Q-matrix)
• Principle Component Analysis (PCs-matrix)
• Familial relatedness
• Kinship matrix
Association Mapping Pipeline
Mixed models reduce false positives in GWAS
• (Line1,…, Linen) ~ MVN(0, )
• K = kinship matrix
• εi ~ i.i.d. N(0, )
Phenotype of ith
individual
Grand Mean
Fixed effects: account
for population
structure
Marker effect
Observed SNP alleles
of ith individual
Random effects:
account for familial
relatedness
Random error
term
Yu et al. (2006)
Measures relatedness between
individuals
What is a significant association?
• Bonferroni correction –procedure to control the family-wise error rate
(i.e., probability of making one or more type I errors)
– Simplest and most conservative method to control FWER
– Calculated as α/n, when nis number of hypotheses (i.e., SNPs tested)
• False Discovery Rate –procedure to control the expected proportion of
false discoveries
– Less stringent than Bonferroni
– q-value is the FDR analogue of p-value e.g., q=0.10 is 10 false discoveries/100
tests
• Use list of p-values from ALL SNP tests as input to R function p.adjust
or packages qvalue, fdrtool, … others
Slide adapted from Prof. Jim Holland
Genome-wide Association Mapping Results
Manhattan plot: summarize GWAS results
Genome-wide Association Mapping Results
QQ-plot: assess performance of Statistical model
Simple Model without correcting for population structure Mixed Linear Model
Genome-wide Association Mapping Results
GWAS results for all SNPs that were analyzed
Software for GWAS
• TASSEL
• GAPIT
• PLINK
• GEMMA
• FARMCPU
• JMP Genomics
• https://omictools.com/gwas-category
• Tutorials
– http://www.slideshare.net/AvjinderSingh/basic-tutorial-of-association-mapping-
by-avjinder-kaler
– http://www.slideshare.net/AvjinderSingh/tutorial-for-association-mapping-with-
farm-cpu

More Related Content

What's hot

Association mapping
Association mappingAssociation mapping
Association mappingNivethitha T
 
Genomic selection
Genomic  selectionGenomic  selection
Genomic selectionpandadebadatta
 
Association mapping
Association mappingAssociation mapping
Association mappingSenthil Natesan
 
Whole Genome Selection
Whole Genome SelectionWhole Genome Selection
Whole Genome SelectionRaghav N.R
 
Association mapping
Association mapping Association mapping
Association mapping Preeti Kapoor
 
Allele mining
Allele miningAllele mining
Allele miningarjun pimple
 
Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...Mahesh Biradar
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in riceSopan Zuge
 
Genotyping by sequencing
Genotyping by sequencingGenotyping by sequencing
Genotyping by sequencingBhavya Sree
 
Molecular markers and Functional molecular markers
Molecular markers and Functional molecular markersMolecular markers and Functional molecular markers
Molecular markers and Functional molecular markersChandana B.R.
 
Association mapping
Association mappingAssociation mapping
Association mappingHina Chaudhary
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by SequencingSenthil Natesan
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvementRagavendran Abbai
 
TILLING & ECO-TILLING
TILLING & ECO-TILLINGTILLING & ECO-TILLING
TILLING & ECO-TILLINGRachana Bagudam
 
Mapping population ppt
Mapping population pptMapping population ppt
Mapping population pptSrishti Aggrawal
 
Tilling and eco tilling
Tilling and eco tillingTilling and eco tilling
Tilling and eco tillingSuresh Antre
 
Quantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breedingQuantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breedingPGS
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingFOODCROPS
 

What's hot (20)

Association mapping
Association mappingAssociation mapping
Association mapping
 
Genomic selection
Genomic  selectionGenomic  selection
Genomic selection
 
Lecture 7 gwas full
Lecture 7 gwas fullLecture 7 gwas full
Lecture 7 gwas full
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Whole Genome Selection
Whole Genome SelectionWhole Genome Selection
Whole Genome Selection
 
Association mapping
Association mapping Association mapping
Association mapping
 
Allele mining
Allele miningAllele mining
Allele mining
 
Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...Genomic selection, prediction models, GEBV values, genomic selection in plant...
Genomic selection, prediction models, GEBV values, genomic selection in plant...
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in rice
 
Genotyping by sequencing
Genotyping by sequencingGenotyping by sequencing
Genotyping by sequencing
 
Molecular markers and Functional molecular markers
Molecular markers and Functional molecular markersMolecular markers and Functional molecular markers
Molecular markers and Functional molecular markers
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Genotyping by Sequencing
Genotyping by SequencingGenotyping by Sequencing
Genotyping by Sequencing
 
Plant genome sequencing and crop improvement
Plant genome sequencing and crop improvementPlant genome sequencing and crop improvement
Plant genome sequencing and crop improvement
 
TILLING & ECO-TILLING
TILLING & ECO-TILLINGTILLING & ECO-TILLING
TILLING & ECO-TILLING
 
Mapping population ppt
Mapping population pptMapping population ppt
Mapping population ppt
 
Tilling and eco tilling
Tilling and eco tillingTilling and eco tilling
Tilling and eco tilling
 
Quantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breedingQuantitative trait loci (QTL) analysis and its applications in plant breeding
Quantitative trait loci (QTL) analysis and its applications in plant breeding
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breeding
 
Genotyping in Breeding programs
Genotyping in Breeding programsGenotyping in Breeding programs
Genotyping in Breeding programs
 

Similar to Genome wide association mapping

Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Varsha Gayatonde
 
Genome wide Association studies.pptx
Genome wide Association studies.pptxGenome wide Association studies.pptx
Genome wide Association studies.pptxAkshitaAwasthi3
 
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Golden Helix Inc
 
Strategies for mapping of genes for agronomic traits in plants
Strategies for mapping of genes for agronomic traits in plantsStrategies for mapping of genes for agronomic traits in plants
Strategies for mapping of genes for agronomic traits in plantstusharamodugu
 
3UnitGeneMapping.pptx
3UnitGeneMapping.pptx3UnitGeneMapping.pptx
3UnitGeneMapping.pptxGounderKirthika2
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDivyanshGupta922023
 
Biometry for 2015.ppt
Biometry for 2015.pptBiometry for 2015.ppt
Biometry for 2015.pptmelkamugenet
 
Linkage analysis
Linkage analysisLinkage analysis
Linkage analysisUshaYadav24
 
assessment of poly genetic variations and path co-efficient analysis
assessment of poly genetic variations and path co-efficient analysisassessment of poly genetic variations and path co-efficient analysis
assessment of poly genetic variations and path co-efficient analysisMahammed Faizan
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesGolden Helix Inc
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationDmitry Grapov
 
GWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEAD
GWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEADGWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEAD
GWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEADvv628048
 
QTL mapping and analysis.pptx
QTL mapping and analysis.pptxQTL mapping and analysis.pptx
QTL mapping and analysis.pptxSarathS586768
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataUC Davis
 
Genetic diversity analysis
Genetic diversity analysisGenetic diversity analysis
Genetic diversity analysisAKHISHA P. A.
 
Predicting Response Mode Preferences of Survey Respondents
Predicting Response Mode Preferences of Survey RespondentsPredicting Response Mode Preferences of Survey Respondents
Predicting Response Mode Preferences of Survey RespondentsMickeyJackson3
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSOYEON KIM
 
Prashanth_Seminar.pptx
Prashanth_Seminar.pptxPrashanth_Seminar.pptx
Prashanth_Seminar.pptxprashanthbabu31
 

Similar to Genome wide association mapping (20)

Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
Genome wide association studies seminar Prepared by Ms Varsha Gaitonde.
 
Genome wide Association studies.pptx
Genome wide Association studies.pptxGenome wide Association studies.pptx
Genome wide Association studies.pptx
 
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
Mixed Models: How to Effectively Account for Inbreeding and Population Struct...
 
Strategies for mapping of genes for agronomic traits in plants
Strategies for mapping of genes for agronomic traits in plantsStrategies for mapping of genes for agronomic traits in plants
Strategies for mapping of genes for agronomic traits in plants
 
3UnitGeneMapping.pptx
3UnitGeneMapping.pptx3UnitGeneMapping.pptx
3UnitGeneMapping.pptx
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
Biometry for 2015.ppt
Biometry for 2015.pptBiometry for 2015.ppt
Biometry for 2015.ppt
 
Basics of association_mapping
Basics of association_mappingBasics of association_mapping
Basics of association_mapping
 
Linkage analysis
Linkage analysisLinkage analysis
Linkage analysis
 
assessment of poly genetic variations and path co-efficient analysis
assessment of poly genetic variations and path co-efficient analysisassessment of poly genetic variations and path co-efficient analysis
assessment of poly genetic variations and path co-efficient analysis
 
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex DiseasesBack to Basics: Using GWAS to Drive Discovery for Complex Diseases
Back to Basics: Using GWAS to Drive Discovery for Complex Diseases
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
GWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEAD
GWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEADGWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEAD
GWAS "GENOME WIDE ASSOCIATION STUDIES" A STEP AHEAD
 
Genetic mapping and qtl detection
Genetic mapping and qtl detectionGenetic mapping and qtl detection
Genetic mapping and qtl detection
 
QTL mapping and analysis.pptx
QTL mapping and analysis.pptxQTL mapping and analysis.pptx
QTL mapping and analysis.pptx
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Genetic diversity analysis
Genetic diversity analysisGenetic diversity analysis
Genetic diversity analysis
 
Predicting Response Mode Preferences of Survey Respondents
Predicting Response Mode Preferences of Survey RespondentsPredicting Response Mode Preferences of Survey Respondents
Predicting Response Mode Preferences of Survey Respondents
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traits
 
Prashanth_Seminar.pptx
Prashanth_Seminar.pptxPrashanth_Seminar.pptx
Prashanth_Seminar.pptx
 

More from Avjinder (Avi) Kaler

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerAvjinder (Avi) Kaler
 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with KerasAvjinder (Avi) Kaler
 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningAvjinder (Avi) Kaler
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfAvjinder (Avi) Kaler
 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsAvjinder (Avi) Kaler
 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Avjinder (Avi) Kaler
 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Avjinder (Avi) Kaler
 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesAvjinder (Avi) Kaler
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RAvjinder (Avi) Kaler
 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsAvjinder (Avi) Kaler
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RAvjinder (Avi) Kaler
 
Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Avjinder (Avi) Kaler
 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Avjinder (Avi) Kaler
 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerAvjinder (Avi) Kaler
 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experimentAvjinder (Avi) Kaler
 

More from Avjinder (Avi) Kaler (20)

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with Keras
 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine Learning
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functions
 
Kaler et al 2018 euphytica
Kaler et al 2018 euphyticaKaler et al 2018 euphytica
Kaler et al 2018 euphytica
 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...
 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plots
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using R
 
Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...
 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder Kaler
 
Population genetics
Population geneticsPopulation genetics
Population genetics
 
Quantitative genetics
Quantitative geneticsQuantitative genetics
Quantitative genetics
 
Abiotic stresses in plant
Abiotic stresses in plantAbiotic stresses in plant
Abiotic stresses in plant
 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experiment
 
R Code for EM Algorithm
R Code for EM AlgorithmR Code for EM Algorithm
R Code for EM Algorithm
 

Recently uploaded

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 

Recently uploaded (20)

MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 

Genome wide association mapping

  • 1. Genome-wide Association Mapping Avjinder Singh Kaler PhD Candidate Department of Crop, Soil, and Environmental Sciences University of Arkansas Nov-15-2016 Plant Breeding Lecture
  • 2. Identify genomic regions associated with phenotypes Phenotypic Data • Flowering time • Plant height • Yield • Phenotype Variation • Phenotypes are response variables Genotypic Data • Genomic markers that span the entire genome • Single nucleotide polymorphisms (SNPs) are commonly used as markers • Markers are explanatory variables
  • 3.
  • 5. Genetic Architecture of Complex Traits Phenotype Genotype Environment P = G + E + GE
  • 6. How do we connect genotype to phenotype? Functional Diversity: Phenotype Variation
  • 7. • Few recombination events, resulting in relatively low mapping resolution
  • 8. • Historical recombination events and natural genetic diversity, resulting in high mapping resolution
  • 9.
  • 10. GWAS based on Linkage Disequilibrium (LD) • LD is the non-random correlation or association of alleles at two loci • D, D′ (normalized), and r2 are commonly used summary statistics to estimate pairwise LD • r2 is preferred in association studies because it is more indicative of how markers might correlate with QTL
  • 11. Visualize extent of LD between pairs of loci LD Decay LD Block (Haplotype View)
  • 12.
  • 13. Genome-wide association study (GWAS) • Identify genomic regions associated with a phenotype • Fit a statistical model at each SNP in genome • Use fitted models to test H0: No association with SNP and phenotype
  • 14. Associating SNPs with phenotypes • At each SNP: Conduct a test of association with trait • Significant SNP/trait association suggests: – SNP has direct biological function (functional polymorphism) – SNP in LD with functional polymorphism(s) Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 A/C T/C G/A A/G G/T
  • 15. Genetic diversity can lead to false positives in a GWAS • Two sources for false positives: – Population structure—allele frequency differences among individuals due to local adaptation or diversifying selection – Familial relatedness—allele frequency differences among individuals due to recent co- ancestry Genetic Diversity of 2,815 Maize Inbreds Principal Coordinate 1 PrincipalCoordinate2 Romay et al. (2013)
  • 16. Controlling False Positives due to Population Structure • STRUCTURE (Q) • Identify different subpopulations within a sample of individuals collected from a population of unknown structure • Estimating Q- matrix • Time Consuming • Principle Component Analysis • Fast and effective approach to diagnose population structure • PCA summarizes variation observed across all markers into a smaller number of underlying component variables • Estimating PCs-matrix
  • 17. Principle Component Analysis •Scree plot –shows the fraction of total variance in the data explained by each PC •PCs selected based on the L-curve
  • 18. Controlling False Positives due to Familial relatedness •A kinship coefficient (F) is the probability that two homologous genes are identical by descent •Kinship from genetic markers is an estimate of relative kinship that is based on probabilities of identical by state
  • 19. Mixed models reduce false positives in GWAS • (Line1,…, Linen) ~ MVN(0, ) • K = kinship matrix • εi ~ i.i.d. N(0, ) Phenotype of ith individual Grand Mean Fixed effects: account for population structure Marker effect Observed SNP alleles of ith individual Random effects: account for familial relatedness Random error term Yu et al. (2006) Measures relatedness between individuals
  • 21. Germplasm Selection •Choice of germplasm is critical to the success of the association analysis •Phenotyping •Design Experiment • Collection of high quality phenotypic data
  • 22. Phenotypic Outliers •Outliers are “unusual” data points that substantially deviate from the mean and strongly influence parameter estimates •Should ALWAYS check for outliers in our data sets • Do NOT ignore outliers if detected
  • 23. Phenotypic Outliers • Outliers can • increase error variance • reduce the power of statistical tests • distort estimates • decrease normality if non-randomly distributed • Potential Causes of Outliers • Human errors in data collection, recording, or entry • Technical errors from faulty or non-calibrated phenotyping equipment • Intentional or motivated mis-reporting such as “speed” phenotyping in a hot field environment
  • 24. Evaluate Data for Outliers •Histogram •Box-plot (Box and Whisker plot) •Quantile-Quantile plot – graphical method for comparing two probability distributions to assess goodness-of-fit Get to know your data!
  • 25. Statistical Identification of Outliers •Cook’s distance – measures influence of a data point. Data points that substantially change effect estimates. •Deleted studentized residuals – measures leverage of a data point. Data points that affect least squares fit. Two of several possible methods
  • 26. Removal of Outliers •Removing anomalous data points from data sets is controversial to some folks. •If outliers are not removed, inferences made from the fitted model may not be representative of the population under study. •If you remove outliers, then be sure to report it in the manuscript.
  • 27. Non-Normal Trait Data •When fitting a mixed model, two very important assumptions are that the error terms follow a normal distribution and that there is a constant variance. •When data are non-normal, these two assumptions in particular could be violated.
  • 28. Analysis of Non-Normal Trait Data •Generalized linear mixed models can be used to analyze non-normal data •The Box-Cox procedure can be used to find the most appropriate transformation that corrects for non- normality of the error terms and unequal variances.
  • 31. Genotyping • SNPs most commonly used in association mapping
  • 32. Genotype-Quality Control • Removing the monomorphic markers • Markers with Minor allele Frequency < 5% or < 3% • Markers with high missing rate (e.g. > 10%) • Imputation for missing data (LD-kNNi, FILLIN, FSHAP, BEAGLE)
  • 33. Controlling False Positives • Population structure—allele frequency differences among individuals due to local adaptation or diversifying selection • Familial relatedness—allele frequency differences among individuals due to recent co-ancestry • If not properly controlled both can cause spurious associations in GWAS
  • 34. Controlling False Positives • Population structure • STRUCTURE (Q-matrix) • Principle Component Analysis (PCs-matrix) • Familial relatedness • Kinship matrix
  • 36. Mixed models reduce false positives in GWAS • (Line1,…, Linen) ~ MVN(0, ) • K = kinship matrix • εi ~ i.i.d. N(0, ) Phenotype of ith individual Grand Mean Fixed effects: account for population structure Marker effect Observed SNP alleles of ith individual Random effects: account for familial relatedness Random error term Yu et al. (2006) Measures relatedness between individuals
  • 37. What is a significant association? • Bonferroni correction –procedure to control the family-wise error rate (i.e., probability of making one or more type I errors) – Simplest and most conservative method to control FWER – Calculated as α/n, when nis number of hypotheses (i.e., SNPs tested) • False Discovery Rate –procedure to control the expected proportion of false discoveries – Less stringent than Bonferroni – q-value is the FDR analogue of p-value e.g., q=0.10 is 10 false discoveries/100 tests • Use list of p-values from ALL SNP tests as input to R function p.adjust or packages qvalue, fdrtool, … others Slide adapted from Prof. Jim Holland
  • 38. Genome-wide Association Mapping Results Manhattan plot: summarize GWAS results
  • 39. Genome-wide Association Mapping Results QQ-plot: assess performance of Statistical model Simple Model without correcting for population structure Mixed Linear Model
  • 40. Genome-wide Association Mapping Results GWAS results for all SNPs that were analyzed
  • 41. Software for GWAS • TASSEL • GAPIT • PLINK • GEMMA • FARMCPU • JMP Genomics • https://omictools.com/gwas-category • Tutorials – http://www.slideshare.net/AvjinderSingh/basic-tutorial-of-association-mapping- by-avjinder-kaler – http://www.slideshare.net/AvjinderSingh/tutorial-for-association-mapping-with- farm-cpu