SlideShare a Scribd company logo
1 of 20
Comparative analysis including
phylogeny in R
AE Zanne et al. Nature 000, 1-4 (2013) doi:10.1038/nature12872
Time-calibrated maximum-likelihood estimate of the molecular phylogeny for 31,749 species of seed plants.
Bianca A. Santini
@myoldowlisdead
b.santini@sheffield.ac.uk
What is a phylogeny?
• Hypothesis that explains the evolutionary relationship among taxa*
*taxa: species, or higher taxonomic levels.
Also genes, or sequences
node
terminals/tips/leaves
root
A B C
D
Internal
branch
branch
External branch
ModifiedfromNatureScitable:
http://www.nature.com/scitable/topicpage/reading-a-phylogenetic-
tree-the-meaning-of-41956#
Why use phylogenies…
…in comparative analysis?
• Comparative analyses are used to assess the ecological significance of a
particular trait
• However, because there is a shared history…
a) they are not statistically independent data
b) the feature under study might exist
because of shared ancestry
One example of data analyzed without a
phylogeny
• Salisbury’s data (1927, yes 88yrs ago!)
• Observations
– Differences in stomata density (SD)
between sun(>) and shade (<) leaves
• Measured stomatal density and related
them to life-form, habitat type
• Conclusion: SD increases with exposure
SD
Trees Shrubs Herbs Woody Herbs
plants
SD
Marginal
herbs
Understory
herbs
Photo by A. Vazquez-Lobo
Re-analysis of Salisbury’s data
• Independent contrasts (Felsenstein,
1985)
– Introduces a phylogeny
– The trait changes along the branches of the
tree, should be associated to changes in the
explanatory variable
– no. of times the traits changes in concert
with the environmental variable
(agreements)
vs.
no. of times they do not (disagreements)
– Sign test
SD
Trees Shrubs Herbs
Woody Herbs
plants
SD
Marginal
herbs
Understory
herbs
Kelly and Beerling, 1995 (68 yrs after)
How to get your own phylogeny ?
phylomatic (uses a megatree)
http://phylodiversity.net/phylomatic/
Use and trim and already published
phylogeny:
- Dryad: http://datadryad.org/
- Ecological Archives (from the esa)
phyloGenerator (uses gen bank sequences)
http://willpearse.github.io/phyloGenerator/
This is if you don’t have the sequences, or are
not planning to get them.
Package CAPER : pgls()
Similar approach as in Independent Contrasts, but uses a matrix
of variances and covariances (tree)
N.B. If interested in phylogenies and evolution
analyses: geiger, adephylo, picante, phylolm, ape…
pgls: phylogenetic generalized least squares
what do you need?
1. Phylogeny
2. Data
– Make sure the rows in your data frame are the same as the tips of your tree
i.e. your data: Juncus bufonius
tree: Juncus_bufonius
> my.data$underscore.name=gsub(" ","_",my.data$underscore.name)
– Make sure you have one observation per species per trait:
3. Put them into a comparative.data()
Species names Leaf area Seed mass
Juncus_bufonius 120.2 0.24
Setaria_pumila 91.2 6.91
#1)PHYLOGENY
> tree<-read.tree ("Vascular_Plants_rooted.dated.tre”) ##or read.nexus()
> tree <- congeneric.merge(tree,my.data$underscore.name) ##pez package
Number of species in tree before: 401
Number of species in tree now: 550
> tree
Phylogenetic tree with 550 tips and 393 internal nodes.
Tip labels:
Gladiolus_italicus, Juncus_squarrosus, Juncus_bufonius, Bolboschoenus_maritimus, Isolepis_setacea,
Cyperus_fuscus, ...
Node labels:
, , , , , , …
Rooted; includes branch lengths.
pgls()
#You can always check for synonyms and replace (taxize)
> my.data$underscore.name<-recode(my.data$underscore.name, "'Aegilops_geniculata' =
'Aegilops_ovata'")
> plot.phylo(tree, cex=0.45, type="radial", edge.color=c("red", "orange", "blue"))
pgls()
##trim your tree
> tree <- drop.tip(tree, setdiff(tree$tip.label, my.data$underscore.name))
#2) YOUR DATA
> dat<-data.frame(read.csv(“mydata.csv",header=T))
#3)PUT THEM together in comparative data, which will drop rows with NAs for you
and match the rows to the tips of the phylogeny :D
> cdat <- comparative.data(data = dat, phy = tree, names.col = ”underscore.name”,
scope=leaf.area~seed.mass, vcv=TRUE) #na.omit=FALSE #warn.dropped=TRUE
> cdat$dropped #to see what has been dropped.
#to get the phylogenetic signal : lambda=‘ML’
#0 is a star phylogeny (no phylo signal), and 1 is an structured phylogeny, or all is explained by the
phylogeny.
> fit= pgls(leaf.area~seed.mass, cdat, lambda='ML')
> summary(fit)
pgls()
> summary(fit)
Call:
pgls(formula = leaf.area ~ seed.mass, data = dat,
lambda = "ML")
Residuals:
Min 1Q Median 3Q Max
-0.176405 -0.046501 0.003632 0.047885 0.227434
Branch length transformations:
kappa [Fix] : 1.000
lambda [ ML] : 0.863
lower bound : 0.000, p = < 2.22e-16
upper bound : 1.000, p = < 2.22e-16
95.0% CI : (0.771, 0.919)
delta [Fix] : 1.000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.730721 0.318898 8.563 4.441e-16 ***
seed.mass 0.442324 0.042318 10.452 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.07308 on 373 degrees of freedom
Multiple R-squared: 0.2265, Adjusted R-squared: 0.2245
F-statistic: 109.2 on 1 and 373 DF, p-value: < 2.2e-16
But what if you want to analyze
factors?
#CAPER has some bugs…
> cdat <- comparative.data(data = dat, phy = tree, names.col = ”underscore.name”,scope =
leaf area ~nitro.class, vcv=TRUE)
> fit= pgls(leaf area ~nitro.class, cdat, lambda='ML')
> anova(fit)
Error in terms.formula(formula,data=data):
invalid model formula in ExtractVars
#solve it like (below)
> fit= pgls(leaf.area~nitro.class, cdat, lambda='ML')
> fit1= pgls(leaf.area~1, cdat, lambda='ML')
> anova(fit, fit1)
Error in anova.pglslist(object, ...) :
models were fitted with different branch length transformations.
##If you click on summary, you’ll see
Call:
pgls(formula = leaf.area~ nitro.class, data = dat,
lambda = "ML")
Residuals:
Min 1Q Median 3Q Max
-0.200441 -0.049287 -0.002017 0.051002 0.200019
Branch length transformations:
kappa [Fix] : 1.000
lambda [ ML] : 0.867
lower bound : 0.000, p = < 2.22e-16
upper bound : 1.000, p = < 2.22e-16
95.0% CI : (0.772, 0.926)
delta [Fix] : 1.000
Call:
pgls(formula = leaf.area ~ 1, data = dat, lambda = "ML")
Residuals:
Min 1Q Median 3Q Max
-0.274585 -0.061252 0.003683 0.053449 0.253368
Branch length transformations:
kappa [Fix] : 1.000
lambda [ ML] : 0.896
lower bound : 0.000, p = < 2.22e-16
upper bound : 1.000, p = < 2.22e-16
95.0% CI : (0.825, 0.940)
delta [Fix] : 1.000
#giving both models the same value
> fit= pgls(leaf.area~nitro.class, cdat, lambda=0.885)
> fit1= pgls(leaf.area~1, cdat, lambda=0.885)
> anova(fit, fit1)
 You can also use gls(), instead of lambda do method=‘ML’
 Visualize your tree, always exciting!
 In R
> plot(tree)
> help(plot.phylo) #install ape
#and: http://www.r-phylo.org/wiki/Main_Page
 Use FigTree (drop the file and it will do the phylogeny for you)
 phytools
>plot.phylo(tree, cex=0.3)
>plot.pylo(tree, cex=0.45, type="cladogram", show.Ep.label=FALSE
>plot.phylo(tree, cex=0.45, type="fan", edge.color=c("red", "orange", "green","blue"), edge.lty=5)
Thanks
Bianca A. Santini
@myoldowlisdead
b.santini@sheffield.ac.uk
From Freckleton et al. 2002.

More Related Content

What's hot

Population genetics with qs
Population genetics with qsPopulation genetics with qs
Population genetics with qs
tas11244
 

What's hot (20)

Population genetics
Population geneticsPopulation genetics
Population genetics
 
Chromosomal aberrations, utilization of aneuploids, chimeras and role of allo...
Chromosomal aberrations, utilization of aneuploids, chimeras and role of allo...Chromosomal aberrations, utilization of aneuploids, chimeras and role of allo...
Chromosomal aberrations, utilization of aneuploids, chimeras and role of allo...
 
Population genetics
Population geneticsPopulation genetics
Population genetics
 
14. components of genetic variation
14. components of genetic variation14. components of genetic variation
14. components of genetic variation
 
Statistical analysis of genetics parameter
Statistical analysis of genetics parameterStatistical analysis of genetics parameter
Statistical analysis of genetics parameter
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
D-Square statistic
D-Square statisticD-Square statistic
D-Square statistic
 
Population genetics with qs
Population genetics with qsPopulation genetics with qs
Population genetics with qs
 
Genomic Selection in Plants
Genomic Selection in PlantsGenomic Selection in Plants
Genomic Selection in Plants
 
Association mapping in plants
Association mapping in plantsAssociation mapping in plants
Association mapping in plants
 
Association mapping
Association mappingAssociation mapping
Association mapping
 
Bayesian Divergence Time Estimation – Workshop Lecture
Bayesian Divergence Time Estimation – Workshop LectureBayesian Divergence Time Estimation – Workshop Lecture
Bayesian Divergence Time Estimation – Workshop Lecture
 
5 gpb 621 components of variance
5 gpb 621 components of variance5 gpb 621 components of variance
5 gpb 621 components of variance
 
Population genetics basic concepts
Population genetics basic concepts Population genetics basic concepts
Population genetics basic concepts
 
TILLING & ECOTILLING
TILLING & ECOTILLINGTILLING & ECOTILLING
TILLING & ECOTILLING
 
Human genetics evolutionary genetics
Human genetics   evolutionary geneticsHuman genetics   evolutionary genetics
Human genetics evolutionary genetics
 
philogenetic tree
philogenetic treephilogenetic tree
philogenetic tree
 

Viewers also liked (7)

Phylogenetics in R
Phylogenetics in RPhylogenetics in R
Phylogenetics in R
 
Phylogenetics Analysis in R
Phylogenetics Analysis in RPhylogenetics Analysis in R
Phylogenetics Analysis in R
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Phylolecture
PhylolecturePhylolecture
Phylolecture
 
SeqinR - biological data handling
SeqinR - biological data handlingSeqinR - biological data handling
SeqinR - biological data handling
 
Introduction to Bayesian Phylogenetics
Introduction to Bayesian PhylogeneticsIntroduction to Bayesian Phylogenetics
Introduction to Bayesian Phylogenetics
 
Introduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users groupIntroduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users group
 

Similar to Phylogeny in R - Bianca Santini Sheffield R Users March 2015

A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
Roderic Page
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
BOSC 2010
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
DataMine Lab
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
vini89
 
Working with Trees in the Phyloinformatic Age. WH Piel
Working with Trees in the Phyloinformatic Age. WH PielWorking with Trees in the Phyloinformatic Age. WH Piel
Working with Trees in the Phyloinformatic Age. WH Piel
Roderic Page
 
PPT file
PPT filePPT file
PPT file
butest
 

Similar to Phylogeny in R - Bianca Santini Sheffield R Users March 2015 (20)

A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
 
Perl for Phyloinformatics
Perl for PhyloinformaticsPerl for Phyloinformatics
Perl for Phyloinformatics
 
Random Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna QuachRandom Forests: The Vanilla of Machine Learning - Anna Quach
Random Forests: The Vanilla of Machine Learning - Anna Quach
 
Comparison between riss and dcharm for mining gene expression data
Comparison between riss and dcharm for mining gene expression dataComparison between riss and dcharm for mining gene expression data
Comparison between riss and dcharm for mining gene expression data
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Bio.Phylo: Phylogenetics in Biopython (BOSC 2010)
Bio.Phylo: Phylogenetics in Biopython (BOSC 2010)Bio.Phylo: Phylogenetics in Biopython (BOSC 2010)
Bio.Phylo: Phylogenetics in Biopython (BOSC 2010)
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methods
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Working with Trees in the Phyloinformatic Age. WH Piel
Working with Trees in the Phyloinformatic Age. WH PielWorking with Trees in the Phyloinformatic Age. WH Piel
Working with Trees in the Phyloinformatic Age. WH Piel
 
Lab3
Lab3Lab3
Lab3
 
ML_Unit_1_Part_C
ML_Unit_1_Part_CML_Unit_1_Part_C
ML_Unit_1_Part_C
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
 
Cg7 trees
Cg7 treesCg7 trees
Cg7 trees
 
High-Dimensional Machine Learning for Medicine
High-Dimensional Machine Learning for MedicineHigh-Dimensional Machine Learning for Medicine
High-Dimensional Machine Learning for Medicine
 
6238578.ppt
6238578.ppt6238578.ppt
6238578.ppt
 
PPT file
PPT filePPT file
PPT file
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted tree
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 

More from Paul Richards

More from Paul Richards (11)

Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflowSheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
 
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
 
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
 
Querying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users GpQuerying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users Gp
 
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
 
Introduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in RIntroduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in R
 
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
 
Introduction to data.table in R
Introduction to data.table in RIntroduction to data.table in R
Introduction to data.table in R
 
Dplyr and Plyr
Dplyr and PlyrDplyr and Plyr
Dplyr and Plyr
 

Recently uploaded

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Recently uploaded (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 

Phylogeny in R - Bianca Santini Sheffield R Users March 2015

  • 1. Comparative analysis including phylogeny in R AE Zanne et al. Nature 000, 1-4 (2013) doi:10.1038/nature12872 Time-calibrated maximum-likelihood estimate of the molecular phylogeny for 31,749 species of seed plants. Bianca A. Santini @myoldowlisdead b.santini@sheffield.ac.uk
  • 2. What is a phylogeny? • Hypothesis that explains the evolutionary relationship among taxa* *taxa: species, or higher taxonomic levels. Also genes, or sequences node terminals/tips/leaves root A B C D Internal branch branch External branch ModifiedfromNatureScitable: http://www.nature.com/scitable/topicpage/reading-a-phylogenetic- tree-the-meaning-of-41956#
  • 3. Why use phylogenies… …in comparative analysis? • Comparative analyses are used to assess the ecological significance of a particular trait • However, because there is a shared history… a) they are not statistically independent data b) the feature under study might exist because of shared ancestry
  • 4. One example of data analyzed without a phylogeny • Salisbury’s data (1927, yes 88yrs ago!) • Observations – Differences in stomata density (SD) between sun(>) and shade (<) leaves • Measured stomatal density and related them to life-form, habitat type • Conclusion: SD increases with exposure SD Trees Shrubs Herbs Woody Herbs plants SD Marginal herbs Understory herbs Photo by A. Vazquez-Lobo
  • 5. Re-analysis of Salisbury’s data • Independent contrasts (Felsenstein, 1985) – Introduces a phylogeny – The trait changes along the branches of the tree, should be associated to changes in the explanatory variable – no. of times the traits changes in concert with the environmental variable (agreements) vs. no. of times they do not (disagreements) – Sign test SD Trees Shrubs Herbs Woody Herbs plants SD Marginal herbs Understory herbs Kelly and Beerling, 1995 (68 yrs after)
  • 6. How to get your own phylogeny ? phylomatic (uses a megatree) http://phylodiversity.net/phylomatic/ Use and trim and already published phylogeny: - Dryad: http://datadryad.org/ - Ecological Archives (from the esa) phyloGenerator (uses gen bank sequences) http://willpearse.github.io/phyloGenerator/ This is if you don’t have the sequences, or are not planning to get them.
  • 7. Package CAPER : pgls() Similar approach as in Independent Contrasts, but uses a matrix of variances and covariances (tree) N.B. If interested in phylogenies and evolution analyses: geiger, adephylo, picante, phylolm, ape…
  • 8. pgls: phylogenetic generalized least squares what do you need? 1. Phylogeny 2. Data – Make sure the rows in your data frame are the same as the tips of your tree i.e. your data: Juncus bufonius tree: Juncus_bufonius > my.data$underscore.name=gsub(" ","_",my.data$underscore.name) – Make sure you have one observation per species per trait: 3. Put them into a comparative.data() Species names Leaf area Seed mass Juncus_bufonius 120.2 0.24 Setaria_pumila 91.2 6.91
  • 9. #1)PHYLOGENY > tree<-read.tree ("Vascular_Plants_rooted.dated.tre”) ##or read.nexus() > tree <- congeneric.merge(tree,my.data$underscore.name) ##pez package Number of species in tree before: 401 Number of species in tree now: 550 > tree Phylogenetic tree with 550 tips and 393 internal nodes. Tip labels: Gladiolus_italicus, Juncus_squarrosus, Juncus_bufonius, Bolboschoenus_maritimus, Isolepis_setacea, Cyperus_fuscus, ... Node labels: , , , , , , … Rooted; includes branch lengths. pgls() #You can always check for synonyms and replace (taxize) > my.data$underscore.name<-recode(my.data$underscore.name, "'Aegilops_geniculata' = 'Aegilops_ovata'") > plot.phylo(tree, cex=0.45, type="radial", edge.color=c("red", "orange", "blue"))
  • 10. pgls() ##trim your tree > tree <- drop.tip(tree, setdiff(tree$tip.label, my.data$underscore.name)) #2) YOUR DATA > dat<-data.frame(read.csv(“mydata.csv",header=T)) #3)PUT THEM together in comparative data, which will drop rows with NAs for you and match the rows to the tips of the phylogeny :D > cdat <- comparative.data(data = dat, phy = tree, names.col = ”underscore.name”, scope=leaf.area~seed.mass, vcv=TRUE) #na.omit=FALSE #warn.dropped=TRUE > cdat$dropped #to see what has been dropped.
  • 11. #to get the phylogenetic signal : lambda=‘ML’ #0 is a star phylogeny (no phylo signal), and 1 is an structured phylogeny, or all is explained by the phylogeny. > fit= pgls(leaf.area~seed.mass, cdat, lambda='ML') > summary(fit) pgls() > summary(fit) Call: pgls(formula = leaf.area ~ seed.mass, data = dat, lambda = "ML") Residuals: Min 1Q Median 3Q Max -0.176405 -0.046501 0.003632 0.047885 0.227434 Branch length transformations: kappa [Fix] : 1.000 lambda [ ML] : 0.863 lower bound : 0.000, p = < 2.22e-16 upper bound : 1.000, p = < 2.22e-16 95.0% CI : (0.771, 0.919) delta [Fix] : 1.000 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.730721 0.318898 8.563 4.441e-16 *** seed.mass 0.442324 0.042318 10.452 < 2.2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.07308 on 373 degrees of freedom Multiple R-squared: 0.2265, Adjusted R-squared: 0.2245 F-statistic: 109.2 on 1 and 373 DF, p-value: < 2.2e-16
  • 12. But what if you want to analyze factors? #CAPER has some bugs… > cdat <- comparative.data(data = dat, phy = tree, names.col = ”underscore.name”,scope = leaf area ~nitro.class, vcv=TRUE) > fit= pgls(leaf area ~nitro.class, cdat, lambda='ML') > anova(fit) Error in terms.formula(formula,data=data): invalid model formula in ExtractVars
  • 13. #solve it like (below) > fit= pgls(leaf.area~nitro.class, cdat, lambda='ML') > fit1= pgls(leaf.area~1, cdat, lambda='ML') > anova(fit, fit1) Error in anova.pglslist(object, ...) : models were fitted with different branch length transformations. ##If you click on summary, you’ll see Call: pgls(formula = leaf.area~ nitro.class, data = dat, lambda = "ML") Residuals: Min 1Q Median 3Q Max -0.200441 -0.049287 -0.002017 0.051002 0.200019 Branch length transformations: kappa [Fix] : 1.000 lambda [ ML] : 0.867 lower bound : 0.000, p = < 2.22e-16 upper bound : 1.000, p = < 2.22e-16 95.0% CI : (0.772, 0.926) delta [Fix] : 1.000 Call: pgls(formula = leaf.area ~ 1, data = dat, lambda = "ML") Residuals: Min 1Q Median 3Q Max -0.274585 -0.061252 0.003683 0.053449 0.253368 Branch length transformations: kappa [Fix] : 1.000 lambda [ ML] : 0.896 lower bound : 0.000, p = < 2.22e-16 upper bound : 1.000, p = < 2.22e-16 95.0% CI : (0.825, 0.940) delta [Fix] : 1.000
  • 14. #giving both models the same value > fit= pgls(leaf.area~nitro.class, cdat, lambda=0.885) > fit1= pgls(leaf.area~1, cdat, lambda=0.885) > anova(fit, fit1)
  • 15.  You can also use gls(), instead of lambda do method=‘ML’  Visualize your tree, always exciting!  In R > plot(tree) > help(plot.phylo) #install ape #and: http://www.r-phylo.org/wiki/Main_Page  Use FigTree (drop the file and it will do the phylogeny for you)  phytools
  • 18. >plot.phylo(tree, cex=0.45, type="fan", edge.color=c("red", "orange", "green","blue"), edge.lty=5)
  • 20. From Freckleton et al. 2002.

Editor's Notes

  1. This are different trees of representing the same relationships Tips are our data, and Nodes represent common ancestors and speciation events External branches connect a tip an a node Internal branches (or internodes) connect two nodes Internal nodes , they represent putative ancestors for the sample viruses.