Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Bayesian Phylogenetic
Inference: Introduction
Fred(rik) Ronquist
Swedish Museum of Natural History,
Stockholm, Sweden
BIG ...
Topics
 Probability 101
 Bayesian Phylogenetic Inference
 Markov chain Monte Carlo
 Bayesian Model Choice and Model
Av...
1. Probability 101
Do not worry about your
difficulties in Mathematics. I
can assure you that mine
are still greater.
Albert Einstein
α Uniform distribution
β
γ
δ
ε
Beta distribution
Gamma distribution
Dirichlet distribution
Exponential distribution
Contin...
DiscreteProbability
Sample spaceω1 ω2 ω3 ω4 ω5 ω6
E = ω1,ω3,ω5{ } Event (subset of outcomes;
e.g., face with odd number)
D...
ContinuousProbability
Sample space
(an interval)
Disc with
circumference 1
f (x) Probability density
function (pdf)
(e.g. ...
ProbabilityDistributions
• Uniform distribution
• Beta distribution
• Gamma distribution
• Dirichlet distribution
• Expone...
ProbabilityDistributions
• Bernoulli distribution
• Categorical distribution
• Binomial distribution
• Multinomial distrib...
StochasticProcesses
• Markov chain
• Poisson process
• Birth-death process
• Coalescence
• Dirichlet Process Mixture
Stoch...
ExponentialDistribution
f (x) = λe−λx
Mean: 1/λ
λ = rate (of decay)
Exp(λ)X ~
Parameters:
Probability density function:
Ex...
GammaDistribution
f (x) µ xα−1
e−βx
Mean: α /β
α = shape
Gamma(α,β)X ~
Parameters:
Probability density function:
Gamma dis...
BetaDistribution
f (x) µ xα1 −1
(1− x)α2 −1
Mode:
α1 −1
αi −1( )
i
∑
α1,α2 = shape parameters
Beta(α1,α2)X ~
Parameters:
P...
DirichletDistribution
f (x) µ xi
αi −1
i
∏
α = vector of k shape parameters
Dir(α) :α = α1,α2,...,αk{ }X ~
Parameters:
Pro...
ConditionalProbability
Pr(H) =
4
10
= 0.4
Pr(D) =
3
10
= 0.3
Pr(D,H) =
2
10
= 0.2Joint probability:
Pr(D | H) =
2
4
= 0.5C...
Bayes’Rule
Reverend Thomas Bayes
(1701-1760)
Pr(A | B) ⇒ Pr(B | A) ?
Bayes’Rule
Pr(D,H) = Pr(D)Pr(H | D)
= Pr(H)Pr(D | H)
Pr(D)Pr(H | D) = Pr(H)Pr(D | H)
Bayes’ rulePr(H | D) =
Pr(H)Pr(D | H)...
MaximumLikelihood
Pr(D |θ)
Maximum Likelihood Inference
Data D; Model M with parameters θ
We can calculate or f (D |θ)
Max...
BayesianInference
Pr(D |θ)
Bayesian Inference
Data D; Model M with parameters θ
We can calculate or f (D |θ)
We are actual...
Coin Tossing Example
What is the probability of
your favorite team winning
the next ice hockey World
Championships?
Gold Silver Bronze
2012
2013
2014
World Championship Medalists
other
2
5
8
1
5
5
3
1
Medals
2011
2007
2008
2009
2010
2015
...
other
2
5
8
1
5
5
3
1
Prior
( )f θ
other
2
0
8
0
0
5
3
0
Posterior 1
1( | )f Dθ
other
0
0
0
0
0
5
0
0
Posterior 2
2( | )f ...
Learn more:
• Wikipedia (good texts on most statistical distributions,
sometimes a little difficult)
• Grinstead & Snell: ...
2. Bayesian Phylogenetic
Inference
Infer relationships among three species:
Outgroup:
Three possible trees (topologies):
A
B
C
A B C
Prior distribution
probability
1.0
Posterior distribution
probability
1.0
Data (observations)
Model
D The data
Taxon Characters
A ACG TTA TTA AAT TGT CCT CTT TTC AGA
B ACG TGT TTC GAT CGT CCT CTT TTC AGA
C ACG TGT TTA GAC ...
Model: topology AND branch lengths
θ Parameters
topology )(τ
branch lengths )( iv
A
B
3v
C
D
2v
1v
4v
5v
(expected amount ...
Model: molecular evolution
θ Parameters
instantaneous rate matrix
(Jukes-Cantor)
Q =
[A] [C] [G] [T]
[A] − µ µ µ
[C] µ − µ...
Model: molecular evolution
Probabilities are calculated using the transition
probability matrix P
4 /3
4 /3
1 1
4 4( )
1 3...
Priors on parameters
 Topology
 all unique topologies have equal
probability
 Branch lengths
 exponential prior (puts ...
4 /3v
p k ke−
= −v
Exp(1)v :
Exp(1)v :
The effect on data likelihood is most important
Jeffrey’s uninformative priors form...
Bayes’ theorem
( ) ( | )
( | )
( ) ( | ) d
f f D
f D
f f D
θ θ
θ
θ θ θ
=
∫
Posterior
distribution
Prior distribution ”Like...
tree 1 tree 2 tree 3
θ
)|( Xf θ
Posterior probability distribution
Parameter space
Posteriorprobability
tree 1 tree 2 tree 3
20% 48% 32%
We can focus on any parameter of interest
(there are no nuisance parameters) by
marginali...
32.048.020.0
38.014.019.005.0
33.006.022.005.0
29.012.007.010.0
3
2
1
321
ν
ν
ν
τττ
joint probabilities
marginal probabili...
tree 1 tree 2 tree 3
always accept
accept sometimes
 Start at an arbitrary point
 Make a small random move
 Calculate h...
Metropolis algorithm
Assume that the current state has
parameter values θ
Consider a move to a state with parameter
values...
MCMC Sampling Strategies
 Great freedom of strategies:
 Typically one or a few related parameters
changed at a time
 Yo...
Trace Plot
burn-in
stationary phase sampled with thinning
(rapid mixing essential)
Majority rule
consensus tree
Frequencies
represent the
posterior
probability of
the clades
Probability of
clade being true...
Summarizing Trees
 Maximum posterior probability tree (MAP tree)
 can be difficult to estimate precisely
 can have low ...
Adding Model Complexity
A
B
topology General Time Reversible
substitution model
)(τ
÷
÷
÷
÷
÷









−
−
−
−
=
...
Adding Model Complexity
Gamma-shaped
rate variation
across sites
Priors on Parameters
 Stationary state frequencies
 Flat Dirichlet, Dir(1,1,1,1)
 Exchangeability parameters
 Flat Dir...
Mean and 95%
credibility
interval for model
parameters
Summarizing Variables
 Mean, median, variance common
summaries
 95 % credible interval: discard the
lowest 2.5 % and hig...
Credible interval
HPD
HPD HPD
Credible intervals and HPDs
Other Sampling Methods
 Gibbs sampling: sample from the conditional posterior (a variant of
the Metropolis algorithm)
 M...
Sequential Monte Carlo Algorithm for Phylogenetics
Bouchard et al. 2012. Syst. Biol.
3. Markov chain Monte
Carlo
Convergence and Mixing
 Convergence is the degree to which
the chain has converged onto the
target distribution
 Mixing ...
Assessing Convergence
 Plateau in the trace plot
 Look at sampling behavior within the
run (autocorrelation time, effect...
Convergence within Run
t =1+ 2 ρk (θ)
k=1
∞
∑Autocorrelation time
where ρk(θ) is the autocorrelation in the MCMC
samples f...
Convergence among Runs
 Tree topology:
 Compare clade probabilities (split frequencies)
 Average standard deviation of ...
Cladeprobabilityinanalysis2
Clade probability in analysis 1
Cladeprobabilityinanalysis2
Clade probability in analysis 1
Sampledvalue Target distribution
Too modest proposals
Acceptance rate too high
Poor mixing
Too bold proposals
Acceptance r...
Tuning Proposals
 Manually by changing tuning parameters
 Increase the boldness of a proposal if
acceptance rate is too ...
cold chain
heated chain
Metropolis-
coupled
Markov chain
Monte Carlo
a. k. a.
MCMCMC
a. k. a.
(MC)3
cold chain
hot chain
cold chain
hot chain
cold chain
hot chain
unsuccessful swap
cold chain
hot chain
cold chain
hot chain
cold chain
hot chain
cold chain
hot chain
successful swap
cold chain
hot chain
cold chain
hot chain
cold chain
hot chain
successful swap
cold chain
hot chain
Incremental Heating
( )iT λ+= 1/1 { }1,...,1,0 −= ni
( )
( )
( )
( ) 62.0
71.0
83.0
00.1
|62.03
|71.02
|83.01
|00.10
Distr...
4. Bayesian Model Choice
Bayesian Model Sensitivity
Models, models, models
 Alignment-free models
 Heterogeneity in substitution rates and stationary
frequencies across sit...
Bayes’ Rule
f (θ | D) =
f (θ) f (D |θ)
f (θ) f (D |θ) dθ∫
=
f (θ) f (D |θ)
f (D)
Marginal likelihood (of the data)
f (θ | ...
Bayesian Model Choice
f M1( ) f D | M1( )
f M0( ) f D | M0( )
Posterior model odds:
B10 =
f D | M1( )
f D | M0( )
Bayes fa...
Bayesian Model Choice
 The normalizing constant in Bayes’ theorem, the
marginal likelihood of the data, f(D) or f(D|M), c...
Simple Model Wins (from Lewis, 2008)
Bayes Factor Comparisons
Interpretation of the Bayes factor
2ln(B10) B10 Evidence against M0
0 to 2 1 to 3
Not worth more ...
Bayesian Software
 Model testing
 ModelTest
 MrModelTest
 MrAIC
 Convergence diagnostics
 AWTY
 Tracer
 Phylogenet...
Listening to lectures,
after a certain age,
diverts the mind too much
from its creative pursuits.
Any scientist who attend...
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
Bayesian phylogenetic inference_big4_ws_2016-10-10
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Bayesian Divergence Time Estimation – Workshop Lecture
Next
Upcoming SlideShare
Bayesian Divergence Time Estimation – Workshop Lecture
Next
Download to read offline and view in fullscreen.

Share

Bayesian phylogenetic inference_big4_ws_2016-10-10

Download to read offline

Introduction to Bayesian Phylogenetic Inference.

Related Books

Free with a 30 day trial from Scribd

See all

Bayesian phylogenetic inference_big4_ws_2016-10-10

  1. 1. Bayesian Phylogenetic Inference: Introduction Fred(rik) Ronquist Swedish Museum of Natural History, Stockholm, Sweden BIG 4 Workshop, October 9-19, Tovetorp, Sweden
  2. 2. Topics  Probability 101  Bayesian Phylogenetic Inference  Markov chain Monte Carlo  Bayesian Model Choice and Model Averaging
  3. 3. 1. Probability 101
  4. 4. Do not worry about your difficulties in Mathematics. I can assure you that mine are still greater. Albert Einstein
  5. 5. α Uniform distribution β γ δ ε Beta distribution Gamma distribution Dirichlet distribution Exponential distribution Continuous probability distributions
  6. 6. DiscreteProbability Sample spaceω1 ω2 ω3 ω4 ω5 ω6 E = ω1,ω3,ω5{ } Event (subset of outcomes; e.g., face with odd number) Distribution function (E.g., uniform) m(ω) Pr(E) = m(ω) ω ∈Ε ∑ Probability Random variable X
  7. 7. ContinuousProbability Sample space (an interval) Disc with circumference 1 f (x) Probability density function (pdf) (e.g. Uniform(0,1)) Pr(E) = f (x) x∈E ∫ dx Probability E = a,b[ ) Event (a subspace of the sample space) a b Random variable X
  8. 8. ProbabilityDistributions • Uniform distribution • Beta distribution • Gamma distribution • Dirichlet distribution • Exponential distribution • Normal distribution • Lognormal distribution • Multivariate normal distribution Continuous Distributions
  9. 9. ProbabilityDistributions • Bernoulli distribution • Categorical distribution • Binomial distribution • Multinomial distribution • Poisson distribution Discrete Distributions
  10. 10. StochasticProcesses • Markov chain • Poisson process • Birth-death process • Coalescence • Dirichlet Process Mixture Stochastic Processes
  11. 11. ExponentialDistribution f (x) = λe−λx Mean: 1/λ λ = rate (of decay) Exp(λ)X ~ Parameters: Probability density function: Exponential distribution
  12. 12. GammaDistribution f (x) µ xα−1 e−βx Mean: α /β α = shape Gamma(α,β)X ~ Parameters: Probability density function: Gamma distribution β = inverse scale Scaled gamma: α = β Scaled Gamma
  13. 13. BetaDistribution f (x) µ xα1 −1 (1− x)α2 −1 Mode: α1 −1 αi −1( ) i ∑ α1,α2 = shape parameters Beta(α1,α2)X ~ Parameters: Probability density function: Beta distribution Defined on two proportions of a whole (a simplex)
  14. 14. DirichletDistribution f (x) µ xi αi −1 i ∏ α = vector of k shape parameters Dir(α) :α = α1,α2,...,αk{ }X ~ Parameters: Probability density function: Dirichlet distribution Defined on k proportions of a whole Dir(1,1,1,1) Dir(300,300,300,300)
  15. 15. ConditionalProbability Pr(H) = 4 10 = 0.4 Pr(D) = 3 10 = 0.3 Pr(D,H) = 2 10 = 0.2Joint probability: Pr(D | H) = 2 4 = 0.5Conditional probability:
  16. 16. Bayes’Rule Reverend Thomas Bayes (1701-1760) Pr(A | B) ⇒ Pr(B | A) ?
  17. 17. Bayes’Rule Pr(D,H) = Pr(D)Pr(H | D) = Pr(H)Pr(D | H) Pr(D)Pr(H | D) = Pr(H)Pr(D | H) Bayes’ rulePr(H | D) = Pr(H)Pr(D | H) Pr(D) = 3 10 × 2 3 = 2 10 = 0.2 = 4 10 × 2 4 = 2 10 = 0.2
  18. 18. MaximumLikelihood Pr(D |θ) Maximum Likelihood Inference Data D; Model M with parameters θ We can calculate or f (D |θ) Maximum likelihood: find the value of θ that maximizes L(θ) Define the likelihood function L(θ)µ f (D |θ) Confidence: asymptotic behavior, more samples, bootstrapping
  19. 19. BayesianInference Pr(D |θ) Bayesian Inference Data D; Model M with parameters θ We can calculate or f (D |θ) We are actually interested in Pr(θ | D) or f (θ | D) Bayes’ rule: f (θ | D) = f (θ) f (D |θ) f (D) Posterior density Prior density ”Likelihood” Normalizing constant Marginal likelihood of the data Model likelihood f (D) = f (θ)∫ f (D |θ) dθ
  20. 20. Coin Tossing Example
  21. 21. What is the probability of your favorite team winning the next ice hockey World Championships?
  22. 22. Gold Silver Bronze 2012 2013 2014 World Championship Medalists other 2 5 8 1 5 5 3 1 Medals 2011 2007 2008 2009 2010 2015 2016
  23. 23. other 2 5 8 1 5 5 3 1 Prior ( )f θ other 2 0 8 0 0 5 3 0 Posterior 1 1( | )f Dθ other 0 0 0 0 0 5 0 0 Posterior 2 2( | )f Dθ other in out in out out in in out Data 1 other out out out out out won out out Data 2 ( )f θ 1 2( | )f D Dθ +
  24. 24. Learn more: • Wikipedia (good texts on most statistical distributions, sometimes a little difficult) • Grinstead & Snell: Introduction to Probability. American Mathematical Society. Free pdf available from: http://www.dartmouth.edu/~chance/teaching_aids/books_art • Team up with a statistician or a computational / theoretical evolutionary biologist!
  25. 25. 2. Bayesian Phylogenetic Inference
  26. 26. Infer relationships among three species: Outgroup:
  27. 27. Three possible trees (topologies): A B C
  28. 28. A B C Prior distribution probability 1.0 Posterior distribution probability 1.0 Data (observations) Model
  29. 29. D The data Taxon Characters A ACG TTA TTA AAT TGT CCT CTT TTC AGA B ACG TGT TTC GAT CGT CCT CTT TTC AGA C ACG TGT TTA GAC CGA CCT CGG TTA AGG D ACA GGA TTA GAT CGT CCG CTT TTC AGA
  30. 30. Model: topology AND branch lengths θ Parameters topology )(τ branch lengths )( iv A B 3v C D 2v 1v 4v 5v (expected amount of change) ),( vτθ =
  31. 31. Model: molecular evolution θ Parameters instantaneous rate matrix (Jukes-Cantor) Q = [A] [C] [G] [T] [A] − µ µ µ [C] µ − µ µ [G] µ µ − µ [T] µ µ µ −          ÷ ÷ ÷ ÷ ÷÷
  32. 32. Model: molecular evolution Probabilities are calculated using the transition probability matrix P 4 /3 4 /3 1 1 4 4( ) 1 3 4 4 v Qv v e P v e e − −  − = =   +  (change) (no change)
  33. 33. Priors on parameters  Topology  all unique topologies have equal probability  Branch lengths  exponential prior (puts more weight on small branch lengths)
  34. 34. 4 /3v p k ke− = −v Exp(1)v : Exp(1)v : The effect on data likelihood is most important Jeffrey’s uninformative priors formalize this Branch length Prob. of substitution Scale matters in priors
  35. 35. Bayes’ theorem ( ) ( | ) ( | ) ( ) ( | ) d f f D f D f f D θ θ θ θ θ θ = ∫ Posterior distribution Prior distribution ”Likelihood” Normalizing constant D = Data θ = Model parameters
  36. 36. tree 1 tree 2 tree 3 θ )|( Xf θ Posterior probability distribution Parameter space Posteriorprobability
  37. 37. tree 1 tree 2 tree 3 20% 48% 32% We can focus on any parameter of interest (there are no nuisance parameters) by marginalizing the posterior over the other parameters (integrating out the uncertainty in the other parameters) (Percentages denote marginal probability distribution on trees)
  38. 38. 32.048.020.0 38.014.019.005.0 33.006.022.005.0 29.012.007.010.0 3 2 1 321 ν ν ν τττ joint probabilities marginal probabilities Why is it called marginalizing? trees branchlengthvectors
  39. 39. tree 1 tree 2 tree 3 always accept accept sometimes  Start at an arbitrary point  Make a small random move  Calculate height ratio (r) of new state to old state:  r > 1 -> new state accepted  r < 1 -> new state accepted with probability r. If new state not accepted, stay in the old state  Go to step 2 Markov chain Monte Carlo The proportion of time the MCMC procedure samples from a particular parameter region is an estimate of that region’s posterior probability density 1 2b 2a 20 % 48 % 32 %
  40. 40. Metropolis algorithm Assume that the current state has parameter values θ Consider a move to a state with parameter values θ * The height ratio r is (prior ratio x likelihood ratio) r = f (θ* | D) f (θ | D) = f (θ* ) f (D |θ* )/ f (D) f (θ) f (D |θ)/ f (D) = f (θ* ) f (θ) × f (D |θ* ) f (D |θ)
  41. 41. MCMC Sampling Strategies  Great freedom of strategies:  Typically one or a few related parameters changed at a time  You can cycle through parameters systematically or choose randomly  One ”generation” or ”iteration” or ”cycle” can include a single randomly chosen proposal (or move, operator, kernel), one proposal for each parameter, a block of randomly chosen proposals
  42. 42. Trace Plot
  43. 43. burn-in stationary phase sampled with thinning (rapid mixing essential)
  44. 44. Majority rule consensus tree Frequencies represent the posterior probability of the clades Probability of clade being true given data and model
  45. 45. Summarizing Trees  Maximum posterior probability tree (MAP tree)  can be difficult to estimate precisely  can have low probability  Majority rule consensus tree  easier to estimate clade probabilities exactly  branch length distributions can be summarized across all trees with the branch  can hide complex topological dependence  branch length distributions can be multimodal  Credible sets of trees  Include trees in order of decreasing probability to obtain, e.g., 95 % credible set  “Median” or “central” tree
  46. 46. Adding Model Complexity A B topology General Time Reversible substitution model )(τ ÷ ÷ ÷ ÷ ÷          − − − − = GTGCTCATA GTTCGCAGA CTTCGGACA ATTAGGACC rrr rrr rrr rrr Q πππ πππ πππ πππ 3v C D 2v 1v 4v 5v branch lengths )( iv
  47. 47. Adding Model Complexity Gamma-shaped rate variation across sites
  48. 48. Priors on Parameters  Stationary state frequencies  Flat Dirichlet, Dir(1,1,1,1)  Exchangeability parameters  Flat Dirichlet, Dir(1,1,1,1,1,1)  Shape parameter of scaled gamma distribution of rate variation across sites  Uniform Uni(0,50)
  49. 49. Mean and 95% credibility interval for model parameters
  50. 50. Summarizing Variables  Mean, median, variance common summaries  95 % credible interval: discard the lowest 2.5 % and highest 2.5 % of sampled values  95 % region of highest posterior density (HPD): find smallest region containing 95 % of probability
  51. 51. Credible interval HPD HPD HPD Credible intervals and HPDs
  52. 52. Other Sampling Methods  Gibbs sampling: sample from the conditional posterior (a variant of the Metropolis algorithm)  Metropolized Gibbs sampling: more efficient variant of Gibbs sampling of discrete characters  Slice sampling: less prone to get stuck in local optima than the Metropolis algorithm  Hamiltonian sampling. A technique for decreasing the problem with sampling correlated parameters.  Simulated annealing: increase ”greediness” during the burn-in phase of MCMC sampling  Data augmentation techniques: add parameters to facilitate probability calculations  Sequential Monte Carlo techniques: generate a sample of complete state by building sets of particles from incomplete states
  53. 53. Sequential Monte Carlo Algorithm for Phylogenetics Bouchard et al. 2012. Syst. Biol.
  54. 54. 3. Markov chain Monte Carlo
  55. 55. Convergence and Mixing  Convergence is the degree to which the chain has converged onto the target distribution  Mixing is the speed with which the chain covers the region of interest in the target distribution
  56. 56. Assessing Convergence  Plateau in the trace plot  Look at sampling behavior within the run (autocorrelation time, effective sample size)  Compare independent runs with different, randomly chosen starting points
  57. 57. Convergence within Run t =1+ 2 ρk (θ) k=1 ∞ ∑Autocorrelation time where ρk(θ) is the autocorrelation in the MCMC samples for a lag of k generations Effective sample size (ESS) e = n t where n is the total sample size (number of generations) Good mixing when t is small and e large
  58. 58. Convergence among Runs  Tree topology:  Compare clade probabilities (split frequencies)  Average standard deviation of split frequencies above some cut-off (min. 10 % in at least one run). Should go to 0 as runs converge.  Continuous variables  Potential scale reduction factor (PSRF). Compares variance within and between runs. Should approach 1 as runs converge.  Assumes overdispersed starting points
  59. 59. Cladeprobabilityinanalysis2 Clade probability in analysis 1
  60. 60. Cladeprobabilityinanalysis2 Clade probability in analysis 1
  61. 61. Sampledvalue Target distribution Too modest proposals Acceptance rate too high Poor mixing Too bold proposals Acceptance rate too low Poor mixing Moderately bold proposals Acceptance rate intermediate Good mixing
  62. 62. Tuning Proposals  Manually by changing tuning parameters  Increase the boldness of a proposal if acceptance rate is too high  Decrease the boldness of a proposal if acceptance rate is too low  Auto-tuning  Tuning parameters are adjusted automatically by the MCMC procedure to reach a target proposal rate
  63. 63. cold chain heated chain Metropolis- coupled Markov chain Monte Carlo a. k. a. MCMCMC a. k. a. (MC)3
  64. 64. cold chain hot chain
  65. 65. cold chain hot chain
  66. 66. cold chain hot chain
  67. 67. unsuccessful swap cold chain hot chain
  68. 68. cold chain hot chain
  69. 69. cold chain hot chain
  70. 70. cold chain hot chain successful swap
  71. 71. cold chain hot chain
  72. 72. cold chain hot chain
  73. 73. cold chain hot chain successful swap
  74. 74. cold chain hot chain
  75. 75. Incremental Heating ( )iT λ+= 1/1 { }1,...,1,0 −= ni ( ) ( ) ( ) ( ) 62.0 71.0 83.0 00.1 |62.03 |71.02 |83.01 |00.10 Distr. Xf Xf Xf Xf Ti θ θ θ θ T is temperature, λ is heating coefficient Example for λ = 0.2: cold chain heated chains
  76. 76. 4. Bayesian Model Choice
  77. 77. Bayesian Model Sensitivity
  78. 78. Models, models, models  Alignment-free models  Heterogeneity in substitution rates and stationary frequencies across sites and lineages  Relaxed clock models  Models for morphology and biogeography  Sampling across model space, e.g. GTR space and partition space  Models of dependence across sites according to 3D structure of proteins  Positive selection models  Aminoacid models  Models for population genetics and phylogeography
  79. 79. Bayes’ Rule f (θ | D) = f (θ) f (D |θ) f (θ) f (D |θ) dθ∫ = f (θ) f (D |θ) f (D) Marginal likelihood (of the data) f (θ | D,M) = f (θ | M) f (D |θ,M) f (D | M) We have implicitly conditioned on a model:
  80. 80. Bayesian Model Choice f M1( ) f D | M1( ) f M0( ) f D | M0( ) Posterior model odds: B10 = f D | M1( ) f D | M0( ) Bayes factor:
  81. 81. Bayesian Model Choice  The normalizing constant in Bayes’ theorem, the marginal likelihood of the data, f(D) or f(D|M), can be used for model choice  f(D|M) can be estimated by taking the harmonic mean of the likelihood values from the MCMC run. Thermodynamic integration and stepping-stone sampling are computationally more complex but more accurate methods  Any models can be compared: nested, non-nested, data-derived; it is just a probability comparison  No correction for number of parameters  Can prefer a simpler model over a more complex model  Critical values in Kass and Raftery (1997)
  82. 82. Simple Model Wins (from Lewis, 2008)
  83. 83. Bayes Factor Comparisons Interpretation of the Bayes factor 2ln(B10) B10 Evidence against M0 0 to 2 1 to 3 Not worth more than a bare mention 2 to 6 3 to 20 Positive 6 to 10 20 to 150 Strong > 10 > 150 Very strong
  84. 84. Bayesian Software  Model testing  ModelTest  MrModelTest  MrAIC  Convergence diagnostics  AWTY  Tracer  Phylogenetic inference  MrBayes  BEAST  BayesPhylogenies  PhyloBayes  Phycas  BAMBE  RevBayes  Specialized inference  PHASE  BAliPhy  BayesTraits  Badger  BEST  *BEAST  CoEvol  Tree drawing  TreeView  FigTree
  85. 85. Listening to lectures, after a certain age, diverts the mind too much from its creative pursuits. Any scientist who attends too many lectures and uses her own brain too little falls into lazy habits of thinking. after Albert Einstein
  • ViktorSenderov

    Oct. 10, 2016

Introduction to Bayesian Phylogenetic Inference.

Views

Total views

868

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

53

Shares

0

Comments

0

Likes

1

×