7. bayesian phylogenetic analysis using mrbayes

26
. Bayesian phylogenetic analysis using MrBAY UST Jeong Dageum 2010.05.24 Thomas Bayes(1702-1761) hylogenetic Handbook – Section III, Phylogenetic inference Prior * Likelihood Posterior Normalizing constant 1

Upload: ata

Post on 23-Feb-2016

91 views

Category:

Documents


0 download

DESCRIPTION

The Phylogenetic Handbook – Section III, Phylogenetic inference. 7. Bayesian phylogenetic analysis using MrBAYES. Thomas Bayes (1702-1761). Prior * Likelihood. Posterior. UST Jeong Dageum 2010.05.24. Normalizing constant. 7. 1 Introduction 7.2 Bayesian phylogenetic inference - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 7. Bayesian  phylogenetic  analysis using  MrBAYES

1

7. Bayesian phylogenetic analysis using MrBAYES

USTJeong Dageum2010.05.24

Thomas Bayes(1702-1761)

The Phylogenetic Handbook – Section III, Phylogenetic inference

Prior * LikelihoodPosterior

Normalizing constant

Page 2: 7. Bayesian  phylogenetic  analysis using  MrBAYES

2

7. 1 Introduction7.2 Bayesian phylogenetic inference7.3 Markov chain Monte Carlo sampling7.4 Burn-in, mixing and convergence7.5 Metropolis coupling7.6 Summarizing the results7.7 An introduction to phylogenetic models7.8 Bayesian model choice and model averaging7.9 Prior probability distribution

Page 3: 7. Bayesian  phylogenetic  analysis using  MrBAYES

3

Next year’s world championships in ice hockey? Sweden?!!!!

15 years1 of 7 countries1:7 or 0.14

2 gold medal2:15 or 0.13

Final? Semifinal?

Russia, Canada, Finland, Czech Republic, Sweden, Slovakia, United States: 7

7.1 Introduction

Page 4: 7. Bayesian  phylogenetic  analysis using  MrBAYES

4

Bayesian approach: Bayesian inference is just a mathematical formalization of a decision process that most of us use without reflecting on it

Posterior

Prior * Likelihood

Normalizing Constant

7.1 Introduction

Page 5: 7. Bayesian  phylogenetic  analysis using  MrBAYES

5

?

? Forward probability

50: 50 ?

W ball proportion : PB ball proportion: 1-p

7.1 Introduction

a, b

Converse !

Page 6: 7. Bayesian  phylogenetic  analysis using  MrBAYES

6

f(p |a,b) = ? : Reverse probability problem

We know a and b, thenWhat is the probability of a particular value of p?

Need Prior beliefs about the value of p

7.1 Introduction

Page 7: 7. Bayesian  phylogenetic  analysis using  MrBAYES

7

[Probability mass function]: a function describing the probability of a discrete Random variable (ex: Dice)

Box 7.1 Probability distributions – [Considering Prior]

[Probability density function]: For a continuous variable, the equivalent functionThe value of this function is not a probability

Exponential distribution: A better choice for a vague prior on branch lengths

7.1 Introduction

Page 8: 7. Bayesian  phylogenetic  analysis using  MrBAYES

8

Gamma distribution: 2 parameters (shape parameter α, scale parameter β

Small value of α: the distribution is L-shapedAnd the variance is largeHigh value of α: similar to normal distribution

7.1 IntroductionBox 7.1 Probability distributions – [Considering Prior]

The beta distribution denoted Beta (α1, α2)Describes the probability on two proportions, which are associated with the weight parameters.

The beta distribution

Page 9: 7. Bayesian  phylogenetic  analysis using  MrBAYES

9

Posterior probability distribution

7.1 Introduction

How do we calculate f(a,b)?

Bayes’ theoremWe can calculate f(a,b|p)We can specify f(p)

To integrate over All possible values of p

- > Denominator is a normalizing constant

Page 10: 7. Bayesian  phylogenetic  analysis using  MrBAYES

10

7.2 Bayesian phylogenetic inference

P(Tree |Data) = P (Data)

P(Data |Tree)P (Tree)Posterior

Likelihood * Prior

Normalizing constant

Page 11: 7. Bayesian  phylogenetic  analysis using  MrBAYES

11

X: the matrix of aligned sequences Θ: topology, branch length, model.. Θ = (τ: topology parameter υ: branch lengths on the tree) substitution model parameters to be considered

7.2 Bayesian phylogenetic inference

X(Data) : fixed, Θ(parameter): Random

(Jukes Cantor substitution model)

Page 12: 7. Bayesian  phylogenetic  analysis using  MrBAYES

12

7.2 Bayesian phylogenetic inference

Parameter spaceIt corresponds to a particular set of branch lengths on that topology

Summarized all joint probabilities along one axis of the table, we obtain the marginal probabilities for the corresponding parameter.

Each cell

[Bayesian inference: there is no need to decide on the parameters of interest before performing the analysis]

Page 13: 7. Bayesian  phylogenetic  analysis using  MrBAYES

13

7.3 Markov chain Monte Carlo sampling

[For parameter sampling]

Markov chain Monte Carlo steps1. Start an arbitrary point (θ)2. Make a small random move (to θ*)3. Calculate height ration (r) of new state (to θ*) to old state (θ)

(a) r>1: new state accepted(b) r<1: new state accepted with probability r

if new state rejected, stay in old state4. Go to step 2

f (θ | θ*)

(Prior ratio)

)

(likelihood ratio)

(proposal ratio)

Page 14: 7. Bayesian  phylogenetic  analysis using  MrBAYES

14

7.3 Markov chain Monte Carlo samplingBox 7.2 Proposal mechanism – To change continuous variables

1) Proposal step2) Acceptance/Rejection step

[Sliding window proposal]

[Normal proposal] (similar to the above one)

[The beta and Dirichlet proposal]

ω: tuning parameterLarge: more radical proposal & lower acceptance ratesSmall: more modest changes & higher acceptance rate

σ2: Determine how drastic the new proposals are andhow often they will be accepted

[Multiplier proposal]

σ2: Determine how drastic the new proposals are andhow often they will be accepted

Page 15: 7. Bayesian  phylogenetic  analysis using  MrBAYES

15

Discard

7.4 Burn-in, mixing and convergence – [about performance of an MCMC run]

* Trace plot* To confirm convergence* Mixing behavior

Page 16: 7. Bayesian  phylogenetic  analysis using  MrBAYES

16

7.4 Burn-in, mixing and convergenceThe mixing behavior of a Metropolis sampler can be adjusted using its tuning parameter

Poor mixing

Poor mixing

Good mixing

ω is too small,The proposal will be accepted wellTakes long time to cover the all region

ω is too large,The proposal will be rejected wellTakes long time to cover the all region

ω is an intermediate value,Moderate acceptance rates

Page 17: 7. Bayesian  phylogenetic  analysis using  MrBAYES

17

7.4 Burn-in, mixing and convergence

Convergence diagnostics help determine the quality of a sample from the posterior.3 different types of diagnostics

(1) Examining autocorrelation times, effective sample sizes, and other measures of the behavior of single chains

(2) Comparing samples from successive time segments of a single chain

(3) Comparing samples from different runs.

=> In Bayesian MCMC sampling of phylogenetic problems, the tree topology is typically the most difficult parameter to sample from

ÞThe approach to solve this problem is to focus on split frequencies instead.•A split is a partition of the tips of the tree into two non-overlapping sets;•To calculate the average standard deviation of the split frequencies.

ÞPotential Scale Reduction Factor(PSRF)•PSRF compares the variance among runs with the variance within runs.•As the chains converge, the variances will become more similar and the PSRF will approach 1.o

Page 18: 7. Bayesian  phylogenetic  analysis using  MrBAYES

18

7.5 Metropolis coupling – [To activate the mixing]Cold chain, Hot chain•When: Difficult of impossible to achieve convergence•Metropolis coupling: A General technique to improve mixing

* An incremental heating scheme T = 1/ 1 + λi

where i∈{ 0,1,…k} for k heated chains, with i=0 for the cold chain, and λ is the temperature factor ( intermediate value of λ works best)

Page 19: 7. Bayesian  phylogenetic  analysis using  MrBAYES

19

7.6 Summarizing the results

Stationary phase of the chain/ Adequate sample> To compute an estimate of the marginal posterior distribution> Summarized using statistics* Bayesian statisticians : 95% credibility interval.

The posterior distribution on topology and branch lengths is more difficult to summarize efficiently.

* To illustrate the topological variance in posterior-> Estimated number of topologies in various credible sets.

* To give the frequencies of the most common splits=> A majority rule consensus tree

*The sampled branch lengths are even more difficult to summarize adequately.

ÞTo display the distribution of branch length values separatelyÞTo pool the branch length samples that correspond to the same split

Page 20: 7. Bayesian  phylogenetic  analysis using  MrBAYES

20

7.7 An introduction to phylogenetic modelsPhylogenetic model:

1) A Tree model • Unrooted / rooted model, • Strict / relaxed clock tree model

2) A substitution modelThe substitution model, Q matrices

The general time-reversible(GTR) model

* Factor πi: corresponds to the stationary state frequency of the receiving state* Factor rij,: determines the intensity of the exchange between pairs of states, controlling for the stationary state frequencies

Page 21: 7. Bayesian  phylogenetic  analysis using  MrBAYES

21

7.8 Bayesian model choice and model averaging

Prior Bayes factor

The probability of the data given the chosen model after we have integrated out all parameters: normalizing constant ( model likelihood)

* Bayes’ theorem

* Bayes factor comparisons are truly flexible. - Unlike likelihood ratio tests, No requirement for the models to be nexted- Unlike Akaike Information Criterion, Bayesian Information Criterion(confusingly named) no need to correct for the number of parameters in the model.

To estimate the model likelihood-> Use harmonic means in the MCMC run.

Page 22: 7. Bayesian  phylogenetic  analysis using  MrBAYES

22

7.9 Prior probability distributions – cautionary notes.

The priors : negligible influence on the posterior distribution

The Bayesian approach typically handles weak data quite well.

But when the data are weak, Extremely low likelihoods that attract the chain. ;;;

Page 23: 7. Bayesian  phylogenetic  analysis using  MrBAYES

23

Schematic overview of the models implemented in MrBayes3. Each box gives the available settings in normal font and then the program commands and coommand options needed to invoke those settings in italics

Page 24: 7. Bayesian  phylogenetic  analysis using  MrBAYES

24

PRACTICE

7.10 Introduction to Mrbayes7.10.1 Acquiring and installing the program7.10.2 Getting started7.10.3 Changing the size of the Mrbayes window7.10.4 Getting help

Page 25: 7. Bayesian  phylogenetic  analysis using  MrBAYES

25

PRACTICE

7.11 A simple analysis7.11.1 Quick start version7.11.2 Getting data into Mrbayes7.11.3 Specifying a model7.11.4 Setting the priors7.11.5 Checking the model7.11.6 Setting up the analysis7.11.7 Running the analysis7.11.8 When to stop the analysis7.11.9 Summarizing samples of substitution model parameters7.11.10 Summarizing samples of trees and branch lengths

Page 26: 7. Bayesian  phylogenetic  analysis using  MrBAYES

26

PRACTICE

7.12 Analyzing a partitioned data set simple analysis7.12.1 Getting mixed data into Mrbayes7.12.2 Dividing the data into partitions7.12.3 Specifying a partitioned model7.12.4 Running the analysis7.12.5 Some practical advice