there are two principal philosophies in statistical data...

50
There are two principal philosophies in statistical data analysis: The classical or frequentist and the Bayesian. The frequentist defines the probability of an event as the expected frequency of occurrence of that event in repeated draws from a real or imaginary population. The performance of an inference procedure is judged by its properties in repeated sampling from the data- generating model, with the parameters fixed . Important concepts include bias and variance of an estimator, confidence intervals, and p values. Yang (2006) Computational Molecular Evolution

Upload: others

Post on 22-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

There are two principal philosophies in statistical data

analysis:The classical or frequentist and the Bayesian.

The frequentist defines the probability of an event as the expected frequency of occurrence of that event in

repeated draws from a real or imaginary population.

The performance of an inference procedure is judged by its properties in repeated sampling from the data-

generating model, with the parameters fixed.

Important concepts include bias and variance of an

estimator, confidence intervals, and p values.

Yang (2006) Computational Molecular Evolution

Page 2: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

There are two principal philosophies in statistical data

analysis:The classical or frequentist and the Bayesian.

Bayesian statistics is not mentioned in most biostatistics course.

The key feature of Bayesian methods is the notion of a

probability distribution for the parameter.

( )( ) ( )

( )BP

ABPAPBAP

×=

( ) ( ) ( ) ( )ABPAPBAPBP ×=×

Yang (2006) Computational Molecular Evolution

Page 3: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

The key feature of Bayesian methods is the notion of a

probability distribution for the parameter.

Here probability cannot be interpreted as the frequency

in random draws from a population but instead is used to represent uncertainty about the parameter.

In classical statistics, parameters are unknown

constants and cannot have distributions.

Bayesian proponents argue that since the value of the parameter is unknown, it is sensible to specify a

probability distribution to describe its possible values.

Yang (2006) Computational Molecular Evolution

Page 4: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

The distribution of the parameter before the data are

analyzed is called the prior distribution.

This can be specified by using either an objective

assessment of prior evidence concerning the parameter or the researcher’s subjective opinion.

The Bayes theorem is then used to calculate the

posterior distribution of the parameter, that is, the conditional distribution of the parameter given the data.

All inferences about the parameter are based on the posterior.

Yang (2006) Computational Molecular Evolution

Page 5: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Suppose the occurrence of a certain B may depend on

whether another event A has occurred. Then the probability that B occurs is given by the law of total

probabilities

Here Ā stands for ‘non A’ or ‘A does not occur’.

Bayes’s theorem, also known as the inverse-probability theorem, gives the conditional probability that B occurs

given that A occurs.

( ) ( ) ( ) ( ) ( )ABPAPABPAPBP ×+×=

( ) ( ) ( )( )

( ) ( )( ) ( ) ( ) ( )ABPAPABPAP

ABPAP

BP

ABPAPBAP

×+×

×=

×=

Yang (2006) Computational Molecular Evolution

Page 6: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example. (False positives of a clinical test)

Suppose a new test has been developed to screen for

an infection in the population. If a person has the infection, the test accurately reports a positive 99% of

the time, and if a person does not have the infection, the test falsely reports a positive only 2% of the time.

Suppose that 0.1% of the population have the

infection. What is the probability that a person who has tested positive actually has the infection?

Yang (2006) Computational Molecular Evolution

Page 7: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example. (False positives of a clinical test)

Suppose a new test has been developed to screen for

an infection in the population. If a person has the infection, the test accurately reports a positive 99% of

the time, and if a person does not have the infection, the test falsely reports a positive only 2% of the time.

Suppose that 0.1% of the population have the

infection. What is the probability that a person who has tested positive actually has the infection?

Let A be the event that a person has the infection,

and Ā no infection. Let B stand for test-positive. Then

P(A) = P(Ā) =P(B|A) = P(B|Ā) =

Yang (2006) Computational Molecular Evolution

Page 8: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example. (False positives of a clinical test)

Suppose a new test has been developed to screen for

an infection in the population. If a person has the infection, the test accurately reports a positive 99% of

the time, and if a person does not have the infection, the test falsely reports a positive only 2% of the time.

Suppose that 0.1% of the population have the

infection. What is the probability that a person who has tested positive actually has the infection?

Let A be the event that a person has the infection,

and Ā no infection. Let B stand for test-positive. Then

P(A) =0.001 P(Ā) = 0.999P(B|A) = 0.99 P(B|Ā) = 0.02

Yang (2006) Computational Molecular Evolution

Page 9: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example. (False positives of a clinical test)

Let A be the event that a person has the infection,

and Ā no infection. Let B stand for test-positive. ThenP(A) =0.001 P(Ā) = 0.999

P(B|A) = 0.99 P(B|Ā) = 0.02

The probability that a random person from the

population tests positive is

( ) 02097.002.0999.099.0001.0 =×+×=BP

( ) ( ) ( ) ( ) ( )ABPAPABPAPBP ×+×=

This is close to the proportion among the noninfected

individuals of the population.

Yang (2006) Computational Molecular Evolution

Page 10: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example. (False positives of a clinical test)

Let A be the event that a person has the infection,

and Ā no infection. Let B stand for test-positive. ThenP(A) =0.001 P(Ā) = 0.999

P(B|A) = 0.99 P(B|Ā) = 0.02

The probability that a random person from the

population tests positive is

The probability that a person who has tested positive

has the infection is

( ) 02097.002.0999.099.0001.0 =×+×=BP

( ) ( ) ( )( )

0472.002097.0

99.0001.0=

×=

×=

BP

ABPAPBAP ?

Yang (2006) Computational Molecular Evolution

Page 11: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

When Bayes’s theorem is used in Bayesian statistics, A

and Ā correspond to different hypotheses H1 and H2, while B corresponds to the observed data (X).

Bayes’s theorem then specifies the conditional probability of hypothesis H1 given the data as

( ) ( ) ( )( )

( ) ( )( ) ( ) ( ) ( )

2211

1111

1HXPHPHXPHP

HXPHP

XP

HXPHPXHP

×+×

×=

×=

( ) ( )XHPXHP12

1−= ?

Yang (2006) Computational Molecular Evolution

Page 12: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

When Bayes’s theorem is used in Bayesian statistics, A

and Ā correspond to different hypotheses H1 and H2, while B corresponds to the observed data (X).

Bayes’s theorem then specifies the conditional probability of hypothesis H1 given the data as

Here P(H1) and P(H2) are called prior probabilities

They are probabilities assigned to the hypotheses

before the data are observed or analyzed

( ) ( ) ( )( )

( ) ( )( ) ( ) ( ) ( )

2211

1111

1HXPHPHXPHP

HXPHP

XP

HXPHPXHP

×+×

×=

×=

Yang (2006) Computational Molecular Evolution

Page 13: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

When Bayes’s theorem is used in Bayesian statistics, A

and Ā correspond to different hypotheses H1 and H2, while B corresponds to the observed data (X).

Bayes’s theorem then specifies the conditional probability of hypothesis H1 given the data as

Here P(H1) and P(H2) are called prior probabilities

The conditional probabilities P(H1|X) and P(H2|X) are called posterior probabilities

P(X|H1) and P(X|H2) are the likelihood under each hypothesis

( ) ( ) ( )( )

( ) ( )( ) ( ) ( ) ( )

2211

1111

1HXPHPHXPHP

HXPHP

XP

HXPHPXHP

×+×

×=

×=

Yang (2006) Computational Molecular Evolution

Page 14: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Note that in the disease testing example discussed

above, P(A) and P(Ā) are frequencies of infected and noninfected individuals in the population.

There is no controversy concerning the use of

Bayes’s theorem in such problems.

However, in Bayesian statistics, the prior probabilities

P(H1) and P(H2) often do not have such a frequentistinterpretation. The use of Bayes’s theorem in such a

context is controversial.

Yang (2006) Computational Molecular Evolution

Page 15: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

When the hypothesis concerns unknown continuous

parameters, probability densities are used instead of probabilities.

Bayes’s theorem then takes the following form:

Here f(θ) is the prior distribution

f(θ|X) is the posterior distribution

f(X|θ) is the likelihood (the probability of data X given parameter θ)The marginal probability of the data, f(X), is a

normalizing constant, to make f(θ|X) integrate to 1.

( )( ) ( )

( )( ) ( )

( ) ( ) θθθ

θθθθθ

dXff

Xff

Xf

XffXf

∫==

Yang (2006) Computational Molecular Evolution

Page 16: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

An important strength of the Bayesian approach is that

it provides a natural way of dealing with nuisance parameters through integration or marginalization.

Let θ = (λ, η), with λ to be the parameters of interest and the η nuisance parameters. The joint posterior

density of λ and η is:

From which the (marginal) posterior density of λ can be obtained as

( )( ) ( )

( )( ) ( )

( ) ( ) ηληληλ

ηληληληληλ

ddXff

Xff

Xf

XffXf

∫==

,,

,,,,,

( ) ( ) ηηλλ dXfXf ∫= ,

Yang (2006) Computational Molecular Evolution

Page 17: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example:

Consider the use of the JC69 model to estimate the

distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.

The data are summarized as x = 90 differences out of n

= 948 sites.

To apply the Bayesian approach, one has to specify a prior. Uniform priors are commonly used in Bayesian

analysis. In this case, one could specify, say, U(0,100),

with a large upper bound. However, the uniform prior is not very reasonable since sequence distances

estimated from real data are often small (say, < 1).

Yang (2006) Computational Molecular Evolution

Page 18: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example:

Consider the use of the JC69 model to estimate the

distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.

The data are summarized as x = 90 differences out of n

= 948 sites.

We instead use an exponential prior

with mean µ = 0.2.

The posterior distribution of θ is:

( ) µθ

µθ −= ef

1

( )( ) ( )

( )( ) ( )

( ) ( ) θθθ

θθθθθ

dxff

xff

xf

xffxf

∫==

Yang (2006) Computational Molecular Evolution

Page 19: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example:

Consider the use of the JC69 model to estimate the

distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.

The data are summarized as x = 90 differences out of n

= 948 sites.

We consider the data to have a binomial distribution,

with probability for a difference and

1 – p for an identity. The likelihood is thus:

34

4

3

4

3 θ−−= ep

( ) ( )xnx

xnxeeppxf

−−−

+

−=−= 3434

4

3

4

1

4

3

4

31

θθθ

Yang (2006) Computational Molecular Evolution

Page 20: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example:

Consider the use of the JC69 model to estimate the

distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.

The data are summarized as x = 90 differences out of n

= 948 sites.

( ) ( )xnx

xnxeeppxf

−−−

+

−=−= 3434

4

3

4

1

4

3

4

31

θθθ

( )( ) ( )

( )( ) ( )

( ) ( ) θθθ

θθθθθ

dxff

xff

xf

xffxf

∫==

( ) 2.0

2.0

1 θθ −= ef ( ) ( ) ( ) 1311016776.5

−×== ∫ θθθ dxffxf

Yang (2006) Computational Molecular Evolution

Page 21: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Example:

Consider the use of the JC69 model to estimate the

distance θ between the human and orangutan 12s rRNA genes from the mitochondrial genome.

The data are summarized as x = 90 differences out of n

= 948 sites.

The mean of the posterior distribution is found by

numerical integration to be

Which is very similar to the MLE of = 0.1015,despite their different interpretations.

( )( ) ( )

( )( ) ( )

( ) ( ) θθθ

θθθθθ

dxff

xff

xf

xffxf

∫==

( ) ( ) 10213.0== ∫ θθθθ dxfxE

θ̂

Yang (2006) Computational Molecular Evolution

Page 22: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of frequentist statistics:

A major Bayesian criticism of classical statistics is that

it does not answer the right question.

Classical methods provide probability statements about

the data or the method for analyzing the data, but not about the parameter, even though the data have

already been observed and our interest is in the parameter.

Yang (2006) Computational Molecular Evolution

Page 23: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of frequentist statistics:

Suppose x = 9 heads and r = 3 tails are observed in n =

12 independent tosses of a coin, and we wish to test

the null hypothesis H0: θ = ½ against the alternative H1: θ > ½, where θ is the true probability of heads.

Suppose the number of tosses n is fixed, so that x has

a binomial distribution, with probability

The probability of the observed data x = 9, which is

0.05371, is not the p value. The p value is 0.075.

( ) ( ) xnx

x

nxf

−−

= θθθ 1

Yang (2006) Computational Molecular Evolution

Page 24: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of frequentist statistics:

The p values are also criticized for violating the likelihood

principle, which says that the likelihood function contains

all information in the data about θ and the same inference should be made from two experiments that have the same

likelihood.

Considering a different experimental design, in which the number of tails r was fixed beforehand; in other words, the

coin was tossed until r = 3 tails were observed, at which

point x = 9 heads were observed.The data x then have a negative binomial distribution, with

probability ( ) ( ) xnx

x

xrxf

−−

−+= θθθ 1

1

Yang (2006) Computational Molecular Evolution

Page 25: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of frequentist statistics:

The p values are also criticized for violating the

likelihood principle, which says that the likelihood

function contains all information in the data about θ and the same inference should be made from two

experiments that have the same likelihood.

If we use this model, the p value becomes

0325.02

1

2

113

9

3

=

−+=∑

=j

j

j

jp

( ) ( ) xnx

x

xrxf

−−

−+= θθθ 1

1

Yang (2006) Computational Molecular Evolution

Page 26: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of Bayesian methods:

All criticisms of Bayesian methods are levied on the

prior or the need for it

Bayesians come into two flavors:

the objective and the subjective

Yang (2006) Computational Molecular Evolution

Page 27: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of Bayesian methods:

All criticisms of Bayesian methods are levied on the

prior or the need for it

Bayesians come into two flavors:

the objective and the subjective

The objective Bayesians consider the prior to be a representation of prior objective information about the

parameter.

The approach runs into trouble when no prior

information is available about the parameter and the prior is supposed to represent total ignorance.

Yang (2006) Computational Molecular Evolution

Page 28: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of Bayesian methods:

All criticisms of Bayesian methods are levied on the

prior or the need for it

Bayesians come into two flavors:

the objective and the subjective

For a continuous parameter, a uniform distribution over the range of the parameter might be assigned.

However, such so-called flat or noninformative priors

lead to contradictions.

Yang (2006) Computational Molecular Evolution

Page 29: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of Bayesian methods:

All criticisms of Bayesian methods are levied on the

prior or the need for it

Bayesians come into two flavors:

the objective and the subjective

If x has a uniform distribution, x2 cannot have a uniform distribution.

Similarly, a uniform prior for the probability of different sites p is very different from a uniform prior for

sequence distance θ under the JC69 model.

Yang (2006) Computational Molecular Evolution

Page 30: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of Bayesian methods:

All criticisms of Bayesian methods are levied on the

prior or the need for it

Bayesians come into two flavors:

the objective and the subjective

Such difficulties in representing total ignorance have

caused the objective Bayesian approach to fall out of

favor.

Yang (2006) Computational Molecular Evolution

Page 31: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of Bayesian methods:

All criticisms of Bayesian methods are levied on the

prior or the need for it

Bayesians come into two flavors:

the objective and the subjective

The subjective Bayesians consider the prior to represent the researcher’s subjective belief about the

parameter before seeing or analyzing the data.

Yang (2006) Computational Molecular Evolution

Page 32: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of Bayesian methods:

All criticisms of Bayesian methods are levied on the

prior or the need for it

Bayesians come into two flavors:

the objective and the subjective

One cannot really argue against somebody else’s subjective beliefs, but ‘classical’ statisticians reject the

notion of subjective probabilities and of letting personal

prejudices influence scientific inference.

Yang (2006) Computational Molecular Evolution

Page 33: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Criticisms of Bayesian methods:

All criticisms of Bayesian methods are levied on the

prior or the need for it

Bayesians come into two flavors:

the objective and the subjective

Even though the choice of the likelihood model involves certain subjectivity as well, the model can nevertheless

be checked against the data, but no such validation is

possible for the prior.

Yang (2006) Computational Molecular Evolution

Page 34: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian phylogenetics

It is straightforward to formulate the problem of

phylogeny reconstruction in the general framework of

Bayesian inference.

Let X be the sequence data. Let θ include all parameters in the substitution model, with a prior

distribution f(θ). Let τi be the ith tree topology, i = 1, 2,…,Ts, where Ts is the total number of tree topologies

for s species.

Usually a uniform prior f(τi) = 1/Ts is assumed, although this means nonuniform prior probabilities for clades.

Yang (2006) Computational Molecular Evolution

Page 35: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian phylogenetics

Let X be the sequence data. Let θ include all parameters in the substitution model, with a prior

distribution f(θ). Let τi be the ith tree topology, i = 1, 2,…,Ts, where Ts is the total number of tree topologies

for s species.

Let bi be the vector of branch lengths on tree τi, with prior probability f(bi). MrBayes 3 assumes that branch

lengths have independent uniform or exponential priors

with the parameter (upper bound for the uniform or mean for the exponential) set by the user.

Yang (2006) Computational Molecular Evolution

Page 36: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian phylogenetics

Let X be the sequence data. Let θ include all parameters in the substitution model, with a prior

distribution f(θ). Let τi be the ith tree topology, i = 1, 2,…,Ts, where Ts is the total number of tree topologies

for s species.

Let bi be the vector of branch lengths on tree τi, with prior probability f(bi).

( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( )∑∫∫

∫∫

=

=sT

j

jjjjjj

iiiiii

i

ddbbXfbfff

ddbbXfbfffXP

1

,,,

,,,

θτθτθθτθ

θτθτθθτθτ

Yang (2006) Computational Molecular Evolution

Page 37: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian phylogenetics

This is a direct application of ,

treating τ as the parameter of interested and all other parameters as nuisance parameters.

Note that the dominator, the marginal probability of the

data f(X), is a sum over all possible tree topologies and,

for each tree topology τj, an integral over all branch

lengths bj and substitution parameters θ.

( )( ) ( ) ( ) ( )

( ) ( ) ( ) ( )∑∫∫

∫∫

=

=sT

j

jjjjjj

iiiiii

i

ddbbXfbfff

ddbbXfbfffXP

1

,,,

,,,

θτθτθθτθ

θτθτθθτθτ

( ) ( ) ηηλλ dXfXf ∫= ,

Yang (2006) Computational Molecular Evolution

Page 38: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

In terms of computational efficiency, stochastic tree

using the program MrBayes appears to be more

efficient than heuristic tree search under likelihood using PAUP program.

Yang (2006) Computational Molecular Evolution

Page 39: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

Nevertheless, the running time of the MCMC algorithm

is proportional to the number of iterations the algorithm

is run for. In general, longer chains are needed to achieve convergence in larger data sets due to the

increased number of parameters to be averaged over.

However, many users run shorted chains for larger data sets because larger trees require more computation per

iteration. As a result, it is not always clear whether the

MCMC algorithm has converged in analyses of large data sets.

Yang (2006) Computational Molecular Evolution

Page 40: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

Furthermore, significant improvements to heuristic tree

search under likelihood are being made.

It seems that for obtaining a point estimate, likelihood

heuristic search using numerical optimization can be faster than Bayesian stochastic search using MCMC.

Yang (2006) Computational Molecular Evolution

Page 41: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

However, no one knows how to use the information in

the likelihood tree search to attach a confidence interval

or some other measure of the sampling error in the ML tree.

As a result, one must currently resort to bootstrapping.

Bootstrapping under likelihood is an expensive

procedure, and appears slower than Bayesian MCMC.

Yang (2006) Computational Molecular Evolution

Page 42: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

To many, Bayesian inference of molecular phylogenies

enjoys a theoretical advantage over ML with

bootstrapping.

Posterior probability for a tree or clade has an easy interpretation: it is the probability that the tree or clade

is correct given that data, model, and prior.

In contrast, the interpretation of the bootstrap in

phylogenetics has been controversial.

Yang (2006) Computational Molecular Evolution

Page 43: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

It has been noted that Bayesian posterior probabilities

calculated from real data sets are often extremely high.

One may observe that while bootstrap support values

are published only if they are > 50% (as otherwise the relationships may not be considered trustworthy),

posterior clade probabilities are sometimes reported only if they are < 100% (as most of them are 100%!).

Yang (2006) Computational Molecular Evolution

Page 44: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

The difference between the two measures of support

does not itself suggest anything inappropriate about the

Bayesian probabilities, especially given the difficulties in the interpretation of the bootstrap.

However, it has been observed that different models

may produce conflicting trees when applied to the same data, each with high posterior probabilities.

Similarly different genes for the same set of species can produce conflicting trees or clades, again each with

high posterior probabilities.

Yang (2006) Computational Molecular Evolution

Page 45: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

Bayesian posterior probability for a tree or clade is the

probability that the tree or clade is true given the data,

the likelihood model and the prior.

Thus there can be only three possible reasons for spuriously high clade probabilities:

1.Computer program bugs or problems in running the MCMC algorithms

2.Misspecification of the likelihood (substitution) model

3.Misspecification and sensitivity of the prior

Yang (2006) Computational Molecular Evolution

Page 46: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

Note that high posterior probabilities were observed in

simulated data sets where the substitution model is

correct and in analyses of small data sets that did not use MCMC.

In those cases, the first two factors do not apply.

The third factor, the sensitivity of Bayesian inference to

prior specification, is more fundamental and difficult to

deal with.

Yang (2006) Computational Molecular Evolution

Page 47: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

Assume independent exponential priors with means µ0

and µ1 for internal and external branch lengths, respectively.

The posterior probabilities of trees might be unduly

influenced by the prior mean µ0 on the internal branch lengths.

It is easy to see that high posterior probabilities for

trees will decrease if µ0 is small; if µ0 = 0, all trees and clades will have posterior probabilities near zero.

Yang (2006) Computational Molecular Evolution

Page 48: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

It was observed that in large data sets, the posterior

clade probabilities are sensitive to µ0 only if µ0 is very small.

In an analysis of 40 land plant species, the sensitive region was found to be (10-5, 10-3). Such branch

lengths seem unrealistically small if we consider estimated internal branch lengths in published trees.

Yang (2006) Computational Molecular Evolution

Page 49: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

However, branch lengths in wrong or poorly supported

trees are typically small and often zero.

As the prior is specified to represent our prior

knowledge of internal branch lengths in all binary trees, the majority of which are wrong or poor trees, a very

small µ0 appears necessary.

Yang (2006) Computational Molecular Evolution

Page 50: There are two principal philosophies in statistical data ...ocw.nctu.edu.tw/course/molecular_evolution/W17.pdf · There are two principal philosophies in statistical data analysis:

國立交通大學國立交通大學國立交通大學國立交通大學 生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所生物資訊及系統生物研究所 林勇欣老師林勇欣老師林勇欣老師林勇欣老師

Bayesian versus Likelihood

While posterior clade probabilities are sensitive to the

mean of the prior for internal branch lengths, it is in

general unclear how to formulate sensible priors that are acceptable to most biologists.

The problem merits further investigation.

Yang (2006) Computational Molecular Evolution