a 250-year argument - stanford university
Post on 18-Apr-2022
4 Views
Preview:
TRANSCRIPT
A 250-Year Argument
Belief, Behavior, and the Bootstrap
Bradley Efron
Stanford University
The Greater World of Mathematics and Science
Mathematics
StatisticsA.I.
TerraIncognita
Ap
plied
Sciences
A 250-Year Argument 1
The Physicist’s Twins
• Sonogram “Twin boys on the way!”
• Physicist “What’s the probability my twins will be identical
rather than fraternal?”
• Doctor “One third of twins are identical.”
A 250-Year Argument 2
Bayes Rule for the Twins
• Prior odds:Pr{identical}
Pr{fraternal}=
1/32/3=
12
(past experience)
• Likelihood ratio:
Pr{same sex|identical}
Pr{same sex|fraternal}=
11/2= 2
(current evidence)
• Posterior odds:Pr{identical|same sex}
Pr{fraternal|same sex}= ? (updated beliefs)
• Bayes rule:
Posterior odds = Prior odds · Likelihood ratio =12· 2 = 1
• My answer : “50/50”
A 250-Year Argument 3
If All Twins Were Sonogrammed:
5
Identical
Twins are:
Fraternal
Same sex Different
Physicist
Sonogram shows:
Doctor
2/3
1/3
1/3
1/3 0
1/3
b a
c d
A 250-Year Argument 4
Belief and Inference
• θ: unknown state of nature (identical or fraternal?)
• π(θ): prior beliefs for θ (1/3, 2/3)
• x: current evidence (sonogram)
• fθ(x): probability model for x given θ
• Question What is π(θ|x)? (posterior beliefs given x)
A 250-Year Argument 5
Bayes Rule (1763)
• π(θ|x) = cπ(θ) · fθ(x)
↑ ↑ ↑
posterior
beliefs
prior
beliefs
likelihood
function
• “c” makes π(θ|x) sum to 1
• Likelihood function fθ(x) with x fixed, θ varying, e.g.,
fθ(x) = 1√
2πe−
12 (θ−x)2
:
2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
x theta−−><−−theta
A 250-Year Argument 6
Bayes Inference without Prior Experience
“Objective Bayes”
• “p” population proportion of identical twins [Doctor : p = 13 ]
• Principle of insufficient reason (Laplace, Bernoulli) “In the
absence of prior experience, assume p equally likely to have
any value between 0 and 1.” [opposed Venn, Keynes, Fisher]
• Invariant prior (Harold Jeffreys, 1930s):
π(p) = cp−12 (1 − p)−
12
A 250-Year Argument 7
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Possible prior densities for p, the Population prop Identical,And the corresponding predictions for the Physicist
p, the population proportion of Identical twins
prio
r de
nsity
pi(p
)
DoctorProb=.5
JeffreysProb=.67
LaplaceProb=.67
1/3●
A 250-Year Argument 8
Frequentist Statistics (Behaviorism)
• θ = unknown parameter, x = observed data,
fθ(x) probability model (but no prior beliefs π(θ))
• “t(x)” some statistical procedure
(test, estimate, confidence interval, . . . )
• Inference based on behavior of t(x) in repeated use
• Optimality find best t(x)(R.A. Fisher, 1920s; J. Neyman, 1930s)
A 250-Year Argument 9
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0 20 40 60
3040
5060
70
Scores of 22 students on two tests 'mechanics' and 'vectors';Sample Correlation Coefficient is .498 +−??
means: mec 38.9 vec 50.6mechanics score −−>
vect
ors
scor
e −
−>
39.0
50.6
A 250-Year Argument 10
Student Score Data
• n = 22 students’ scores on two tests: mechanics, vectors
• Data y = (y1, y2, . . . , y22) with yi = (meci, veci)
• Parameter of interest θ = correlation (mec, vec)
• Sample correlation coefficient θ = 0.498± ??
A 250-Year Argument 11
R. A. Fisher
• 1915 : probability density fθ(θ)
(hypergeometric series)
• 1922–30 : θ is maximum likelihood estimate (MLE)
• Frequentist optimality of MLE minimize expected squared
error E{(θ − θ
)2}
• Bivariate normal models
A 250-Year Argument 12
Jerzy Neyman (1930s)
• Optimal frequentist tests and confidence intervals
• 90% confidence interval for θ:
θ ∈ [0.164, 0.717]
• Neyman’s construction covers true θ 90% of the time, in
repeated use
A 250-Year Argument 13
−0.5 0.0 0.5 1.0
01
23
4
Neyman's 90% confidence interval for student score correlation:.164 < theta < .717
thetahat* −−>
Fis
her's
den
sity
f(th
etah
at*
| the
ta)
−−
>
.05.05
.164 .717.498
theta=.164
theta=.717
A 250-Year Argument 14
Jeffreys’ Invariant Prior
• Jeffreys’ objective (or “uninformative”) prior for correlation:
π(θ) = 1/(1 − θ2)
• General formula one over square root of Fisher’s information
bound for the variance of the MLE (transforms correctly under
change of variables)
A 250-Year Argument 15
−0.2 0.0 0.2 0.4 0.6 0.8
0.0
0.5
1.0
1.5
2.0
Bayes posterior density pi(theta | thetahat) for the 22 students;90% Credible Limits = [.164,.718]; Neyman Limits [.164,.717]
theta −−>
post
erio
r de
nsity
−−
>
5% 90% 5%
.164 .718● ●
A 250-Year Argument 16
More Students
n θ
22 .498
44 .663
66 .621
88 .553
∞ [.415, .662]
A 250-Year Argument 17
*****
**** ***
**
*
*
* *** ***
*
*** *
** **
***
*
* ****
***
** **
** *
** *
***
**
*
**
**
***
***
******
*** *
** *
*
**
***
*
**
*
***
**
***
* *
**
** **
**
**
**** *
**
*
* ** ** **
** ****
* **
*
**
*
*
**** ****
*
***
**** * **
**
** *
*** *****
** *
*
* ****
* **
**
***
**
*
*
****
* ****
****
*
** *
*** *
****** **
*
**** *** **
**** *
*
****
*** ***
*
***
***
*
****
**
****
** *
***
**
** *
*** **
*
***
***
**
***
** *
***** **
***
** * **
*
*
** **
**
*** ** ** *
** *
**** ** ** *
** * ****
*
*
*****
*
***
*
*** * *
**
** **
* ** *
****** *
***
**** **** ***
****
***
*
* *
***
* ** **
* ** **
***** *
******
**** *
*** ***
**
****
**
***
**
* *
***
*
**
***
***
*****
** ** * *
*
*
*** *** *
*
*
**** *
***
*
**
** *
* *
*****
*
**
*** * ** *
***
* ***** *
**
** *
* *** ****
**
*
****
***
*
** * ****
* **** *****
*****
* ****
***
***
* ***
*
* *
**
**
**
**
*** *
**
*
* *
**** ***
***
***
***
*****
**
*
** **
** *
***
**** *
** ****
**
* ** ** **
* ** ** **
*
*** **
****
****
****
** *****
** **
*** ***
**
** *
***
****
*** **
* *****
*
**** **
* ***
**
**
** *
***** *
* * **
******
***
* **** **
****
***
** *
***
* *
**
** *
**
* * **
* **
* *****
* *
** ***
**
*** ****
***
* *
****
*
**** ** *
*
*
* ** ****
***
* *
* * ***
** * **
****** *
*****
**
*** ***
**
****
*
***
**
**
** **** *
**
** *
**
*
***
**
*
****
64 66 68 70 72 74
6065
7075
Galton's 1886 distribution of child's height vs parents';Ellipses are contours of best fit bivariate normal density;
Red dot at bivariate average (68.3, 68.1)
parents' height
child
's h
eigh
t
●
68.3
68.1
A 250-Year Argument 18
Bivariate Normal Distribution
• “y ∼ N2(µ,Σ)” (y, µ ∈ R2, Σ 2 × 2 pos def):
fµ,Σ(y) =1
2π|Σ|−
12 e−
12 (y−µ)tΣ−1(y−µ)
• µ center of ellipse, Σ their shape
• 5 parameters: 2 means, 2 variances, 1 correlation
A 250-Year Argument 19
A More Difficult Problem
• θ = “eigenratio” =λ1
λ1 + λ2(λ1 > λ2 eigenvalues Σ)
• Student score data y (22 × 2) gives MLEs µ, Σ, and
θ = 0.793±?
• Not true: fµ,Σ(θ)
depends only on θ
• There are 4 “nuisance parameters”
A 250-Year Argument 20
0.5 0.6 0.7 0.8 0.9 1.0
02
46
Posterior density: eigenratio, Jeffreys prior bivariate normal; 90% credible limits [.68,.89]; Bootstrap CI [.63,.88]
Red dots are Bootstrap 90% confidence limitseigenratio−−>
post
erio
r de
nsity
−−
>
● ●
A 250-Year Argument 21
Bootstrap Methods (Automatic Frequentist Inference)
• Original data yi ∼ N2(µ,Σ), i = 1, 2, . . . , 22
– gives MLEs µ, Σ, and θ = 0.793
• Bootstrap data y∗i ∼ N2(µ, Σ), i = 1, 2, . . . , 22
– gives θ∗ = bootstrap eigenratio
• 10,000 θ∗s • 58% exceed θ (upward bias)
• Reweighting formula puts bigger weights on smaller θ∗s
• Confidence limits are the weighted bootstrap percentiles
A 250-Year Argument 22
10000 bootstrap eigenratio values from student score data(bivariate normal model); Red line shows confidence weights
58% of the bootstrap values exceed .793bootstrap eigenratios −−>
Fre
quen
cy
0.5 0.6 0.7 0.8 0.9 1.0
010
020
030
040
050
060
0
MLE=.793● ●
A 250-Year Argument 23
Gibbs Sampling (Automatic Bayes Inference)
• Given: prior π(θ), data x, model fθ(x)
• Approximates: π(θ|x) by Markov chain random walk
• “MCMC”, “Metropolis-Hastings”, . . . (A-Bomb?)
• Most often used with convenient “ uninformative” priors
A 250-Year Argument 24
Prostate Cancer Study
(Singh et al 2002)
• 102 men: 52 prostate cancer, 50 healthy controls
• Each man assessed for activity of 6033 genes
• Statistic xi measures differences in activity, patients minus
controls, for genei, i = 1, 2, . . . , 6033.
• Probability model
xi ∼ N(δi, 1) (normal, mean δi, variance 1)
δi the true difference or effect size
A 250-Year Argument 25
Prostate Study (Singh et al 2002): difference estimates x[i]comparing cancer patients with normal controls, 6033 genes
hash marks show 10 largest x valuesdifference estimates x[i] −−>
Fre
quen
cy
−4 −2 0 2 4
010
020
030
040
0
●
if allx[i]=0
gene 610x=5.29
A 250-Year Argument 26
Bayesian Analysis (for one gene)
• Assume δ has prior density π(δ)
• Prob model fδ(x) =1√
2πe−
12 (x−δ)2
• Marginal density m(x) =∫∞
−∞
fδ(x)π(δ) dδ
(overall density of x taking account of randomness in δ)
• Bayes posterior expectation (“Tweedie’s formula”)
E{δ|x} = x +d
dxlog m(x)
A 250-Year Argument 27
Empirical Bayes Analysis
• We don’t know prior π(δ), but histogram provides a smooth
estimate m(x) for m(x)
• Empirical Bayes estimate:
E{δi|xi} = xi +ddx
log m(x)∣∣∣∣∣xi
• Frequentist estimation of a Bayesian inference
A 250-Year Argument 28
−4 −2 0 2 4 6
−2
02
4
Empirical Bayes estimates of E{delta|x}, the expected truedifference delta[i] given the observed difference x[i]
Estimates near 0 for the 93% of genes in [−2,2]difference value x[i] −−>
E{d
elta
[i] |
x[i]}
−−
>
●
●
x[610]=5.29
estimate= 4.07
| |
A 250-Year Argument 29
Score Sheet
Bayes Frequentist
1. Belief (prior)
2. Principled
3. One distribution
4. Dynamic
5. Individual (subjective)
6. Aggressive
1. Behavior (method)
2. Opportunistic
3. Many distributions (bootstrap?)
4. Static
5. Community (objective)
6. Defensive
A 250-Year Argument 30
top related