Bayesian and Frequentist Issues
in Modern Inference
Bradley Efron
Stanford University
Small Data
Classical statistics Direct tests and estimates of individual
parameters within well-defined models (MLE, Neyman–Pearson)
Not much:
I data-based model selection
I Bayesian combination of related problems
Today Methodology (not Philosophy)
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 2 / 32
Bayesian Inference
Parameter: µ ∈ Ω
Observed data: x
Prior: g(µ)
Probability distributions:fµ(x), µ ∈ Ω
Parameter of interest: θ = t(µ)
Eθ|x =
∫Ω
t(µ)fµ(x)g(µ) dµ/∫
Ω
fµ(x)g(µ) dµ
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 3 / 32
Jeffreysonian Bayes Inference“Uninformative Priors”
What if we don’t know prior g?
Jeffreys: g(µ) =∣∣∣I(µ)
∣∣∣1/2 where I(µ) = cov∇µ log fµ(x)
(the Fisher information matrix)
Can still use Bayes theorem but how accurate are the estimates?
Frequentist variability of Et(µ)|x
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 4 / 32
General Accuracy Formula
µ and x ∈ Rp
Vµ = covµ(x)
αx(µ) = ∇x log fµ(x) =(. . . ,
∂ log fµ(x)
∂xi, . . .
)T
LemmaE = E
t(µ)|x
has gradient ∇xE = cov
t(µ), αx(µ)|x
.
TheoremThe delta-method standard deviation of E is
sd(E) =[cov
t(µ), αx(µ)|x
T Vx covt(µ), αx(µ)|x
]1/2.
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 5 / 32
Implementation
Posterior sample from µ|x
µ1, µ2, . . . , µB (MCMC?)
Each µi gives ti = t(µi) and
αi = αx(µi)
E =∑
ti/B Et(µ)|x
cov =∑B
i=1 (αi − α)(ti − t
) /B
sd =[covT Vx cov
]1/2
No additional sampling for cov
−4 −2 0 2
02
46
810
12
(mu[i], alpha[i], t[i])
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 6 / 32
Diabetes DataEfron et al. (2004), “LARS”
n = 442 subjects
p = 10 predictors: age, sex, bmi, glu,. . .
Response: y = disease progression at one year
Model: yn×1
= Xn×p
βp×1
+ en×1
[e ∼ Nn(0, I)]
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 7 / 32
Bayesian LassoPark and Casella (2008)
Model: y ∼ Nn(Xβ, I)
Prior: g(β) = e−γL1(β) [γ = 0.37]
Then posterior mode at Lasso βγ
Subject 125: θ125 = xT125β
How accurate are Bayes posterior inferences for θ125?
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 8 / 32
Bayesian Analysis
MCMC: posterior sampleβi for i = 1,2, . . . ,10,000
Gives
θ125,i = xT
125βi , i = 1,2, . . . ,10,000
θ125,i ∼ 0.248 ± 0.072
General accuracy formula frequentist sd 0.071 for E = 0.248[αx(µ) = XT Xβ
]
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 9 / 32
Posterior CDF for Subject 125
cdfy(c) = Prθ125 ≤ c |y
si =
1 as ti ≤ c
0 as ti > c
cdfy(c) =∑B
1 si/B
For c = 0.3:
cdfy(c)= 0.762 ± 0.304
Bayes frequentistestimate sd
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 10 / 32
0.1 0.2 0.3 0.4
0.0
0.2
0.4
0.6
0.8
1.0
Posterior cdf for mu125, Diabetes data, 10000 MCMC draws,Prior exp−.37*L1(beta); verts are +− One Frequentist Standard Dev
Upper 95% credible limit is .342 +− .069c value
Pro
bm
u125
< c
| da
ta
][
.342
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 11 / 32
Exponential Families
fα(β)
= eαT β−ψ(α)f0
(β), with α, β in Rp
Natural parameter α, sufficient statistic β, expectation β = Eαβ[
Poisson : fµ(x) = e−µµx/x! : x = β, µ = β, α = log(µ)]
General accuracy formula For E = Et(β)|β
,
sd(E) =cov
(t , α
∣∣∣β)TVα cov
(t , α
∣∣∣β)1/2
with Vα = covα=α
(β).
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 12 / 32
Better Frequentist Inferences
For E = Et(β)
∣∣∣β in exfam fα(β)
= eαT β−ψ(α)f0
(β)
Parametric bootstrap fα(·)→[β∗1, β
∗
2, . . . , β∗
j , . . . , β∗
J
]−→
[· · ·E∗j = E
t(β)
∣∣∣∣β∗j · · · ] −→ bootstrap conf int for E
Trouble Need new MCMC sample for each β∗j
Shortcut Reweight original MCMC sample (importance
sampling)
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 13 / 32
Digression: Posterior Exponential Family
fα(β)
= eαT β−ψ(α)f0
(β): natural param α, suff stat β, “carrier” f0
(β)
Posterior exponential family
g(α|suff stat b) = e(b−β)Tα−φ(b)g
(α∣∣∣β)
natural param b, suff stat α, carrier g(α∣∣∣β)
Importance sampling Reweight the original MCMC realizations:
Eti |b =
B∑i=1
tiWi(b)
/ B∑i=1
Wi(b)[Wi(b) = e(b−β)
Tαi
]
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 14 / 32
Bootstrap Intervals Without BootstrappingDiCiccio and Efron (1992)
“abc” Investigate Et |b for b near β
Requires p + 2 numerical 2nd derivatives of E function
Next: Applied to posterior cdf for mu125
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 15 / 32
0.1 0.2 0.3 0.4
0.0
0.2
0.4
0.6
0.8
1.0
Heavy curve is posterior cdf for mu125, diabetes data.Vertical bars frequentist central 68% abc confidence intervals.
(Light lines show +− one frequentist standard error)
c value
Pro
bm
u125
< c
| da
ta
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 16 / 32
Estimation After Model Selection
Usually:
(a) look at data
(b) choose model (linear, quad, cubic . . . ?)
(c) fit estimates using chosen model
(d) analyze as if pre-chosen
Today Include model selection process in the analysis
Question Effects on standard errors, confidence intervals, etc.?
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 17 / 32
Cholesterol Data
n = 164 men took Cholestyramine for ∼ 7 years
x = compliance measure (adjusted: x ∼ N(0,1))
y = cholesterol decrease
Wish to estimate regression values
µj = Ey |x = xj for j = 1,2, . . . ,164
µ = (µ1, µ2, . . . , µ164)T
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 18 / 32
−2 −1 0 1 2
050
100
Cholesterol data, n=164 subjects: cholesterol decrease plottedversus adjusted compliance; Green curve is OLS cubic regression;
Red points indicate 5 featured subjects
compliance
chol
este
rol d
ecre
ase
1
2
3
4
5
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 19 / 32
Cp Selection Criterion
Regression model yn×1
= Xn×m
βm×1
+ en×1
[ei ∼ (0, σ2)
]Cp criterion
∥∥∥y − X β∥∥∥2
+ 2mσ2
β = OLS estimate, m = “degrees of freedom”
Model selection From possible models X1,X2,X3, . . .
choose the one minimizing Cp.
Then use OLS estimate from chosen model.
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 20 / 32
Cp for Cholesterol Data
Model df Cp − 80000 (Boot %)
M1 (linear) 2 1132 (19%)
M2 (quad) 3 1412 (12%)
M3 (cubic) 4 667 (34%)
M4 (quartic) 5 1591 (8%)
M5 (quintic) 6 1811 (21%)
M6 (sextic) 7 2758 (6%)
(σ = 22 from “full model”M6)
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 21 / 32
Nonparametric Bootstrap Analysis
data = (xi , yi), i = 1,2, . . . ,n = 164 gave original estimate
µ = X3β3
Bootstrap data set: data∗ =(xj , yj)
∗, j = 1,2, . . . ,n
where
(xj , yj)∗ drawn randomly and with replacement from data:
data∗ −→Cp
m∗ −→OLS
β∗m∗ −→ µ∗ = Xm∗ β∗
m∗
I did this all B = 4000 times.
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 22 / 32
B=4000 nonparametric bootstrap replications for the model−selectedregression estimate of Subject 1; boot (m,stdev)=(−2.63,8.02);
76% of the replications less than original estimate 2.71
Red triangles are 2.5th and 97.5th boot percentilesbootstrap estimates for subject 1
Fre
quen
cy
−30 −20 −10 0 10 20
050
100
150
200
250
^ ^2.71
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 23 / 32
−3 −2 −1 0 1 2 3
−3
−2
−1
01
23
Smooth Estimation Model: ' y ' is observed data; Ellipses indicate bootstrap distribution for ' y* ';
Red curves level surfaces of equal estimation for thetahat=t(y)
thetahat=t(y)
y
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 24 / 32
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 25 / 32
1 2 3 4 5 6
−40
−30
−20
−10
010
2030
Boxplot of Cp boot estimates for Subject 1; B=4000 bootreps;Red bars indicate selection proportions for Models 1−6
only 1/3 of the bootstrap replications chose Model 3selected model
subj
ect 1
est
imat
es
Model3
*********
MODEL 3
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 26 / 32
Bootstrap Smoothing
Idea Replace original estimator t(y) with bootstrap average
s(y) =
B∑i=1
t(y∗i
) /B
Model averaging
Same as bagging (“bootstrap aggregation,” Breiman)
Removes discontinuities, reduces variance
Approximate confidence interval: s(y) ± 1.96 · sd
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 27 / 32
Accuracy Theorem
Notation s0 = s(y), t ∗i = t(y∗i ), i = 1,2, . . .B
Y ∗ij = # of times jth data point appears in ith boot sample
covj =∑B
i=1 Y ∗ij ·(t ∗i − s0
) /B
[covariance Y ∗ij with t ∗i
]TheoremThe delta method standard deviation estimate for s0 is
sd =
n∑j=1
cov2j
1/2
,
always ≤
B∑i=1
(t ∗i − s0
)2 /B
1/2
, the boot stdev for t(y).
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 28 / 32
Projection Interpretation
0
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 29 / 32
1 2 3 4 5
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Standard Deviation of smoothed estimate relative to original (Red)for five subjects; green line is stdev Naive Cubic Model
bottom numbers show original standard deviationsSubject number
Rel
ativ
e st
dev
7.9 3.9 4.1 4.7 6.8
*
* *
*
*
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 30 / 32
Model Probability Estimates
34% of the 4000 bootreps chose the cubic model
Poor man’s Bayes posterior prob for “cubic”
How accurate is that 34%?
Apply accuracy theorem to indicator function for choosing “cubic”
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 31 / 32
Model Boot % ± Standard Error
M1 (linear) 19% ±24
M2 (quad) 12% ±18
M3 (cubic) 34% ±24
M4 (quartic) 8% ±14
M5 (quintic) 21% ±27
M6 (sextic) 6% ±6
Bradley Efron (Stanford University) Bayesian and Frequentist Issues 32 / 32