Download - Bayesian and Frequentist Issues in Modern Inference

Bayesian and Frequentist Issues

in Modern Inference

Bradley Efron

Stanford University

Small Data

Classical statistics Direct tests and estimates of individual

parameters within well-defined models (MLE, Neyman–Pearson)

Not much:

I data-based model selection

I Bayesian combination of related problems

Today Methodology (not Philosophy)

Bradley Efron (Stanford University) Bayesian and Frequentist Issues 2 / 32

Bayesian Inference

Parameter: µ ∈ Ω

Observed data: x

Prior: g(µ)

Probability distributions:fµ(x), µ ∈ Ω

Parameter of interest: θ = t(µ)

Eθ|x =

∫Ω

t(µ)fµ(x)g(µ) dµ/∫

Ω

fµ(x)g(µ) dµ


Jeffreysonian Bayes Inference“Uninformative Priors”

What if we don’t know prior g?

Jeffreys: g(µ) =∣∣∣I(µ)

∣∣∣1/2 where I(µ) = cov∇µ log fµ(x)

(the Fisher information matrix)

Can still use Bayes theorem but how accurate are the estimates?

Frequentist variability of Et(µ)|x


General Accuracy Formula

µ and x ∈ Rp

Vµ = covµ(x)

αx(µ) = ∇x log fµ(x) =(. . . ,

∂ log fµ(x)

∂xi, . . .

)T

LemmaE = E

t(µ)|x

has gradient ∇xE = cov

t(µ), αx(µ)|x

.

TheoremThe delta-method standard deviation of E is

sd(E) =[cov

t(µ), αx(µ)|x

T Vx covt(µ), αx(µ)|x

]1/2.


Implementation

Posterior sample from µ|x

µ1, µ2, . . . , µB (MCMC?)

Each µi gives ti = t(µi) and

αi = αx(µi)

E =∑

ti/B Et(µ)|x

cov =∑B

i=1 (αi − α)(ti − t

) /B

sd =[covT Vx cov

]1/2

No additional sampling for cov

−4 −2 0 2

02

46

810

12

(mu[i], alpha[i], t[i])


Diabetes DataEfron et al. (2004), “LARS”

n = 442 subjects

p = 10 predictors: age, sex, bmi, glu,. . .

Response: y = disease progression at one year

Model: yn×1

= Xn×p

βp×1

+ en×1

[e ∼ Nn(0, I)]


Bayesian LassoPark and Casella (2008)

Model: y ∼ Nn(Xβ, I)

Prior: g(β) = e−γL1(β) [γ = 0.37]

Then posterior mode at Lasso βγ

Subject 125: θ125 = xT125β

How accurate are Bayes posterior inferences for θ125?


Bayesian Analysis

MCMC: posterior sampleβi for i = 1,2, . . . ,10,000

Gives

θ125,i = xT

125βi , i = 1,2, . . . ,10,000

θ125,i ∼ 0.248 ± 0.072

General accuracy formula frequentist sd 0.071 for E = 0.248[αx(µ) = XT Xβ

]


Posterior CDF for Subject 125

cdfy(c) = Prθ125 ≤ c |y

si =

1 as ti ≤ c

0 as ti > c

cdfy(c) =∑B

1 si/B

For c = 0.3:

cdfy(c)= 0.762 ± 0.304

Bayes frequentistestimate sd


0.1 0.2 0.3 0.4

0.0

0.2

0.4

0.6

0.8

1.0

Posterior cdf for mu125, Diabetes data, 10000 MCMC draws,Prior exp−.37*L1(beta); verts are +− One Frequentist Standard Dev

Upper 95% credible limit is .342 +− .069c value

Pro

bm

u125

< c

| da

ta

][

.342


Exponential Families

fα(β)

= eαT β−ψ(α)f0

(β), with α, β in Rp

Natural parameter α, sufficient statistic β, expectation β = Eαβ[

Poisson : fµ(x) = e−µµx/x! : x = β, µ = β, α = log(µ)]

General accuracy formula For E = Et(β)|β

,

sd(E) =cov

(t , α

∣∣∣β)TVα cov

(t , α

∣∣∣β)1/2

with Vα = covα=α

(β).


Better Frequentist Inferences

For E = Et(β)

∣∣∣β in exfam fα(β)


(β)

Parametric bootstrap fα(·)→[β∗1, β

∗

2, . . . , β∗

j , . . . , β∗

J

]−→

[· · ·E∗j = E

t(β)

∣∣∣∣β∗j · · · ] −→ bootstrap conf int for E

Trouble Need new MCMC sample for each β∗j

Shortcut Reweight original MCMC sample (importance

sampling)


Digression: Posterior Exponential Family

fα(β)


(β): natural param α, suff stat β, “carrier” f0

(β)

Posterior exponential family

g(α|suff stat b) = e(b−β)Tα−φ(b)g

(α∣∣∣β)

natural param b, suff stat α, carrier g(α∣∣∣β)

Importance sampling Reweight the original MCMC realizations:

Eti |b =

B∑i=1

tiWi(b)

/ B∑i=1

Wi(b)[Wi(b) = e(b−β)

Tαi

]


Bootstrap Intervals Without BootstrappingDiCiccio and Efron (1992)

“abc” Investigate Et |b for b near β

Requires p + 2 numerical 2nd derivatives of E function

Next: Applied to posterior cdf for mu125


0.1 0.2 0.3 0.4

0.0

0.2

0.4

0.6

0.8

1.0

Heavy curve is posterior cdf for mu125, diabetes data.Vertical bars frequentist central 68% abc confidence intervals.

(Light lines show +− one frequentist standard error)

c value

Pro

bm

u125

< c

| da

ta


Estimation After Model Selection

Usually:

(a) look at data

(b) choose model (linear, quad, cubic . . . ?)

(c) fit estimates using chosen model

(d) analyze as if pre-chosen

Today Include model selection process in the analysis

Question Effects on standard errors, confidence intervals, etc.?


Cholesterol Data

n = 164 men took Cholestyramine for ∼ 7 years

x = compliance measure (adjusted: x ∼ N(0,1))

y = cholesterol decrease

Wish to estimate regression values

µj = Ey |x = xj for j = 1,2, . . . ,164

µ = (µ1, µ2, . . . , µ164)T


−2 −1 0 1 2

050

100

Cholesterol data, n=164 subjects: cholesterol decrease plottedversus adjusted compliance; Green curve is OLS cubic regression;

Red points indicate 5 featured subjects

compliance

chol

este

rol d

ecre

ase

1

2

3

4

5


Cp Selection Criterion

Regression model yn×1

= Xn×m

βm×1

+ en×1

[ei ∼ (0, σ2)

]Cp criterion

∥∥∥y − X β∥∥∥2

+ 2mσ2

β = OLS estimate, m = “degrees of freedom”

Model selection From possible models X1,X2,X3, . . .

choose the one minimizing Cp.

Then use OLS estimate from chosen model.


Cp for Cholesterol Data

Model df Cp − 80000 (Boot %)

M1 (linear) 2 1132 (19%)

M2 (quad) 3 1412 (12%)

M3 (cubic) 4 667 (34%)

M4 (quartic) 5 1591 (8%)

M5 (quintic) 6 1811 (21%)

M6 (sextic) 7 2758 (6%)

(σ = 22 from “full model”M6)


Nonparametric Bootstrap Analysis

data = (xi , yi), i = 1,2, . . . ,n = 164 gave original estimate

µ = X3β3

Bootstrap data set: data∗ =(xj , yj)

∗, j = 1,2, . . . ,n

where

(xj , yj)∗ drawn randomly and with replacement from data:

data∗ −→Cp

m∗ −→OLS

β∗m∗ −→ µ∗ = Xm∗ β∗

m∗

I did this all B = 4000 times.


B=4000 nonparametric bootstrap replications for the model−selectedregression estimate of Subject 1; boot (m,stdev)=(−2.63,8.02);

76% of the replications less than original estimate 2.71

Red triangles are 2.5th and 97.5th boot percentilesbootstrap estimates for subject 1

Fre

quen

cy

−30 −20 −10 0 10 20

050

100

150

200

250

^ ^2.71


−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

Smooth Estimation Model: ' y ' is observed data; Ellipses indicate bootstrap distribution for ' y* ';

Red curves level surfaces of equal estimation for thetahat=t(y)

thetahat=t(y)

y


1 2 3 4 5 6

−40

−30

−20

−10

010

2030

Boxplot of Cp boot estimates for Subject 1; B=4000 bootreps;Red bars indicate selection proportions for Models 1−6

only 1/3 of the bootstrap replications chose Model 3selected model

subj

ect 1

est

imat

es

Model3

*********

MODEL 3


Bootstrap Smoothing

Idea Replace original estimator t(y) with bootstrap average

s(y) =

B∑i=1

t(y∗i

) /B

Model averaging

Same as bagging (“bootstrap aggregation,” Breiman)

Removes discontinuities, reduces variance

Approximate confidence interval: s(y) ± 1.96 · sd


Accuracy Theorem

Notation s0 = s(y), t ∗i = t(y∗i ), i = 1,2, . . .B

Y ∗ij = # of times jth data point appears in ith boot sample

covj =∑B

i=1 Y ∗ij ·(t ∗i − s0

) /B

[covariance Y ∗ij with t ∗i

]TheoremThe delta method standard deviation estimate for s0 is

sd =

n∑j=1

cov2j

1/2

,

always ≤

B∑i=1

(t ∗i − s0

)2 /B

1/2

, the boot stdev for t(y).


Projection Interpretation

0


1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Standard Deviation of smoothed estimate relative to original (Red)for five subjects; green line is stdev Naive Cubic Model

bottom numbers show original standard deviationsSubject number

Rel

ativ

e st

dev

7.9 3.9 4.1 4.7 6.8

*

* *

*

*


Model Probability Estimates

34% of the 4000 bootreps chose the cubic model

Poor man’s Bayes posterior prob for “cubic”

How accurate is that 34%?

Apply accuracy theorem to indicator function for choosing “cubic”


Model Boot % ± Standard Error

M1 (linear) 19% ±24

M2 (quad) 12% ±18

M3 (cubic) 34% ±24

M4 (quartic) 8% ±14

M5 (quintic) 21% ±27

M6 (sextic) 6% ±6


Download - Bayesian and Frequentist Issues in Modern Inference

Top Related