identifying robust activation in fmri

1

Identifying Robust Activation in fMRI

Thomas Nichols, Ph.D.Assistant Professor

Department of Biostatistics University of Michigan

http://www.sph.umich.edu/~nichols

FBIRNMarch 13, 2006

2

Are Robust Activations a Problem?

• Robust activation– Proposed definition:

An effect that is detected regardless of the specific model or methods used

• Shouldn’t we be worried about non-robust activations?

3

Robustness Overview

• 1 voxel | Univariate– Validity– Sensitivity

• Images | Mass-univariate– Validity for some multiple

Type I metric– Sensitivity, depending on metric

4

Robustness & Test Validity

• Parametric, Two-sample t-test– Famously robust

• False positive rate even...• Under non-normality, heterogeneous variance• Most robust with balanced data

– Can have problems with outliers– False positive rate may be <

• Impact for imaging– Simple block designs probably very safe

Univariate

5

Univariate


• Non-Parametric tests– “Exact” by construction

• False positive rate precisely – NB: Due to discreteness, your may not be available

– Not a generic modeling framework• No “permutation GLM”• Autocorrelation challenging

• Impact for imaging– Within subject, must account for autocorrelation– Between subject, simple models easy

6

Robustness & Test Power

• Parametric, Two-sample t-test– Reduced sensitivity

• From outliers or, with un-balanced data, non-normality or heteroscedasticity

• Impact for imaging– Safe, but possibly conservative approach– Not getting the most out of the data

Univariate

7

Univariate


• Non-Parametric tests– Sensitivity varies with test!

• Just because all tests are “Exact” doesn’t mean all have same sensitivity to Ha

– When Normality true, or almost, t-test is optimal• Indicates permutation t-test is good

– When data very non-normal, other tests better• E.g. median• “Robust” methods – Iteratively Re-weighted Least

Squares (Wager, NI, 2005)

Univariate


• Non-Parametric tests• In-flight Monte Carlo Simulation

– One-sample test on differences, 12 Subjects• 11 Ss have effect size 1• 1 S has effect size -2

– Compare power of twopermutation tests

• Median & t-test– Conclusion

• Both tests “exact”, but Median more sensitive in the presence of outliers

TestStatistic

Power(=0.05)

T-test 60.9%

Median 68.7%Normal data, 1,000 realizations

9


• Implications for Imaging– Non-normality (group heterogeneity) can

reduce sensitivity– Alternate test statistics can out-perform

standard methods

Univariate

10

Mass-Univariate Inference• Interesting Result?

– t = 5.446– 4.3×10-5 !

• Look at the data– Contrast

unremarkable– Standard deviation

low– White matter!

• Must account for multiple tests! FIAC group data, 15 subjects, block design data

Different Speaker & Sentence Effect

11

Mass- Univariate


• 100,000 tests, 0-100,000 false positives!– No unique measure of false positives

• Just two:– Familywise Error Rate (FWER)

• Chance of existence of one or more false positives– False Discovery Rate (FDR)

• Expected fraction of false positives (among all detections)

12

Mass- Univariate

Robustness & Test ValidityFWER methods

• Parametric, Random Field Theory– Provides thresholds that control FWER– Assumes data is smooth random field

• Very flexible framework– Closed form results for t/Z/F...

• Can be conservative– Low DF– Low smoothness

13

Mass- Univariate

Robustness & Test ValidityFWER methods

• Non-Parametric– Use permutation to find null max distribution– No smoothness assumptions– “Exact” control of FWER

• Not very flexible– But can get a lot of mileage out of

1-, 2-sample t, and correlation

14

Mass- Univariate

Robustness & Test PowerFWER methods

• Parametric, Random Field Theory– Can be conservative when...

• Low DF• Low smoothness

• Nonparametric Permutation– More powerful when RFT has problems

15

FWERThresholds:RFT vs. Perm• RF & Perm

adapt to smoothness

• Perm & Truth close

• Bonferroni close to truth for low smoothness

9 df

19 df

more

Real Data – ThresholdRFT vs Bonf. vs Perm.

t Threshold (0.05 Corrected)

df RF Bonf Perm Verbal Fluency 4 4701.32 42.59 10.14 Location Switching 9 11.17 9.07 5.83 Task Switching 9 10.79 10.35 5.10 Faces: Main Effect 11 10.43 9.07 7.92 Faces: Interaction 11 10.70 9.07 8.26 Item Recognition 11 9.87 9.80 7.67 Visual Motion 11 11.07 8.92 8.40 Emotional Pictures 12 8.48 8.41 7.15 Pain: Warning 22 5.93 6.05 4.99 Pain: Anticipation 22 5.87 6.05 5.05

Real Data – Num voxel foundRFT vs Bonf. vs Perm. No. Significant Voxels

(0.05 Corrected) t SmVar t df RF Bonf Perm Perm

Verbal Fluency 4 0 0 0 0 Location Switching 9 0 0 158 354 Task Switching 9 4 6 2241 3447 Faces: Main Effect 11 127 371 917 4088 Faces: Interaction 11 0 0 0 0 Item Recognition 11 5 5 58 378 Visual Motion 11 626 1260 1480 4064 Emotional Pictures 12 0 0 0 7 Pain: Warning 22 127 116 221 347 Pain: Anticipation 22 74 55 182 402

18

Mass-Univariate Inference• FWER-Corrected

P-value:0.9878

• FDR-Corrected P-value0.1122

• Interpretation– This result is

totally consistent with the null hyp. when searching 26,000 voxels FIAC group data, 15 subjects, block design data

Different Speaker & Sentence Effect

Robustness Conclusions• Separately consider validity and sensitivity• Validity

– Most methods fairly robust– Event-related fMRI probably least robust

• Sensitivity– Standard univariate methods suffer under non-

normality, heterogeneity– RFT FWER thresholds can lack sensitivity under

low DF, low smoothness– Nonparametric methods, while not fully general,

provide good power under problematic settings

20

Permutation for fMRIBOLD vs. ASL

• Temporal Autocorrelation– BOLD fMRI has it– Makes permutation test difficult

• Differenced ASL data– Differenced ASL data white (Aguirre et al)– Permutation test now easy

• Though Aguirre found that regressing out movement parameters was necessary to get nominal FPR’s

21

BOLD vs. ASL:My stance: Don’t Difference!

• Model the control/label effect– Differenced data has length-n/2– Only using ½ the data is suboptimal– Gauss-Markov Theorem

• Optimally precise estimates come from full, whitened model

• Advantages– Uses standard BOLD fMRI modeling tools

• Reference– Mumford, Hernadez & Nichols,

Estimation Efficiency and Statistical Power in Arterial Spin Labeling FMRI.Provisionally accepted, NeuroImage.

22

ASL w/outDifferencing

• Model all n observations

• Predictors– Baseline

BOLD– Baseline

perfusion– BOLD – Perfusion

Full Data Design Matrix Columns

23


• Two key aspects– Model all data– Account for

autocorrelation• Theoretical

Result– Better power!

Difference in Power Relative toModeling Full Data and Autocorrelation

24


• Two key aspects– Model all data– Account for

autocorrelation• Real Data

Result– Bigger Z’s!

(on average)

Difference in Z scoreFull Model GLS vs. Difference Data OLS

25

ASL Conclusion

• Intrasubject Inferences with ASL– Differenced ASL data only white when noise

1/f– Worry about validity of intrasubject

permutation test• Group Inferences with ASL

– Data then looks just like BOLD fMRI– Permutation test easy again

identifying robust activation in fmri

Documents