biostatistics in practice session 5: associations and confounding peter d. christenson...
TRANSCRIPT
![Page 1: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/1.jpg)
Biostatistics in Practice
Session 5: Associations and Confounding
Peter D. ChristensonBiostatistician
http://research.LABioMed.org/Biostat
![Page 2: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/2.jpg)
Session 5 Preparation #1
1. We often hear news reports of "seasonally adjusted unemployment rates". Can you think of a logical way that this adjustment could be made?
![Page 3: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/3.jpg)
Session 5 Preparation #2
Unadjusted
Adjusted
What does “adjusted” mean?
How is it done?
From Table 3
![Page 4: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/4.jpg)
Goal One of Session 5Earlier: Compare means for a single measure among groups.
Use t-test, ANOVA.
Session 5: Relate two or more measures.
Use correlation or regression.
Qu et al(2005), JCEM 90:1563-1569.
ΔΔY/ΔX
![Page 5: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/5.jpg)
Goal Two of Session 5
Try to isolate the effects of different characteristics on an outcome.
Previous slide:
Gender
BMI
GH Peak
![Page 6: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/6.jpg)
CorrelationVisualize Y (vertical) by X (horizontal) scatter plot.
Pearson correlation, r, is used to measure association between two measures X and Y
Ranges from -1 (perfect inverse association) to 1 (perfect direct association)
Value of r does not depend on:
scales (units) of X and Ywhich role X and Y assume, as in a X-Y plot
Value of r does depend on: the ranges of X and Yvalues chosen for X, if X is fixed & Y is measured
![Page 7: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/7.jpg)
Graphs and Values of Correlations
![Page 8: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/8.jpg)
Logic for Value of Correlation
Σ (X-Xmean) (Y-Ymean)
√Σ(X-Xmean)2 Σ(Y-Ymean)2
r =
+
+-
-
Statistical software gives r.
![Page 9: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/9.jpg)
Correlation Depends on Ranges of X & Y
Graph B contains only the graph A points in the ellipse.
Correlation is reduced in graph B.
Thus: correlations for the same quantities X and Y may be quite different in different study populations.
BA
![Page 10: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/10.jpg)
Correlation and Measurement Precision
A lack of correlation for the subpopulation with 5<x<6 may be due to inability to measure x and y well.
Lack of evidence of association is not evidence of lack of association.
B
A
r=0 for s
Boverall
5 6
12
10
![Page 11: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/11.jpg)
Regression
Again: Y (vertical) by X (horizontal) scatterplot, as with correlation. See next slide.
X and Y now assume unique roles: Y is an outcome, response, output, dependent
variable. X is an input, predictor, independent variable. Regression analysis is used to:
Measure X-Y association, as with correlation. Fit a straight line through the scatter plot, for:
Prediction of Y from X. Estimation of Δ in Y for a unit change in X (slope = “effect” of X on Y).
![Page 12: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/12.jpg)
Regression Example
ei
MinimizesΣei
2
Range for Individuals
Range for mean
Statistical software gives all this info.
![Page 13: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/13.jpg)
X-Y Association
If slope=0 then X and Y are not associated.
But the slope measured from a sample will never be 0. How different from 0 does a measured slope need to be in order to claim X and Y are associated?
[ Side note: It turns out that slope=0 is equivalent to correlation r = 0. ]
![Page 14: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/14.jpg)
X-Y Association
Test slope=0 vs. slope≠0, with the rule:
Claim association (slope≠0) if
tc=|slope/SE(slope)| > t ≈ 2.
There is a 5% chance of claiming an X-Y association that really does not exist.
Note similarity to t-test for means:
tc=|mean/ SE(mean)|
Formula for SE(slope) is in statistics books.
![Page 15: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/15.jpg)
Example Software OutputThe regression equation is: Y = 81.6 + 2.16 X
Predictor Coeff StdErr T PConstant 81.64 11.47 7.12 <0.0001X 2.1557 0.1122 19.21 <0.0001
S = 21.72 R-Sq = 79.0%
Predicted Values:
X: 100Fit: 297.21SE(Fit): 2.1795% CI: 292.89 - 301.5295% PI: 253.89 - 340.52
Predicted y = 81.6 + 2.16(100)
Range of Ys with 95% assurance for:
Mean of all subjects with x=100.
Individual with x=100.
19.21=2.16/0.112 should be between ~ -2 and 2 if “true” slope=0.
Refers to Intercept
![Page 16: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/16.jpg)
Multiple Regression
We now generalize to prediction from multiple characteristics.
The next slide gives a geometric view of prediction from two factors simultaneously.
![Page 17: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/17.jpg)
Multiple Regression: Geometric View
LHCY is the Y (homocysteine) to be predicted from the two X’s: LCLC (folate) and LB12 (B12).
LHCY = b0 + b1LCLC + b2LB12 is the equation of the plane
Suppose multiple predictors are continuous.
Geometrically, this is fitting a slanted plane to a cloud of points:
www.StatisticalPractice.com
![Page 18: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/18.jpg)
Multiple Regression: Software
![Page 19: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/19.jpg)
Multiple Regression: Software
Output: Values of b0, b1, and b2 for
LHCY = b0 + b1LCLC + b2LB12
![Page 20: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/20.jpg)
How Are Coefficients Interpreted?
LHCY = b0 + b1LCLC + b2LB12
OutcomePredictors
LHCY
LCLC
LB12
LB12 may have both an independent and an indirect (via LCLC) association with LHCY
Correlation
b1 ?
b2 ?
![Page 21: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/21.jpg)
Coefficients: Meaning of their Values
LHCY = b0 + b1LCLC + b2LB12
OutcomePredictors
LHCY increases by b2 for a 1-unit increase in LB12
… if other factors (LCLC) remain constant, or
… adjusting for other factors in the model (LCLC)
May be physiologically impossible to maintain one predictor constant while changing the other by 1 unit.
![Page 22: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/22.jpg)
Another Example: HDL Cholesterol Std Coefficient Error t Pr > |t|
Intercept 1.16448 0.28804 4.04 <.0001 AGE -0.00092 0.00125 -0.74 0.4602 BMI -0.01205 0.00295 -4.08 <.0001BLC 0.05055 0.02215 2.28 0.0239PRSSY -0.00041 0.00044 -0.95 0.3436DIAST 0.00255 0.00103 2.47 0.0147GLUM -0.00046 0.00018 -2.50 0.0135SKINF 0.00147 0.00183 0.81 0.4221LCHOL 0.31109 0.10936 2.84 0.0051
The predictors of log(HDL) are age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol. The equation is:
Log(HDL) = 1.16 - 0.00092(Age) +…+ 0.311(LCHOL)
www.
Statistical
Practice
.com
Output:
![Page 23: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/23.jpg)
HDL Example: Coefficients
Interpretation of coefficients on previous slide:
1. Need to use entire equation for making predictions.
2. Each coefficient measures the difference in expected LHDL between 2 subjects if the factor differs by 1 unit between the two subjects, and if all other factors are the same. E.g., expected LHDL is 0.012 lower in a subject whose BMI is 1 unit greater, but is the same as the other subject on other factors.
Continued …
![Page 24: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/24.jpg)
HDL Example: Coefficients
Interpretation of coefficients two slides back:
3. P-values measure the association of a factor with Log(HDL) , if other factors do not change.
This is sometimes expressed as “after accounting for other factors” or “adjusting for other factors”, and is called independent association.
SKINF probably is associated. Its p=0.42 says that it has no additional info to predict LogHDL, after accounting for other factors such as BMI.
![Page 25: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/25.jpg)
Special Cases of Multiple Regression
So far, our predictors were all measured over a continuum, like age or concentration.
This is simply called multiple regression.
When some predictors are grouping factors like gender or ethnicity, regression has other special names:
ANOVA
Analysis of Covariance
![Page 26: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/26.jpg)
Analysis of Variance
• All predictors are grouping factors.
• One-way ANOVA: Only 1 predictor that may have only 2 “levels”, such as gender, or more levels, such as ethnicity.
• Two-way ANOVA: Two grouping predictors, such as decade of age and genotype.
![Page 27: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/27.jpg)
Analysis of Variance
• Interaction in 2-way ANOVA: Measures whether the effect of one factor depends on the other factor. Difference of a difference in outcome. E.g.,
(Female – Male)Asian – (Female – Male)Hispanic
• The effect of gender, adjusted for ethnicity, is a weighted average of gender differences within ethnic subgroups, i.e., of :
(Female – Male)Asian and (Female – Male)Hispanic
![Page 28: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/28.jpg)
Analysis of Covariance
• At least one predictor is a grouping factor, such as ethnicity, and at least one predictor is continuous, such as age, called a “covariate”.
• Interest is often on comparing the groups.
• The covariate is often a nuisance.
Confounder: A covariate that both co-varies with the outcome and is distributed differently in the groups.
![Page 29: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/29.jpg)
A Study Using Analysis of Covariance
J Clin Endocrin Metab 2006 Nov; 91(11):4424-32.
Potential doping test for athletes.
![Page 30: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/30.jpg)
Study Goal: Outcomes are IGF-1 and Collagen Markers
Determine the relative and combined explanatory power of age, gender, BMI, ethnicity, and sport type on the markers.
*
* for age, gender, and BMI.
Figure 2.One conclusion is lack of differences between ethnic IGF-1 means, after adjustment for age, gender, and BMI (Fig 2).
How are these adjustments made?
![Page 31: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/31.jpg)
Adjustment: For a Single Continuous Characteristic
We simulate data for Caucasians and Africans only for simplicity, to demonstrate attenuation of a large 155-140=15 μg/L ethnic difference to a small 160-157=3 μg/L age-adjusted ethnic difference.
157155 160
140
15 = Diff
3 = Adj Diff
![Page 32: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/32.jpg)
Adjustment: For a Single Continuous Characteristic
Problem:
Want to compare groups on IGF-1.
Groups to be compared (ethnicities) have different mean ages, and IGF-1 tends to decrease with age.
Solution:
Make groups appear to have the same mean age.
![Page 33: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/33.jpg)
250
200
150
100
Age (Years)
IGF
1 (
ug
/L)
IGF1 Adjustment for Age - Simulated Data
(Mean)
140
155
15 = Diff
160157
Diff = 3
Unadjusted 22.2 Adjusted
CaucasianAfrican
15 30
![Page 34: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/34.jpg)
Adjustment: Software Output
Unadjusted Group Difference:
Estimated IGF-1 = intercept + b0(indicator)
Age-adjusted Group Difference :
Estimated IGF-1 = intercept + b0(indicator) +b1(age)
Indicator = 0 if African, 1 if Caucasian.
15
3
![Page 35: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/35.jpg)
Adjustment: Software Input
Select:
Regression or Analysis of Covariance.
Usually menu such as
Output:
Values of b0, b1, and b2 (and p-values and CIs) for
IGF1 = b0 + b1(indicator) + b2(age)
![Page 36: Biostatistics in Practice Session 5: Associations and Confounding Peter D. Christenson Biostatistician](https://reader035.vdocuments.pub/reader035/viewer/2022062407/56649e525503460f94b481ee/html5/thumbnails/36.jpg)
Adjustment: Software Output
Unadjusted Group Difference:
Estimated IGF-1 = intercept + b0(indicator)
Age-adjusted Group Difference :
Estimated IGF-1 = intercept + b0(indicator) +b1(age)
Indicator = 0 if African, 1 if Caucasian.
15
3
Δ in IGF-1 per year(weighted average of the 2 groups)