simmons comprehensive cancer center
TRANSCRIPT
COMPARING GROUPS – PART 1 CONTINUOUS DATA
Min Chen, Ph.D.
Assistant Professor
Quantitative Biomedical Research CenterDepartment of Clinical SciencesBioinformatics Shared Resource
Simmons Comprehensive Cancer Center
Lecture 4
July 9, 2013
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 1 / 38
OUTLINE
1 REVIEW
2 INTRODUCTION
3 COMPARISON OF TWO GROUPSParametric tests
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 2 / 38
REVIEW: (1−α )% CONFIDENCE INTERVAL OF THEMEAN
Lower Limit :
L = X̄− zα/2 ×s√n
Upper Limit :
U = X̄+ zα/2 ×s√n
Standard Normal Distribution:
µ = 0,σ = 1
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 3 / 38
REVIEW OF CONFIDENCE INTERVAL FROM SMALLSAMPLE
As a rule of thumb, if sample size, N < 30, use the formula below.
(1−α)% Confidence Interval:
X̄± tα/2,n−1 ×s√n
where tα/2 is the (α/2)th quantile of the t-distribution with
(n -1) degrees of freedom.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 4 / 38
REVIEW: INTERPRETATION OF CI
The CI:
Pr(L(X)≤ θ ≤ U(X)) = 1−α.
It is temping to state “the probability that the θ lies between two
numbers, L and U, is (1−α)”.
� Wrong because θ is a fixed number;� L(X) and U(X) are random variables, not numbers.� On average 95% times the calculated intervals will contain the true
population parameter θ .
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 5 / 38
RELATIONSHIP BETWEEN TYPE I ERROR (α ) AND POWER
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 6 / 38
PARAMETRIC VS NON-PARAMETRIC
Parametric tests
Assume data follow some known distribution
E.g., Normal, t-distribution, chi-square, Binomial distribution etc. –
Compare means, variances
Non-parametric tests
Don’t assume a form of distribution
Compare other measures of central tendency (e.g., median, or location
shift)
Useful for skewed data, small samples, ordinal data
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 7 / 38
NOTATION
Population parameter Sample value
Mean µ X̄
Standard deviation σ s
Variance σ2s
2
Sample Size n
Sample Value xi
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 8 / 38
ONE SAMPLE t TEST
Recall one - sample t-test:
t =X̄−µ0
s/√
n
Test statistic for comparing the mean of one group against a fixed
value.
General form of a t-statistic is
t =difference of means
standard error.
T-statistic follows a t-distribution!
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 9 / 38
STUDENT’S t-DISTRIBUTION
Here is how to generate a Student’s t random variable:
Tν =Z�V/ν
,
where
Z is a standard normal distribution;
V has a chi-squared distribution with ν degrees of freedom (df), i.e.,
V =ν
∑i=1
Z2i
where Zi are iid standard normal r.v.’s. (Recall E[Z2i] = 1. So
E[V] = ν .)
Z and V are independent.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 10 / 38
t–A FAMILY OF DISTRIBUTIONS IDENTIFIED BY df
Recall t = X̄−µ0s/√
n=
(X̄−µ0)/σ√
n√s2/σ2
.
Approaches Normal distribution as df increases.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 11 / 38
SMALL SAMPLE VS. LARGE SAMPLE
Recall in CI, as a rule of thumb, if sample size n < 30, use the tstatistic for the (1−α)% confidence Interval:
X̄± tα/2,n−1 ×s√n
while for large samples we have
X̄± zα/2 ×s√n.
The reason is when sample size is large,
tα/2,n−1 ≈ zα/2.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 12 / 38
OUTLINE
1 REVIEW
2 INTRODUCTION
3 COMPARISON OF TWO GROUPSParametric tests
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 13 / 38
COMPARING MEANS OF PAIRED SAMPLES
In paired samples each data point in one sample is matched to another
data point in the second sample.
Same subject
� Measured at 2 time points� Before and after intervention� Two eyes (Left, Right)� Two organs (Heart, Liver)
Matched subjects
� Experimental animal, Pair-fed Match� Male, Female
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 14 / 38
COMPARING MEANS OF TWO INDEPENDENT SAMPLES
Two independent samples
Subjects are unrelated in two separate groups;
Sample sizes may be different in each group, (n1,n2)
Variances in each group may be
� Equal, σ21 = σ2
2� Unequal, σ2
1 �= σ22
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 15 / 38
EXAMPLE 1In a hypertension research study, subjects are given dietary counseling to
restrict their sodium intake. Data on urinary sodium from 8 subjects at
Baseline (Week 0), and Week 1, are shown.
Subject Week 0 Week 1 Change
1 7.85 9.59 1.74
2 12.03 34.5 22.47
3 21.84 4.55 -17.29
4 13.94 20.78 6.84
5 16.68 11.69 -4.99
6 41.78 32.51 -9.27
7 14.97 5.46 -9.51
8 12.07 12.95 0.88
X̄ 17.65 16.5 1.14
s 10.56 11.63 12.22
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 16 / 38
EXAMPLE 1 (CONTD.)
Subject Week 0 Week 1 Change
1 7.85 9.59 1.74
2 12.03 34.5 22.47
3 21.84 4.55 -17.29
4 13.94 20.78 6.84
5 16.68 11.69 -4.99
6 41.78 32.51 -9.27
7 14.97 5.46 -9.51
8 12.07 12.95 0.88
X̄ 17.65 16.5 1.14
s 10.56 11.63 12.22
Q1:Paired samples or two independent samples?
Q2: Is there a change in mean levels of urinary sodium after 1 week?
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 17 / 38
PAIRED t-TEST
Example 1 has paired sample data (since same subject was measured at
two time points).
Compute the mean and standard deviations of differences.
H0 : µ1 −µ2 = c vs. Ha : µ1 −µ2 �= c
t =X̄d − c
sd/√
n,
which follows a t-distribution with (n−1) degrees of freedom.
If |t|> t∗n−1(1−α/2), reject H0. Here t
∗n−1(1−α/2) is the (1−α/2)
quantile of Tn−1.
P− value = Pr(Tn−1 > |t|).
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 18 / 38
REJECTION REGIONS
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 19 / 38
PAIRED T-TEST USING EXCEL – EXAMPLE 1
Values shown in bold red have been modified from original data.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 20 / 38
EXAMPLE 2
A study was performed to compare the mean ERG (electroretinogram)
amplitude of patients with different genetic types of retinitis pigmentosa
(RP), a genetic eye disease that often results in blindness. Data was
collected in patients of age 18-29 years with different genetic types.
Genetic type Mean ± SD N
Dominant 0.85 ± 0.18 62
Recessive 0.38 ± 0.21 35
Table shows values for natural log of ERG.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 21 / 38
EXAMPLE 2 (CONTD.)
Q1:Paired samples or two independent samples?
Q2: Is there a difference in mean log(ERG) amplitude between patients
with dominant RP versus those with the recessive form?
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 22 / 38
TWO-SAMPLE t-TEST WITH EQUAL VARIANCES
Example 2 has two independent samples.
H0 : µ1 = µ2 vs. Ha : µ1 �= µ2
t =X̄1 − X̄2
sp
�1n1+ 1
n2
,
which follows a t-distribution with (n1 +n2 −2) degrees of freedom, where
s2p=
(n1 −1)s21 +(n2 −1)s2
2n1 +n2 −2
is the pooled variance.
If |t|> t∗n1+n2−2(1−α/2), reject H0.
P− value = Pr(Tn−1 > |t|).
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 23 / 38
TWO-SAMPLE t-TEST FOR EQUAL VARIANCES USINGEXCEL–EXAMPLE 2
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 24 / 38
COMPARING VARIANCES
In Example 2, the two-sample t-test for independent samples assumed that
variances were equal
Variance of Group 1 = Variance of Group 2
Note that σ21 = 0.182 = 0.032 and σ2
2 = 0.212 = 0.044.
Is equal variance assumption true?
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 25 / 38
COMPARING VARIANCES
To compare variances, we conduct a hypothesis test to exam if the ratio of
variances is equal to 1.
H0 :σ2
1σ2
2= 1 vs. Ha :
σ21
σ22�= 1
Test statistic: f =s
21
s22,
which follows an F-distribution.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 26 / 38
F-DISTRIBUTION
Here is how to generate a F random variable:
Fν1,ν2 =V1/ν1
V2/ν2,
where
V1 and V2 have chi-squared distributions with ν1 and ν2 degrees of
freedom (df), respectively.
V1 and V2 are independent.
Recall E[V] = ν .
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 27 / 38
F-DISTRIBUTION
F-distribution is a family of distributions that are identified by numerator
and denominator degrees of freedom (df).
F-distribution are
always
right-skewed;
Have numerator
and denominator
df.Recall
f =s
21
s22=
s21/σ2
1s
22/σ2
2.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 28 / 38
REJECTION REGIONS FOR THE F-TEST
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 29 / 38
F-TEST FOR COMPARING VARIANCES
H0 :σ2
1σ2
2= 1 vs. Ha :
σ21
σ22�= 1
Test statistic: f =s
21
s22,
which follows an F-distribution with (n1 −1,n2 −1) degrees of freedom.
If f > Fn1−1,n2−1(1−α/2) or f < Fn1−1,n2−1(α/2), Reject H0.
If f ≥ 1, then P value = 2×Pr(Fn1−1,n2−1 > f );
If f < 1, then P value = 2×Pr(Fn1−1,n2−1 < f ).
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 30 / 38
F-TEST FOR EQUALITY OF VARIANCES USINGEXCEL–EXAMPLE 2
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 31 / 38
TWO-SAMPLE t-TEST WITH UNEQUAL VARIANCES
H0 : µ1 = µ2 vs. Ha : µ1 �= µ2
t =X̄1 − X̄2�
s21
n1+
s22
n2
,
which follows a t-distribution with d�degrees of freedom, where
d� =
�s
21/n1 + s
22/n2
�2
s21/n1
n1−1 +s
22/n2
n2−1
.
Round d�down to nearest integer and call it d”.
If |t|> t∗d”(1−α/2), reject H0.
P− value = Pr(Td” > |t|).
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 32 / 38
EXAMPLE 3
A research study aimed to assess the familial aggregation of cholesterol
levels by collecting data on children of age 2- to 14-years. Cholesterol levels
(mg/dL) were collected in one group of children (say, “cases”) whose father
died from heart disease. Data were also collected in historical control group
of children of same age.
Group Mean ± SD N
Cases 207.3 ± 35.6 100
Historical Control 193.4 ± 17.3 74
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 33 / 38
EXAMPLE 3 (CONTD.)
Paired sample or two independent samples?
Is there a difference in mean cholesterol levels between Cases and
Historical Control group?
Which statistical test should we use?
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 34 / 38
F-TEST FOR EQUALITY OF VARIANCES USINGEXCEL–EXAMPLE 3
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 35 / 38
TWO-SAMPLE t-TEST FOR UNEQUAL VARIANCES USINGEXCEL–EXAMPLE 3
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 36 / 38
ADVANTAGES OF PAIRED SAMPLES
Suppose we want to test H0 : µ1 = µ2 vs. Ha : µ1 �= µ2
Test statistic is related to X̄1 − X̄2.The variance is:
Var(X̄1 − X̄2) = Var(X̄1)+Var(X̄2)−2ρ12
�Var(X̄1) ·Var(X̄2)
The positive correlation ρ12 in paired-samples reduces the variance of
the difference, yielding more powerful test than the independent
sample design.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 37 / 38
REFERENCES I
Rafia Bhore. Lecture notes.
Berman, Nancy (2007). Comparison of Means. In Methods in
Molecular Biology, Vol 404: Topics in Biostatistics, edited by W. T.
Ambrosius. Humana Press Inc., Totowa, NJ, USA.
Rosner, Bernard (2000). Fundamentals of Biostatistics, 5th edition.
Duxbury Press, California, USA.
Min Chen (QBRC/CCBSR) Comparing groups Continuous Data 1 Lec 4 38 / 38