two population means hypothesis testing and confidence intervals with unknown standard deviations

Two Population MeansTwo Population Means

Hypothesis Testing and Hypothesis Testing and Confidence IntervalsConfidence Intervals

With UnknownWith UnknownStandard DeviationsStandard Deviations

The ProblemThe Problem1 or 2 are unknown

1 and 2 are not known (the usual case)

OBJECTIVESOBJECTIVES• Test whether 1 > 2 (by a certain amount)

– or whether 1 2

• Determine a confidence interval for the difference in the means: 1 - 2

KEY ASSUMPTIONSKEY ASSUMPTIONSSampling is done from two populations.

– Population 1 has mean µ1 and variance σ12.

– Population 2 has mean µ2 and variance σ22.

– A sample of size n1 will be taken from population 1.

– A sample of size n2 will be taken from population 2.

– Sampling is random and both samples are drawn independently.

– Either the sample sizes will be large or the populations are assumed to be normally distribution.

1

21

1

111 n

σ variance,

n

σ deviation standard ,μ mean :X variableRandom

2

22

2

222 n

σ variance,

n

σ deviation standard ,μ mean :X variableRandom

Distribution of Distribution of XX11 - - XX22

• Since X1 and X2 are both assumed to be normalnormal, or the sample sizes, n1 and n2 are assumed to be large, then because 11 and and 22 are unknown are unknown, the random variable X1 -X2 has a:– DistributionDistribution -- tt

– MeanMean = 11 - - 22

– Standard deviationStandard deviation that depends on whether or not the standard deviations of X1 and X2 (although unknown) can be assumed to be equal

– Degrees of freedomDegrees of freedom that also depends on whether or not the standard deviations of X1 and X2 can be assumed to be equal

Appropriate Standard Deviation Appropriate Standard Deviation For For XX11 - -XX22 When When ’s Are Known’s Are Known• Recall the appropriate standard deviation

for X1 - X2 is:

• Now if 1 = 2 we can simply call it and write it as:

• So if the standard deviations are unknown, we need an estimate for the common variance, 2.

2

22

1

21

n

σ

n

σ

21

2

n

1

n

1σ

Estimating Estimating 2 2

Degrees of FreedomDegrees of Freedom• If we can assume that the populations have equal

variances, then the variance of X1 - X2 is the weighted weighted

average of saverage of s1122 and s and s22

22, weighted by:

DEGREES OF FREEDOMDEGREES OF FREEDOM• There are n1- 1 degrees of freedom from the first sample

and n2-1 degrees of freedom from the second sample, so

• Total Degrees of FreedomTotal Degrees of Freedom for the hypothesis test or confidence interval = (n1 -1) + (n2 -1) = n = n1 1 + n+ n2 2 -2-2

The Appropriate Standard DeviationThe Appropriate Standard DeviationFor For XX11 - - XX22 When Are When Are ’s Unknown, ’s Unknown,

but Can Be Assumed to Be Equalbut Can Be Assumed to Be Equal• The best estimate for 2 then is the pooled

variance, sp2:

• Thus the best estimates for the variance and standard deviation of X1 - X2 are:

22

21

221

21

122

221

12p s

2nn

1ns

2nn

1ns

DFTotal

DFs

DFTotal

DFs

21

2PXX

21

2P

2XX

n

1

n

1s s

n

1

n

1s s

21

21

21 xx t-Statistic

Error

Standard

vEstimate

Point

t

t-Statistic and t-Confidence Interval t-Statistic and t-Confidence Interval Assuming Equal VariancesAssuming Equal Variances

Degrees of Freedom = n1 + n2 -2

Confidence Interval

Error

Standard t

Estimate

Point/2

2x1x 2x1x

21

2p n

1

n

1s

21

2p n

1

n

1s

The Appropriate Standard DeviationThe Appropriate Standard DeviationFor For XX11 - - XX22 When Are When Are ’s Unknown, ’s Unknown,

And Cannot Be Assumed to Be EqualAnd Cannot Be Assumed to Be Equal• If we cannot assume that the populations have

equal variances, then the best estimate for 12 is

s12 and the best estimate for 2

2 is s22.

• Thus the best estimates for the variance and standard deviation of X1 - X2 are:

2

22

1

21

XX

2

22

1

212

XX

n

s

n

s s

n

s

n

s s

21

21

t-Statistic and t-Confidence Interval t-Statistic and t-Confidence Interval Assuming Unequal VariancesAssuming Unequal Variances

21 xx t-Statistic

Error

Standard

vEstimate

Point

t

Confidence Interval

Error

Standardt

Estimate

Point/2

Total Degrees of Freedom

1n

ns

1n

ns

ns

ns

2

2

2

22

1

2

1

21

2

2

22

1

21

2x1x 2x1x

2

22

1

21

n

s

n

s

2

22

1

21

n

s

n

s

Round the resulting value.

s are knowns are known

z-distributionz-distribution

Standard Error

s are unknowns are unknown

t-distributiont-distribution1 = 2 1 ≠ 2

Standard

Error

Degree

Of

Freedomn1 + n2 -2

1n

ns

1n

ns

ns

ns

2

2

2

22

1

2

1

21

2

2

22

1

21

Testing whether the Variances Testing whether the Variances Can Be Assumed to Be EqualCan Be Assumed to Be Equal

• The following hypothesis test tests whether or not equal variances can be assumed:H0:

They are equal)

HA:

They are different)

This is an F-test!This is an F-test!

If the larger of s12 and s2

2 is put in the numerator,

then the test is:

Reject H0 if F = ss

> FDF1, DF2

Hypothesis Test/Confidence Interval Hypothesis Test/Confidence Interval Approach With Unknown Approach With Unknown ’s ’s

• Take a sample of size n1 from population 1

– Calculate x1 and s12

• Take a sample of size n2 from population 2

– Calculate x2 and s22

• Perform an F-test to determine if the variances can be assumed to be equal

• Perform the Appropriate Hypothesis Test or Construct the Appropriate Confidence Interval

Example 1Example 1

Based on the following two random samples, – Can we conclude that women on the average score

better than men on civil service tests?– Construct a 95% for the difference in average scores

between women and men on civil service tests.

• Because the sample sizes are large, we do not have to assume that test scores have a normal distribution to perform our analyses.

Number sampled = 32Sample Average = 75Sample St’d Dev. = 13.92

Women

Number sampled = 30Sample Average = 73

Sample St’d Dev. = 11.79

Men

Example 1 – Example 1 – F-testDo an F-test to determine if variances can be

assumed to be equal.

H0: W2/M

2 = 1 (Equal Variances)

HA: W2/M

2 1 (Unequal Variances)

• Select α = .05.

• Reject H0 (Accept HA) if Larger s2/Smaller s2 > F.025,DF(Larger s2),DF(Smaller s2) = F.025,31,29 = 2.09 *

Calculation: sW2/ sM

2 = (13.92)2/(11.79)2 = 1.39

Since 1.39 < 2.09, CannotCannot conclude unequal variances.

Example 1 The Equal Variance t-TestThe Equal Variance t-Test

H0: W - M = 0

HA: W - M > 0

• Select α = .05.

• Reject H0 (Accept HA) if t > t.05,60 = 1.658

Since t = .608, p-value = 0.273 > 0.05, we cannotcannot conclude that women average better than men on the tests. .608

301

321

167.30

073)(75t

167.30(11.79)60

29(13.92)

60

31s 222

p

Example 1 The Equal Variance t-TestThe Equal Variance t-Test

Example 195% Confidence Interval95% Confidence Interval

95% Confidence Interval

21

2P.025,60MW n

1

n

1s t)x x(

30

1

32

130.167000.2)7375(

2 2 ± 6.57± 6.57-4.57 -4.57 8.57 8.57

Example 2Example 2Based on the following random samples of

basketball attendances at the Staples Center,– Can we conclude that the Lakers average attendance is

more than 2000 more than the Clippers average attendance at the Staples Center?

– Construct a 95% for the difference in average attendance between Lakers and Clippers games at the Staples Center.

Since sample sizes are small, we must assume that attendance at Lakers and Clipper games have normal distributionsnormal distributions to perform the analyses.

Number sampled = 13Sample Average = 16,675Sample St’d Dev. = 1014.97

LA Lakers

Number sampled = 11Sample Average = 12,009Sample St’d Dev. = 3276.73

LA Clippers

Example 2 – Example 2 – F-test• Do an F-test to determine if variances can be

assumed to be equal.

H0: C2/L

2 = 1 (Equal Variances)

HA: C2/L

2 1 (Unequal Variances)

Note: Clipper variance is the larger sample variance

• Choose α = .05.

• Reject H0 (Accept HA) if Larger s2/Smaller s2 > F.025,DF(Larger variance),DF(Smaller variance) = F.025,10,12 = 3.37

Calculation: sC2/ sL

2 = (3276.73)2/(1014.97)2 = 10.42

Since 10.42 > 3.37, CanCan conclude unequal variances.

Do Unequal Variance t-test. Unequal Variance t-test.

Degrees of Freedom for the Unequal Degrees of Freedom for the Unequal Variance t-TestVariance t-Test

• The degrees of freedom for this test is given by:

1n

ns

1n

ns

ns

ns

2

2

2

22

1

2

1

21

2

2

22

1

21

= 11.626=

12

13(1014.97)

10

11(3276.73)

13(1014.97)

11(3276.73)

2222

222

This rounded to 1212 degrees of freedom.

Proceed to the hypothesis test for the difference in means with unequal variances:

H0: L - C = 2000

HA: L - C > 2000• Select α = .05.• Reject H0 (Accept HA) if t > t.05,12 = 1.782

Since t = 2.595, p-value = 0.0117 < 0.05, we cancan conclude that the Lakers average more than 2000 per game more than the Clippers.

Example 2 – Example 2 – the t-Test

595.2

11)73.3276(

13)97.1014(

2000)009,12675,16(t

22

Example 2Example 295% Confidence Interval95% Confidence Interval

95% Confidence Interval

2

22

1

21

.025,12CL n

s

n

s t)x x(

11

)73.3276(

13

)97.1014(179.2)009,12675,16(

22

4666 4666 ± 2238.47± 2238.472427.53 2427.53 6904.47 6904.47

Excel ApproachExcel Approach

• F-test, t-test Assuming Equal Variances, t-test Assuming Unequal Variances are all found in Data AnalysisData Analysis..

• Excel only performs a one-tail F-test. – Multiply this 1-tail p-value by 2 to get the p-

value for the 2-tail F-test.

• Formulas must be entered for the LCL and UCL of the confidence intervals.– All values for these formulas can be found in

the Equal or Unequal Variance t-test Output.

Inputting/Interpreting Results Inputting/Interpreting Results From Hypotheses TestsFrom Hypotheses Tests

• Express H0 and HA so that the number on the right side is positive (or 0)

• The p-value returned for the two-tailed test will always be correct.

• The p-value returned for the one-tail test is usually correct. It is correct if:– HA is a “> test” and the t-statistic is positive

• This is the usual case• If t < 0, the true p-value is 1 – (p-value printed by Excel)

– HA is a “< test” and the t-statistic is negative• This is the usual case• If t>0, the true p-value is 1 – (p-value printed by Excel)

Excel For Excel For Example 1 – F-Test– F-TestGo Data

Select Data Analysis

Select F-Test Two-Sample For Variances

Example 1 – F-Test (Cont’d)Example 1 – F-Test (Cont’d)

Designate first cell

for output.

Use Women (Column A) for Variable Range 1

Use Men (Column B) for Variable Range 2

CheckLabels


p-value for

one-tail test


p-value for

one-tail test

=2*D9Multiply the one-tail p-value by 2 to get the 2-tail p-value.

High p-value (.371671)

Cannot conclude Unequal Variances

Use Equal Variance t-testUse Equal Variance t-test

Example 1 – t-TestExample 1 – t-Test

Go Data


Select t-Test: Two-Sample Assuming Equal Variances

Example 1 – t-Test (Cont’d)Example 1 – t-Test (Cont’d)

Since HA is W - M > 0, enter

Column A for Range 1

Column B for Range 2

0 for Hypothesized Mean Difference

Check

Labels Designate first cell

for output.

Example 1 – t-test (Cont’d)Example 1 – t-test (Cont’d)

p-value for

the one-tail “>” test

p-value for at

two-tail “” test

High p-value for 1-tail test!

Cannot conclude average women’s score >

average men’s score

Example 1 – 95% Confidence IntervalExample 1 – 95% Confidence Interval

Excel For Excel For Example 2 – F-Test– F-TestGo Data


Select F-Test Two-Sample For Variances

Example 2 – F-Test (Cont’d)Example 2 – F-Test (Cont’d)Use Lakers (Column B) for Variable Range 1

Use Clippers (Column D) for Variable Range 2

Check

Labels

Designate first cell

for output.


Enter =2*F9

to give the p-value

for the two-tailed test

p-value for

one-tail test

Low p-value (.000352) – Can conclude Unequal Variances

Use Unequal Variance t-testUse Unequal Variance t-test

Example 2 – t-TestExample 2 – t-TestGo Data


Select

t-Test: Two Sample Assuming Unequal Variances

Example 2 – t-Test (Cont’d)Example 2 – t-Test (Cont’d)

Check

Labels Designate first cell

for output.

Since HA is L - C > 2000, enter

Column B for Range 1

Column D for Range 2

2000 for Hypothesized Mean Difference

Example 2 – t-test (Cont’d)Example 2 – t-test (Cont’d)

Low p-value for 1-tail test

(compared to α = .05)!

CanCan conclude the Lakers average more than 2000 more people per

game than the Clippers.

p-value for

the one-tail “>” test

p-value for at

two-tail “” test

Example 2 – 95% Confidence IntervalExample 2 – 95% Confidence Interval

=(F15-G15)-TINV(.05,F19)*SQRT(F16/F17+G16/G17)1x 2x- DF.025,t-

Highlight Cell I14

Add $ Signs Using F4 key

Drag to cell I15

Change “-” to “+”

1

21

n

s

2n

s

22

*

ReviewReview• Standard Errors and Degrees of Freedom when:

– Variances are assumed equal– Variances are not assumed equal

• F-statistic to determine if variances differ• t-statistic and confidence interval when:

– Variances are assumed equal– Variances are not assumed equal

• Hypothesis Tests/ Confidence Intervals for Differences in Means (Assuming Equal or Unequal Variances)– Summary Data - By Formula– Detailed Data - By Data Analysis Tool

two population means hypothesis testing and confidence intervals with unknown standard deviations

Documents