ka-fu wong © 2007 econ1003: analysis of economic data lesson9-1 lesson 9: confidence intervals and...

86
Lesson9-1 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson 9: Confidence Intervals and Tests of Confidence Intervals and Tests of Hypothesis Hypothesis Two or more samples Two or more samples

Post on 21-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Lesson9-1 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Lesson 9:

Confidence Intervals and Confidence Intervals and Tests of HypothesisTests of HypothesisTwo or more samplesTwo or more samples

Lesson9-2 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

The most important part of testing hypothesis

Suppose we are interested in testing whether the population parameter () is equal to k. H0: = k H1: k

First, we need to get a sample estimate (q) of the population parameter ().

Second, we need to identify the sampling distribution of q, including its mean and variance.

Third, we know in most cases, the test statistics will be in the following form: t=(q-k)/q

q is the standard deviation of q under the null. The form of q depends on what q is.

Fourth, given the level of significance, determine the rejection region.

Lesson9-3 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Testing a two-sided hypothesis at 5% level of significance

0

0

q

z

z=(q- 0)/std(q) is approximately normally distribution under CLT.

/2

1.96

Rejection region

-1.96

/2

Rejection region

0+1.96*q0 -1.96*q

Lesson9-4 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

The most important part of constructing confidence intervals

Suppose we are interested in constructing a (1-)*100% confidence interval about the unknown the population parameter (), based on some sampling information.

First, we must have a sample estimate (q) of the population parameter ().

Second, we need to identify the sampling distribution of q, including its mean and variance.

Third, we know in most cases, the following statistics will be approximately normal or student-t distributed: t=(q-k)/q

q is the standard deviation of q under the null. The form of q depends on what q is.

Fourth, given the confidence level, determine the upper and lower confidence limit for . q ± t/2*q

Lesson9-5 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Constructing a 95% confidence interval for

q*

0

q

z

z=(q- )/std(q) is approximately normally distribution under CLT.

/2

1.96

Upper limit

-1.96

/2

lower limit

q*+1.96*qq*-1.96*q

q*: estimate of from a sample.

confidence interval

Lesson9-6 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Examples of the population parameter of interest

Population mean: =

The difference of two population means = –

The sum of two population means = +

The sum of three population means = ++

Population variance: =

Ratio of two population variances: =

Sampling distribution usually normal, due to CLT.

Sampling distribution usually chi-square.

Sampling distribution usually F.

Lesson9-7 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Distribution of linear combinations of random variables

If m1, m2, and m3 are random variables that are independently normally distributed, For constants a, b and c,

z= am1 + bm2 +cm3 are also normally distributed.

E(z) = aE(m1)+ bE(m2)+cE(m3)

Var(z) = a2Var(m1)+ b2Var(m2)+c2Var(m3)

Lesson9-8 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Distribution of sample variance

Let x1, x2, . . . , xn be a random sample from a population. The sample variance is

n

1i

2i

2 )x(x1n

1s

The sampling distribution of s2 has mean σ2

And the following statistics

has a 2 distribution with n – 1 degrees of freedom.

22 σ)E(s 1n

2σ)Var(s

42

2

2

σ

1)s-(n

Lesson9-9 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Distribution of a ratio of sample variances

The random variable

has an F distribution with (nx – 1) numerator degrees of freedom and (ny – 1) denominator degrees of freedom

2y

2y

2x

2x

/σs

/σsF

Lesson9-10 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Hypothesis testing

Two samples

Constructing confidence interval

Lesson9-11 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Control GroupExperimental Group

Sample1

Sample2

To test the effect of an herbal treatment on improvement of memory you randomly select two samples, one to receive the treatment and one to receive a placebo. Results of a memory test taken one month later are given.

95

15

77

1

1

1

n

s

x

105

12

73

2

2

2

n

s

x

The resulting test statistic is 77 - 73 = 4. Is this difference significant or is it due to chance (sampling error)?

Treatment Placebo

An example of hypothesis testing

Lesson9-12 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Two Sample Tests

TEST FOR EQUAL VARIANCESTEST FOR EQUAL VARIANCES TEST FOR EQUAL MEANSTEST FOR EQUAL MEANS

HHo

HH1

Population 1

Population 2

Population 1

Population 2

HHo

HH1

Population 1

Population 2

Population 1Population 2

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Comparing two populations

We wish to know whether the distribution of the differences in sample means has a mean of 0.

If both samples contain at least 30 observations we use the z distribution as the test statistic.

Lesson9-14 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Hypothesis Tests for Two Population Means

Format 1Format 1

Two-Tailed Two-Tailed TestTest

Upper Upper One-Tailed One-Tailed TestTest

Lower Lower One-Tailed One-Tailed TestTest

0.0:

0.0:

21

210

AH

H

0.0:

0.0:

21

210

AH

H

0.0:

0.0:

21

210

AH

H

Format 2Format 2

21

210

:

:

AH

H

21

210

:

:

AH

H

21

210

:

:

AH

H

Preferred

Lesson9-15 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Two Independent Populations: Examples

1. An economist wishes to determine whether there is a difference in mean family income for households in two socioeconomic groups. Do HKU students come from families with

higher income than CUHK students?

2. An admissions officer of a small liberal arts college wants to compare the mean SAT scores of applicants educated in rural high schools & in urban high schools.

Do students from rural high schools have lower A-level exam score than from urban high schools?

Note: The SAT (Scholastic Achievement Test) is a standardized test for college admissions in the United States.

Lesson9-16 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Two Dependent Populations: Examples

1. An analyst for Educational Testing Service wants to compare the mean GMAT scores of students before & after taking a GMAT review course.

Get HKU graduates to take A-Level English and Chinese exam again. Do they get a higher A-Level English and Chinese exam score than at the time they enter HKU?

2. Nike wants to see if there is a difference in durability of 2 sole materials. One type is placed on one shoe, the other type on the other shoe of the same pair.

Note: The Graduate Management Admissions Test, better known by the acronym GMAT (pronounced G-mat), is a standardized test for determining aptitude to succeed academically in graduate business studies. The GMAT is used as one of the selection criteria by most respected business schools globally, most commonly for admission into an MBA program.

Lesson9-17 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Thinking Challenge

1. Miles per gallon ratings of cars before & after mounting radial tires

2. The life expectancies of light bulbs made in two different factories

3. Difference in hardness between 2 metals: one contains an alloy, one doesn’t

4. Tread life of two different motorcycle tires: one on the front, the other on the back

Are they independent or dependent?

independent

independent

dependent

dependent

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Comparing two populations

No assumptions about the shape of the populations are required.

The samples are from independent populations. Values in one sample have no influence on the

values in the other sample(s). Variance formula for independent random

variables A and B: V(A-B) = V(A) + V(B) The formula for computing the value of z is:

2

22

1

21

21

ns

ns

XXz

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 1

Two cities, Bradford and Kane are separated only by the Conewango River. There is competition between the two cities. The local paper recently reported that the mean household income in Bradford is $38,000 with a standard deviation of $6,000 for a sample of 40 households. The same article reported the mean income in Kane is $35,000 with a standard deviation of $7,000 for a sample of 35 households. At the .01 significance level can we conclude the mean income in Bradford is more?

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 1 continued

Step 1: State the null and alternate hypotheses. H0: µB ≤ µK ; H1: µB > µK

Step 2: State the level of significance. The .01 significance level is stated in the problem.

Step 3: Find the appropriate test statistic. Because both samples are more than 30, we can use z as the test statistic.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 1 continued

Step 4: State the decision rule. The null hypothesis is rejected if z is greater than 2.33.

33.2z0

Rejection Region = 0.01

H0: µB ≤ µK ;

H1: µB > µK

Probability density of z statistic : N(0,1)

Acceptance Region = 0.01

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 1 continued

Step 5: Compute the value of z and make a decision.

98.1

35)000,7($

40)000,6($

000,35$000,38$22

z

33.2z0

H0: µB ≤ µK ;

H1: µB > µK

1.98

Rejection Region = 0.01

Acceptance Region = 0.01

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 1 continued

The decision is to not reject the null hypothesis. We cannot conclude that the mean household income in Bradford is larger.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 1 continued

The p-value is: P(z > 1.98) = .5000 - .4761 = .0239

33.2z0

Rejection Region = 0.01

H0: µB ≤ µK ;

H1: µB > µK

1.98

P-value = 0.0239

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Small Sample Tests of Means

The t distribution is used as the test statistic if one or more of the samples have less than 30 observations.

The required assumptions are:1. Both populations must follow the normal

distribution.2. The populations must have equal standard

deviations.3. The samples are from independent

populations.

Lesson9-26 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Small sample test of means continued

Finding the value of the test statistic requires two steps.Step 1: Pool the sample standard deviations.

2

)1()1(

21

222

2112

nn

snsnsp

21

2

21

11nn

s

XXt

p

Step 2: Determine the value of t from the following formula.

Why not n1 + n2?

Lesson9-27 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Small sample test of means continued

2

)1()1(

21

222

2112

nn

snsnsp

Why not n1 + n2?

1

)(

1

1

1

211

21

n

xxs

n

ii

(n1 – 1) is the degree of freedom. One df is lost because sample mean must be fixed before computation of the sample variance.Division by df instead of n1 ensures the unbiasedness of the s1

2 as an estimate of the population variance.

1

1

211

211 )()1(

n

ii xxsn

2

)()(

21

2

1

222

1

1

211

2

nn

xxxxs

n

ii

n

ii

p

(n1 +n2 – 2) is the degree of freedom. Two dfs are lost because two sample means must be fixed before computation of the sample variance.Division by df instead of (n1+n2) ensures the unbiasedness of the sp

2 as an estimate of the population variance.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 2

A recent EPA study compared the highway fuel economy of domestic and imported passenger cars. A sample of 15 domestic cars revealed a mean of 33.7 mpg with a standard deviation of 2.4 mpg. A sample of 12 imported cars revealed a mean of 35.7 mpg with a standard deviation of 3.9.

At the .05 significance level can the EPA conclude that the mpg is higher on the imported cars?

Lesson9-29 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 2 continued

Step 1: State the null and alternate hypotheses. H0: µD ≥ µI ; H1: µD < µI

Step 2: State the level of significance. The .05 significance level is stated in the problem.

Step 3: Find the appropriate test statistic. Both samples are less than 30, so we use the t distribution.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 2 continued

Step 4: The decision rule is to reject H0 if t<-1.708.

There are 25 degrees of freedom.

708.1t 0

Rejection Region = 0.05

05.0

:

:0

IDA

ID

H

H

Probability density of t statistic : t (df=25)

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 2 continued

918.921215

)9.3)(112()4.2)(115(

2

))(1())(1(

22

21

222

2112

nn

snsnsp

Step 5: We compute the pooled variance:

Lesson9-32 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 2 continued

We compute the value of t as follows.

640.1

121

151

312.8

7.357.33

11

21

2

21

nns

XXt

p

Lesson9-33 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 2 continued

708.1t 0

Rejection Region = 0.05

05.0

:

:0

IDA

ID

H

H

-1.640

H0 is not rejected. There is insufficient sample evidence to claim a higher mpg on the imported cars.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Hypothesis Testing Involving Paired Observations

Independent samples are samples that are not related in any way.

Dependent samples are samples that are paired or related in some fashion. For example: If you wished to buy a car you would look at the

same car at two (or more) different dealerships and compare the prices.

If you wished to measure the effectiveness of a new diet you would weigh the dieters at the start and at the finish of the program.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Hypothesis Testing Involving Paired Observations

Use the following test when the samples are dependent:

where is the mean of the differences is the standard deviation of the differences n is the number of pairs (differences)

dsd

ns

dt

d

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 3

An independent testing agency is comparing the daily rental cost for renting a compact car from Hertz and Avis. A random sample of eight cities revealed the following information. At the .05 significance level can the testing agency conclude that there is a difference in the rental charged?

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 3 continued

City Hertz ($) Avis ($)

Atlanta 42 40

Chicago 56 52

Cleveland 45 43

Denver 48 48

Honolulu 37 32

Kansas City 45 48

Miami 41 39

Seattle 46 50

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 3 continued

Step 1: State the null and alternate hypotheses. H0: µd = 0 ; H1: µd ≠ 0

Step 2: State the level of significance. The .05 significance level is stated in the problem.

Step 3: Find the appropriate test statistic. We can use t as the test statistic.

Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

EXAMPLE 3 continued

Step 4: State the decision rule. H0 is rejected if t < -2.365 or t > 2.365. We use the t distribution with 7 degrees of freedom.

365.22/ t

H0: µd = 0 ;

H1: µd ≠ 0

Rejection Region IIprobability=0.025

Acceptance Region = 0.01

Rejection Region IProbability =0.025

365.22/ t

Probability density of t statistic : t (df=7)

Lesson9-40 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 3 continued

City Hertz ($) Avis ($) d d2

Atlanta 42 40 2 4

Chicago 56 52 4 16

Cleveland 45 43 2 4

Denver 48 48 0 0

Honolulu 37 32 5 25

Kansas City 45 48 -3 9

Miami 41 39 2 4

Seattle 46 50 -4 16

Lesson9-41 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 3 continued

00.18

0.8

n

dd

1623.3

1888

78

1

222

nnd

dsd

894.081623.3

00.1

ns

dt

d

Lesson9-42 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 3 continued

Step 5: Because 0.894 is less than the critical value, do not reject the null hypothesis. There is no difference in the mean amount charged by Hertz and Avis.

365.22/ t

Rejection Region IIprobability=0.025

Acceptance Region = 0.01

Rejection Region IProbability =0.025

365.22/ t

0.894

H0: µd = 0 ;

H1: µd ≠ 0

Lesson9-43 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Two Sample Tests of Proportions

We investigate whether two independent samples came from populations with an equal proportion of successes.

The two samples are pooled using the following formula.

where X1 and X2 refer to the number of successes in the respective samples of n1 and n2.

21

21

nn

XXpc

Lesson9-44 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Two Sample Tests of Proportions continued

The value of the test statistic is computed from the following formula.

21

21

)1()1(n

ppn

pp

ppz

cccc

Note: The form of standard deviation reflects the assumption of independence of the two samples.

Lesson9-45 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 4

Are unmarried workers more likely to be absent from work than married workers? A sample of 250 married workers showed 22 missed more than 5 days last year, while a sample of 300 unmarried workers showed 35 missed more than five days. Use a .05 significance level.

Lesson9-46 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 4 continued

The null and the alternate hypothesis are:

H0: U ≤ M H1: U > M

The null hypothesis is rejected if the computed value of z is greater than 1.65.

Lesson9-47 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 4 continued

The pooled proportion is

1036.250300

2235

cp

The value of the test statistic is

10.1

250)1036.1(1036.

300)1036.1(1036.

25022

30035

z

Lesson9-48 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example 4 continued

The null hypothesis is not rejected. We cannot conclude that a higher proportion of unmarried workers miss more days in a year than the married workers.

The p-value is:P(z > 1.10) = .5000 - .3643 = .1357

Lesson9-49 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Two Sample Tests

TEST FOR EQUAL VARIANCESTEST FOR EQUAL VARIANCES TEST FOR EQUAL MEANSTEST FOR EQUAL MEANS

HHo

HH1

Population 1

Population 2

Population 1

Population 2

HHo

HH1

Population 1

Population 2

Population 1Population 2

Lesson9-50 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

2

22

1n σ

1)s(n

follows a chi-square distribution with (n – 1) degrees of freedom

Hypothesis Tests of one Population Variance

If the population is normally distributed,

The test statistic for hypothesis tests about one population variance is

20

22

1n σ

1)s(n χ

Variance under null hypothesis

Population variance

Lesson9-51 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Decision Rules: Variance

Population variance

Lower-tail test:

H0: σ2 σ02

H1: σ2 < σ02

Upper-tail test:

H0: σ2 ≤ σ02

H1: σ2 > σ02

Two-tail test:

H0: σ2 = σ02

H1: σ2 ≠ σ02

/2 /2

Reject H0 ifReject H0 if Reject H0 if

or

2, 1n χ

2,1 1n χ

2,1 1n 2/χ

2, 1n 2/χ

2,1 1n

21n χχ

2, 1n

21n χχ

2, 1n

21n 2/ χχ

2,1 1n

21n 2/ χχ

Lesson9-52 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Hypothesis Tests for Two Variances

H0: σx2 = σy

2

H1: σx2 ≠ σy

2Two-tail test

Lower-tail test

Upper-tail test

H0: σx2 σy

2

H1: σx2 < σy

2

H0: σx2 ≤ σy

2

H1: σx2 > σy

2

The two populations are assumed to be independent and normally distributed

Lesson9-53 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Hypothesis Tests for Two Variances

2y

2y

2x

2x

/σs

/σsF

The random variable

has an F distribution with (nx – 1) numerator degrees of freedom and (ny – 1) denominator degrees of freedom

(continued)

Under the null that x2 = y

2, we have

2y

2x

s

sF

Lesson9-54 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Decision Rules: Two Variances

rejection region for a two-tail test is:

F 0

Reject H0Do not reject H0

F 0

/2

Reject H0Do not reject H0

H0: σx2 = σy

2

H1: σx2 ≠ σy

2

H0: σx2 ≤ σy

2

H1: σx2 > σy

2

Let sx2 be the larger of the two

sample variances.

α1,n1,n yxF

2/α1,n1,n0 yxFF if H Reject

2/α1,n1,n yxF

α1,n1,n0 yxFF if H Reject

Lesson9-55 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example: F Test

You are a financial analyst for a brokerage firm. You want to compare dividend yields between stocks listed on the NYSE & NASDAQ. You collect the following data:

Is there a difference in the variances between the NYSE & NASDAQ at the = 0.10 level?

NYSE NASDAQNumber 21 25Mean 3.27 2.53Std dev 1.30 1.16

Lesson9-56 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

F Test: Example Solution

Form the hypothesis test:H0: σx

2 = σy2 (there is no difference between

variances)H1: σx

2 ≠ σy2 (there is a difference between

variances)

Degrees of Freedom: Numerator

(NYSE has the larger standard deviation):

nx – 1 = 21 – 1 = 20 d.f. Denominator:

ny – 1 = 25 – 1 = 24 d.f.

Find the F critical values for = .10/2:

2.03F

F

0.10/2 , 24 , 20

, 1n , 1n yx

2/α

Lesson9-57 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

The test statistic is:

1.2561.16

1.30

s

sF

2

2

2y

2x

/2 = .05

Reject H0Do not reject H0

H0: σx2 = σy

2

H1: σx2 ≠ σy

2

F Test: Example Solution

F = 1.256 is not in the rejection region, so we do not reject H0

(continued)

Conclusion: There is not sufficient evidence of a difference in variances at = .10

F

2.03F 0.10/2 , 24 , 20

Lesson9-58 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Hypothesis testing

Two samples

Constructing confidence interval

Lesson9-59 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Dependent Samples

Tests Means of 2 Related Populations Paired or matched samples Repeated measures (before/after) Use difference between paired values:

di = xi - yi

Lesson9-60 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Mean Difference

The ith paired difference is di , where

di = xi - yi

The point estimate for the population mean paired difference is d :

n

dd

n

1ii

1n

)d(dS

n

1i

2i

d

The sample standard deviation is:

Lesson9-61 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Confidence Interval forMean Difference

The confidence interval for difference between population means, μd , is

where n = the sample size (number of matched pairs in the paired sample)

n

Stdμ

n

Std d

α/21,ndd

α/21,n

Lesson9-62 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Six people sign up for a weight loss program. You collect the following data:

Paired Samples Example

Weight: Person Before (x) After (y) Difference, di

1 136 125 11 2 205 195 10 3 157 150 7 4 138 140 - 2 5 175 165 10 6 166 160 6

42

d = di

n

4.82

1n

)d(dS

2i

d

= 7.0

Lesson9-63 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

For a 95% confidence level, the appropriate t value is tn-1,/2 = t5,.025 = 2.571

The 95% confidence interval for the difference between means, μd , is

12.06μ1.94

6

4.82(2.571)7μ

6

4.82(2.571)7

n

Stdμ

n

Std

d

d

dα/21,nd

dα/21,n

Paired Samples Example (continued)

Since this interval contains zero, we cannot be 95% confident, given this limited data, that the weight loss program helps people lose weight

Lesson9-64 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Difference Between Two Means

Population means, independent

samples

Confidence interval uses z/2

Confidence interval uses a value from the Student’s t distribution

σx2 and σy

2 assumed equal

σx2 and σy

2 known

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

Lesson9-65 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Population means, independent samples

σx2 and σy

2 Known

Assumptions:

Samples are randomly and independently drawn

both population distributions are normal

Population variances are known

σx2 and σy

2 known

σx2 and σy

2 unknown

Lesson9-66 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

…and the random variable

has a standard normal distribution

When σx and σy are known and both

populations are normal, the variance of X – Y is

y

2y

x

2x2

YX n

σ

n

σσ

Y

2y

X

2x

YX

n

σ

)μ(μ)yx(Z

σx2 and σy

2 Known

Population means, independent samples

σx2 and σy

2 known

σx2 and σy

2 unknown

Lesson9-67 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

The confidence interval for μx – μy is:

Confidence Interval, σx

2 and σy2 Known

y

2Y

x

2X

α/2YXy

2Y

x

2X

α/2 n

σ

n

σz)yx(μμ

n

σ

n

σz)yx(

Lesson9-68 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Population means, independent samples

σx2 and σy

2 Unknown,Assumed Equal

Assumptions:

Samples are randomly and independently drawn

Populations are normally distributed

Population variances are unknown but assumed equalσx

2 and σy2

assumed equal

σx2 and σy

2 known

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

Lesson9-69 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Forming interval estimates:

The population variances are assumed equal, so use the two sample standard deviations and pool them to estimate σ

use a t value with (nx + ny – 2) degrees of freedom

σx2 and σy

2 Unknown,Assumed Equal

The pooled variance is

2nn

1)s(n1)s(ns

yx

2yy

2xx2

p

Lesson9-70 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

The confidence interval for μ1 – μ2 is:

Where

Confidence Interval, σx

2 and σy2 Unknown, Equal

y

2p

x

2p

α/22,nnYXy

2p

x

2p

α/22,nn n

s

n

st)yx(μμ

n

s

n

st)yx(

yxyx

2nn

1)s(n1)s(ns

yx

2yy

2xx2

p

Lesson9-71 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Pooled Variance Example

You are testing two computer processors for speed. Form a confidence interval for the difference in CPU speed. You collect the following speed data (in Mhz):

CPUx CPUy

Number Tested 17 14Sample mean 3004 2538Sample std dev 74 56

Assume both populations are normal with equal variances, and use 95% confidence

Lesson9-72 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Calculating the Pooled Variance

4427.03

1)141)-(17

5611474117

1)n(n

S1nS1nS

22

y

2yy

2xx2

p

(()1x

The pooled variance is:

The t value for a 95% confidence interval is:

2.045tt 0.025 , 29α/2 , 2nn yx

Lesson9-73 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Calculating the Confidence Limits

The 95% confidence interval is

y

2p

x

2p

α/22,nnYXy

2p

x

2p

α/22,nn n

s

n

st)yx(μμ

n

s

n

st)yx(

yxyx

14

4427.03

17

4427.03(2.054)2538)(3004μμ

14

4427.03

17

4427.03(2.054)2538)(3004 YX

515.31μμ416.69 YX

We are 95% confident that the mean difference in CPU speed is between 416.69 and 515.31 Mhz.

Lesson9-74 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

σx2 and σy

2 Unknown,Assumed Unequal

Assumptions:

Samples are randomly and independently drawn

Populations are normally distributed

Population variances are unknown and assumed unequal

Population means, independent samples

σx2 and σy

2 assumed equal

σx2 and σy

2 known

σx2 and σy

2 unknown

σx2 and σy

2 assumed unequal

Lesson9-75 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

σx2 and σy

2 Unknown,Assumed Unequal

Forming interval estimates:

The population variances are assumed unequal, so a pooled variance is not appropriate

use a t value with degrees of freedom, where

1)/(nn

s1)/(n

ns

)n

s()

ns

(

y

2

y

2y

x

2

x

2x

2

y

2y

x

2x

v

Lesson9-76 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

The confidence interval for μ1 – μ2 is:

Confidence Interval, σx

2 and σy2 Unknown, Unequal

y

2y

x

2x

α/2,YXy

2y

x

2x

α/2, n

s

n

st)yx(μμ

n

s

n

st)yx(

1)/(nn

s1)/(n

ns

)n

s()

ns

(

y

2

y

2y

x

2

x

2x

2

y

2y

x

2x

vWhere

Lesson9-77 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Two Population Proportions

Goal: Form a confidence interval for the difference between two population proportions, Px – Py

The point estimate for the difference is

Assumptions:

Both sample sizes are large (generally at least 40 observations in each sample)

yx pp ˆˆ

Lesson9-78 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Two Population Proportions (continued)

The random variable

is approximately normally distributed

y

yy

x

xx

yxyx

n

)p(1p

n)p(1p

)p(p)pp(Z

ˆˆˆˆ

ˆˆ

The confidence limits for Px – Py are:

y

yy

x

xxyx n

)p(1p

n

)p(1pZ )pp(

ˆˆˆˆˆˆ

2/

Lesson9-79 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example: Two Population Proportions

Form a 90% confidence interval for the difference between the proportion of men and the proportion of women who have college degrees.

In a random sample, 26 of 50 men and 28 of 40 women had an earned college degree

Lesson9-80 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example: Two Population Proportions

Men:

0.101240

0.70(0.30)

50

0.52(0.48)

n

)p(1p

n

)p(1p

y

yy

x

xx

ˆˆˆˆ

0.5250

26px ˆ 0.70

40

28py ˆ

For 90% confidence, Z/2 = 1.645

Women:

(0.1012)1.645.70)(.52

n

)p(1p

n

)p(1pZ)pp(

y

yy

x

xxα/2yx

ˆˆˆˆˆˆ

The confidence limits are:

Since this interval does not contain zero we are 90% confident that the two proportions are not equal

Lesson9-81 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Confidence Intervals for the Population Variance

The confidence interval is based on the sample variance, s2

Assumed: the population is normally distributed

The random variable

2

22

1n σ

1)s(n

follows a chi-square distribution with (n – 1) degrees of freedom

The chi-square valuedenotes the number for which

α)P( 2α , 1n

21n χχ

Lesson9-82 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Confidence Intervals for the Population Variance

The (1 - )% confidence interval for the population variance is

2/2 - 1 , 1n

22

2/2 , 1n

2 1)s(nσ

1)s(n

αα χχ

Lesson9-83 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Example

You are testing the speed of a computer processor. You collect the following data (in Mhz):

CPUx

Sample size 17Sample mean 3004Sample std dev 74

Assume the population is normal. Determine the 95% confidence interval for σx

2

Lesson9-84 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Finding the Chi-square Values

n = 17 so the chi-square distribution has (n – 1) = 16 degrees of freedom

= 0.05, so use the the chi-square values with area 0.025 in each tail:

probability α/2 = .025

216

216

= 28.85

6.91

28.85

20.975 , 16

2/2 - 1 , 1n

20.025 , 16

2/2 , 1n

χχ

χχ

α

α

216 = 6.91

probability α/2 = .025

Lesson9-85 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

Calculating the Confidence Limits

The 95% confidence interval is

Converting to standard deviation, we are 95% confident that the population standard deviation of CPU speed is between 55.1 and 112.6 Mhz

2/2 - 1 , 1n

22

2/2 , 1n

2 1)s(nσ

1)s(n

αα χχ

6.91

1)(74)(17σ

28.85

1)(74)(17 22

2

12683σ3037 2

Lesson9-86 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data

- END -

Lesson 9:Lesson 9: Confidence Intervals and Tests of Confidence Intervals and Tests of HypothesisHypothesisTwo or more samplesTwo or more samples