ka-fu wong © 2007 econ1003: analysis of economic data lesson9-1 lesson 9: confidence intervals and...
Post on 21-Dec-2015
215 views
TRANSCRIPT
Lesson9-1 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Lesson 9:
Confidence Intervals and Confidence Intervals and Tests of HypothesisTests of HypothesisTwo or more samplesTwo or more samples
Lesson9-2 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
The most important part of testing hypothesis
Suppose we are interested in testing whether the population parameter () is equal to k. H0: = k H1: k
First, we need to get a sample estimate (q) of the population parameter ().
Second, we need to identify the sampling distribution of q, including its mean and variance.
Third, we know in most cases, the test statistics will be in the following form: t=(q-k)/q
q is the standard deviation of q under the null. The form of q depends on what q is.
Fourth, given the level of significance, determine the rejection region.
Lesson9-3 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Testing a two-sided hypothesis at 5% level of significance
0
0
q
z
z=(q- 0)/std(q) is approximately normally distribution under CLT.
/2
1.96
Rejection region
-1.96
/2
Rejection region
0+1.96*q0 -1.96*q
Lesson9-4 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
The most important part of constructing confidence intervals
Suppose we are interested in constructing a (1-)*100% confidence interval about the unknown the population parameter (), based on some sampling information.
First, we must have a sample estimate (q) of the population parameter ().
Second, we need to identify the sampling distribution of q, including its mean and variance.
Third, we know in most cases, the following statistics will be approximately normal or student-t distributed: t=(q-k)/q
q is the standard deviation of q under the null. The form of q depends on what q is.
Fourth, given the confidence level, determine the upper and lower confidence limit for . q ± t/2*q
Lesson9-5 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Constructing a 95% confidence interval for
q*
0
q
z
z=(q- )/std(q) is approximately normally distribution under CLT.
/2
1.96
Upper limit
-1.96
/2
lower limit
q*+1.96*qq*-1.96*q
q*: estimate of from a sample.
confidence interval
Lesson9-6 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Examples of the population parameter of interest
Population mean: =
The difference of two population means = –
The sum of two population means = +
The sum of three population means = ++
Population variance: =
Ratio of two population variances: =
Sampling distribution usually normal, due to CLT.
Sampling distribution usually chi-square.
Sampling distribution usually F.
Lesson9-7 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Distribution of linear combinations of random variables
If m1, m2, and m3 are random variables that are independently normally distributed, For constants a, b and c,
z= am1 + bm2 +cm3 are also normally distributed.
E(z) = aE(m1)+ bE(m2)+cE(m3)
Var(z) = a2Var(m1)+ b2Var(m2)+c2Var(m3)
Lesson9-8 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Distribution of sample variance
Let x1, x2, . . . , xn be a random sample from a population. The sample variance is
n
1i
2i
2 )x(x1n
1s
The sampling distribution of s2 has mean σ2
And the following statistics
has a 2 distribution with n – 1 degrees of freedom.
22 σ)E(s 1n
2σ)Var(s
42
2
2
σ
1)s-(n
Lesson9-9 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Distribution of a ratio of sample variances
The random variable
has an F distribution with (nx – 1) numerator degrees of freedom and (ny – 1) denominator degrees of freedom
2y
2y
2x
2x
/σs
/σsF
Lesson9-10 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Hypothesis testing
Two samples
Constructing confidence interval
Lesson9-11 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Control GroupExperimental Group
Sample1
Sample2
To test the effect of an herbal treatment on improvement of memory you randomly select two samples, one to receive the treatment and one to receive a placebo. Results of a memory test taken one month later are given.
95
15
77
1
1
1
n
s
x
105
12
73
2
2
2
n
s
x
The resulting test statistic is 77 - 73 = 4. Is this difference significant or is it due to chance (sampling error)?
Treatment Placebo
An example of hypothesis testing
Lesson9-12 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Two Sample Tests
TEST FOR EQUAL VARIANCESTEST FOR EQUAL VARIANCES TEST FOR EQUAL MEANSTEST FOR EQUAL MEANS
HHo
HH1
Population 1
Population 2
Population 1
Population 2
HHo
HH1
Population 1
Population 2
Population 1Population 2
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Comparing two populations
We wish to know whether the distribution of the differences in sample means has a mean of 0.
If both samples contain at least 30 observations we use the z distribution as the test statistic.
Lesson9-14 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Hypothesis Tests for Two Population Means
Format 1Format 1
Two-Tailed Two-Tailed TestTest
Upper Upper One-Tailed One-Tailed TestTest
Lower Lower One-Tailed One-Tailed TestTest
0.0:
0.0:
21
210
AH
H
0.0:
0.0:
21
210
AH
H
0.0:
0.0:
21
210
AH
H
Format 2Format 2
21
210
:
:
AH
H
21
210
:
:
AH
H
21
210
:
:
AH
H
Preferred
Lesson9-15 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Two Independent Populations: Examples
1. An economist wishes to determine whether there is a difference in mean family income for households in two socioeconomic groups. Do HKU students come from families with
higher income than CUHK students?
2. An admissions officer of a small liberal arts college wants to compare the mean SAT scores of applicants educated in rural high schools & in urban high schools.
Do students from rural high schools have lower A-level exam score than from urban high schools?
Note: The SAT (Scholastic Achievement Test) is a standardized test for college admissions in the United States.
Lesson9-16 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Two Dependent Populations: Examples
1. An analyst for Educational Testing Service wants to compare the mean GMAT scores of students before & after taking a GMAT review course.
Get HKU graduates to take A-Level English and Chinese exam again. Do they get a higher A-Level English and Chinese exam score than at the time they enter HKU?
2. Nike wants to see if there is a difference in durability of 2 sole materials. One type is placed on one shoe, the other type on the other shoe of the same pair.
Note: The Graduate Management Admissions Test, better known by the acronym GMAT (pronounced G-mat), is a standardized test for determining aptitude to succeed academically in graduate business studies. The GMAT is used as one of the selection criteria by most respected business schools globally, most commonly for admission into an MBA program.
Lesson9-17 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Thinking Challenge
1. Miles per gallon ratings of cars before & after mounting radial tires
2. The life expectancies of light bulbs made in two different factories
3. Difference in hardness between 2 metals: one contains an alloy, one doesn’t
4. Tread life of two different motorcycle tires: one on the front, the other on the back
Are they independent or dependent?
independent
independent
dependent
dependent
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Comparing two populations
No assumptions about the shape of the populations are required.
The samples are from independent populations. Values in one sample have no influence on the
values in the other sample(s). Variance formula for independent random
variables A and B: V(A-B) = V(A) + V(B) The formula for computing the value of z is:
2
22
1
21
21
ns
ns
XXz
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
EXAMPLE 1
Two cities, Bradford and Kane are separated only by the Conewango River. There is competition between the two cities. The local paper recently reported that the mean household income in Bradford is $38,000 with a standard deviation of $6,000 for a sample of 40 households. The same article reported the mean income in Kane is $35,000 with a standard deviation of $7,000 for a sample of 35 households. At the .01 significance level can we conclude the mean income in Bradford is more?
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
EXAMPLE 1 continued
Step 1: State the null and alternate hypotheses. H0: µB ≤ µK ; H1: µB > µK
Step 2: State the level of significance. The .01 significance level is stated in the problem.
Step 3: Find the appropriate test statistic. Because both samples are more than 30, we can use z as the test statistic.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 1 continued
Step 4: State the decision rule. The null hypothesis is rejected if z is greater than 2.33.
33.2z0
Rejection Region = 0.01
H0: µB ≤ µK ;
H1: µB > µK
Probability density of z statistic : N(0,1)
Acceptance Region = 0.01
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 1 continued
Step 5: Compute the value of z and make a decision.
98.1
35)000,7($
40)000,6($
000,35$000,38$22
z
33.2z0
H0: µB ≤ µK ;
H1: µB > µK
1.98
Rejection Region = 0.01
Acceptance Region = 0.01
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 1 continued
The decision is to not reject the null hypothesis. We cannot conclude that the mean household income in Bradford is larger.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 1 continued
The p-value is: P(z > 1.98) = .5000 - .4761 = .0239
33.2z0
Rejection Region = 0.01
H0: µB ≤ µK ;
H1: µB > µK
1.98
P-value = 0.0239
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Small Sample Tests of Means
The t distribution is used as the test statistic if one or more of the samples have less than 30 observations.
The required assumptions are:1. Both populations must follow the normal
distribution.2. The populations must have equal standard
deviations.3. The samples are from independent
populations.
Lesson9-26 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Small sample test of means continued
Finding the value of the test statistic requires two steps.Step 1: Pool the sample standard deviations.
2
)1()1(
21
222
2112
nn
snsnsp
21
2
21
11nn
s
XXt
p
Step 2: Determine the value of t from the following formula.
Why not n1 + n2?
Lesson9-27 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Small sample test of means continued
2
)1()1(
21
222
2112
nn
snsnsp
Why not n1 + n2?
1
)(
1
1
1
211
21
n
xxs
n
ii
(n1 – 1) is the degree of freedom. One df is lost because sample mean must be fixed before computation of the sample variance.Division by df instead of n1 ensures the unbiasedness of the s1
2 as an estimate of the population variance.
1
1
211
211 )()1(
n
ii xxsn
2
)()(
21
2
1
222
1
1
211
2
nn
xxxxs
n
ii
n
ii
p
(n1 +n2 – 2) is the degree of freedom. Two dfs are lost because two sample means must be fixed before computation of the sample variance.Division by df instead of (n1+n2) ensures the unbiasedness of the sp
2 as an estimate of the population variance.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
EXAMPLE 2
A recent EPA study compared the highway fuel economy of domestic and imported passenger cars. A sample of 15 domestic cars revealed a mean of 33.7 mpg with a standard deviation of 2.4 mpg. A sample of 12 imported cars revealed a mean of 35.7 mpg with a standard deviation of 3.9.
At the .05 significance level can the EPA conclude that the mpg is higher on the imported cars?
Lesson9-29 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 2 continued
Step 1: State the null and alternate hypotheses. H0: µD ≥ µI ; H1: µD < µI
Step 2: State the level of significance. The .05 significance level is stated in the problem.
Step 3: Find the appropriate test statistic. Both samples are less than 30, so we use the t distribution.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
EXAMPLE 2 continued
Step 4: The decision rule is to reject H0 if t<-1.708.
There are 25 degrees of freedom.
708.1t 0
Rejection Region = 0.05
05.0
:
:0
IDA
ID
H
H
Probability density of t statistic : t (df=25)
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
EXAMPLE 2 continued
918.921215
)9.3)(112()4.2)(115(
2
))(1())(1(
22
21
222
2112
nn
snsnsp
Step 5: We compute the pooled variance:
Lesson9-32 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 2 continued
We compute the value of t as follows.
640.1
121
151
312.8
7.357.33
11
21
2
21
nns
XXt
p
Lesson9-33 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 2 continued
708.1t 0
Rejection Region = 0.05
05.0
:
:0
IDA
ID
H
H
-1.640
H0 is not rejected. There is insufficient sample evidence to claim a higher mpg on the imported cars.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Hypothesis Testing Involving Paired Observations
Independent samples are samples that are not related in any way.
Dependent samples are samples that are paired or related in some fashion. For example: If you wished to buy a car you would look at the
same car at two (or more) different dealerships and compare the prices.
If you wished to measure the effectiveness of a new diet you would weigh the dieters at the start and at the finish of the program.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Hypothesis Testing Involving Paired Observations
Use the following test when the samples are dependent:
where is the mean of the differences is the standard deviation of the differences n is the number of pairs (differences)
dsd
ns
dt
d
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
EXAMPLE 3
An independent testing agency is comparing the daily rental cost for renting a compact car from Hertz and Avis. A random sample of eight cities revealed the following information. At the .05 significance level can the testing agency conclude that there is a difference in the rental charged?
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
EXAMPLE 3 continued
City Hertz ($) Avis ($)
Atlanta 42 40
Chicago 56 52
Cleveland 45 43
Denver 48 48
Honolulu 37 32
Kansas City 45 48
Miami 41 39
Seattle 46 50
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
EXAMPLE 3 continued
Step 1: State the null and alternate hypotheses. H0: µd = 0 ; H1: µd ≠ 0
Step 2: State the level of significance. The .05 significance level is stated in the problem.
Step 3: Find the appropriate test statistic. We can use t as the test statistic.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
EXAMPLE 3 continued
Step 4: State the decision rule. H0 is rejected if t < -2.365 or t > 2.365. We use the t distribution with 7 degrees of freedom.
365.22/ t
H0: µd = 0 ;
H1: µd ≠ 0
Rejection Region IIprobability=0.025
Acceptance Region = 0.01
Rejection Region IProbability =0.025
365.22/ t
Probability density of t statistic : t (df=7)
Lesson9-40 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 3 continued
City Hertz ($) Avis ($) d d2
Atlanta 42 40 2 4
Chicago 56 52 4 16
Cleveland 45 43 2 4
Denver 48 48 0 0
Honolulu 37 32 5 25
Kansas City 45 48 -3 9
Miami 41 39 2 4
Seattle 46 50 -4 16
Lesson9-41 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 3 continued
00.18
0.8
n
dd
1623.3
1888
78
1
222
nnd
dsd
894.081623.3
00.1
ns
dt
d
Lesson9-42 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 3 continued
Step 5: Because 0.894 is less than the critical value, do not reject the null hypothesis. There is no difference in the mean amount charged by Hertz and Avis.
365.22/ t
Rejection Region IIprobability=0.025
Acceptance Region = 0.01
Rejection Region IProbability =0.025
365.22/ t
0.894
H0: µd = 0 ;
H1: µd ≠ 0
Lesson9-43 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Two Sample Tests of Proportions
We investigate whether two independent samples came from populations with an equal proportion of successes.
The two samples are pooled using the following formula.
where X1 and X2 refer to the number of successes in the respective samples of n1 and n2.
21
21
nn
XXpc
Lesson9-44 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Two Sample Tests of Proportions continued
The value of the test statistic is computed from the following formula.
21
21
)1()1(n
ppn
pp
ppz
cccc
Note: The form of standard deviation reflects the assumption of independence of the two samples.
Lesson9-45 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 4
Are unmarried workers more likely to be absent from work than married workers? A sample of 250 married workers showed 22 missed more than 5 days last year, while a sample of 300 unmarried workers showed 35 missed more than five days. Use a .05 significance level.
Lesson9-46 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 4 continued
The null and the alternate hypothesis are:
H0: U ≤ M H1: U > M
The null hypothesis is rejected if the computed value of z is greater than 1.65.
Lesson9-47 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 4 continued
The pooled proportion is
1036.250300
2235
cp
The value of the test statistic is
10.1
250)1036.1(1036.
300)1036.1(1036.
25022
30035
z
Lesson9-48 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example 4 continued
The null hypothesis is not rejected. We cannot conclude that a higher proportion of unmarried workers miss more days in a year than the married workers.
The p-value is:P(z > 1.10) = .5000 - .3643 = .1357
Lesson9-49 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Two Sample Tests
TEST FOR EQUAL VARIANCESTEST FOR EQUAL VARIANCES TEST FOR EQUAL MEANSTEST FOR EQUAL MEANS
HHo
HH1
Population 1
Population 2
Population 1
Population 2
HHo
HH1
Population 1
Population 2
Population 1Population 2
Lesson9-50 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
2
22
1n σ
1)s(n
follows a chi-square distribution with (n – 1) degrees of freedom
Hypothesis Tests of one Population Variance
If the population is normally distributed,
The test statistic for hypothesis tests about one population variance is
20
22
1n σ
1)s(n χ
Variance under null hypothesis
Population variance
Lesson9-51 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Decision Rules: Variance
Population variance
Lower-tail test:
H0: σ2 σ02
H1: σ2 < σ02
Upper-tail test:
H0: σ2 ≤ σ02
H1: σ2 > σ02
Two-tail test:
H0: σ2 = σ02
H1: σ2 ≠ σ02
/2 /2
Reject H0 ifReject H0 if Reject H0 if
or
2, 1n χ
2,1 1n χ
2,1 1n 2/χ
2, 1n 2/χ
2,1 1n
21n χχ
2, 1n
21n χχ
2, 1n
21n 2/ χχ
2,1 1n
21n 2/ χχ
Lesson9-52 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Hypothesis Tests for Two Variances
H0: σx2 = σy
2
H1: σx2 ≠ σy
2Two-tail test
Lower-tail test
Upper-tail test
H0: σx2 σy
2
H1: σx2 < σy
2
H0: σx2 ≤ σy
2
H1: σx2 > σy
2
The two populations are assumed to be independent and normally distributed
Lesson9-53 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Hypothesis Tests for Two Variances
2y
2y
2x
2x
/σs
/σsF
The random variable
has an F distribution with (nx – 1) numerator degrees of freedom and (ny – 1) denominator degrees of freedom
(continued)
Under the null that x2 = y
2, we have
2y
2x
s
sF
Lesson9-54 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Decision Rules: Two Variances
rejection region for a two-tail test is:
F 0
Reject H0Do not reject H0
F 0
/2
Reject H0Do not reject H0
H0: σx2 = σy
2
H1: σx2 ≠ σy
2
H0: σx2 ≤ σy
2
H1: σx2 > σy
2
Let sx2 be the larger of the two
sample variances.
α1,n1,n yxF
2/α1,n1,n0 yxFF if H Reject
2/α1,n1,n yxF
α1,n1,n0 yxFF if H Reject
Lesson9-55 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example: F Test
You are a financial analyst for a brokerage firm. You want to compare dividend yields between stocks listed on the NYSE & NASDAQ. You collect the following data:
Is there a difference in the variances between the NYSE & NASDAQ at the = 0.10 level?
NYSE NASDAQNumber 21 25Mean 3.27 2.53Std dev 1.30 1.16
Lesson9-56 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
F Test: Example Solution
Form the hypothesis test:H0: σx
2 = σy2 (there is no difference between
variances)H1: σx
2 ≠ σy2 (there is a difference between
variances)
Degrees of Freedom: Numerator
(NYSE has the larger standard deviation):
nx – 1 = 21 – 1 = 20 d.f. Denominator:
ny – 1 = 25 – 1 = 24 d.f.
Find the F critical values for = .10/2:
2.03F
F
0.10/2 , 24 , 20
, 1n , 1n yx
2/α
Lesson9-57 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
The test statistic is:
1.2561.16
1.30
s
sF
2
2
2y
2x
/2 = .05
Reject H0Do not reject H0
H0: σx2 = σy
2
H1: σx2 ≠ σy
2
F Test: Example Solution
F = 1.256 is not in the rejection region, so we do not reject H0
(continued)
Conclusion: There is not sufficient evidence of a difference in variances at = .10
F
2.03F 0.10/2 , 24 , 20
Lesson9-58 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Hypothesis testing
Two samples
Constructing confidence interval
Lesson9-59 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Dependent Samples
Tests Means of 2 Related Populations Paired or matched samples Repeated measures (before/after) Use difference between paired values:
di = xi - yi
Lesson9-60 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Mean Difference
The ith paired difference is di , where
di = xi - yi
The point estimate for the population mean paired difference is d :
n
dd
n
1ii
1n
)d(dS
n
1i
2i
d
The sample standard deviation is:
Lesson9-61 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Confidence Interval forMean Difference
The confidence interval for difference between population means, μd , is
where n = the sample size (number of matched pairs in the paired sample)
n
Stdμ
n
Std d
α/21,ndd
α/21,n
Lesson9-62 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Six people sign up for a weight loss program. You collect the following data:
Paired Samples Example
Weight: Person Before (x) After (y) Difference, di
1 136 125 11 2 205 195 10 3 157 150 7 4 138 140 - 2 5 175 165 10 6 166 160 6
42
d = di
n
4.82
1n
)d(dS
2i
d
= 7.0
Lesson9-63 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
For a 95% confidence level, the appropriate t value is tn-1,/2 = t5,.025 = 2.571
The 95% confidence interval for the difference between means, μd , is
12.06μ1.94
6
4.82(2.571)7μ
6
4.82(2.571)7
n
Stdμ
n
Std
d
d
dα/21,nd
dα/21,n
Paired Samples Example (continued)
Since this interval contains zero, we cannot be 95% confident, given this limited data, that the weight loss program helps people lose weight
Lesson9-64 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Difference Between Two Means
Population means, independent
samples
Confidence interval uses z/2
Confidence interval uses a value from the Student’s t distribution
σx2 and σy
2 assumed equal
σx2 and σy
2 known
σx2 and σy
2 unknown
σx2 and σy
2 assumed unequal
Lesson9-65 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Population means, independent samples
σx2 and σy
2 Known
Assumptions:
Samples are randomly and independently drawn
both population distributions are normal
Population variances are known
σx2 and σy
2 known
σx2 and σy
2 unknown
Lesson9-66 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
…and the random variable
has a standard normal distribution
When σx and σy are known and both
populations are normal, the variance of X – Y is
y
2y
x
2x2
YX n
σ
n
σσ
Y
2y
X
2x
YX
n
σ
nσ
)μ(μ)yx(Z
σx2 and σy
2 Known
Population means, independent samples
σx2 and σy
2 known
σx2 and σy
2 unknown
Lesson9-67 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
The confidence interval for μx – μy is:
Confidence Interval, σx
2 and σy2 Known
y
2Y
x
2X
α/2YXy
2Y
x
2X
α/2 n
σ
n
σz)yx(μμ
n
σ
n
σz)yx(
Lesson9-68 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Population means, independent samples
σx2 and σy
2 Unknown,Assumed Equal
Assumptions:
Samples are randomly and independently drawn
Populations are normally distributed
Population variances are unknown but assumed equalσx
2 and σy2
assumed equal
σx2 and σy
2 known
σx2 and σy
2 unknown
σx2 and σy
2 assumed unequal
Lesson9-69 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Forming interval estimates:
The population variances are assumed equal, so use the two sample standard deviations and pool them to estimate σ
use a t value with (nx + ny – 2) degrees of freedom
σx2 and σy
2 Unknown,Assumed Equal
The pooled variance is
2nn
1)s(n1)s(ns
yx
2yy
2xx2
p
Lesson9-70 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
The confidence interval for μ1 – μ2 is:
Where
Confidence Interval, σx
2 and σy2 Unknown, Equal
y
2p
x
2p
α/22,nnYXy
2p
x
2p
α/22,nn n
s
n
st)yx(μμ
n
s
n
st)yx(
yxyx
2nn
1)s(n1)s(ns
yx
2yy
2xx2
p
Lesson9-71 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Pooled Variance Example
You are testing two computer processors for speed. Form a confidence interval for the difference in CPU speed. You collect the following speed data (in Mhz):
CPUx CPUy
Number Tested 17 14Sample mean 3004 2538Sample std dev 74 56
Assume both populations are normal with equal variances, and use 95% confidence
Lesson9-72 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Calculating the Pooled Variance
4427.03
1)141)-(17
5611474117
1)n(n
S1nS1nS
22
y
2yy
2xx2
p
(()1x
The pooled variance is:
The t value for a 95% confidence interval is:
2.045tt 0.025 , 29α/2 , 2nn yx
Lesson9-73 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Calculating the Confidence Limits
The 95% confidence interval is
y
2p
x
2p
α/22,nnYXy
2p
x
2p
α/22,nn n
s
n
st)yx(μμ
n
s
n
st)yx(
yxyx
14
4427.03
17
4427.03(2.054)2538)(3004μμ
14
4427.03
17
4427.03(2.054)2538)(3004 YX
515.31μμ416.69 YX
We are 95% confident that the mean difference in CPU speed is between 416.69 and 515.31 Mhz.
Lesson9-74 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
σx2 and σy
2 Unknown,Assumed Unequal
Assumptions:
Samples are randomly and independently drawn
Populations are normally distributed
Population variances are unknown and assumed unequal
Population means, independent samples
σx2 and σy
2 assumed equal
σx2 and σy
2 known
σx2 and σy
2 unknown
σx2 and σy
2 assumed unequal
Lesson9-75 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
σx2 and σy
2 Unknown,Assumed Unequal
Forming interval estimates:
The population variances are assumed unequal, so a pooled variance is not appropriate
use a t value with degrees of freedom, where
1)/(nn
s1)/(n
ns
)n
s()
ns
(
y
2
y
2y
x
2
x
2x
2
y
2y
x
2x
v
Lesson9-76 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
The confidence interval for μ1 – μ2 is:
Confidence Interval, σx
2 and σy2 Unknown, Unequal
y
2y
x
2x
α/2,YXy
2y
x
2x
α/2, n
s
n
st)yx(μμ
n
s
n
st)yx(
1)/(nn
s1)/(n
ns
)n
s()
ns
(
y
2
y
2y
x
2
x
2x
2
y
2y
x
2x
vWhere
Lesson9-77 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Two Population Proportions
Goal: Form a confidence interval for the difference between two population proportions, Px – Py
The point estimate for the difference is
Assumptions:
Both sample sizes are large (generally at least 40 observations in each sample)
yx pp ˆˆ
Lesson9-78 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Two Population Proportions (continued)
The random variable
is approximately normally distributed
y
yy
x
xx
yxyx
n
)p(1p
n)p(1p
)p(p)pp(Z
ˆˆˆˆ
ˆˆ
The confidence limits for Px – Py are:
y
yy
x
xxyx n
)p(1p
n
)p(1pZ )pp(
ˆˆˆˆˆˆ
2/
Lesson9-79 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example: Two Population Proportions
Form a 90% confidence interval for the difference between the proportion of men and the proportion of women who have college degrees.
In a random sample, 26 of 50 men and 28 of 40 women had an earned college degree
Lesson9-80 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example: Two Population Proportions
Men:
0.101240
0.70(0.30)
50
0.52(0.48)
n
)p(1p
n
)p(1p
y
yy
x
xx
ˆˆˆˆ
0.5250
26px ˆ 0.70
40
28py ˆ
For 90% confidence, Z/2 = 1.645
Women:
(0.1012)1.645.70)(.52
n
)p(1p
n
)p(1pZ)pp(
y
yy
x
xxα/2yx
ˆˆˆˆˆˆ
The confidence limits are:
Since this interval does not contain zero we are 90% confident that the two proportions are not equal
Lesson9-81 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Confidence Intervals for the Population Variance
The confidence interval is based on the sample variance, s2
Assumed: the population is normally distributed
The random variable
2
22
1n σ
1)s(n
follows a chi-square distribution with (n – 1) degrees of freedom
The chi-square valuedenotes the number for which
α)P( 2α , 1n
21n χχ
Lesson9-82 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Confidence Intervals for the Population Variance
The (1 - )% confidence interval for the population variance is
2/2 - 1 , 1n
22
2/2 , 1n
2 1)s(nσ
1)s(n
αα χχ
Lesson9-83 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Example
You are testing the speed of a computer processor. You collect the following data (in Mhz):
CPUx
Sample size 17Sample mean 3004Sample std dev 74
Assume the population is normal. Determine the 95% confidence interval for σx
2
Lesson9-84 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Finding the Chi-square Values
n = 17 so the chi-square distribution has (n – 1) = 16 degrees of freedom
= 0.05, so use the the chi-square values with area 0.025 in each tail:
probability α/2 = .025
216
216
= 28.85
6.91
28.85
20.975 , 16
2/2 - 1 , 1n
20.025 , 16
2/2 , 1n
χχ
χχ
α
α
216 = 6.91
probability α/2 = .025
Lesson9-85 Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data
Calculating the Confidence Limits
The 95% confidence interval is
Converting to standard deviation, we are 95% confident that the population standard deviation of CPU speed is between 55.1 and 112.6 Mhz
2/2 - 1 , 1n
22
2/2 , 1n
2 1)s(nσ
1)s(n
αα χχ
6.91
1)(74)(17σ
28.85
1)(74)(17 22
2
12683σ3037 2