hypothesis testing - seoul national universityhosting03.snu.ac.kr/~hokim/int/2018/chap_7.pdf ·...
TRANSCRIPT
7. 통계적 가설검정Hypothesis testing
updated 2018-04-09
7.1 Introduction
• 가설검정(Hypothesis Testing)의 목적
어느 표본을 분석해 표본의 모집단에 관해어떤 결론을 내릴 수 있도록 도와준다
We can make a decision on the population (parameters) based on the results of hypothesis testing
• 가설- 하나 또는 그 이상의 모집단에 대한 기술.
모집단의 모수(parameter)에 관련
Hypothesis-description(s) about the population (parameters)
• A(observed data) -> B (research hypothesis)
• -B -> -A
• Null hypothesis(-B) : No difference (Ho)
• Alternative hypothesis (B): different pop’s (Ha)
• Type I error : prob of rejecting right null hypo= Pr (reject Ho | Ho is true)
• Type II error : prob of accepting wrong null hypo = Pr (Not reject Ho | Ha is true)
• Power = 1- (prob of detecting the diff which is real)
* Basic logic of statistical hypothesis testing
• 4 cases of the hypothesis testing
Results of the Hypo. Test
Truth
H0 : TrueH0 : False(Ha:True)
Accept H0Right(1- )
Type II error( )
Reject H0 Type I error( )
Right(power=1- )
• Procedures of hypothesis testing
1) 자료(Data) -> to understand the characteristics of the data
2) 가정(Assumptions of the model) Normality, homogeneity (same variances), independence,
… etc
3) 가설(Hypotheses) ① null hypotheses, HO- ‘No diff’ we want to reject to
prove the research hypothesis
② alternative hypothesis HA – research hypo to be proven
ex)
We want to have a statistical support to claim the alternative hypothesis
‘we cannot reject the null hypo’=‘no evidence of rejecting the null hypo’ ≠ ‘null hypo is right’
: 50, : 50o AH H
4) 검정통계량(Test Statistic)
depends on observed data
determines rejecting the null hypo
5) 검정통계량의 분포 (dist’n of the test stat under the null dist’n)
6) 결정원칙 (decision rule)
기각역(rejection region)and 수용역(acceptance region)
If the value of the test stat is in the rejection region then we reject the null hypo, otherwise we cannot reject the null hypo.
oxz
n
test stat=
standardized diff between observed stat and the param from the null
observed stat param from the null
SE of the stat
유의수준(lever of significance, )- Porb of rejecting right null hypo.-> Type 1 error
7) 검정통계량의 산출(Calculate Test Statistic)
8) 통계적 결정(Statistical Decision)
9) 결론(Conclusion)
Reject HO-> Evidence that HA is true
Not reject HO -> We don’t have an evidence that HA is true, HO can be right
Ho is
Correct Wrong
decision Fail to reject HO Right decision Type II error
Reject HO Type I error Right decision
Data evaluation
Assumptions check
Generate Hypothesis
Choose Test stat
Dist’n of test stat under Ho
Rule of the test (Rejection region)
Calculate test stat
Stat test
Reject Ho -> conclude that Ha is true
Not reject Ho ->conclude that Ho can be true
1. Sample from normal and known variance
<보기 7.2.1>
mean of age; We want to know that pop mean is very different from 30
1) data: n=10.
2) assumption: normal with
7.2 Hypo test: one sample mean
27x 2 20
oxz
n
: 30, : 30o AH H
oxz
n
0.05 If 1.96, or 1.96 oz z H then reject
7) Calculate test stat:
8) stat’l decision:
-2.121 is in the rejection region-> we reject the null hypo -> TS is significantly different from 0 by
9) conclusion:
p-value: Porb(observing our data or larger difference when Ho is true)
27 30 32.121
1.414220 /10
oxz
n
is not the same as 30
2 pnorm(-2.121) 0.0339p
0.05
• Testing HO using Confidence Interval
-> CI doe not include 30 => we reject HO
If two-sided CI does not include the parameter value => we reject (Ho:parameter= the value) by the level.
If .. include … => we do not reject
CI of 27 1.96 20 /10 24.228, 29.772
100(1 )%
• One-sided test
<Ex 7.2.2>
Age: Can we claim that the pop mean is less than 30?
1) data
2) assumptions
3) Hypothesis :
4) Test stat:
5) Distribution of the test stat
: 30, : 30o AH H
oxz
n
6) Decision rule:
7) Calculating test stat:
8) stat’l decision: -2.12<-1.645 => reject Ho
9) Conclusion: Pop mean is less than 30
(p-value= 0.0169)
0.05
27 302.121
20 /10z
If 1.645, z othen reject H
2. Sampling from normal pop : unknown variance
<Ex 7.2.3>
# days until MRI: Is pop mean different from 15?
1) data:
2) assumptions: random sample from normal pop. Variance is not known.
3) hypo:
4) Test stat:
13.6, 8.3123x s
: 15, : 15o AH H
oxt
s n
oxt
s n
5) dist’n of the test stat: t-dist with df=n-1 under Ho
6) Decision rule:
7) Cal test stat:
8) stat’l decision: -0.753>-2.093 => cannot reject HO
9) conclusion: pop mean is not 15
13.6 150.753
8.3123/ 20t
0.05
3. Random sample from non-normal pop
<Ex 7.2.4>
survey from 157 people. Mean of sbp=146mmHg, sd=27. Is pop mean > 140 ?
1) data:
2) assumption: sample from non-normal pop
3) hypo:
4) Test stat:
146, 27x s
: 140, : 140o AH H
oxz
s n
oxz
s n
5) dist’n of test stat: By CLT, approximately normal dist under Ho
6) Decision rule:
7) Cal test stat:
8) stat’l decision:
2.784>1.645 => Reject HO
9) conclusion: Mean # of sbp is greater than 140. (p=0.0027)
146 1402.784
27 / 157z
0.05
7.3 hypo test: diff of two means
• hypotheses
1. From normal pop : known variance
1 2 1 2
1 2 1 2
1 2 1 2
(1) : 0, : 0
(2) : 0, : 0
(3) : 0, : 0
o A
o A
o A
H H
H H
H H
1 2 1 2
2 21 2
1 2
( ) ( )x xz
n n
<Ex 7.3.1>*
mean diff uric acid* (요산 수치) pts and normal controls
1) data: 12 pts, 15 normal controls
2) assumptions: two independent samples from two independent normal pop with variance=1
3) hypo:
4) Test stat:
1 24.5 mg/dl, 3.4 mg/dlx x
1 2 1 2
1 2 1 2
: 0, :
: 0, :
o o
A A
H H
H H
1 2 1 2
2 21 2
1 2
( ) ( )x xz
n n
5) dist’n of test stat: N(0,1) under HO
6) Decision rule:
7) Cal test stat:
8) stat’l decision:
2.569>1.96 => Reject HO
9) conclusion: means are diff (p=0.0102)
0.05
1.5112 15
(4.5 3.4) 02.569z
2. From normal pop: unknown variance
(1) same variances
<Ex 7.3.2>
sbp from pts and normal controls. Want to know whether sbp of pts is less than that of controls.
1) data:
2) assumptions: two indep random samples from two normal pop with unknown variances
1 2 1 2
2 2
1 2
( ) ( )
p p
x xt
s s
n n
1 1 1
1 2 2
10, 124.8, 22.215
10, 130.0, 30.82
n x s
n x s
3) hypo:
4) Test stat:
5) dist’n of test stat: t dist’n with df= under Ho
6) Decision rule:
7) Cal test stat:
8) stat’l decision: -1.7341<-0.499 => cannot reject HO
9) conclusion: we cannot claim that sbp of pts is less than that of control (p=0.3118)
1 2 1 2: , :o AH H
1 2 1 2
2 2
1 2
( ) ( )
p p
x xt
s s
n n
1 2 2n n
0.05, 1.7341 critical value of t=-
721.73 721.7310 10
(124.8 130.8) 00.499t
2 22 1 1 2 2
1 2
( 1) ( 1)
2
p
n s n ss
n n
> qt(0.05,18)
[1] -1.734064
> pt(-0.499,18)
[1] 0.3119114
(2) diff variances
<Ex 7.3.3>
1) data:
2) assumptions: two indep random samples from normal pop with unknown variances (unequal)
3) hypo:
4) Test stat:
1 2 1 2
2 21 2
1 2
( ) ( )'
x xt
s s
n n
1 1 1 2 2 215, 19.16, 5.29, 30, 9.53, 2.69n x s n x s
1 2 1 2: 0, : 0 o AH H
1 2 1 2
2 21 2
1 2
( ) ( )'
x xt
s s
n n
5) dist’n of TS:
6) Dec rule:
7) Cal TS:
8) stat’l dec: 6.635>2.133 => reject HO
9) conclusion: We claim that the means are different
0.05,
1.8656(2.1448) 0.2412(2.0452)' 2.133
1.8656 0.2412t
2 21 1 2 2 1 1 1 2 2 2
1 2 2 21 2 1 1 2 2
( ) ( )'
( ) ( )
w t w t s n t s n tt
w w s n s n
2 2(5.29) (2.69)
15 30
(19.16 9.53) 0' 6.635t
3. Sample from non-normal pop
<Ex 7.3.4>
We want to know that IgG of group A is higher than that of Group B
1) data:
2) assumptions: Two indep sample, from non-normal pop
3) hypo:
A A A B B B59, 53, sd 45, 46, 54, sd 34x n x n
: 0, :
: 0, :o A B o A B
A A B A A B
H H
H H
4) TS: By CLT,
5) dist’n of TS: asymptotically normal under HO
6) Dec rule:
7) Cal TS:
8) stat’l dec: 1.684<2.326 => cannot reject HO
9) conclusion: we cannot reject that IgG of group A is lower than that of Group B (p=0.0461)
1 2 1 2
2 21 2
1 2
( ) ( )x xz
s s
n n
0.01, One-sided test with critical value of 2.326z
2 2(45) (34)
53 54
(59 46) 01.684z
숙제
• 1-5, 6-7 (SAS and R), 8-11
7.4 짝비교(Paired Comparisons)
• Samples from non-indep observations
ex) observation pre/post trement
• Test stat
<보기 7.4.1>
Measurements of gallbladder function
o
d
dt
s
치료 전 (%) 22 60 96 10 3 50 33 69 64 18 0 34
치료 후 (%) 63 91 59 37 10 19 41 87 86 55 88 40
1)data:𝑑𝑖 = 41 31 − 37 27 7 − 31 8 18 22 37 88 6
2) assumptions: observed diff : random sample from normal pop
3) hypo:
4) TS:
5) dist’n TS: t dist’n with df=n-1 under HO
o o
d d
d dt
s s n
𝐻0 ∶ 𝜇𝑑 ≤ 0 vs 𝐻𝐴 ∶ 𝜇𝑑 > 0
6) Dec rule:
7) Cal TS:
𝑡 =18.083 − 0
1077.0/12=
18.083
9.4736= 1.909
8) stat’l dec: 1.909>1.7959=> reject HO
9) conclusion: The new drug is effective.
10) p=0.0414
If pop var(𝜎𝑑)is known then use 𝑧 = 𝑑−𝜇𝑑
𝜎𝑑/ 𝑛
0.05,
critical value of 1.7959t
7.5 Hypo test: proportion from one pop
<Ex 7.5.1>
25 female (out of 300)is experiencing fasting blood sugar disorder. Any evidence that the proportion of pop is over 6.3%?
1) data: 𝑝 =25
300= 0.08333
2) assum: 𝑛 𝑝 > 5, 𝑛(1 − 𝑝) > 5=> dist’n of is normal by CLT, under HO
3) Hypo:𝐻0 ∶ 𝑝 ≤ 0.063 vs 𝐻𝐴 ∶ 𝑝 > 0.063
0
0 0
p pz
p q n
4) TS:
5) dist’n of TS: asymptotically standard normal under HO
6) Dec rule:
• 7) Cal TS: 𝑧 =0.08333−0.063
(0.063)(0.937)
300
= 1.450
8) sta’l dec: 1.4051<1.645 => cannot reject HO
9) conclusion: We cannot reject that the proportion is less than 6.3% (p=0.0736)
0
0 0
p pz
p q n
0.05, critical vaule of 1.645z
7.6 hypo test: diff of two proportions
<Ex 7.6.1>
compare prevalence of Noonan Syndrome between male and female. male: 12/30, female: 25/50 Is Female’s prevalence higher than Male’s?
1) data:
2) assumption: indep sampling
3) hypo:𝐻0 : 𝑝𝑓 ≤ 𝑝𝑚 vs 𝐻𝐴 ∶ 𝑝𝑓 > 𝑝𝑚
1 2
1 2 1 2( ) ( )
p p
p p p pz
4) Test stat:
5) dist’n TS: 𝑛𝑓𝑝𝑓 > 5, 𝑛𝑓(1 − 𝑝𝑓) > 5, 𝑛𝑚𝑝𝑚 > 5,
𝑛𝑚(1 − 𝑝𝑚) > 5 -> asymptotically standard normal under HO
6) Dec rule:
7) Cal TS: 𝑝𝑓 =25
50= 0.5, 𝑝𝑚 =
12
30= 0.4, 𝑝 =
12+25
30+50=
0.4625, 𝑧 =(0.5−0.4)
0.4625 (0.5375)
50+
0.4625 (0.5375)
30
= 0.868
8) Stat dec: 0.868<1.645=> cannot reject HO
9) conclusion: Female’s prevalence is not higher than Male’s. (p=0.1926)
0.05, Critical value of 1.645z
1 2
2 1 2 1( ) ( )
p p
p p p pz
1 2
1 2
1 2 1 2
(1 ) (1 ),
p p
x x p p p pp
n n n n
<Ex 7.7.1>
n=20, Sample variance=670.81. Is this diff from pop variance=600?
1) data:
2) ass: random sample from normal pop
3) hypo:𝐻0 ∶ σ2 = 600 vs 𝐻𝐴 ∶ σ2 ≠ 600
4) Test stat:
7.7 hypo test: variance of one pop2
2
2
( 1)
n s
2 670.81s
22
2
( 1)
n s
5) dist’n of TS: with df=n-1 under HO
6) Dec rule:
7) Cal TS:
8) stat’l dec: 8.907<21.242<32.852 => cannot reject HO
9) conclusion: No evidence that the variance is not 600(p>0.05) “sample variance is not significantly diff from 600”
2
2
0.05,
Critical value of
(8.907,32.852)
𝜒2 =)19(670.81
600= 21.242
7.8 hypo test: ratio of two variances
• 분산비검정(Variance Ratio Test): Homogeneity of two pop variances
Variance ratio = 1 or not
<Ex 7.8.1>
Sd’s are 30, 12 for groups A and B. Sample size is 6 for both. Is var of A is greater than that of B?
1) data: sd=30 and 12
2) ass: random samples from normal pop
21
22
. s
V Rs
3) hypo:
4) TS:
5) dist’n of TS: F-dist with df= under HO
6) Dec rule:
7) Cal TS: 𝑠12
𝑠22 =
(30)2
(12)2= 6.25
8) Stat dec: 6.25> 5.05 => reject HO
9) con: variance of A is greater than that of B (p=pf(6.25,df1=5, df2=5,lower.tail=F)=0.0329)
2 2 2 21 2 1 2: , : o AH H
2 2 21 1 1
2 2 22 2 2
.
s sV R
s s
1 21, 1 n n
0.05,
critical value of 5.50F
7.9 type II error and power
<Ex 7.9.1>𝛼 = 0.05 𝐻0 ∶ 𝜇 = 17.5, vs 𝐻A ∶ 𝜇 ≠ 17.5
Pop var is assumed to be 3.6, n=100, 𝜇 = 16.5
Calculate the power.
Rule of the test : We reject Ho if | 𝑥−17.5
𝜎
𝑛
| > 1.96
Type I error =P(reject Ho|Ho is true)
Type II error= P(not reject Ho|Ha is true)
=P(| 𝑥−17.5
𝜎
𝑛
| < 1.96 |𝐻𝑎 𝑖𝑠 𝑡𝑟𝑢𝑒)
=P(| 𝑥−17.5
𝜎
𝑛
| < 1.96 |𝐻𝑎 𝑖𝑠 𝑡𝑟𝑢𝑒)
=P −1.96𝜎
𝑛< 𝑥 − 17.5 < 1.96
𝜎
𝑛μ1 = 16.5
=
P −1.96𝜎
𝑛+ 17.5 − μ1 < 𝑥 − μ1 < 1.96
𝜎
𝑛+ 17.5 − μ1 μ1 = 16.5
=P(−1.96 +17.5−μ1
𝜎/ 𝑛<
𝑥−16.5
𝜎/ 𝑛< 1.96 +
17.5−μ1
𝜎/ 𝑛)
= P(−1.96 +1.0
𝜎/ 𝑛<
𝑥−16.5
𝜎/ 𝑛< 1.96 +
1.0
𝜎/ 𝑛)
= P(−1.96 +1.0
3.6/ 100<
𝑥−16.5
𝜎/ 𝑛< 1.96 +
1.0
3.6/ 100)
= P(0.8167 < 𝑍 < 4.7378)
= pnorm(4.737778)-pnorm(0.817778)
[1] 0.2067409
Power=1-0.2067=0.7933
Ha: 𝑍 = 𝑥−16.5
𝜎
𝑛
~𝑁 0,1
m0=17.5;m1=16.5;n=100;s=3.6
1-(pnorm(1.96+(m0-m1)/s*sqrt(n))-pnorm(-1.96+(m0-m1)/s*sqrt(n)))
power.cal <- function(m0,m1,n,s){
1-(pnorm(1.96+(m0-m1)/s*sqrt(n))-pnorm(-1.96+(m0-m1)/s*sqrt(n)))
}
m1=seq(16,19,0.1)
power.cal(17.5, m1,100,3.6)
plot(m1, power.cal(17.5, m1,100,3.6),xlab=expression(mu),ylab=expression(1-beta),type="l")
n=seq(10,200,10)
p=power.cal(17.5, 16.5,n,3.6)
plot(n, p,,ylab=expression(1-beta),type="l")
<Ex 7.9.2>𝑛 = 20, 𝛼 = 0.01, 𝜎 = 15
𝐻0 ∶ 𝜇 ≥ 65, vs 𝐻A ∶ 𝜇 < 65
What is the power when pop mean=55?
Rule of the test : We reject Ho if 𝑥−65𝜎
𝑛
< −2.326
Type I error =P(reject Ho|Ho is true)
Type II error= P(not reject Ho|Ha is true)
=P( 𝑥−65𝜎
𝑛
> −2.326| 𝜇 = 55)
Power curve for one-sided test
Type II error= P(not reject Ho|Ha is true)
=P( 𝑥−65𝜎
𝑛
> −2.326| 𝜇 = 55)
=P( 𝑥 > −2.326𝜎
𝑛+ 65| 𝜇 = 55)
=P( 𝑥−55𝜎
𝑛
> (−2.326𝜎
𝑛+ 65 − 55 )/
𝜎
𝑛)
=P 𝑍 > −2.326 +65−55
15
20
=P 𝑍 > 0.655 = pnorm(0.655,lower=F)
[1] 0.2562339
𝛽, when 𝜇 = 55
Power curve of [Ex 7.9.2]
7.10 2종의 오류를 고려한 표본 크기의 결정(Type II error and sample size)
<Ex 7.10.1 [예제7.9.2]>
𝐻0∶𝜇≥65,vs 𝐻𝐴∶𝜇<65
𝜎=5, 𝛼=0.01,if type II error(𝛽)=0.05,
What is n to guarantee 𝛼 and 𝛽?
Sol) let C be the critical Value, then
𝑃 𝑁(0,1) ≤𝐶−𝜇0
𝜎/ 𝑛= 𝛼, 𝑃 𝑁(0,1) ≥
𝐶−𝜇1
𝜎/ 𝑛= 𝛽
⟹𝐶−𝜇0
𝜎/ 𝑛= 𝑧𝛼 = −𝑧1−𝛼,
𝐶−𝜇1
𝜎/ 𝑛= 𝑧1−𝛽
⟹ C = 𝜇0 − 𝑧1−𝛼𝜎
𝑛= 𝜇1 + 𝑧1−𝛽
𝜎
𝑛
𝑛 =𝑧1−𝛼 + 𝑧1−𝛽 𝜎
𝜇0 − 𝜇1
2
𝑛 =𝑧1−𝛼+𝑧1−𝛽 𝜎
𝜇0−𝜇1
2
= =(2.326+1.645)(15)
(65−55)
2=
35.483
n=36 will guarantee 𝛼 = 0.01 and 𝛽 = 0.05
𝐶 = 65 − 2.32615
36= 59.184
Our test: n=36 and if 𝑥 ≤ 59.184 then reject Ho.
Type I error and Type II error