hypothesis testing - seoul national universityhosting03.snu.ac.kr/~hokim/int/2018/chap_7.pdf ·...

7. 통계적 가설검정Hypothesis testing

updated 2018-04-09

7.1 Introduction

• 가설검정(Hypothesis Testing)의 목적

어느 표본을 분석해 표본의 모집단에 관해어떤 결론을 내릴 수 있도록 도와준다

We can make a decision on the population (parameters) based on the results of hypothesis testing

• 가설- 하나 또는 그 이상의 모집단에 대한 기술.

모집단의 모수(parameter)에 관련

Hypothesis-description(s) about the population (parameters)

• A(observed data) -> B (research hypothesis)

• -B -> -A

• Null hypothesis(-B) : No difference (Ho)

• Alternative hypothesis (B): different pop’s (Ha)

• Type I error : prob of rejecting right null hypo= Pr (reject Ho | Ho is true)

• Type II error : prob of accepting wrong null hypo = Pr (Not reject Ho | Ha is true)

• Power = 1- (prob of detecting the diff which is real)

* Basic logic of statistical hypothesis testing

• 4 cases of the hypothesis testing

Results of the Hypo. Test

Truth

H0 : TrueH0 : False(Ha:True)

Accept H0Right(1- )

Type II error( )

Reject H0 Type I error( )

Right(power=1- )

• Procedures of hypothesis testing

1) 자료(Data) -> to understand the characteristics of the data

2) 가정(Assumptions of the model) Normality, homogeneity (same variances), independence,

… etc

3) 가설(Hypotheses) ① null hypotheses, HO- ‘No diff’ we want to reject to

prove the research hypothesis

② alternative hypothesis HA – research hypo to be proven

ex)

We want to have a statistical support to claim the alternative hypothesis

‘we cannot reject the null hypo’=‘no evidence of rejecting the null hypo’ ≠ ‘null hypo is right’

: 50, : 50o AH H

4) 검정통계량(Test Statistic)

depends on observed data

determines rejecting the null hypo

5) 검정통계량의 분포 (dist’n of the test stat under the null dist’n)

6) 결정원칙 (decision rule)

기각역(rejection region)and 수용역(acceptance region)

If the value of the test stat is in the rejection region then we reject the null hypo, otherwise we cannot reject the null hypo.

oxz

n

test stat=

standardized diff between observed stat and the param from the null

observed stat param from the null

SE of the stat

유의수준(lever of significance, )- Porb of rejecting right null hypo.-> Type 1 error

7) 검정통계량의 산출(Calculate Test Statistic)

8) 통계적 결정(Statistical Decision)

9) 결론(Conclusion)

Reject HO-> Evidence that HA is true

Not reject HO -> We don’t have an evidence that HA is true, HO can be right

Ho is

Correct Wrong

decision Fail to reject HO Right decision Type II error

Reject HO Type I error Right decision

Data evaluation

Assumptions check

Generate Hypothesis

Choose Test stat

Dist’n of test stat under Ho

Rule of the test (Rejection region)

Calculate test stat

Stat test

Reject Ho -> conclude that Ha is true

Not reject Ho ->conclude that Ho can be true

1. Sample from normal and known variance

<보기 7.2.1>

mean of age; We want to know that pop mean is very different from 30

1) data: n=10.

2) assumption: normal with

7.2 Hypo test: one sample mean

27x 2 20

oxz

n

: 30, : 30o AH H

oxz

n

0.05 If 1.96, or 1.96 oz z H then reject

7) Calculate test stat:

8) stat’l decision:

-2.121 is in the rejection region-> we reject the null hypo -> TS is significantly different from 0 by

9) conclusion:

p-value: Porb(observing our data or larger difference when Ho is true)

27 30 32.121

1.414220 /10

oxz

n

is not the same as 30

2 pnorm(-2.121) 0.0339p

0.05

• Testing HO using Confidence Interval

-> CI doe not include 30 => we reject HO

If two-sided CI does not include the parameter value => we reject (Ho:parameter= the value) by the level.

If .. include … => we do not reject

CI of 27 1.96 20 /10 24.228, 29.772

100(1 )%

• One-sided test

<Ex 7.2.2>

Age: Can we claim that the pop mean is less than 30?

1) data

2) assumptions

3) Hypothesis :

4) Test stat:

5) Distribution of the test stat

: 30, : 30o AH H

oxz

n

6) Decision rule:

7) Calculating test stat:

8) stat’l decision: -2.12<-1.645 => reject Ho

9) Conclusion: Pop mean is less than 30

(p-value= 0.0169)

0.05

27 302.121

20 /10z

If 1.645, z othen reject H

2. Sampling from normal pop : unknown variance

<Ex 7.2.3>

# days until MRI: Is pop mean different from 15?

1) data:

2) assumptions: random sample from normal pop. Variance is not known.

3) hypo:

4) Test stat:

13.6, 8.3123x s

: 15, : 15o AH H

oxt

s n

oxt

s n

5) dist’n of the test stat: t-dist with df=n-1 under Ho

6) Decision rule:

7) Cal test stat:

8) stat’l decision: -0.753>-2.093 => cannot reject HO

9) conclusion: pop mean is not 15

13.6 150.753

8.3123/ 20t

0.05

3. Random sample from non-normal pop

<Ex 7.2.4>

survey from 157 people. Mean of sbp=146mmHg, sd=27. Is pop mean > 140 ?

1) data:

2) assumption: sample from non-normal pop

3) hypo:

4) Test stat:

146, 27x s

: 140, : 140o AH H

oxz

s n

oxz

s n

5) dist’n of test stat: By CLT, approximately normal dist under Ho

6) Decision rule:

7) Cal test stat:


2.784>1.645 => Reject HO

9) conclusion: Mean # of sbp is greater than 140. (p=0.0027)

146 1402.784

27 / 157z

0.05

7.3 hypo test: diff of two means

• hypotheses

1. From normal pop : known variance

1 2 1 2

1 2 1 2

1 2 1 2

(1) : 0, : 0

(2) : 0, : 0

(3) : 0, : 0

o A

o A

o A

H H

H H

H H

1 2 1 2

2 21 2

1 2

( ) ( )x xz

n n

<Ex 7.3.1>*

mean diff uric acid* (요산 수치) pts and normal controls

1) data: 12 pts, 15 normal controls

2) assumptions: two independent samples from two independent normal pop with variance=1

3) hypo:

4) Test stat:

1 24.5 mg/dl, 3.4 mg/dlx x

1 2 1 2

1 2 1 2

: 0, :

: 0, :

o o

A A

H H

H H

1 2 1 2

2 21 2

1 2

( ) ( )x xz

n n

5) dist’n of test stat: N(0,1) under HO

6) Decision rule:

7) Cal test stat:


2.569>1.96 => Reject HO

9) conclusion: means are diff (p=0.0102)

0.05

1.5112 15

(4.5 3.4) 02.569z

2. From normal pop: unknown variance

(1) same variances

<Ex 7.3.2>

sbp from pts and normal controls. Want to know whether sbp of pts is less than that of controls.

1) data:

2) assumptions: two indep random samples from two normal pop with unknown variances

1 2 1 2

2 2

1 2

( ) ( )

p p

x xt

s s

n n

1 1 1

1 2 2

10, 124.8, 22.215

10, 130.0, 30.82

n x s

n x s

3) hypo:

4) Test stat:

5) dist’n of test stat: t dist’n with df= under Ho

6) Decision rule:

7) Cal test stat:

8) stat’l decision: -1.7341<-0.499 => cannot reject HO

9) conclusion: we cannot claim that sbp of pts is less than that of control (p=0.3118)

1 2 1 2: , :o AH H

1 2 1 2

2 2

1 2

( ) ( )

p p

x xt

s s

n n

1 2 2n n

0.05, 1.7341 critical value of t=-

721.73 721.7310 10

(124.8 130.8) 00.499t

2 22 1 1 2 2

1 2

( 1) ( 1)

2

p

n s n ss

n n

> qt(0.05,18)

[1] -1.734064

> pt(-0.499,18)

[1] 0.3119114

(2) diff variances

<Ex 7.3.3>

1) data:

2) assumptions: two indep random samples from normal pop with unknown variances (unequal)

3) hypo:

4) Test stat:

1 2 1 2

2 21 2

1 2

( ) ( )'

x xt

s s

n n

1 1 1 2 2 215, 19.16, 5.29, 30, 9.53, 2.69n x s n x s

1 2 1 2: 0, : 0 o AH H

1 2 1 2

2 21 2

1 2

( ) ( )'

x xt

s s

n n

5) dist’n of TS:

6) Dec rule:

7) Cal TS:

8) stat’l dec: 6.635>2.133 => reject HO

9) conclusion: We claim that the means are different

0.05,

1.8656(2.1448) 0.2412(2.0452)' 2.133

1.8656 0.2412t

2 21 1 2 2 1 1 1 2 2 2

1 2 2 21 2 1 1 2 2

( ) ( )'

( ) ( )

w t w t s n t s n tt

w w s n s n

2 2(5.29) (2.69)

15 30

(19.16 9.53) 0' 6.635t

3. Sample from non-normal pop

<Ex 7.3.4>

We want to know that IgG of group A is higher than that of Group B

1) data:

2) assumptions: Two indep sample, from non-normal pop

3) hypo:

A A A B B B59, 53, sd 45, 46, 54, sd 34x n x n

: 0, :

: 0, :o A B o A B

A A B A A B

H H

H H

4) TS: By CLT,

5) dist’n of TS: asymptotically normal under HO

6) Dec rule:

7) Cal TS:

8) stat’l dec: 1.684<2.326 => cannot reject HO

9) conclusion: we cannot reject that IgG of group A is lower than that of Group B (p=0.0461)

1 2 1 2

2 21 2

1 2

( ) ( )x xz

s s

n n

0.01, One-sided test with critical value of 2.326z

2 2(45) (34)

53 54

(59 46) 01.684z

숙제

• 1-5, 6-7 (SAS and R), 8-11

7.4 짝비교(Paired Comparisons)

• Samples from non-indep observations

ex) observation pre/post trement

• Test stat

<보기 7.4.1>

Measurements of gallbladder function

o

d

dt

s

치료 전 (%) 22 60 96 10 3 50 33 69 64 18 0 34

치료 후 (%) 63 91 59 37 10 19 41 87 86 55 88 40

1)data:𝑑𝑖 = 41 31 − 37 27 7 − 31 8 18 22 37 88 6

2) assumptions: observed diff : random sample from normal pop

3) hypo:

4) TS:

5) dist’n TS: t dist’n with df=n-1 under HO

o o

d d

d dt

s s n

𝐻0 ∶ 𝜇𝑑 ≤ 0 vs 𝐻𝐴 ∶ 𝜇𝑑 > 0

6) Dec rule:

7) Cal TS:

𝑡 =18.083 − 0

1077.0/12=

18.083

9.4736= 1.909

8) stat’l dec: 1.909>1.7959=> reject HO

9) conclusion: The new drug is effective.

10) p=0.0414

If pop var(𝜎𝑑)is known then use 𝑧 = 𝑑−𝜇𝑑

𝜎𝑑/ 𝑛

0.05,

critical value of 1.7959t

7.5 Hypo test: proportion from one pop

<Ex 7.5.1>

25 female (out of 300)is experiencing fasting blood sugar disorder. Any evidence that the proportion of pop is over 6.3%?

1) data: 𝑝 =25

300= 0.08333

2) assum: 𝑛 𝑝 > 5, 𝑛(1 − 𝑝) > 5=> dist’n of is normal by CLT, under HO

3) Hypo:𝐻0 ∶ 𝑝 ≤ 0.063 vs 𝐻𝐴 ∶ 𝑝 > 0.063

0

0 0

p pz

p q n

4) TS:

5) dist’n of TS: asymptotically standard normal under HO

6) Dec rule:

• 7) Cal TS: 𝑧 =0.08333−0.063

(0.063)(0.937)

300

= 1.450

8) sta’l dec: 1.4051<1.645 => cannot reject HO

9) conclusion: We cannot reject that the proportion is less than 6.3% (p=0.0736)

0

0 0

p pz

p q n

0.05, critical vaule of 1.645z

7.6 hypo test: diff of two proportions

<Ex 7.6.1>

compare prevalence of Noonan Syndrome between male and female. male: 12/30, female: 25/50 Is Female’s prevalence higher than Male’s?

1) data:

2) assumption: indep sampling

3) hypo:𝐻0 : 𝑝𝑓 ≤ 𝑝𝑚 vs 𝐻𝐴 ∶ 𝑝𝑓 > 𝑝𝑚

1 2

1 2 1 2( ) ( )

p p

p p p pz

4) Test stat:

5) dist’n TS: 𝑛𝑓𝑝𝑓 > 5, 𝑛𝑓(1 − 𝑝𝑓) > 5, 𝑛𝑚𝑝𝑚 > 5,

𝑛𝑚(1 − 𝑝𝑚) > 5 -> asymptotically standard normal under HO

6) Dec rule:

7) Cal TS: 𝑝𝑓 =25

50= 0.5, 𝑝𝑚 =

12

30= 0.4, 𝑝 =

12+25

30+50=

0.4625, 𝑧 =(0.5−0.4)

0.4625 (0.5375)

50+

0.4625 (0.5375)

30

= 0.868

8) Stat dec: 0.868<1.645=> cannot reject HO

9) conclusion: Female’s prevalence is not higher than Male’s. (p=0.1926)

0.05, Critical value of 1.645z

1 2

2 1 2 1( ) ( )

p p

p p p pz

1 2

1 2

1 2 1 2

(1 ) (1 ),

p p

x x p p p pp

n n n n

<Ex 7.7.1>

n=20, Sample variance=670.81. Is this diff from pop variance=600?

1) data:

2) ass: random sample from normal pop

3) hypo:𝐻0 ∶ σ2 = 600 vs 𝐻𝐴 ∶ σ2 ≠ 600

4) Test stat:

7.7 hypo test: variance of one pop2

2

2

( 1)

n s

2 670.81s

22

2

( 1)

n s

5) dist’n of TS: with df=n-1 under HO

6) Dec rule:

7) Cal TS:

8) stat’l dec: 8.907<21.242<32.852 => cannot reject HO

9) conclusion: No evidence that the variance is not 600(p>0.05) “sample variance is not significantly diff from 600”

2

2

0.05,

Critical value of

(8.907,32.852)

𝜒2 =)19(670.81

600= 21.242

7.8 hypo test: ratio of two variances

• 분산비검정(Variance Ratio Test): Homogeneity of two pop variances

Variance ratio = 1 or not

<Ex 7.8.1>

Sd’s are 30, 12 for groups A and B. Sample size is 6 for both. Is var of A is greater than that of B?

1) data: sd=30 and 12

2) ass: random samples from normal pop

21

22

. s

V Rs

3) hypo:

4) TS:

5) dist’n of TS: F-dist with df= under HO

6) Dec rule:

7) Cal TS: 𝑠12

𝑠22 =

(30)2

(12)2= 6.25

8) Stat dec: 6.25> 5.05 => reject HO

9) con: variance of A is greater than that of B (p=pf(6.25,df1=5, df2=5,lower.tail=F)=0.0329)

2 2 2 21 2 1 2: , : o AH H

2 2 21 1 1

2 2 22 2 2

.

s sV R

s s

1 21, 1 n n

0.05,

critical value of 5.50F

=P(| 𝑥−17.5

𝜎

𝑛

| < 1.96 |𝐻𝑎 𝑖𝑠 𝑡𝑟𝑢𝑒)

=P −1.96𝜎

𝑛< 𝑥 − 17.5 < 1.96

𝜎

𝑛μ1 = 16.5

=

P −1.96𝜎

𝑛+ 17.5 − μ1 < 𝑥 − μ1 < 1.96

𝜎

𝑛+ 17.5 − μ1 μ1 = 16.5

=P(−1.96 +17.5−μ1

𝜎/ 𝑛<

𝑥−16.5

𝜎/ 𝑛< 1.96 +

17.5−μ1

𝜎/ 𝑛)

= P(−1.96 +1.0

𝜎/ 𝑛<

𝑥−16.5

𝜎/ 𝑛< 1.96 +

1.0

𝜎/ 𝑛)

= P(−1.96 +1.0

3.6/ 100<

𝑥−16.5

𝜎/ 𝑛< 1.96 +

1.0

3.6/ 100)

= P(0.8167 < 𝑍 < 4.7378)

= pnorm(4.737778)-pnorm(0.817778)

[1] 0.2067409

Power=1-0.2067=0.7933

Ha: 𝑍 = 𝑥−16.5

𝜎

𝑛

~𝑁 0,1

m0=17.5;m1=16.5;n=100;s=3.6

1-(pnorm(1.96+(m0-m1)/s*sqrt(n))-pnorm(-1.96+(m0-m1)/s*sqrt(n)))

power.cal <- function(m0,m1,n,s){

1-(pnorm(1.96+(m0-m1)/s*sqrt(n))-pnorm(-1.96+(m0-m1)/s*sqrt(n)))

}

m1=seq(16,19,0.1)

power.cal(17.5, m1,100,3.6)

plot(m1, power.cal(17.5, m1,100,3.6),xlab=expression(mu),ylab=expression(1-beta),type="l")

n=seq(10,200,10)

p=power.cal(17.5, 16.5,n,3.6)

plot(n, p,,ylab=expression(1-beta),type="l")

<Ex 7.9.2>𝑛 = 20, 𝛼 = 0.01, 𝜎 = 15

𝐻0 ∶ 𝜇 ≥ 65, vs 𝐻A ∶ 𝜇 < 65

What is the power when pop mean=55?

Rule of the test : We reject Ho if 𝑥−65𝜎

𝑛

< −2.326

Type I error =P(reject Ho|Ho is true)


=P( 𝑥−65𝜎

𝑛

> −2.326| 𝜇 = 55)

Power curve for one-sided test


=P( 𝑥−65𝜎

𝑛

> −2.326| 𝜇 = 55)

=P( 𝑥 > −2.326𝜎

𝑛+ 65| 𝜇 = 55)

=P( 𝑥−55𝜎

𝑛

> (−2.326𝜎

𝑛+ 65 − 55 )/

𝜎

𝑛)

=P 𝑍 > −2.326 +65−55

15

20

=P 𝑍 > 0.655 = pnorm(0.655,lower=F)

[1] 0.2562339

𝛽, when 𝜇 = 55

Power curve of [Ex 7.9.2]

7.10 2종의 오류를 고려한 표본 크기의 결정(Type II error and sample size)

<Ex 7.10.1 [예제7.9.2]>

𝐻0∶𝜇≥65,vs 𝐻𝐴∶𝜇<65

𝜎=5, 𝛼=0.01,if type II error(𝛽)=0.05,

What is n to guarantee 𝛼 and 𝛽?

Sol) let C be the critical Value, then

𝑃 𝑁(0,1) ≤𝐶−𝜇0

𝜎/ 𝑛= 𝛼, 𝑃 𝑁(0,1) ≥

𝐶−𝜇1

𝜎/ 𝑛= 𝛽

⟹𝐶−𝜇0

𝜎/ 𝑛= 𝑧𝛼 = −𝑧1−𝛼,

𝐶−𝜇1

𝜎/ 𝑛= 𝑧1−𝛽

⟹ C = 𝜇0 − 𝑧1−𝛼𝜎

𝑛= 𝜇1 + 𝑧1−𝛽

𝜎

𝑛

𝑛 =𝑧1−𝛼 + 𝑧1−𝛽 𝜎

𝜇0 − 𝜇1

2

𝑛 =𝑧1−𝛼+𝑧1−𝛽 𝜎

𝜇0−𝜇1

2

= =(2.326+1.645)(15)

(65−55)

2=

35.483

n=36 will guarantee 𝛼 = 0.01 and 𝛽 = 0.05

𝐶 = 65 − 2.32615

36= 59.184

Our test: n=36 and if 𝑥 ≤ 59.184 then reject Ho.

Type I error and Type II error

hypothesis testing - seoul national universityhosting03.snu.ac.kr/~hokim/int/2018/chap_7.pdf ·...

Documents