statistics iiitcs.inf.kyushu-u.ac.jp/~kijima/gps20/gps20-10.pdfquestion does t follow n(0,1), in a...
TRANSCRIPT
Statistics III
July 15, 2020
来嶋 秀治 (Shuji Kijima)
Dept. Informatics,
Graduate School of ISEE
Todays topics
• interval estimation (区間推定)
• hypothesis testing (仮説検定)
• t-test
• 2-test
確率統計特論 (Probability & Statistics)
Lesson 10
1. Interval estimation
Statistical Inference (統計的推定)
point estimation (点推定)
consistent estimation (一致推定)
unbiased estimation (不偏推定)
maximum likelihood (最尤推定)
interval estimation (区間推定)
Statistical inference3
Example 1
A clerk says “our eggs are big. 70[g] in average.”
You bought 6 eggs in a shop.
How large are eggs sold in this shop?
ത𝑋 = 66.3[g], s2 = 17.584[g2]
Is the clerk honest?
1 2 3 4 5 6
weight[g] 64.3 70.4 63.2 67.8 71.3 60.8
Central Limit Theorem (中心極限定理)4
Def.
A series 𝑌𝑛 w/ distribution functions 𝐹𝑛
converges 𝑌 in distribution (𝑌に分布収束する), if
lim𝑛→∞
𝐹𝑛 = 𝐹 where 𝐹 is the distr. func. of 𝑌.
Thm. Central limit theorem
Suppose 𝑋1, … , 𝑋𝑛 are i.i.d., w/ expectation 𝜇, and variance 𝜎2,
then 𝑍𝑛 ≔1
𝑛σ𝑖=1𝑛 𝑋
𝑖−𝜇
𝜎converges to N(0,1) in distribution.
i.e., lim𝑛→∞
Pr 𝑍𝑛 < 𝑧 = −∞
𝑧 1
2𝜋e−
𝑥2
2 d𝑥
𝑍𝑛 ≔1
𝑛
𝑖=1
𝑛𝑋𝑖 − 𝜇
𝜎=
𝑛
𝜎 𝑛
𝑖=1
𝑛𝑋𝑖 − 𝜇
𝑛=
1
𝜎𝑛
𝑋 − 𝜇
Statistical inference5
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Suppose 2=18.0 for simplicity.
Let z* (>0) satisfy
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ ≥ 0.95
Since central limit theorem,
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ = න−𝑧∗
𝑧∗ 1
2𝜋𝜎exp −
1
2𝑥2 d𝑥
… and we see that z* = 1.960 (see normal distribution table).
“two-sided 95%
confidence interval”
両側95%信頼区間
Normal distribution6
Wikipedia: Standard normal table
http://en.wikipedia.org/wiki/Normal_distribution
Standard normal table (標準正規分布表)7
Wikipedia: Standard normal table
http://en.wikipedia.org/wiki/Standard_normal_table
Statistical inference8
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Suppose 2=18.0 for simplicity.
ത𝑋 = 66.3[g]
𝑧∗ = 1.960
𝜎2 = 18.0
𝑛 = 6
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ =
===
= Pr ?≤ 𝜇 ≤?
Statistical inference9
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Suppose 2=18.0 for simplicity.
ത𝑋 = 66.3[g]
𝑧∗ = 1.960
𝜎2 = 18.0
𝑛 = 6
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ = Pr −𝑧∗𝜎
𝑛≤ 𝑋 − 𝜇 ≤ 𝑧∗
𝜎
𝑛
= Pr −𝑋 − 𝑧∗𝜎
𝑛≤ −𝜇 ≤ −𝑋 + 𝑧∗
𝜎
𝑛
= Pr 𝑋 + 𝑧∗𝜎
𝑛≥ 𝜇 ≥ 𝑋 − 𝑧∗
𝜎
𝑛
= Pr 66.3 + 1.96018
6≥ 𝜇 ≥ 66.3 − 1.960
18
6
= Pr 69.69 ≥ 𝜇 ≥ 62.91
2. hypothesis testing (仮説検定)
Todays topics
• interval estimation (区間推定)
• hypothesis testing (仮説検定)
• t-test
• 2-test
Hypothesis testing (仮説検定)11
Terminology
• null hypothesis (帰無仮説)
• alternative hypothesis (対立仮説)
Idea
Pr[null hypo is true]
reject the null hypothesis with significant level
(有意水準で帰無仮説を棄却する)
Pr[null hypo is true]
fail to reject the null hypothesis with significant level
(有意水準で帰無仮説を棄却しない)
Statistical inference12
Example 1
A clerk says “our eggs are big. 70[g] in average.”
You bought 6 eggs in a shop.
How large are eggs sold in this shop?
ത𝑋 = 66.3[g], s2 = 17.584[g2]
Is the clerk honest?
1 2 3 4 5 6
weight[g] 64.3 70.4 63.2 67.8 71.3 60.8
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ =
==
= Pr ?≤ 𝑋 ≤?
Statistical inference13
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Let assume = 70.0 Suppose 2=18.0 for simplicity.
𝜇 = 70
𝑧∗ = 1.960
𝜎2 = 18.0
𝑛 = 6
ത𝑋 = 66.3[g]
Pr −𝑧∗ ≤𝑋 − 𝜇𝜎𝑛
≤ 𝑧∗ = Pr −𝑧∗𝜎
𝑛≤ 𝑋 − 𝜇 ≤ 𝑧∗
𝜎
𝑛
= Pr 𝜇 − 𝑧∗𝜎
𝑛≤ 𝑋 ≤ 𝜇 + 𝑧∗
𝜎
𝑛
= Pr 70 − 1.96018
6≤ 𝑋 ≤ 70 + 1.960
18
6
= Pr 66.6 ≤ 𝑋 ≤ 73.4
Statistical inference14
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Let assume = 70.0 Suppose 2=18.0 for simplicity.
It rejects the null hypothesis = 70.0 with significant level 5%
(帰無仮説 = 70.0 は有意水準5%で棄却される.)
𝜇 = 70
𝑧∗ = 1.960
𝜎2 = 18.0
𝑛 = 6
ത𝑋 = 66.3[g]
Exercise15
Example 2
The scores of an examination.
How much ratio do they understand?
student 1 2 3 4 5 6 7 8 9 10
score 72 89 64 52 96 64 70 83 56 70
Q1. Compute the two-sided 95% confidence interval
Q2. Discuss the null hypothesis “the expectation is 80”
with significance level 5%?
𝑋 = 71.6, 𝜎2 ≃ 200 (unbiased variance)
2. t distribution, 2 distribution
Todays topics
• interval estimation (区間推定)
• hypothesis testing (仮説検定)
• t-test
• 2-test
Statistical inference17
Example 1
A clerk says “our eggs are big. 70[g] in average.”
You bought 6 eggs in a shop.
How large are eggs sold in this shop?
ത𝑋 = 66.3[g], s2 = 17.584[g2]
Is the clerk honest?
1 2 3 4 5 6
weight[g] 64.3 70.4 63.2 67.8 71.3 60.8
Student’s t-statistics (スチューデントのt統計量)18
Let assume = 70.0
Let 𝑡: =ത𝑋−𝜇𝑠
𝑛
,
where 𝑠2 ≔σ𝑖=1𝑛 𝑋𝑖− ത𝑋 2
𝑛−1(unbiased estimator of 2).
𝑍𝑛 ≔ത𝑋−𝜇𝜎
𝑛
in Cent. limit. Thm.
Question
Does t follow N(0,1), in a similar way as Z?
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Suppose 2=18.0 for simplicity.
Student’s t-statistics (スチューデントのt統計量)19
Question
Does 𝑡 follow N(0,1), in a similar way as 𝑧?
𝑡 =𝜎
𝑠𝑍 =
1
𝑠2
𝜎2
𝑍 =1
1𝜎2
⋅σ𝑖=1𝑛 𝑋𝑖 − 𝑋
2
𝑛 − 1
𝑍 =1
1𝑛 − 1
σ𝑖=1𝑛 𝑋𝑖 − 𝑋
𝜎
2
𝑍
Let 𝑡 =ത𝑋−𝜇𝑠
𝑛
and 𝑍 =ത𝑋−𝜇𝜎
𝑛
where 𝑠2 ≔σ𝑖=1𝑛 𝑋𝑖− ത𝑋 2
𝑛−1(unbiased estimator of 2).
t-distribution and 2distribution20
Prop. 1.
σ𝑖=1𝑛 𝑋𝑖−𝑋
𝜎
2
follows the 𝜒2-distribution with 𝑛 − 1 degrees.
Prop. 2.
𝑋1, … , 𝑋𝑛 ∼ N 0,1 , independently.
Let 𝑌:= 𝑋12 +⋯+ 𝑋𝑛
2, then 𝑌 follows Ga1
2,𝑛
2.
𝜒2-distribution
with 𝑛 degrees
of freedom
𝑡 =𝜎
𝑠𝑍 =
1
𝑠2
𝜎2
𝑍 =1
1𝜎2
⋅σ𝑖=1𝑛 𝑋𝑖 − 𝑋
2
𝑛 − 1
𝑍 =1
1𝑛 − 1
σ𝑖=1𝑛 𝑋𝑖 − 𝑋
𝜎
2
𝑍
Idea of Prop. 1 (Not a sketch of proof)21
𝑖=1
𝑛𝑋𝑖 − 𝑋
𝜎
2
=
𝑖=1
𝑛 𝑋𝑖 − 𝜇 − 𝑋 − 𝜇
𝜎
2
=1
𝜎2
𝑖=1
𝑛
𝑋𝑖 − 𝜇 2 − 2 𝑋𝑖 − 𝜇 𝑋 − 𝜇 + 𝑋 − 𝜇2
=
𝑖=1
𝑛𝑋𝑖 − 𝜇
𝜎
2
− 2 𝑋 − 𝜇σ𝑖=1𝑛 𝑋𝑖 − 𝜇
𝜎2+ 𝑛
𝑋 − 𝜇
𝜎
2
=
𝑖=1
𝑛𝑋𝑖 − 𝜇
𝜎
2
− 2𝑛𝑋 − 𝜇
𝜎
2
+ 𝑛𝑋 − 𝜇
𝜎
2
=
𝑖=1
𝑛𝑋𝑖 − 𝜇
𝜎
2
− 𝑛𝑋 − 𝜇
𝜎
2
=
𝑖=1
𝑛𝑋𝑖 − 𝜇
𝜎
2
−𝑋 − 𝜇𝜎𝑛
2
Rem. if 𝑋 ∼ N 𝜇, 𝜎2 then 𝑋 − 𝜇
𝜎∼ N(0,1)
Rem. if 𝑋 ∼ N 𝜇, 𝜎2 then
𝑋 ∼ N 𝜇,𝜎2
𝑛
t-distribution and 2distribution [William Gosset]22
Prop. 3.
𝑋 ∼ N(0,1), 𝑌 ∼ Ga1
2,𝑛
2, independently.
Then, 𝑋
𝑌
𝑛
follows
𝑓 𝑥 =Γ
𝑛 + 12
𝑛𝜋 Γ𝑛2
1 +𝑥2
𝑛
−𝑛+12
−∞ < 𝑥 < ∞ .
𝑡-distribution
with 𝑛 degrees
of freedom
𝑡 = 𝑍𝜎
𝑠=
𝑍
𝑠2
𝜎2
=𝑍
1𝜎2
⋅σ𝑖=1𝑛 𝑋𝑖 − 𝑋
2
𝑛 − 1
=𝑍
1𝑛 − 1
σ𝑖=1𝑛 𝑋𝑖 − 𝑋
𝜎
2
t-distribution and 2distribution [William Gosset]23
𝑡 = 𝑍𝜎
𝑠=
𝑍
𝑠2
𝜎2
=𝑍
1𝜎2
⋅σ𝑖=1𝑛 𝑋𝑖 − 𝑋
2
𝑛 − 1
=𝑍
1𝑛 − 1
σ𝑖=1𝑛 𝑋𝑖 − 𝑋
𝜎
2
Thm.
𝑡 follows the 𝑡-distribution with 𝑛 − 1 degrees, i.e.,
𝑓𝑡 𝑥 =Γ
𝑛2
(𝑛 − 1)𝜋 Γ𝑛 − 12
1 +𝑥2
𝑛 − 1
−𝑛2
−∞ < 𝑥 < ∞
Student’s t distribution24
Wikipedia: Student’s t distribution
http://en.wikipedia.org/wiki/Student%27s_t-distribution
2 分布25
Wikipedia: Chi-squared distribution
http://en.wikipedia.org/wiki/Chi-squared_distribution
t-test (t検定)
Todays topics
• interval estimation (区間推定)
• hypothesis testing (仮説検定)
• t-test
• 2-test
estimation of (expect.)
Statistical inference27
Example 1
A clerk says “our eggs are big. 70[g] in average.”
You bought 6 eggs in a shop.
How large are eggs sold in this shop?
ത𝑋 = 66.3[g], s2 = 17.584[g2]
Is the clerk honest?
1 2 3 4 5 6
weight[g] 64.3 70.4 63.2 67.8 71.3 60.8
𝑡-test (𝑡検定)28
𝑡-test
Given samples 𝑋1 = 𝑎1, … , 𝑋𝑛 = 𝑎𝑛.
Q: Does a value 𝑏 estimate E[𝑋]?
Claim
If 1 < 𝛼 it rejects E 𝑋 = 𝑏
If 1 ≥ 𝛼 it fails to reject E 𝑋 = 𝑏
Since 𝑡: =ത𝑋−𝜇𝑠
𝑛
follows t distribution with degree n-1,
Pr null hypo. : E 𝑋 = 𝑏 = Pr 𝑋 − 𝑏 ≥ 𝑎 − 𝑏 ∣ 𝐸 𝑋 = 𝑏
= න−∞
−𝑎−𝑏
𝑠2/𝑛𝑓𝑡 𝑥 𝑑𝑥 + න
𝑎−𝑏
𝑠2/𝑛
∞
𝑓𝑡 𝑥 d𝑥 (1)
Student’s t-statistics (スチューデントのt統計量)29
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Let assume = 70.0 Suppose 2=18.0 for simplicity.
Let 𝑡: =ത𝑋−𝜇𝑠
𝑛
,
where 𝑠2 ≔σ𝑖=1𝑛 𝑋𝑖− ത𝑋 2
𝑛−1(unbiased estimator of 𝜎2).
Then 𝑡, follows t distribution with degree 𝑛 − 1
𝑓𝑡 𝑥 =Γ
𝑛 + 12
𝑛𝜋 Γ𝑛2
1 +𝑥2
𝑛
−𝑛+12
−∞ < 𝑥 < ∞ .
𝑍𝑛 ≔ത𝑋−𝜇𝜎
𝑛
in Cent. limit. Thm.
Statistical inference30
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Let assume = 70.0 Suppose 2=18.0 for simplicity.
Let 𝑡∗ (>0) satisfy
Pr −𝑡∗ ≤𝑋 − 𝜇𝑠𝑛
≤ 𝑡∗ = න−𝑡∗
𝑡∗
𝑓𝑡(𝑥)d𝑥 ≥ 0.95
… and we see that 𝑡∗ = 2.571 (see 𝑡-distribution table).
Statistical inference31
Example 1
A clerk says “our eggs are big. 70[g] in average.”
ത𝑋 = 66.3[g], s2 = 17.584[g2] for 6 eggs.
Let assume = 70.0 Suppose 2=18.0 for simplicity.
ത𝑋 = 66.3[g]
2=17.584
n = 6
z*=2.571
It fails to reject the null hypothesis = 70.0
with significant level 5%
(帰無仮説 = 70.0 は有意水準5%で棄却されない.)
𝑋 − 𝜇
𝑠2
𝑛
=66.3 − 70
17.5846
= 2.161 < 𝑡∗ = 2.571
2-test (2検定)
Todays topics
• interval estimation (区間推定)
• hypothesis testing (仮説検定)
• t-test
• 2-test
estimation of 2 (variance.)
2-test (2検定)33
2-test
Given samples 𝑋1 = 𝑎1, …𝑋𝑛 = 𝑎𝑛.
Q: Does a value 𝑐2 estimate Var[𝑋]?
Claim
If 2 < 𝛼 it rejects Var 𝑋 = c2
If 2 ≥ 𝛼 it fails to reject Var 𝑋 = c2
Since 𝑆:= σ𝑖=1𝑛 (𝑋𝑖− ത𝑋)2
𝜎2follows
2 distribution with n-1 degrees of freedom,
Pr null hypothesis: Var 𝑋 = 𝑐2 = Pr 𝑆 ≥ 𝑐2 ∣ Var 𝑋 = 𝑐2
= න𝑐2
∞
𝑓𝜒2 𝑥 d𝑥 (2)
2 分布34
Wikipedia: Chi-squared distribution
http://en.wikipedia.org/wiki/Chi-squared_distributionreject
2-test (2検定) Example35
2-test
Suppose the sample variance of weights of 10 balls is 0.35.
Is this smaller than the prescribed value 0.2?
Discuss with significant level 5%
Claim
It fails to reject the null hypothesis with significant level 5%.
(有意水準5%で帰無仮説は棄却されない)
𝑆:=
𝑖=1
𝑛(𝑋𝑖− ത𝑋)2
𝜎2=
𝑛 − 1 𝑠2
𝜎2=
10 − 1 × 0.35
0.2= 15.75 < 16.919
right 5%null hypothesis (帰無仮説)
Var[X] 0.2
2-test (2検定) Example36
2-test
Suppose the sample variance of weights of 100 balls is 0.26.
Is this smaller than the prescribed value 0.2?
Discuss with significant level 5%
𝑆:=
𝑖=1
𝑛(𝑋𝑖− ത𝑋)2
𝜎2=
𝑛 − 1 𝑠2
𝜎2=
100 − 1 × 0.26
0.2= 128.7 > 124.34
null hypothesis (帰無仮説)
Var[X] 0.2
Claim
It rejects the null hypothesis with significant level 5%.
(有意水準5%で帰無仮説は棄却される)
right 5%
Statistical Hypothesis Testing37
z-test: normal distribution
t-test: 𝑡 distribution, such as expectation
2-test: 2 distribution, such as variance
F-test: 𝐹 distribution, such as ratio of variance