sampling theory and some important sampling distributions

101
Sampling Theory and Some Sampling Theory and Some Important Sampling Important Sampling Distributions Distributions • 統統統統統統統統統統統統統統統統統統統統統統統統統 統統 (parameters) • 統統統統統統 統統統統統統統統 體。 •統μ σ 統統 統統統統統統統統統 、體。 • 統統統統 Sample statistics 統統統統統統統統 統統統統統統 統統統統統統統統統統統統統統統 統統統統統統統統 統統統統統 統統統統統統統統 統統統 統統統統統統統統統統統 ,。, 統統 統統統統統 一。

Upload: tyler-stuart

Post on 04-Jan-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Sampling Theory and Some Important Sampling Distributions. 觀念. 統計主要問題在於如何透過樣本的統計量來推估或檢證母體的 參數 (parameters) 。 參數 為描述母體某些特性的數值。 如 μ 、 σ 、母體中位數等皆為參數。 樣本統計 Sample statistics 是用來描述樣本的特性的數量,樣本統計為觀察到的樣本之函數,樣本的統計量隨著取樣的不同,會有不同的變化。因此,樣本統計量本身可以被視為是一隨機變數。. 母體參數,樣本統計量. 觀念. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sampling Theory and Some Important Sampling Distributions

Sampling Theory and Some Important Sampling Theory and Some Important Sampling DistributionsSampling Distributions

• 統計主要問題在於如何透過樣本的統計量來推估或檢證母體的參數 (parameters) 。

• 參數為描述母體某些特性的數值。• 如 μ 、 σ 、母體中位數等皆為參數。• 樣本統計 Sample statistics 是用來描述樣本的特性的數量,樣本統計為觀察到的樣本之函數,樣本的統計量隨著取樣的不同,會有不同的變化。因此,樣本統計量本身可以被視為是一隨機變數。

觀念觀念

Page 2: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 2

母體參數,樣本統計量母體參數,樣本統計量

• 一個樣本統計量( 如樣本平均數 ) 是隨機樣本的函數,其本身為一隨機變數

觀念觀念

22000001 xx

Population母體參數

x4

x49

x103

x354

x41

Sample 樣本平均數

x4

x42909

x1005

x31

x411

1x

2x

,

X 隨機變數

的特定值

X

Page 3: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 3

Introduction to Sampling DistributionIntroduction to Sampling Distribution

• 用於推估母體的參數( μ )的樣本統計量(如 X-bar ),稱為「估計式」 (an estimator) 。

觀念觀念

n

xx

•將實際抽樣所得到的樣本帶入估計式,得到的數值 ( 如 χ-bar) 稱為估計值 (estimate)

Page 4: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 4

抽樣分配抽樣分配

• 樣本的統計量為一隨機變數,樣本統計量的值隨著抽出樣本的不同而不同,每一個特定變量出現的機率呈某一機率分配,稱為樣本統計的抽樣分配 (sampling distribution) ,為多次抽樣結果的機率分佈。

Page 5: Sampling Theory and Some Important Sampling Distributions

抽樣分配抽樣分配

)( 1xf

)( 2xf

觀念觀念

22000001 xx

Population母體參數

x4

x49

x103

x354

x41

x4

x42909

x1005

x31

x411

1x

2x

,

樣本平均值的抽樣分配

x7

x43209

x1321

x3423

x4241 3x )( 3xf

Page 6: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 6

Very simple random sample Very simple random sample (VSRS)(VSRS)

• 樣本中每一個元素被抽到的機率皆相同• 每一個元素的期望值為母體平均數 μ• 每一個元素的標準差為母體標準差 σ

觀念觀念

22000001 xx

Population母體參數

,

P(x1) = P( x2) … = P(xn) = population distribution P(x)

E(x) = μ, Var(X) = σ2

Page 7: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 7

Independently and identically distributIndependently and identically distributed, i.i.d.ed, i.i.d.

• When X1, X2, …Xn are drawn from the same distribution and are independently distributed, they are said to be independently and identically distributed or i.i.d.

Page 8: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 8

樣本平均值樣本平均值的期望值與變異數的期望值與變異數

• 如果 X1, X2, …Xn 為 i.i.d. ,則對於每一個 X 而言都有相同的平均值( μ )及變異量( σ2)。

Page 9: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 9

?)( XE

)(1

21 nxxxn

X

)(1

)( 21 nxxxEn

XE

)(1

21 nxxxEn

)]()()([1

21 nxExExEn

unun

uuun

][1

][1

Page 10: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 10

?)( XVar

)](1

[)( 21 nxxxn

VarXVar

)]()()([1

212 nxVarxVarxVarn

tindependen are , 21 nxxx

][1 222

2

n

][1 2

2 n

n n

2

nX of deviation standard

Page 11: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 11

樣本平均值樣本平均值的期望值與變異數的期望值與變異數

uuXE X )(

nXVar

2

)(

nSE

X oferror Standard

Page 12: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 12

中央極限定理中央極限定理The Central Limit TheoremThe Central Limit Theorem

• 當 X 為來自於母體為常態分配的 i.i.d. 樣本時,無論樣本數大小,樣本平均數的抽樣分配必為常態。 ),(~

2

nNX

),(~ 2NX i

Page 13: Sampling Theory and Some Important Sampling Distributions

中央極限定理中央極限定理

Page 14: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 14

中央極限定理中央極限定理The Central Limit TheoremThe Central Limit Theorem

無論母體為何種分配,若隨機樣本的樣本數夠大 (n>30) ,則樣本平均數的抽樣分配會趨近於 (approximately) 常態分配

If n is largeapproximately

),(~2

nNX

Page 15: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 15

標準化分數與標準常態分配標準化分數與標準常態分配

• 設 X 為一呈常態分配的隨機變數,其平均數 = ,變異數為 2

• Z = (X - )/ , Z 為標準化變數,且 E(Z) = 0, Var(Z) = 1

觀念觀念

),(~ 2NX )1,0(~ NuX

Z

Page 16: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 16

中央極限定理中央極限定理The Central Limit TheoremThe Central Limit Theorem

)1,0(~2

N

n

X

•如果 X1, X2, …Xn 為 i.i.d. ,且 E(X)=μ 及 Var(X) =σ2 。

N ∞approximately

),(~2

nNX

Page 17: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 17

標準常態分配曲線下的面積標準常態分配曲線下的面積

P(Z<0) = 0.5

0.399

1.338104

f x( )

44 x4 3 2 1 0 1 2 3 4

0

0.2

0.4

觀念觀念

P(Z>0) = 0.5

P(Z< -z) = P(Z > z)

2/2

2

1)( xezf

Page 18: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 18

Interval EstimationInterval Estimation 區間估計區間估計

• 在前章中我們介紹了各種估計母體參數的方法 (point estimator) 。例如我們發現樣本平均數 X 為母體平均數 μ 的一個不偏估計式。

• 雖然平均而言, X 能正確的代表 μ ,但每一次觀察到的 X 不會剛好等於 μ ,而是隨著抽到的樣本不同有高有低:

觀念觀念

error sampling X

Page 19: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 19

Interval EstimationInterval Estimation 區間估計區間估計

• 因此除了點估計外,我們還想進一步知道從樣本中得到的估計值有多可靠,由於樣本的估計值本身也是一個隨機變數,不一定會剛好等於母體參數,因此我們問:估計值與母體參數有多接近?

觀念觀念

Page 20: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 20

Interval EstimationInterval Estimation 區間估計區間估計

• 在估計的問題中,我們希望估計式具有以下兩個性質:

• 1. 估計式為不偏估計 (unbiased estimator) ,即估計式不會系統性的高估或低估母體參數。

• 2. 我們希望估計式的抽樣分配集中於母體參數的周圍,即估計式的變異數愈小愈好。

Page 21: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 21

Interval EstimationInterval Estimation 區間估計區間估計

• 在估計的問題中,我們希望估計式具有以下兩個性質: θ

θ̂

Unbiased

small is )ˆvar(

Page 22: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 22

Interval EstimationInterval Estimation 區間估計區間估計

• 從估計式的抽樣分配中,我們可以建立一套系統性的方法來表達估計式的精確度。

觀念觀念

error) sampling(

ˆ

ˆ

稱為抽樣誤差

為估計的誤差,則

的估計式為假設

Page 23: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 23

例題例題

• 母體:加州 250,000 高三學生• 數學 SAT 分數• 樣本 500 ,得樣本平均值 =461

• How reliable is this estimate?

Page 24: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 24

例題例題

• 根據樣本平均數的抽樣分配 (sampling distribution) ,我們知道:

)500

,(~Nx

假設我們預先就知道 σ=100 ,則

5.4500

100) of ..( xESx

Page 25: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 25

• 根據經驗法則, .95 的機率,樣本平均數會落在以 μ 為中心點,向左右延伸兩個標準誤的界域內。

• (μ - 2 × 4.5 , μ + 2 × 4.5)

Page 26: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 26

Page 27: Sampling Theory and Some Important Sampling Distributions

To say that x-bar lies within 9 points of μis the same as saying that μ is within 9 points of x-bar

Page 28: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 28

Statistical confidenceStatistical confidence

• The language of statistical inference uses this fact about what would happen in the long run to express our confidence in the results of any one sample.

Page 29: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 29

Interval EstimationInterval Estimation 區間估計區間估計

• 我們通常以建構信賴區間 (confidence intervals) 來顯示估計式的準確度。

觀念觀念

的機率為一特定值。會包含母體參數所形成的區間使

和來建構出兩個數值用

,)ˆ,ˆ(

ˆˆˆ

21

21

Page 30: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 30

Interval EstimationInterval Estimation 區間估計區間估計 觀念觀念

1)(

,

),(

21

21

21

P

xxx n

為兩個統計量,使得假設為欲估計的母體參數,

樣本為由某母體抽出的隨機設

)condidence of level(

),()1(

intervals) confidence()%1(100

),(

21

21

的信賴度稱為信賴區間信賴區間的為則稱

Page 31: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 31

Confidence intervalConfidence interval

• A level C confidence interval for a parameter is an interval computed from sample data by a method that has probability C of producing an interval containing the true value of the parameter.

• We must find the number z* such that any normal distribution has probability C within ± z* standard deviation of its mean.

Page 32: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 32

Page 33: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 33

Value of ZValue of Zαα

• Let Z be a standard normal random variable and let αbe any number such that 0<α<1. Then zαdenotes the number for which

• P(Z z≧ α) = α

Page 34: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 34

Value of ZValue of Zαα

• 例題: α=.025 ,求 zα?

• P(Z z≧ α) =.025

0 zα

Area=.025

Area=1-.025=0.975

zα=1.96

Page 35: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 35

Value of ZValue of Zαα

• 例題:求 z.05?

• P(Z z≧ .05) =.05

0 z.05

Area=.05

Area=1-.05=0.95

zα=1.645

Page 36: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 36

Value of ZValue of Zαα

• 例題:求 z.005?

• P(Z z≧ .005) =.005

0 z.005

Area=.005

Area=1-.005=.995

zα=2.58

Page 37: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 37

Page 38: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 38

Value of ZValue of Zαα

• P(Z z≧ α/2) =α/2 • P(Z -z≦ α/2) =α/2 • P(-zα/2 Z z≦ ≦ α/2) =(1-α)

0.399

1.338104

f x( )

44 x4 3 2 1 0 1 2 3 4

0

0.2

0.4

α/2

1-α/2-α/2

=1-α

Page 39: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 39

Confidence intervals for the mean with Confidence intervals for the mean with know population varianceknow population variance

• 假設我們從 N(μ, σ2) 的母體中抽取樣本數為 n 的樣本。其樣本平均數的抽樣分配為 :

),(~2

nNX

)1,0(~/

Nn

uXZ

Page 40: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 40

Confidence intervals for the mean with Confidence intervals for the mean with know population varianceknow population variance

)/

(1

)(1

2/2/

2/2/

zn

uXzP

zZzP

• 根據先前的結果:

)//( 2/2/ nzuXnzP

)//( 2/2/ nzXunzXP

)//( 2/2/ nzXunzXP

Page 41: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 41

Confidence intervals for the mean with Confidence intervals for the mean with know population varianceknow population variance

• 這個結果告訴我們:• 由這兩個數值所構成的隨機區間

)/,/( 2/2/ nzXnzX

會包含母體參數 μ 的機率為 (1-α)

Page 42: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 42

Level of ConfidenceLevel of Confidence

• The level of confidence (C=1-α) of a confidence interval measures the probability that a population parameter will be contained in an interval calculated after a random sample has been selected from a population.

• 信賴度衡量從母體中抽取隨機樣本所建構出的信賴區間會含括母體參數的機率。

• α 為信賴區間沒有正確涵蓋母體參數的機率。如 α=.05 ,則信賴度 1-α=.95 ,表示有 5% 的機率信賴區間無法包含母體參數。

Page 43: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 43

Level of ConfidenceLevel of Confidence

• 一般常用「母體參數會落在信賴區間的機率」來定義信賴度是一種錯誤的說法。– 母體參數只有一個,不會變來變去– 所建構出的區間也是一個特定的區間。– 這個特定區間不是有包含母體參數,不然就是沒有包含母體參數,這不是一個機率的問題。

Page 44: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 44

Confidence intervals for the mean with Confidence intervals for the mean with know population varianceknow population variance

• Suppose we take random sample of n observations from a normal population with mean u and variance σ2. If σ2is known and the observed sample mean is x, then the confidence interval for the mean with a level of confidence 100(1-α)% is given by:

)/,/( 2/2/ nzXnzX

• Where zα/2is the number for which

• P(Z z≧ α/2) =α/2

Page 45: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 45

例題例題

• 學校想估計去年畢業的學生第一年的年薪。假設薪資分佈為常態分配,且母體的標準差為 $2000 。取隨機樣本 25名校友得到平均薪資為 $19,500 ,求 95% 的信賴區間。

)25/200096.119500,25/200096.119500(

• σ= $2000, n=25, x=$19500• 1-α=95%, α= .05 , α/2=.025, zα/2=1.96

)20284 ,18716(

Page 46: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 46

例題例題

• 我們之所以計算出 95% 的信賴區間為 (18,716, 20,284)完全是因為樣本的平均數為$19,500 。如果我們再抽取一個 25人的樣本,則可能得到不同的區間。

• 如果我們一直不斷的重複取樣本 1000 次,則有 950 次 (95%) 所建構出的信賴區間會含括母體的平均數。

Page 47: Sampling Theory and Some Important Sampling Distributions

0.399

1.338104

f x( )

44 x4 3 2 1 0 1 2 3 4

0

0.2

0.4

μ

的抽樣分配X母體參數:

Mean = μ

Variance =σ2

每個區間 =

nx

96.1

間為隨機區間變化,所以區的平均值會有因為每個樣本

Page 48: Sampling Theory and Some Important Sampling Distributions
Page 49: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 49

Confidence intervals for the mean with Confidence intervals for the mean with know population varianceknow population variance

• 母體參數 u 不是一個隨機函數,不會隨著樣本的不同而有差異。

• X 為會隨著樣本而變的隨機函數,因此信賴區間也會隨著樣本的不同而有差異。

• 100(1-α)% 的機率,上述的隨機區間會含括母體參數 u 。

)//(1 2/2/ nzXunzXP

Page 50: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 50

Confidence intervals for the mean with Confidence intervals for the mean with know population varianceknow population variance

• 一般所謂 95% 的信賴區間估計,乃是表示重複抽取樣本數為 n 的所有可能樣本所建立的全部區間估計中,有 95% 的區間將會包含真正的母體平均數,而僅有 5% 沒有包含母體平均數。

• 但實際上我們通常僅抽取一個樣本,且 u 通常為未知,因此無法確切知道此樣本是否包含 u ,但我們可以說此區間有 95% 的機率會包含 u.

Page 51: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 51

Formula for commonly constructed Formula for commonly constructed confidence intervalsconfidence intervals

• 經常在使用的信賴區間

)96.1 ,96.1(n

xn

x

(1-α) α α/2 zα/2

0.90 0.10 0.050 1.650.95 0.05 0.025 1.960.99 0.01 0.005 2.58

Level of Confidence

)58.2 ,58.2(n

xn

x

)645.1 ,645.1(n

xn

x

Page 52: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 52

Desirable Properties of Confidence Desirable Properties of Confidence IntervalsIntervals

• 好的信賴區間有兩個特性:• 信賴度愈高愈好 The interval should hav

e a high level of confidence (1-)

• 信賴區間愈小愈好 The interval should have narrow width ( precision)

Page 53: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 53

Page 54: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 54

Margin of Error-Margin of Error-The width of a confidence interval for The width of a confidence interval for uu

• 母體平均數的信賴區間寬度 W :

nzW

2/2

• 信賴區間的寬度 W決定於幾個因素:• (1) 信賴區間的信賴度 (1-α)

• (2) 母體標準差• (3) 樣本規模 n

Page 55: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 55

Comparing Width of Confidence IntervalsComparing Width of Confidence Intervals

• Suppose we take a random sample of size n from population having known variance 2. Construct 99%, 95%, 90% CI for the population mean and compare their widths.

nW

)58.2(21

nW

)96.1(22

nW

)645.1(23

32.12

1 W

W

19.13

2 W

W

W1比W2 的寬度多 32%

W2比W3 的寬度多 19%

Page 56: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 56

Comparing Width of Confidence IntervalsComparing Width of Confidence Intervals

• To decrease the width of confidence interval, we must either use a smaller level of confidence (1-), or increase the sample size n.

99%95%90%

80%

50%

n

58.2

n

96.1

n

64.1

n

28.1

n

67.

Confidence coefficient

Width of CI

Page 57: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 57

例題例題

• 學校想估計去年畢業的學生第一年的年薪。假設薪資分佈為常態分配,且母體的標準差為 $2000 。取隨機樣本 25名校友得到平均薪資為 $19,500 ,求 99% 的信賴區間 ,並與 95%CI做比較。

)25/200058.219500,25/200058.219500(

• σ= $2000, n=25, x=$19500• 1-α=99%, α= .01 , α/2=.005, zα/2=2.58

2064$ )20532 ,18486( 1 W

1568$ )20284 ,18716( 2 W=1.32

Page 58: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 58

Confidence intervals for large Confidence intervals for large samplessamples

• 建構 CI需有兩個條件:• (1) 母體必須為常態分配。• (2) 必須知道母體的變異數等於多少,即母體的變異數為已知數。

• 當樣本數 n30 ,根據中央極限定律,樣本平均數的抽樣分配會趨近於常態分配,且樣本標準差會愈來愈趨近於母體標準差,所以條件 (1)(2) 皆能滿足。

Page 59: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 59

例題例題• 郵局的人事部門想要瞭解郵差請病假的情況,取樣 100人來觀察,母體的分配及標準差皆為未知數,假設樣本平均數為 8.2 , s=2.7天,建構 95% CI 。

05.,95.)1(,7.2,2.8,100 sXn

30,96.1,025.2/ 2/ nz

),( 2/2/n

szx

n

szxCI

)7292.8,6708.7()100

7.296.12.8,

100

7.296.12.8( CI

Page 60: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 60

Student’s Student’s tt distribution distribution

• 先前透過 Z-score 來建構 CI :• (1) 母體必須為常態分配,母體的變異數為已知數。

• (2) n30

• 當母體標準差為未知數,且樣本數很小時,如何建構 CI ?

Page 61: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 61

• 我們必須以樣本的標準差 s 來估計母體的標準差 σ

• 此時我們所建構出的信賴區間會比知道母體標準差所建構出來的信賴區間更「不正確」,因此必須擴大信賴區間才能在相同的信賴水準底下涵蓋母體的參數。

Page 62: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 62

Student’s Student’s tt distribution distribution

• 若母體~ N(, 2) ,則

),(~2

nNX

)1,0(~/

Nn

uXZ

• 若母體 2未知,則以 S 來取代,我們得到 t-score:

nS

uXt

/

• 樣本數愈大, S 愈接近, t 分配愈接近標準常態分配 Z:

Page 63: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 63

tt 分配的一些特性分配的一些特性

• t 分配為中心點為零,介於 - 至的對稱分配 .

• t 分配的形狀為類似標準常態分配的鐘形分配

• t distribution 的平均值為 0.

• t 分配的機率密度函數決定於參數 (nu), 即自由度 (degree of freedom) 。建構平均值的信賴區間時,自由度為樣本數減一 degrees of freedom is =(n-1) 。

Page 64: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 64

Characteristics of Characteristics of tt distribution distribution

• t distribution 的變異數為 /(-2) for >2 ,其值永遠大於 1 。 v 愈大(樣本越大),變異數越接近 1 ,其形狀越接近標準常態分配。

Page 65: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 65

Characteristics of Characteristics of tt distribution distribution

• t 分配是一群機率分配的組合,不同自由度對應不同的 t distribution 的密度函數,由於變異數較標準常態分配大,所以形狀較為矮胖。

0.399

1.338104

f x( )

44 x4 3 2 1 0 1 2 3 4

0

0.2

0.4Standard normal (d.f.=)

d.f. =4

d.f. =2

d.f. =1

Page 66: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 66

Value of tValue of t,,

• The symbol t,denotes the value of t such that the area to its right is and t has degree of freedom. The value t, satisfies the equation:

• P(t > t, )=

• Where the random variable t has the t distribution with degrees of freedom.

Page 67: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 67

Value of tValue of t,,

• P(t > t0.05,13 )=0.05找出 t 值?

Page 68: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 68

例題例題• Consider the t distribution having =9 degrees

of freedom. Find the value t.05, 9 such that the area in the right tail of the t distribution is .05.

0 t.05=1.83

Area = .05

t distribution with d.f. = 9

Page 69: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 69

例題例題• Consider the t distribution having =9 degrees of

freedom. Find the value t.025, 9 and -t.025, 9 such that each tail of the t distribution contains area .025.

0 t.025= 2.262

Area = .025

t distribution with d.f. = 9

-t.025= -2.262

Page 70: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 70

例題例題• Consider the t distribution having =20 degrees

of freedom. Find the value t.025, 20 such that the right tail of the distribution contains area .025.

0 t.025= 2.086

Area = .025

t distribution with d.f. = 20

Page 71: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 71

自由度自由度• X1 X2兩個隨機變數的可能數值組合有無限多種。如果我們不作任何限制,則可以任意選定任何數值給 X1及 X2。

• 但如果我們規定:

• 則 X1 X2的組合必須是 (3, 7)(2, 8)(5,5)(6,4) 等• 當 X1決定之後,我們沒有自由空間可以決定

X2的數值,也就是我們的「自由度」只有 (2-1) 個

5)(2

121 XXX

Page 72: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 72

自由度自由度

• 同理,上面兩個統計量都有一樣的限制,當知道 n-1 個數值之後,最後一個數值已經被決定了,所有自由度僅有 (n-1) 。

• 統計學上的自由度即是指所有變數中,其數值可以自由選定之變數的個數,等於

• 統計量所涉及的隨機變數個數減去加諸於該統計量的限制個數。

2

1

)( XX i

n

i

)...(1

21 nXXXn

X

Page 73: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 73

Confidence intervals for the mean Confidence intervals for the mean with unknown population variancewith unknown population variance• 若母體~ N(, 2) ,則

),(~2

nNX

)1,0(~/

Nn

uXZ

• 若母體 2未知,則以 S 來取代,我們得到 t-score:

nS

uXt

/

has the t distribution with v = (n-1) degrees of freedom.

Page 74: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 74

Constructing confidence intervals Constructing confidence intervals using the using the tt distribution distribution

• The area to the right of tα/2,υis α/2 for the t distribution having v degrees of freedom. Similarly, the area to the left of -tα/2,υ is α/2 . Thus, we obtain:

)/

(1

)(1

,2/,2/

,2/,2/

vv

vv

tns

uXtP

tttP

Page 75: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 75

Constructing confidence intervals Constructing confidence intervals using the using the tt distribution distribution

)/

(1 ,2/,2/ vv tns

uXtP

)(1 ,2/,2/ nStuX

nStP vv

)(1 ,2/,2/ nStXu

nStXP vv

Page 76: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 76

Constructing confidence intervals Constructing confidence intervals using the using the tt distribution distribution

)(1 ,2/,2/ nStXu

nStXP vv

這個結果告訴我們,如果從常態分配中抽取樣本,則母體 u 在 1-α 的機率下會落於以下區間

),( ,2/,2/ nStX

nStX vv

Page 77: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 77

Confidence interval for the mean of a normal Confidence interval for the mean of a normal population with population with unknown population varianceunknown population variance

Suppose we take a random sample of n observations from a normal population with mean u and unknown variance σ2. If the observed sample mean is x and the observed sample standard deviation is s, the confidence interval for the mean having level of confidence 100(1-α)% is given by

),( ,2/,2/ nStX

nStX vv

定義定義

Page 78: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 78

例題例題一工程師要估計某種鋼鐵的平均強度,假設該鋼條的強度為常態分配,他做了四個試驗,得到的強度如下 844, 847, 845, 844 ,計算該鋼條平均強度的 95% 信賴區間。

例題例題

8454

844845847844

X

2])844845(...)845844[(2

1)(

1

1 2222

i i xxn

S

)42845,4

2845( 3,025.03,025.0 tt

Page 79: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 79

兩種信賴區間的比較兩種信賴區間的比較

由 t 值所建構出的 CI 的區間比由 Z-score 所建構出的 CI 區間要寬,因為母體的變異數必須估計,誤差較大。

樣本數愈大, CI 的寬度愈小。因為

(1) n 在分母

(2) t 值隨著 degree of freedom 的增加而減小。

觀念觀念

Page 80: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 80

兩種信賴區間的比較兩種信賴區間的比較

CI 隨著 d.f.增加而減小的情形:

觀念觀念

sample size d.f. 95% CIn n-15 410 920 1930 29

∞ ∞

)/(776.2 nsx

)/(262.2 nsx

)/(093.2 nsx

)/(045.2 nsx

)/(96.1 nsx 當 d.f. 大於 120 時,用 t 值所計算的 CI 與用標準常態分配所計算出的 CI幾乎相同。

Page 81: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 81

例題例題N=121, X = $20,000 S=$4,000 construct two CI, one using t, the other using z.

υ= n-1 =120, t0.025, 120 = 1.984

例題例題

),( ,2/,2/ nStX

nStX vv

)121

4000984.1000,20,121

4000984.120000(

)121

400096.1000,20,121

400096.120000( 值用z

Page 82: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 82

例題例題n=10, we want to construct 95% IC using z and t.

If the variance is known, we use z =1.96

If the variance is unknown, we use t.025, 9 = 2.262

2.262/1.96=15%. The confidence interval based on the t value will be 15% wider than that based on the z value.

例題例題

Page 83: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 83

One-sided confidence intervals for One-sided confidence intervals for the meanthe mean

• Suppose that we wish to find the lower confidence limit (LCL) such that the probability (1-)that u exceeds LCL. The one-sided interval (LCL, ) is a left-sided confidence interval. The lower confidence limit is given by

nzxLCL

• Suppose that we wish to find the upper confidence limit (UCL) such that the probability (1-)that u is less than UCL. The one-sided interval (-, UCL) is a right-sided confidence interval. The upper confidence limit is given by

nzxUCL

Page 84: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 84

One-sided confidence intervals for One-sided confidence intervals for the meanthe mean

• 單邊信賴區間的意義:假設重複取樣本數為 n 的隨機樣本,每次計算 (LCL, ) ,則在所有樣本所建構出的左邊信賴區間中,將有 1- 的機率會包含 u 。

Page 85: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 85

One-sided confidence intervals for One-sided confidence intervals for the meanthe mean

• 郵局的人事部門想要瞭解郵差請病假的情況,取樣100人來觀察,母體的分配及標準差皆為未知數,假設樣本平均數為 8.2 , s=2.7天,建構母體參數u 的單(左)邊 95% 信賴區間。

05.,95.)1(,7.2,2.8,100 sXn,645.1z

75585.7100

7.2645.12.8

),(

n

szxLCL

95% 的 機率 (7.7558, ∞) 會包含母體平均值 u

Page 86: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 86

One-sided confidence intervals for One-sided confidence intervals for the meanthe mean

Take a random sample of n observations from some normal population having unknown mean u and unknown standard deviation σ.

Suppose that we wish to find the lower confidence interval (LCL, ∞) is a left-sided confidence interval.

The lower confidence limit is given by:nstxLCL /

Suppose that we wish to find the upper confidence interval (-∞, UCL) is a right-sided confidence interval

nstxLCL /

Page 87: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 87

One-sided confidence intervals for One-sided confidence intervals for the meanthe mean

n=10, σ = unknown, x=14.5, s = 2.5. Construct 95% left-sided CI for the population mean u.

例題例題

nstxLCL /

The 95% left-sided confidence interval for u is (13.051, ∞)

051.1310

5.2833.15.14

Page 88: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 88

Determining the sample sizeDetermining the sample size 決定樣本大小決定樣本大小

Confidence interval for the mean:

Suppose an individual is interested in estimating the mean of a population having a known variance 2. How large a sample size must be taken if the investigator wants the probability to be (1-) that the sampling error |X - u| is less than some amount D?

Page 89: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 89

Determining the sample sizeDetermining the sample size 決定樣本大小決定樣本大小

信賴區間是以 X 為中心,向左右各伸展:

)/,/( 2/2/ nzXnzX

)//(1 2/2/ nzXunzXP

n

zD

2/

D

zn

2/2

2

22/

D

zn

將 D固定,求 n=?

Page 90: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 90

例題例題An economist wants to estimate the mean annual income of households in a particular congressional district. It is assumed that the population standard deviation is =$4,000. The economist wants the probability to be .95 that the sample mean will be within a D = $500 of the true mean u. How large a sample is required?

500 D4000 96.1 95.1 2/ z

2

2

22/

D

zn

86.245500

)000,4(96.12

2

2

n

Page 91: Sampling Theory and Some Important Sampling Distributions

x

f x( )

f x( )

x

nXVar

XE

的抽樣分配X

2

)(

)(

複習

母體分配

根據中央極限定律,我們知道樣本夠大時,樣本平均數的抽樣分配為常態分配

Page 92: Sampling Theory and Some Important Sampling Distributions

0.399

1.338104

f x( )

44 x4 3 2 1 0 1 2 3 4

0

0.2

0.4

μ

的抽樣分配X母體參數:

Mean = μ

Variance =σ2

每個區間 =

nx

96.1

間為隨機區間變化,所以區的平均值會有因為每個樣本

Page 93: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 93

複習複習

設( x1,x2…xn) 為由某母體抽出的隨機樣本,為此母體之參數,假設 T1, T2 為兩個統計量,使得

1)( 21 TTP

則稱( T1, T2) 為的 100(1-)% 信賴區間,而 (1-) 為信賴度。

T1 T2

Page 94: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 94

複習複習

的估計式,為參數設 ˆ n

1))ˆ( Dp n若

(精確度、抽樣誤差)誤差界線的估計為以則稱 )%1(100 ˆ nD

T1 T2n̂

D

Page 95: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 95

複習複習

母體平均數 u 之區間估計:

當母體標準差 σ已知,且 n>30 ,則

)/,/( 2/2/ nzXnzX

為母體平均數 u 的 100(1-)% 的信賴區間

Page 96: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 96

Confidence intervals for the mean Confidence intervals for the mean with unknown population variancewith unknown population variance• 若母體~ N(, 2) ,則

),(~2

nNX

)1,0(~/

Nn

uXZ

• 若母體 2未知,則以 S 來取代,我們得到 t-score:

nS

uXt

/

has the t distribution with v = (n-1) degrees of freedom.

Page 97: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 97

複習複習

母體平均數 u 之區間估計:

當母體標準差 σ未知則

為母體平均數 u 的 100(1-)% 的信賴區間

)(1 ,2/,2/ nStXu

nStXP vv

),( ,2/,2/ nStX

nStX vv

Page 98: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 98

複習複習

母體平均數 u 之點估計:

一般以 X 來估計 u ,也就是取 X做為 u 的估計式,因此 X 為 u 之點估計值。

當樣本數 n已知,且 n>30 ,以 X 估計 u 的 100(1-)%誤差界線為

n

zD

2/

當樣本數未定,但 n>30 ,若誤差界線 D已知,則樣本數為 2

2

22/

D

zn

Page 99: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 99

複習複習

一個日光燈製造公司生產的燈管壽命近似常態分配,它的標準差為 100 小時。某品管人員隨機抽樣 32燈管,經使用後觀察其壽命,得平均壽命為 1200 小時

(1)求該公司生產的每支燈管的平均壽命之估計值。平均壽命 u 之點估計值為 x=1200 小時

Page 100: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 100

複習複習

(2)求 (1) 中的估計之 95% 誤差界線?

648.3432

10096.1

2

n

ZD

誤差界線

(3)若希望(2)中的 95% 誤差界線為 20 小時,問此題的樣本夠不夠大?若不夠大應再抽多少樣本?

9704.96)100()20

96.1( 22

2

2

22/ 取

D

zn

故應再取 97-32=65支

Page 101: Sampling Theory and Some Important Sampling Distributions

©蘇國賢 2005社會統計(上) Page 101

複習複習

(4)求該公司生產的每支燈管平均壽命的 90% 及 95% 信賴區間

)/,/( 2/2/ nzXnzX

)32/100645.11200,32/100645.11200(

:%90

CIu之

)32/10096.11200,32/10096.11200(

:%95

CIu之