chapter 5. joint probability distributions and random sample weiqi luo ( 骆伟祺 ) school of...

Chapter 5. Joint Probability Distributions and Random Sample

Weiqi Luo (骆伟祺 )School of Software

Sun Yat-Sen UniversityEmail ： [email protected] Office ： # A313

mailto:[email protected]

School of Software

5.1. Jointly Distributed Random Variables 5.2. Expected Values, Covariance, and Correlation 5. 3. Statistics and Their Distributions 5.4. The Distribution of the Sample Mean 5.5. The Distribution of a Linear Combination

2

Chapter 5: Joint Probability Distributions and Random Sample

School of Software

The Joint Probability Mass Function for Two Discrete Random Variables

Let X and Y be two discrete random variables defined on the sample space S of an experiment. The joint probability mass function p(x,y) is defined for each pair of numbers (x,y) by

3

5.1. Jointly Distributed Random Variables

)(),( yYandxXPyxp

School of Software

Let A be any set consisting of pairs of (x,y) values. Then the probability P[(X,Y)∈A] is obtained by summing the joint pmf over pairs in A:

Two requirements for a pmf


4

Ayx

yxpAYXp),(

),(]),[(

x y

yxpyxp 1),(0),(

School of Software

Example 5.1 A large insurance agency services a number of customers who have

purchased both a homeowner’s policy and an automobile policy from the agency. For each type of policy, a deductible amount must be specified. For an automobile policy, the choices are $100 and $250, whereas for a homeowner’s policy the choices are 0, $100, and $200.

Suppose an individual with both types of policy is selected at random from the agency’s files. Let X = the deductible amount on the auto policy, Y = the deductible amount on the homeowner’s policy


5

p(x,y)

x100

250

0 100 200y

0.20 0.10 0.20

0.05 0.15 0.30

Joint Probability Table

School of Software

Example 5.1 (Cont’)


6

p(x,y)

x100

250

0 100 200y

0.20 0.10 0.20

0.05 0.15 0.30

p(100,100) =P(X=100 and Y=100) = 0.10

P(Y ≥ 100) = p(100,100) + p(250,100) + p(100,200) + p(250,200) = 0.75

School of Software

The marginal probability mass function The marginal probability mass functions of X and Y,

denoted by pX(x) and pY(y), respectively, are given by


7

( ) ( , ); ( ) ( , )X Yy x

p x p x y p y p x y

Y1 Y2 … Ym-1 Ym

X1 p1,1 p1,2 p1,m-1 p1,m

X2 p2,1 p2,2 p2,m-1 p2,m

…

Xn-1 pn-1,m pn-1,m pn-1,m pn-1,m

Xn pn,m pn,m pn,m pn,m

pY

pX

School of Software

Example 5.2 (Ex. 51. Cont’)

The possible X values are x=100 and x=250, so computing row totals in the joint probability table yields


8

p(x,y)

x100

250

0 100 200y

0.20 0.10 0.20

0.05 0.15 0.30

px(100)=p(100,0 )+p(100,100)+p(100,200)=0.5

px(250)=p(250,0 )+p(250,100)+p(250,200)=0.5

0.5, 100,250( )

0,x

xp x

otherwise

School of Software



9

p(x,y)

x100

250

0 100 200y

0.20 0.10 0.20

0.05 0.15 0.30

py(0)=p(100,0)+p(250,0)=0.2+0.05=0.25

py(100)=p(100,100)+p(250,100)=0.1+0.15=0.25

py(200)=p(100,200)+p(250,200)=0.2+0.3=0. 5

0.25, 0,100

( ) 0.5, 200

0,Y

y

p y y

otherwise

P(Y ≥ 100) = p(100,100) + p(250,100) + p(100,200) + p(250,200) = pY(100)+pY (200) =0.75

School of Software

The Joint Probability Density Function for Two Continuous Random Variables

Let X and Y be two continuous random variables. Then f(x,y) is the joint probability density function for X and Y if for any two-dimensional set A

Two requirements for a joint pdf

1. f(x,y) ≥ 0; for all pairs (x,y) in R2

2.


10

[( , ) ] ( , )A

P X Y A f x y dxdy

1),( dxdyyxf

School of Software

In particular, if A is the two-dimensional rectangle {(x,y):a ≤ x ≤ b, c ≤ y ≤ d},then


11

f(x,y)

x

ySurface f(x,y)

A = Shaded rectangle

[( , ) ] ( , ) ( , )b d

a cP X Y A P a X b c Y d f x y dydx

School of Software

Example 5.3 A bank operates both a drive-up facility and a walk-up window.

On a randomly selected day, let X = the proportion of time that the drive-up facility is in use, Y = the proportion of time that the walk-up window is in use. Let the joint pdf of (X,Y) be


12

otherwise

yxyxyxf

0

10,10)(5

6),(

2

1. Verify that f(x,y) is a joint probability density function;

2. Determine the probability )4

10,

4

10( YXP

School of Software

Marginal Probability density function The marginal probability density functions of X and Y,

denoted by fX(x) and fY(y), respectively, are given by


13

( ) ( , )

( ) ( , )

X

Y

f x f x y dy for x

f y f x y dx for y

X

Y

Fixed y

Fixed x

School of Software

Example 5.4 (Ex. 5.3 Cont’) The marginal pdf of X, which gives the probability distribution of

busy time for the drive-up facility without reference to the walk-up window, is

for x in (0,1); and 0 for otherwise.

Then


14

1 2

0

6 6 2( ) ( , ) ( )

5 5 5Xf x f x y dy x y dy x

26 30 1

( ) 5 50

Y

y yf y

otherwise

3/4

1/4

1 3( ) ( ) 0.46254 4 YP Y f y dy

School of Software

Example 5.5 A nut company markets cans of deluxe mixed nuts containing

almonds, cashews, and peanuts. Suppose the net weight of each can is exactly 1 lb, but the weight contribution of each type of nut is random. Because the three weights sum to 1, a joint probability model for any two gives all necessary information about the weight of the third type. Let X = the weight of almonds in a selected can and Y = the weight of cashews. The joint pdf for (X,Y) is


15

otherwise

yxyxxyyxf

0

1,10,1024),(

School of Software



16

(0,1)

(1, 0)

otherwise

yxyxxyyxf

0

1,10,1024),(

1: f(x,y) ≥ 0

1 1

0 0

1 2

0

2 : ( , ) ( , )

{ (24 ) }

12 (1 ) 1

D

x

f x y dydx f x y dydx

xy dy dx

x x dx

x

(x,1-x)

School of Software

x+y=0.5



17

(0,1)

(1, 0)

Let the two type of nuts together make up at most 50% of the can, then A={(x,y); 0≤x ≤1; 0 ≤ y ≤ 1, x+y ≤ 0.5}

0.5 0.5

0 0

(( , ) ) ( , )

{ (24 ) }

0.625

A

x

P X Y A f x y dydx

xy dy dx

School of Software



18

(0,1)

(1, 0)x

(x,1-x)

0.5

0

1 2

0

( ) ( , )

(24 ) 12 (1 ) ,0 1

0,

X

x

f x f x y dy

xy dy x x x

otherwise

The marginal pdf for almonds is obtained by holding X fixed at x and integrating f(x,y) along the vertical line through x:

School of Software

Independent Random Variables Two random variables X and Y are said to be

independent if for every pair of x and y values,

Otherwise, X and Y are said to be dependent.


19

)()(),( ypxpyxp YX when X and Y are discrete

)()(),( yfxfyxf YX when X and Y are continuous

Namely, two variables are independent if their joint pmf or pdf is the product of the two marginal pmf’s or pdf’s.

School of Software

Example 5.6 In the insurance situation of Example 5.1 and 5.2

So, X and Y are not independent.


20

(100,100) 0.1 (0.5)(0.25) (100) (100)X Yp p p

p(x,y)

x100

250

0 100 200y

0.20 0.10 0.20

0.05 0.15 0.30

School of Software

Example 5.7 (Ex. 5.5 Cont’) Because f(x,y) has the form of a product, X and Y

would appear to be independent. However, although


21

3 3 9 9 9( ) ( ) , ( , ) 04 4 16 16 16X Yf f f x y

1 2

0( ) (24 ) 12 (1 )

x

Xf x xy dy x x

1 2

0( ) (24 ) 12 (1 )

y

Yf y xy dx y y

By symmetry

School of Software

Example 5.8 Suppose that the lifetimes of two components are independent of

one another and that the first lifetime, X1, has an exponential distribution with parameter λ1 whereas the second, X2, has an exponential distribution with parameter λ2. Then the joint pdf is

Let λ1 =1/1000 and λ2=1/1200. So that the expected lifetimes are 1000 and 1200 hours, respectively. The probability that both component lifetimes are at least 1500 hours is


22

1 1 2 2

1 2

1 2 1 21 2 1 2

0, 0( , ) ( ) ( )

0

x x

X X

e x xf x x f x f x

otherwise

1 2 1 2(1500 ,1500 ) (1500 ) (1500 )P X X P X P X

School of Software

More than Two Random Variables

If X1, X2, …, Xn are all discrete rv’s, the joint pmf of the variables is the function

If the variables are continuous, the joint pdf of X1, X2, …, Xn is the function f(x1, x2, …, xn) such that for any n intervals [a1, b1], …, [an, bn],


23

p(x1, x2, …, xn) = P(X1 = x1, X2 = x2, …, Xn = xn)

1

1

1 1 1 1 1( ,..., ) ... ( ,..., ) ...n

n

bb

n n b n n

a a

P a X b a X b f x x dx dx

School of Software

Independent

The random variables X1, X2, …Xn are said to be independent if for every subset Xi1, Xi2,…, Xik of the variable, the joint pmd or pdf of the subset is equal to the product of the marginal pmf’s or pdf’s.


24

School of Software

Multinomial Experiment An experiment consisting of n independent and identical trials, in which each trial can

result in any one of r possible outcomes. Let pi=P(Outcome i on any particular trial), and define random variables by Xi=the number of trials resulting in outcome i (i=1,…,r). The joint pmf of X1,…,Xr is called the multinomial distribution.

Note: the case r=2 gives the binomial distribution.


25

11 1 2

1 21

!... , 0,1... ...

( !)( !)...( !)( ,..., )

0

x xrr i r

rr

np p x withx x x n

x x xp x x

School of Software

Example 5.9

If the allele of each of then independently obtained pea sections id determined and p1=P(AA), p2=P(Aa), p3=P(aa), X1= number of AA’s, X2=number of Aa’s and X3=number of aa’s, then

If p1=p3=0.25, p2=0.5, then


26

1 2 31 2 3 1 2 3 1 2 3

1 2

10!( , , ) , 0,1,.. 10

( !)( !)...( !)x x x

ir

p x x x p p p x andx x xx x x

1 2 3( 2, 5, 3) (2,5,3) 0.0769P x x x p

School of Software

Example 5.10 When a certain method is used to collect a fixed volume of

rock samples in a region, there are four resulting rock types. Let X1, X2, and X3 denote the proportion by volume of rock types 1, 2 and 3 in a randomly selected sample. If the joint pdf of X1,X2 and X3 is


27

1 2 3 1 2 3 1 2 31

(1 ),0 1,0 1,0 1, 1( , 2, 3)

0,

kx x x x x x x x xf x x x

otherwise

k=144.

1

1 1( , 2, 3) 1, : , 1,2,3i

D

f x x x D x i

2

1 2 1 2( , 2, 3) 0.6066, : 0.5D

f x x x D X X

School of Software

Example 5.11 If X1, …,Xn represent the lifetime of n components, the

components operate independently of one another, and each lifetime is exponentially distributed with parameter, then


28

1 21 2

1 2

( , ,... ) ( )( )...( )

, 0; 0;..., 0;

0,

nxx xn

xinn

f x x x e e e

e x x x

otherwise

School of Software

Example 5.11 (Cont’) If there n components constitute a system that will fail

as soon as a single component fails, then the probability that the system lasts past time is

therefore,


29

1

1 2 1 2 1

1

( , ,..., ) ... ( , ,..., ) ...

( )...( )n

n n n

t t

xx n tn

t t

P X t X t X t f x x x dx dx

e dx e dx e

( ) 1 , 0n tP systemlifetime t e fort

School of Software

Conditional Distribution Let X and Y be two continuous rv’s with joint pdf f(x,y) and

marginal X pdf fX(x). Then for any X values x for which fX(x)>0, the conditional probability density function of Y given that X=x is

If X and Y are discrete, then

is the conditional probability mass function of Y when X=x.


30

|

( , )( | ) ,

( )Y XX

f x yf y x y

f x

|

( , )( | ) ,

( )Y XX

p x yf y x y

p x

School of Software

Example 5.12 (Ex.5.3 Cont’)

X= the proportion of time that a bank’s drive-up facility is busy and Y=the analogous proportion for the walk-up window. The conditional pdf of Y given that X=0.8 is

The probability that the walk-up facility is busy at most half the time given that X=0.8 is then


31

22

|

(0.8, ) 1.2(0.8 ) 1( | 0.8) (24 30 ),0 1

(0.8) 1.2(0.8) 0.4 34Y XX

f y yf y y y

f

0.5 0.52

| |

1( 0.5 | 0.8) ( | 0.8) (24 30 ) 0.39

34Y X Y Xf y X f y dy y dy

School of Software

Homework

Ex. 9, Ex.12, Ex.18, Ex.19


32

School of Software

The Expected Value of a function h(x,y)

Let X and Y be jointly distribution rv’s with pmf p(x,y) or pdf f(x,y) according to whether the variables are discrete or continuous. Then the expected value of a function h(X,Y), denoted by E[h(X,Y)] or μh(X,Y) , is given by

5.2 Expected Values, Covariance, and Correlation

33

( , ) ( , ), & :

[ ( , )]( , ) ( , ) , & :

x y

h x y p x y X Y discrete

E h X Yh x y f x y dxdy X Y continuous

School of Software

Example 5.13

Five friends have purchased tickets to a certain concert. If the tickets are for seats 1-5 in a particular row and the tickets are randomly distributed among the five, what is the expected number of seats separating any particular two of the five?

The number of seats separating the two individuals is

h(X,Y)=|X-Y|-1


34

11,...,5; 1,...,5;

( , ) 200 otherwise

x y x yp x y

School of Software



35

--

h(x,y)

y

1

5432

x1 2 3 4 5-- 30 1 20 -- 0 1 21 0 0 12 1 0 -- 03 2 1 0 --

( , )

[ ( , )] ( , ) ( , )x y

E h X Y h x y p x y 5 5

1 1

1(| | 1) 1

20x yx y

x y

School of Software

Example 5.14 In Example 5.5, the joint pdf of the amount X of almonds and

amount Y of cashews in a 1-lb can of nuts was

If 1 lb of almonds costs the company $1.00, 1 lb of cashews costs $1.50, and 1 lb of peanuts costs $0.50, then the total cost of the contents of a can is


36

24 0 1,0 1, 1( , )

0 otherwise

xy x y x yf x y

h(X,Y)=(1)X+(1.5)Y+(0.5)(1-X-Y)=0.5+0.5X+Y

School of Software

Example 5.14 (Cont’) The expected total cost is


37

[ ( , )] ( , ) ( , )E h X Y h x y f x y dxdy

1 1

0 0(0.5 0.5 ) 24 $1.10

xx y xydydx

Note: The method of computing E[h(X1,…, Xn)], the expected value of a function h(X1, …, Xn) of n random variables is similar to that for two random variables.

School of Software

Covariance The Covariance between two rv’s X and Y is


38

Cov( , ) [( )( )]

( )( ) ( , ) , discrete

( )( ) ( , ) , continuous

X Y

X Yx y

X Y

X Y E X Y

x y p x y X Y

x y f x y dxdy X Y

School of Software

Illustrates the different possibilities.


39

y

x

μY

μX

+

+

-

-

y

x

μY

μX

+

+

-

- x

y

μY

μX

+

+

-

-

(a) positive covariance (b) negative covariance;

(c) covariance near zero

Here: P(x, y) =1/10

School of Software

Example 5.15 The joint and marginal pmf’s for X = automobile policy

deductible amount and Y = homeowner policy deductible amount in Example 5.1 were


40

yp(x,y)

x 100250

0 200100.20 .10 .20.05 .15 .30

x

pX(x)

100 250

.5 .5

250

.5

y

pY(y)

100

.25

0

.25

From which μX=∑xpX(x)=175 and μY=125. Therefore

( , )

Cov( , ) ( 175)( 125) ( , )x y

X Y x y p x y (100 175)(0 125)(0.2) ... (250 175)(200 125)(0.3) 1875

School of Software

Proposition

Note:

Example 5.16 (Ex. 5.5 Cont’)

The joint and marginal pdf’s of X = amount of almonds and Y = amount of cashews were


41

Cov( , ) ( ) X YX Y E XY

2 2Cov( , ) ( ) ( )XX X E X V X

24 0 1,0 1, 1( , )

0 otherwise

xy x y x yf x y

School of Software



42

212 (1 ) 0 1 ( )

0 otherwiseX

x x xf x

fY(y) can be obtained through replacing x by y in fX(x). It is easily verified that μX = μY = 2/5, and

( ) ( , )E XY xyf x y dxdy

Thus Cov(X,Y) = 2/15 - (2/5)2 = 2/15 - 4/25 = -2/75. A negative covariance is reasonable here because more almonds in the can implies fewer cashews.

1 1 1 2 3

0 0 024 8 (1 ) 2 /15

xxy xydydx x x dx

School of Software

Correlation The correlation coefficient of X and Y, denoted by

Corr(X,Y), ρX,Y or just ρ, is defined by

Example 5.17 It is easily verified that in the insurance problem of

Example 5.15, σX = 75 and σY = 82.92. This gives


43

,

Cov( , )X Y

X Y

X Y

ρ = 1875/(75)(82.92)=0.301

The normalized version of Cov(X,Y)

School of Software

Proposition


44

1. If a and c are either both positive or both negative

Corr(aX+b, cY+d) = Corr(X,Y)

2. For any two rv’s X and Y, -1 ≤ Corr(X,Y) ≤ 1.

3. If X and Y are independent, then ρ = 0, but ρ = 0 does not imply independence.

4. ρ = 1 or –1 iff Y = aX+b for some numbers a and b with a ≠ 0.

School of Software

Example 5.18 Let X and Y be discrete rv’s with joint pmf


45

1( , ) ( 4,1), (4, 1), (2, 2)( 2, 2)

( , ) 40 otherwise

x yp x y

It is evident from the figure that the value of X is completely determined by the value of Y and vice versa, so the two variables are completely dependent.

However, by symmetry μX = μY = 0 and E(XY) = (-4)1/4 + (-4)1/4 + (4)1/4 + (4)1/4 = 0, so Cov(X,Y) = E(XY) - μX μY = 0 and thus ρXY = 0.

Although there is perfect dependence, there is also complete absence of any linear relationship!

School of Software

Another Example X and Y are uniform distribution in an unit circle


46

2 21, 1

( , )0,

x yp x y

otherwise

(1,0)Obviously, X and Y are dependent. However, we have

( , ) 0Cov X Y

School of Software

Homework

Ex. 24, Ex. 26, Ex. 33, Ex. 35


47

School of Software

Example 5.19

5.3 Statistics and Their Distributions

48

f(x)

x0 5 10 15

0.05

0.10

0.15

Given a Weibull Population with α=2, β=5

μ= 4.4311, μ= 4.1628, δ=2.316~

School of Software



49

Sample 1 2 3 4 5 6

1 6.1171 5.07611 3.46710 1.55601 3.12372 8.93795

2 4.1600 6.79279 2.71938 4.56941 6.09685 3.92487

3 3.1950 4.43259 5.88129 4.79870 3.41181 8.76202

4 0.6694 8.55752 5.14915 2.49795 1.65409 7.05569

5 1.8552 6.82487 4.99635 2.33267 2.29512 2.30932

6 5.2316 7.39958 5.86887 4.01295 2.12583 5.94195

7 2.7609 2.14755 6.05918 9.08845 3.20938 6.74166

8 10.2185 8.50628 1.80119 3.25728 3.23209 1.75486

9 5.2438 5.49510 4.21994 3.70132 6.84426 4.91827

10 4.5590 4.04525 2.12934 5.50134 4.20694 7.26081

School of Software



50

Sample 1 2 3 4 5 6

Mean 4.401 5.928 4.229 4.132 3.620 5.761

Median 4.360 6.144 4.608 3.857 3.221 6.342

Standard Deviation 2.642 2.062 1.611 2.124 1.678 2.496

Population

Sample 1

Function of the sample observation

A quantity #1

Sample 2


A quantity #2

Sample k


A quantity #k

…statistic

School of Software

Statistic

A statistic is any quantity whose value can be calculated from sample data (with a function).

Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable. A statistic will be denoted by an uppercase letter; a lowercase letter is used to represent the calculated or observed value of the statistic.

The probability distribution of a statistic is sometimes referred to as its sampling distribution. It describes how the statistic varies in value across all samples that might be selected.


51

School of Software

The probability distribution of any particular statistic depends on

1. The population distribution, e.g. the normal, uniform, etc. , and the corresponding parameters

2. The sample size n (refer to Ex. 5.20 & 5.30)

3. The method of sampling, e.g. sampling with replacement or without replacement


52

School of Software

Example Consider selecting a sample of size n = 2 from a

population consisting of just the three values 1, 5, and 10, and suppose that the statistic of interest is the sample variance.

If sampling is done “with replacement”, then S2 = 0 will result if X1 = X2.

If sampling is done “without replacement”, then

S2 can not equal 0.


53

School of Software

Random Sample The rv’s X1, X2,…, Xn are said to form a (simple) random

sample of size n if

1. The Xi’s are independent rv’s.

2. Every Xi has the same probability distribution.

When conditions 1 and 2 are satisfied, we say that the Xi’s are independent and identically distributed (i.i.d)


54

Note: Random sample is one of commonly used sampling methods in practice.

School of Software

Random Sample Sampling with replacement or from an infinite population is

random sampling.

Sampling without replacement from a finite population is generally considered not random sampling. However, if the sample size n is much smaller than the population size N (n/N ≤ 0.05), it is approximately random sampling.


55

Note: The virtue of random sampling method is that the probability distribution of any statistic can be more easily obtained than for any other sampling method.

School of Software

Deriving the Sampling Distribution of a Statistic Method #1: Calculations based on probability rules e.g. Example 5.20 & 5.21

Method #2:

Carrying out a simulation experiments e.g. Example 5.22 & 5.23


56

School of Software

Example 5.20 A large automobile service center charges $40, $45, and $50 for a tune-

up of four-, six-, and eight-cylinder cars, respectively. If 20% of its tune-ups are done on four-cylinder cars, 30% on six-cylinder cars, and 50% on eight-cylinder cars, then the probability distribution of revenue from a single randomly selected tune-up is given by

Suppose on a particular day only two servicing jobs involve tune-ups.

Let X1 = the revenue from the first tune-up &

X2 = the revenue from the second,

which constitutes a random sample with the above probability distribution.


57

x 40 45 50

p(x) 0.2 0.3 0.5

μ = 46.5σ2 = 15.25

School of Software


58


x1 x2 p(x1,x2) x s2

40 40 0.04 40 0

40 45 0.06 42.5 12.5

40 50 0.10 45 50

45 40 0.06 42.5 12.5

45 45 0.09 45 0

45 50 0.15 47.5 12.5

50 40 0.10 45 50

50 45 0.15 47.5 12.5

50 50 0.25 50 0

x 40 42.5 45 47.5 50

px(x) 0.04 0.12 0.29 0.30 0.25

s2 0 12.5 50

ps2(s2

)0.38 0.42 0.20

_

2_ _2( ) 46.5 , ( ) 7.635

2xE X V X

2

2 2( ) 15.25s

E S Known the Population Distribution

School of Software



59

x 40 41.25 42.5 43.75 45 43.26 47.5 48.75 50

px(x) 0.0016 0.0096 0.0376 0.0936 0.1761 0.2340 0.2350 0.1500 0.0625

x 40 42.5 45 47.5 50

px(x) 0.04 0.12 0.29 0.30 0.25n=2

n=4

…

School of Software

Example 5.21 The time that it takes to serve a customer at the cash register in a

minimarket is a random variable having an exponential distribution with parameter λ. Suppose X1 and X2 are service times for two different customers, assumed independent of each other. Consider the total service time To = X1 + X2 for the two customers, also a statistic. What is the pdf of To? The cdf of To is, for t≥0


60

0 1 2( ) ( )TF t P X X t

11

0

[ ]

1

tx t

t t

e e dx

e te

1 2 1 2

1 2 1 2

{( , ); }

( , )x x x x t

f x x dx dx

1

1 22 1

0 0

t xtx xe e dx dx

x1+x2= t

(x1,t-x1)

x

1

x2

School of Software

Example 5.21 (Cont’) The pdf of To is obtained by differentiating FTo(t);

This is a gamma pdf (α = 2 and β = 1/λ).


61

0

2 0( )

0 0

t

T

te tf t

t

The pdf of = To/2 is obtained from the relation { ≤ }iff {To ≤ 2 } as

XX x x

2 24 0( )

0 0

x

X

xe xf x

x

School of Software

Simulation Experiments This method is usually used when a derivation via probability

rules is too difficult or complicated to be carried out. Such an experiment is virtually always done with the aid of a computer. And the following characteristics of an experiment must be specified:

The statistic of interest (e.g. sample mean, S, etc.) The population distribution (normal with μ = 100 and σ = 15,

uniform with lower limit A = 5 and upper limit B = 10, etc.) The sample size n (e.g., n = 10 or n = 50) The number of replications k (e.g., k = 500 or 1000) (the actual

sampling distribution emerges as k∞)


62

School of Software

Example 5.23

Consider a simulation experiment in which the population distribution is quite skewed. Figure shows the density curve of a certain type of electronic control (actually a lognormal distribution with E(ln(X)) = 3 and V(ln(X))=.4).


63

f(x)

x0 25 50 75

.01

.05

.03

E(X)=μ=21.7584, V(X)=σ2=82.1449

School of Software



64

1. Center of the sampling distribution remains at the population mean.

2. As n increases: Less skewed (“more normal”) More concentrated

(“smaller variance”)

School of Software

Homework

Ex.38, Ex.41


65

School of Software

Proposition Let X1, X2, …, Xn be a random sample (i.i.d. rv’s) from a

distribution with mean value μ and standard deviation σ. Then

In addition, with To=X1+…+Xn (the sample total),

5.4 The Distribution of the Sample Mean

66

22

( )

( ) and /X

X

X

E X

V X nn

0

20 0( ) , ( ) and TE T n V T n n

Refer to 5.5 for the proof!

School of Software

Example 5.24 In a notched tensile fatigue test on a titanium specimen, the expected

number of cycles to first acoustic emission (used to indicate crack initiation) is μ = 28,000, and the standard deviation of the number of cycles is σ = 5000.

Let X1, X2, …, X25 be a random sample of size 25, where each Xi is the number of cycles on a different randomly selected specimen. Then

The standard deviations of and To are


67

0( ) 28,000, ( ) 25(28000) 700,000E X E T n

000,25)5000(25

100025

5000/

0

n

n

T

X

X

School of Software

Proposition Let X1, X2, …, Xn be a random sample from a normal

distribution with mean μ and standard deviation σ. Then for any n, is normally distributed (with mean μ and standard deviation ), as is To (with mean nμ and standard deviation ).


68

X

/ nn

School of Software

Example 5.25 The time that it takes a randomly selected rat of a certain

subspecies to find its way through a maze is a normally distributed rv with μ = 1.5 min and σ = .35 min. Suppose five rats are selected. Let X1, X2, …, X5 denote their times in the maze. Assuming the Xi’s to be a random sample from this normal distribution.

Q #1: What is the probability that the total time To = X1+X2+…+X5 for the five is between 6 and 8 min?

Q #2: Determine the probability that the sample average time is at most 2.0 min.


69

X

School of Software

Example 5.25 (Cont’) A #1: To has a normal distribution with μTo= nμ = 5(1.5) = 7.5

min and variance σTo2 =nσ2 = 5(0.1225) =0.6125, so σTo = 0.783

min. To standardize To, subtract μTo and divide by σTo:

A #2:


70

6 7.5 8 7.5(6 8) ( )

0.783 0.783( 1.92 0.64) (0.64) ( 1.92) 0.7115

oP T P Z

P Z

2.0 1.5( 2.0) ( )

0.1565( 3.19) (3.19) 0.9993

P X P Z

P Z

( ) 1.5E X / 0.35 / 5 0.1565X n

School of Software

The Central Limit Theorem (CLT) Let X1, X2, …, Xn be a random sample from a distribution

(may or may not be normal) with mean μ and variance σ2.

Then if n is sufficiently large, has approximately a normal distribution with

To also has approximately a normal distribution with

The larger the value of n, the better the approximation


71

X

2 2, /X X n

0 0

2 2,T Tn n

Usually, If n > 30, the Central Limit Theorem can be used.

School of Software

An Example for Uniform Distribution


72

School of Software

An Example for Triangular Distribution


73

School of Software

Example 5.26 When a batch of a certain chemical product is prepared, the amount

of a particular impurity in the batch is a random variable with mean value 4.0g and standard deviation 1.5g. If 50 batches are independently prepared, what is the (approximate) probability that the sample average amount of impurity X is between 3.5 and 3.8g?

Here n = 50 is large enough for the CLT to be applicable. X then has approximately a normal distribution with mean value and

so


74

_

0.4X1.5 / 50 0.2121,X

(3.5 3.8)P X 3.5 4.0 3.8 4.0( )

0.2121 0.2121P Z

( 0.94) ( 2.36) 0.1645

_

School of Software

Example 5.27

A certain consumer organization customarily reports the number of major defects for each new automobile that it tests. Suppose the number of such defects for a certain model is a random variable with mean value 3.2 and standard deviation 2.4. Among 100 randomly selected cars of this model, how likely is it that the sample average number of major defects exceeds 4?

Let Xi denote the number of major defects for the ith car in the random sample. Notice that Xi is a discrete rv, but the CLT is applicable whether the variable of interest is discrete or continuous.


75

School of Software



76

( 4)P X

Using and

24.0X2.3

X

1 (3.33) 0.0004

4 3.2

0.24P Z

School of Software

Other Applications of the CLT

The CLT can be used to justify the normal approximation to the binomial distribution discussed in Chapter 4. Recall that a binomial variable X is the number of successes in a binomial experiment consisting of n independent success/failure trials with p = P(S) for any particular trial. Define new rv’s X1, X2, …, Xn by


77

Xi = 1 if the ith trial results in a success

0 if the ith trial results in a failure(i = 1, …, n)

School of Software

Because the trials are independent and P(S) is constant from trial to trial to trial, the Xi’s are i.i.d (a random sample from a Bernoulli distribution).

The CLT then implies that if n is sufficiently large, both the sum and the average of the Xi’s have approximately normal distributions. Now the binomial rv X = X1+….+Xn. X/n is the sample mean of the Xi’s. That is, both X and X/n are approximately normal when n is large.

The necessary sample size for this approximately depends on the value of p: When p is close to .5, the distribution of Xi is reasonably symmetric. The distribution is quit skewed when p is near 0 or 1.


78

0 1(a)p=0.4

0 1(b)p=0.1

Rule: np ≥ 10 & n(1-p) ≥ 10rather than n>30

School of Software

Proposition

Let X1, X2, …, Xn be a random sample from a distribution for which only positive values are possible [P(Xi > 0) = 1]. Then if n is sufficiently large, the product Y = X1X2 · … · Xn has approximately a lognormal distribution.

Please note that :

ln(Y)=ln(X1)+ ln(X2)+…+ ln(Xn)


79

School of Software

Chebyshev's Inequality Let X be a random variable (continuous or discrete) , then

Supplement: Law of large numbers

80

0,)(

)|)((|2

XDXEXP

)1|)(|

()|)((|

XEXPXEXP )1

))(((

2

2

XEX

P

dxxpXEX

)(1

))((2

2

dxxp

XEX)(

))((2

2

2

)(XD

Proof:

B

dxxpXEX

)())((

2

2

})({})({ XEXXEXB

1|))((

|2

2

XEX

School of Software

Khintchine law of large numbers

X1, X2, ... an infinite sequence of i.i.d. random variables with finite expected value E(Xk) = µ < ∞ and variable D(Xk) = δ2 < ∞


81

0,1)|1

(|lim1

n

iin

Xn

P

nXDXE nn

2

)(;)(

Proof:

According to Chebyshev's inequality

2

2

21

1)()|)((|)|

1(|

nXD

XEXPXn

P nnn

n

ii

n

iin X

nX

1

1

n0

1)|1

(|lim1)|1

(|lim11

n

iin

n

iin

Xn

PXn

P

School of Software

Bernoulli law of large numbersThe empirical probability of success in a series of Bernoulli trials Ai

will converge to the theoretical probability.

Let n(A) be the number of replication on which A does occur, then we have


82

n

iiAnn

An

1

1)(

others

occursAA ii

,0

,1

According to Chebyshev's inequality

pnAn

E ))(

(

Ai 1 0

p p 1-p

npp

nnAn

D)1(

))(

(2

2

)1(1)|)

)((|

pp

np

nAn

P

1)|)(

(|lim1)|)(

(|lim

pnAn

PpnAn

Pnn

n0

School of Software


83

1

0 1 2 3 … …100 101

Relative frequency:n(A)/n

Number of experiments performed

p

School of Software

Homework

Ex. 48, Ex. 51, Ex. 55, Ex. 56


84

School of Software

Linear Combination

Given a collection of n random variables X1, …, Xn and n numerical constants a1, …, an, the rv

is called a linear combination of the Xi’s.

5.5 The Distribution of a Linear Combination

85

n

iiinn XaXaXaY

111

School of Software

Let X1, X2, …, Xn have mean values μ1, …, μn respectively, and variances of σ1

2, …., σn2, respectively.

Whether or not the Xi’s are independent,

If X1, X2, …, Xn are independent,

For any X1, X2, …, Xn,


86

1 1 1( ) ( )

n n n

i i i i i ii i iE a X a E X a

2 2 2

1 1 1( ) ( )

n n n

i i i i i ii i iV a X a V X a

1 1

2 2 2 21 1n na X a X n na a

1 1 1( ) Cov( , )

n n n

i i i j i ji i jV a X a a X X

School of Software

Proof: For the result concerning expected values, suppose that

Xi’s are continuous with joint pdf f(x1,…,xn). Then


87

1 11 1( ) ... ( ) ( ,..., ) ...

n n

i i i i n ni iE a X a x f x x dx dx

1 11... ( ,..., ) ...

n

i i n nia x f x x dx dx

1( )

i

n

i i X i iia x f x dx

1( )

n

i iia E X

1 1 1( ) ( )

n n n

i i i i i ii i iE a X a E X a

School of Software

Proof:


88

1 1 1( ) Cov( , )

n n n

i i i j i ji i jV a X a a X X

2

1 1 1

n n n

i i i i i ii i iV a X E a X a

2

1

n

i i iiE a X

1 1

( )( )n n

i j i i j ji jE a a X X

1 1[( )( )]

n n

i j i i j ji ja a E X X

1 1Cov( , )

n n

i j i ji ja a X X

When the Xi’s are independent, Cov(Xi, Xj) = 0 for i ≠ j, and

1

n

i iiV a X

1 1Cov( , )

n n

i j i ji ja a X X

2

1( )

n

i iia V X

School of Software

Example 5.28 A gas station sells three grades of gasoline: regular unleaded,

extra unleaded, and super unleaded. These are priced at $1.20, $1.35, and $1.50 per gallon, respectively. Let X1, X2 and X3 denote the amounts of these grades purchased (gallon) on a particular day. Suppose the Xi’s are independent with μ1 = 1000, μ2= 500, μ3= 300, σ1 = 100, σ2 = 80, and σ3 = 50. The revenue from sales is Y = 1.2X1+1.35X2+1.5X3. Compute E(Y), V(Y), σY.


89

( )E Y

Y

( )V Y

1 2 31.2 1.35 1.5 $2325

2 2 2 2 2 21 2 3(1.2) (1.35) (1.5) 31,689

31,689 $178.01

School of Software

Corollary (the different between two rv’s) E(X1-X2) = E(X1) - E(X2) and, if X1 and X2 are independent,

V(X1-X2) = V(X1)+V(X2).

Example 5.29 A certain automobile manufacturer equips a particular model with

either a six-cylinder engine or a four-cylinder engine. Let X1 and X2 be fuel efficiencies for independently and randomly selected six-cylinder and four-cylinder cars, respectively. With μ1 = 22, μ2 = 26, σ1 = 1.2, and σ2 = 1.5,


90

1 2( )E X X

3.69 1.92

1 2 22 26 4

1 2( )V X X 2 2 2 21 2 (1.2) (1.5) 3.69

1 2X X

School of Software

Proposition

If X1, X2, …, Xn are independent, normally distributed rv’s (with possibly different means and/or variances), then any linear combination of the Xi’s also has a normal distribution.

Example 5.30 (Ex. 5.28 Cont’) The total revenue from the sale of the three grades of gasoline on

a particular day was Y = 1.2X1+1.35X2+1.5X3, and we calculated μY = 2325 and σY =178.01). If the Xi’s are normally distributed, the probability that the revenue exceeds 2500 is


91

( 2500)P Y

( 0.98) 1 (0.98) 0.1635P Z

2500 2325( )

178.01P Z

School of Software

Homework

Ex. 58, Ex. 70, Ex. 73


92

chapter 5. joint probability distributions and random sample weiqi luo ( 骆伟祺 ) school of...

Documents