the principle of maximum likely hoodwzech/prinmaxlikelyhood.pdf · the principle of maximum likely...

22
The Principle of Maximum Likely Hood Prof. Carol Tanner September 24, 2013

Upload: buikhanh

Post on 15-Feb-2018

231 views

Category:

Documents


2 download

TRANSCRIPT

The Principle of Maximum Likely Hood

Prof. Carol TannerSeptember 24, 2013

Assumptions

✴Sample - multiple measurements of the same thing assumed to be a random subset of the parent population distributed according to the parent distribution.

X1,…,Xn{ }n = number of measurements

Data

limn→∞

X1,…,Xn{ }Parent Population

P(x) =1

σ 2πe−12

x−µσ

⎛⎝⎜

⎞⎠⎟2

Parent Distribution

Parent Distribution

µ = xP(x)dx∫σ 2 = (x − µ)2P(x)dx∫

σ = σ 2

P(x) has units of probability per unit x.

P(x)dx∫ = 1

f (x) = f (x)P(x)dx∫

✴Probability Distribution-probability density function describing the parent population from which it is assumed the data are chosen.

• Normalization

• Expectation Value of f(x)

• Mean (True Value)

• Variance

• Parent Standard Deviation

P(x) = 1σ 2π

e−12

x−µσ

⎛⎝⎜

⎞⎠⎟2

Probability

dp = P(x)dx = 1σ 2π

e−12

x−µσ

⎛⎝⎜

⎞⎠⎟2

dx

Δp = P(x)Δx = 1σ 2π

e−12

x−µσ

⎛⎝⎜

⎞⎠⎟2

Δx

probability = P(x)dxx1

x2

�2 2 4 6

0.1

0.2

0.3

0.4

P(x) has units of probability per unit x.

x1 x2

Probability

dp = P(x)dx = 1σ 2π

e−12

x−µσ

⎛⎝⎜

⎞⎠⎟2

dx

Δp = P(x)Δx = 1σ 2π

e−12

x−µσ

⎛⎝⎜

⎞⎠⎟2

Δx

probability = P(x)dxx1

x2

�2 2 4 6

0.1

0.2

0.3

0.4

P(x) has units of probability per unit x.

Probability

dp = P(x)dx = 1σ 2π

e−12

x−µσ

⎛⎝⎜

⎞⎠⎟2

dx

Δp = P(x)Δx = 1σ 2π

e−12

x−µσ

⎛⎝⎜

⎞⎠⎟2

Δx

probability = P(x)dxx1

x2

�2 2 4 6

0.1

0.2

0.3

0.4

P(x) has units of probability per unit x.

Combining Probabilities

Head Head 1/2 x 1/2 1/4

Head Tail 1/2 x 1/2 1/4

Tail Head 1/2 x 1/2 1/4

Tail Tail 1/2 x 1/2 1/4

Example: Two Coins

And OrHead Tail

orTail Head

1/4 + 1/4 = 1/2

#1 #2 p1 × p2

What is the best estimate of the “true value” based on

our data?

What is the probability of obtaining this data set?

X1 ±σ 1,X2 ±σ 2 ,X3 ±σ 3,…,Xn ±σ n{ }Data

P(X1) =1

σ 1 2πe−12

X1−Xσ1

⎛⎝⎜

⎞⎠⎟

2

P(X2 ) =1

σ 2 2πe−12

X2−Xσ 2

⎛⎝⎜

⎞⎠⎟

2

P(X3) =1

σ 3 2πe−12

X3−Xσ 3

⎛⎝⎜

⎞⎠⎟

2

P(Xn ) =1

σ n 2πe−12

Xi−Xσ n

⎛⎝⎜

⎞⎠⎟

2...

P(X1,X2 ,X3,…Xn ) = P(X1)P(X2 )P(X3)…P(Xn )

Let Xbar be an unknown.

= 1σ 1 2π

e−12

X1−Xσ1

⎛⎝⎜

⎞⎠⎟

2

1σ 2 2π

e−12

X2−Xσ 2

⎛⎝⎜

⎞⎠⎟

2

…1

σ n 2πe−12

Xn−Xσ n

⎛⎝⎜

⎞⎠⎟

2

P(X1,X2 ,X3,…Xn ) = P(X1)P(X2 )P(X3)…P(Xn )

P(X1,X2 ,X3,…Xn ) =1

σ i 2πe−12

Xi−Xσ i

⎛⎝⎜

⎞⎠⎟

2⎡

⎢⎢

⎥⎥i=1

n

F(X) = 1σ i 2π

e−12

Xi−Xσ i

⎛⎝⎜

⎞⎠⎟

2⎡

⎢⎢

⎥⎥i=1

n

Assume the best estimate of the “true value” is the value of Xbar that maximizes the probability of obtaining this data set. That is the most likely value for Xbar is the

most probable value.

Let X be a variable and maximize F(X).

F(X) = 1σ i 2π

e−12

Xi−Xσ i

⎛⎝⎜

⎞⎠⎟

2⎡

⎢⎢

⎥⎥=

i=1

n

∏ 1σ i 2π⎡

⎣⎢

⎦⎥e

− 12

Xi−Xσ i

⎛⎝⎜

⎞⎠⎟

2

i=1

n

i=1

n

∂F(X)∂X

= 1σ i 2π⎡

⎣⎢

⎦⎥

i=1

n

∏ ∂e− 1

2Xi−Xσ i

⎛⎝⎜

⎞⎠⎟

2

i=1

n

∂X

= Ce− 1

2Xi−Xσ i

⎛⎝⎜

⎞⎠⎟

2

i=1

n

∑ ∂∂X

− 12

Xi − Xσ i

⎛⎝⎜

⎞⎠⎟

2

i=1

n

∑⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

= Ce− 1

2Xi−Xσ i

⎛⎝⎜

⎞⎠⎟

2

i=1

n

∑− 1

22(Xi − X)(−1)

σ i2

i=1

n

∑⎧⎨⎩

⎫⎬⎭

= Ce− 1

2Xi−Xσ i

⎛⎝⎜

⎞⎠⎟

2

i=1

n

∑ (Xi − X)σ i

2i=1

n

∑⎧⎨⎩

⎫⎬⎭= 0

Set the derivative equal to zero.

0 = (Xi − X)σ i

2i=1

n

∑ = Xi

σ i2

i=1

n

∑ − Xσ i

2i=1

n

∑ = Xi

σ i2

i=1

n

∑ − X 1σ i

2i=1

n

X 1σ i

2i=1

n

∑ = Xi

σ i2

i=1

n

Solve for

X =

Xi

σ i2

i=1

n

∑1σ i

2i=1

n

This is the weighted mean as previously defined. wi =

1σ i

2i=1

n

∑ X =wiXi

i=1

n

wii=1

n

X

What is the uncertainty in the weighted mean?

Use the error propagation formula.

(ΔX)2 = ∂X∂Xi

⎛⎝⎜

⎞⎠⎟σ i2

i=1

n

∑ =

1σ i2

1σ i2

i=1

n

⎜⎜⎜⎜

⎟⎟⎟⎟

2

σ i2

i=1

n

∑ =

1σ i2

i=1

n

∑1σ i2

i=1

n

∑⎛⎝⎜⎞⎠⎟

2 =11σ i2

i=1

n

ΔX = 11σ i2

i=1

n

Δf (a,b,c) = ∂ f∂a

⎛⎝⎜

⎞⎠⎟2

(Δa)2 + ∂ f∂b

⎛⎝⎜

⎞⎠⎟2

(Δb)2 + ∂ f∂c

⎛⎝⎜

⎞⎠⎟2

(Δc)2

X =

Xi

σ i2

i=1

n

∑1σ i

2i=1

n

Note that if the individual uncertainties are all the same then the weighted mean reduces to the sample mean as previously defined. Also note the standard error (uncertainty) in the mean.

X =

1σ 2 Xi

i=1

n

∑1σ 2 1

i=1

n

∑= 1n

Xii=1

n

ΔX = 11σ i2

i=1

n

ΔX = σn

σ i =σ

Δf (x, y) = ∂ f∂x

⎛⎝⎜

⎞⎠⎟2

(Δx)2 + ∂ f∂y

⎛⎝⎜

⎞⎠⎟

2

(Δy)2

What about the error propagation formula?

Look at 4 Cases of Error Propagation

• Measured quantity plus a fixed number

• Measured quantity times a fixed number

• Sum of two measured quantities

• General case

q = x + A

q = Bx

x ±σ x

y ±σ y

q = f (x, y)

q = x + y

x ±σ x

x ±σ x

y ±σ y

x ±σ x

Measured quantity plus a fixed number

P(x) = 1σ x 2π

e−12

x−xTσ x

⎛⎝⎜

⎞⎠⎟

2

P(q) = 1σ x 2π

e−12q−A−xT

σ x

⎛⎝⎜

⎞⎠⎟

2

= 1σ x 2π

e−12q−(xT +A)

σ x

⎛⎝⎜

⎞⎠⎟

2

= 1σ x 2π

e−12q−qTσ x

⎛⎝⎜

⎞⎠⎟

2

x ±σ xGiven:

Find the new probability distribution for q.

σ q =σ x q = (x + A)±σ x

q = x + A

Measured quantity times a fixed number

P(x) = 1σ x 2π

e−12

x−xTσ x

⎛⎝⎜

⎞⎠⎟

2

P(q) = 1σ x 2π

e−12

qB−xTσ x

⎝⎜

⎠⎟

2

= 1σ x 2π

e−12q−BxTBσ x

⎛⎝⎜

⎞⎠⎟

2

= 1σ x 2π

e−12q−qTBσ x

⎛⎝⎜

⎞⎠⎟

2

x ±σ xGiven:

Find the new probability distribution for q.

σ q = B σ x q = Bx ± B σ x

q = Bx

Sum of Two Measured Quantities

P(x) = 1σ x 2π

e−12

xσ x

⎛⎝⎜

⎞⎠⎟

2

x ±σ xGiven:

σ q2 =σ x

2 +σ y2

q = (x + y)± σ x2 +σ y

2

y ±σ yP(y) = 1

σ y 2πe−12

yσ y

⎝⎜⎞

⎠⎟

2

P(x, y) = P(x)P(y) = Cxe−12

xσ x

⎛⎝⎜

⎞⎠⎟

2

Cye−12

yσ y

⎝⎜⎞

⎠⎟

2

= Ce−12

xσ x

⎛⎝⎜

⎞⎠⎟

2

+ yσ y

⎝⎜⎞

⎠⎟

2⎡

⎢⎢

⎥⎥

x2

σ x2 +

y2

σ y2 =

(x + y)2

σ x2 +σ y

2 +(σ y

2x −σ x2y)

σ x2σ y

2 (σ x2 +σ y

2 )= (x + y)2

σ x2 +σ y

2 + z2

P(x, y) = Ce−12

(x+y)2

σ x2+σ y

2

⎣⎢⎢

⎦⎥⎥e

−12z2

= P(x + y)P(z)

P(x + y) P(z)dz−∞

+∞

∫ = Ce−12

(x+y)2

σ x2+σ y

2

⎣⎢⎢

⎦⎥⎥ e

−12z2

dz−∞

+∞

∫ = Ce−12

(x+y)2

σ x2+σ y

2

⎣⎢⎢

⎦⎥⎥ 2π

P(x + y) = ′C e−12

(x+y)2

σ x2+σ y

2

⎣⎢⎢

⎦⎥⎥

Show:

σ q = σ x2 +σ y

2

General Caseq = f (x, y) y ±σ yx ±σ x

q ≈ f (xT , yT )+∂ f∂x(x − xT )+

∂ f∂y(y − yT )

q − qT ≈ ∂ f∂x(x − xT )+

∂ f∂y(y − yT )

q − qT ≈ B(x − A)+ D(y −C)

σ q2 = B( )2σ x

2 + D( )2σ y2 = ∂ f

∂x⎛⎝⎜

⎞⎠⎟2

σ x2 + ∂ f

∂y⎛⎝⎜

⎞⎠⎟

2

σ y2

σ q = Δf (x, y) = ∂ f∂x

⎛⎝⎜

⎞⎠⎟2

σ x2 + ∂ f

∂y⎛⎝⎜

⎞⎠⎟

2

σ y2

In Conclusion

• We have seen how the principle of maximum likely hood motives the definitions of statistical parameters that are derived from the data.

• We have seen how the error propagation formula follows from the assumed parent probability distribution determined from the data.

The End