modern methods of data analysis - physikalisches...

36
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Modern Methods of Data Analysis Lecture IX (10.12.07) Least Square Method (II) Bayesian Parameter Estimates Contents:

Upload: trannhan

Post on 09-Aug-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Modern Methods ofData Analysis

Lecture IX (10.12.07)

● Least Square Method (II)● Bayesian Parameter Estimates

Contents:

Page 2: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Error Estimate● One σ interval :

(not per dof!) – shape is per definition parabolic – n-dim contours are ellipsoide

● Compare LH:

- LH function approx. parabolic shape is most of the time asymmetric

Page 3: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Least Square with Binned Data

● Very often we do not have to estimate a parameter of f(x)=y, but a pdf f(x).

● Bin data, define as expected rate in bin j

● What is ?

data entries in bin j

Page 4: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Least Square with Binned Data (II)

● Rate in bin is Poisson distributed with●

● In general highly not linear in a:● Approximate

[Only valid for large n (> 10)]● What to do with empty bins? - Skip them ... ● After all this, it is not a surprise that the

convergence of binned is not at all optimal. Better use binned ML for histogram fits, at least for histograms with little entries.

Page 5: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Non Linear Least Square

● What to do if the problem is not linear in a?● Develop fist order Taylor expansion around

starting value as close as possible to true value

● Solve linearized problem● Take result as starting value for next round of

iterations.● until it converges ...

● With bad starting values, changes are high to

end up at false minimum.

Page 6: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Χ² with additional constraints● Assume the fit has to solve certain boundary

conditions, which can be expressed as

● Introduce Lagrangian Multiplier

● Solve set of equations:

Page 7: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example● Measurement of all three angels of a triangle:

α,β,γ (all measurements with same uncertainties):–

Page 8: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Likelihood vs. Χ² (I)● Χ² fit is fastest, easiest

– Works fine at high statistics– Gives absolute goodness-of-fit indication– If linear in parameters, then analytical solvable– Make Gaussian error assumption on low statistic bins– ignores 0 entry bins– Misses information with feature size < bin size

● Full Maximum Likelihood estimators is most robust

– No Gaussian assumption made a low statistics– Use if handful of events– No information lost due to binning– Gives best error of all methods (esp. at low statistics)– No intrinsic goodness-of fit measure (best might be pretty bad)– Has bias proportional to 1/N– Can be computationally expensive for large number of events

Page 9: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Likelihood vs. Χ² (II)● Binned Maximum Likelihood, somewhere inbetween

– Much faster than full Maximum Likelihood– correct Poisson treatment of low statistics bins– Can use Χ² test afterwards– Misses information with feature size < bin size– Has bias proportional to 1/N

Page 10: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Re: Bayes' Theorem (1)

Conditional (“bedingte”) probability

Due to follows:

Page 11: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Re: Bayes' Theorem (2)

Important through the interpretation A=theory,B=data

Posterior

Likelihood

Evidence

Prior

Page 12: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Examples: Rare Disease

Probability of disease A: P(A) = 0.001 P(not A) = 0.999Test for disease: P(+|A) = 0.98 P(+|not A) = 0.03P(-|A) = 0.02 P(-|not A) = 0.97

Need to be worried if you get “+” as result?What is the posteriori probability? = 0.032

Page 13: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Bayesian Parameter Estimate● Bayesian ansatz:

– treat parameters as random variables– knowledge about parameters is described by a PDF– the likelihood function is a conditional probability

● use Bayes' theorem to translate measurement into

updated of knowledge about parameters

● consider the following scenario– measurements with likelihood functions

– probability of the measurements is – prior distribution of the parameter p(a)– posterior distribution of the parameter p(a|x)

Page 14: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Bayesian use of Bayes' Theorem●

The absolute probability for the measurements usually is not known, but can be inferred from the completeness relation to yield:

● note:– p(a|x) is pdf in a, thus normalized– p(a|x) depend on p(a)– Bayes' theorem improves (pre)existing knowledge

through measurements– concept of objectively existing parameter not required

Page 15: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Bayesian Interpretation of

● “best” estimate– Median

– Maximum Likelihood value

● error interval, in which the (true) parameter value is

located with probability p– center interval

– minimum size interval

All available information about a is contained in p(a|x).Possible interpretations are ...

Page 16: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example: Coin

Throwing N times a coin, obtaining H times head:

Bayesian approach has proper description for parameter bounds.

Page 17: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Re: Unphysical Result in “normal” Likelihood

● Likelihood often sum of two or more components (signal + background)

● e.g. toy MC with 6 signal & 60 background events

Although negative number of # signal unphysical, need to use them, when combining with other experiments otherwise bias.

Page 18: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example Coin: Bayesian Likelihood

● Seite 16

Uniform distribution is mathematical identical to “normal” (Frequentist) Likelihood

Page 19: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example Coin: Bayesian Likelihood

● Seite 17

Page 20: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example Coin: Bayesian Likelihood

● Use two different priors– asuming almost unbiased coin:

● Gaussian distribution around 0.5

– asuming very biased coin, don't exclude any other unlikely but possible hypothesis

● probability not zero in the center!

Page 21: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example Coin: Bayesian Likelihood

● Seite 18

Page 22: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example Coin: Bayesian Likelihood

● Seite 19

For large N prior has no impact anymore. Be carefulto not exclude unlikely but possible results with prior!

Page 23: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example: Lifetime Measurement (I)● conditional probability of a single measurement

● conditional probability for n measurements:

● Ansatz for a prior distribution (not normalized): with a free parameter k

● With the substitution and

Page 24: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example: Lifetime Measurement (II)

Conditional probability depend on parameter k of prior.

Page 25: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Consistency: Add more measurement ...

● Use as prior for m new measurement

● with and

Page 26: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Adding more measurement ...

Adding m measurements with prior probabilityis equivalent to using n+m measurement right away.

Consistency check of Bayesian probability!

Page 27: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Choice of Prior DistributionConsider Bayesian Maximum Likelihood estimator

Result is a function of k, it depends on the choice of the prior● prior is result of treating parameters like random variables;

very controversial issue● the prior should neither exclude nor prefere any value● it should be uniform over the allowed phase space

– uniform in a -> frequentist approach– uniform in f(a) ??? (E.g. decay length instead of life time)!

● look for a physics argument which determines the prior based on the structure or symmetry of the problem!

is a biased estimator ...

Page 28: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Translation and Scale Invariance● Location parameters should not depend on a shift

of the coordinate system: in case of complete ignorance

● Quantities which are associated with size or magniture are called scale parameters, they should not depend on the units (e.g. ns versus ps) Jeffreys' prior in case of complete ignorance

Page 29: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example: Lifetime Measurement (III)● numerical checks of Bayesian error interval:

– true value ; n=5 & n=25– compare Jeffrey's prior with uniform prior– study central and minimal 68% intervals

Jeffreys' prior uniform prior

central 2.8487 < T(5) < 7.1469 2.0928 < T(5) < 5.9038minimal 3.5185 < T(5) < 9.4375 2.7013 < T(5) < 8.3333central 20.056 < T(25) < 29.935 19.159 < T(25) < 28.834minimal 21.063 < T(25) < 31.658 20.158 < T(25) < 30.563

Page 30: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Example: Lifetime Measurement (VI)● Results obtained with Toy Monte Carlo samples

● The Bayesian probability content of the interval given on the previous slides are all 68%. The actual probability content of those are given in the table:

Jeffrey's uniform

central (5) 67.998% 64.082%minimal (5) 67.994% 78.048%central (25) 68.082% 67.366%minimal (25) 68.066% 70.033%

Page 31: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Summary: Bayesian Parameter Estimates● properties of Bayesian parameter estimates:

– simple and powerful method– mathematically consistent concept– a critical point is the choice of the prior distribution

● in the lifetime example:– fit unbiased, error intervals biased (uniform prior)– fit biased, error intervals unbiased (Jeffrey's prior)

● in limit of large statistics, choice of prior less critical

Note: In general bayesian parameter estimate biased. Same often applies to normal ML. Frequentist give high priorityto bias correction, for Bayesians this is less important

- the interpretation of parameter PDF is subjective- error intervals are exact

Page 32: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Constraining Flavour Physics (I)

Page 33: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Constraining Flavour Physics (II)

Page 34: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

UT fit versus CKM fitter● CKM fitter: Frequentist approach

– theory predictions give ranges for parameters, assume uniform distribution in allowed parameter space

● UT fit: Bayesian approach – theory predictions are interpreted as probability

functions

Frequentist CL for α

Bayesian pdf for α depending on theory input

Page 35: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Bayesians vs. Frequentists

There are strong rivaling schools of these approaches.

Looks for the moment like philosophical question, differencein interpretation will come clearer once we discuss confidence levels & statistical and systematical uncertainties.Again: Math is the same!

A frequentist is a person whose lifetime ambition is to be wrong 5% of the time.

A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.

Page 36: Modern Methods of Data Analysis - Physikalisches Institutmenzemer/Stat0708/statistik_vorlesung_9.pdf · Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer Least

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

Bayesians vs. Frequentists

Someone studying large sample of potential carriers of disease:

Prior probability: overall fraction of people who carry diseasePosterior probability: fraction of people who are carriers out of those with positive test result

A specific individual, however, may be interested in the subjective probability:

Prior probability: Degree of belief that one has the disease before the testPosterior probability: Degree of belief that one has the disease after the test

Frequentist point of view: Probability that individual has diseaseis 0 or 1, we just don't know. Bayesians say it is 0.1%.