a flavour of errors in variables modelling jonathan gillard [email protected]

17
A Flavour of Errors in Variables Modelling Jonathan Gillard [email protected]

Upload: carlos-vega

Post on 28-Mar-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

A Flavour of Errors in Variables Modelling

Jonathan Gillard

[email protected]

Page 2: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Constructing the Model• We have two variables, ξ and η.• ξ and η are linearly related in the form η = α+βξ.

• Instead of observing n pairs (ξi, ηi) we observe the n data pairs (xi,yi), where

xi = ξi + δi

yi = ηi + εi

and it is assumed that i and i are independent error terms having zero mean and variances σδ

and σε respectively.2

2

Page 3: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Down’s Syndrome

• Affects 1 in 1000 children born in the UK.

• Down’s is caused by the presence of an extra chromosome. An extra copy of chromosome 21 is included when the sperm and the egg combine to form the embryo.

• Screening tests are used to calculate the chance of a baby having the condition.

Page 4: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

The Data Set

0

20

40

60

80

100

120

100 105 110 115 120 125 130 135

Gestational Age

Lo

g A

FP

Page 5: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

How can we fit a line?

• There are clearly errors in both variables.

• “To use standard statistical techniques of estimation to estimate β, one needs additional information about the variance of the estimators” – Madansky (1959)

• We know the dating error is ±2 days – this is enough information!

Page 6: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Method of Moments

• “The method of moments has a long history, involves an enormous amount of literature, has been through periods of severe turmoil associated with its sampling properties compared to other estimation procedures, yet survives as an effective tool, easily implemented and of wide generality”

– Bowman and Shenton

Page 7: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Method of Moments

• “The maximum likelihood approach to estimation is primarily justified by asymptotic (as the sample size goes to infinity) considerations”

– Cheng and Van Ness

Page 8: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Estimating the Parameters

• As the dating error is ±2 days, then σδ = 2.

• Use a modified ‘y on x’ regression estimator: β = sxy / (sxx - σδ).

• Other parameters i.e. intercept α can be estimated from the method of moment equations.

2

Page 9: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Regression Lines

0

20

40

60

80

100

120

100 105 110 115 120 125 130 135

Gestational Age

Lo

g A

FP x y pair

y on x

x on y

sigma[delta] known

Page 10: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Typology of Residuals

Cond’l Residuals “Local”

Innovation Residuals

“White noise”

Marginal Residuals “Global”

Typology of

Residuals (Haslett)

What are residuals used for?1. Prediction2. Model checking3. Leverage4. Influence5. Deletion

Page 11: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Estimating the true points

• Two naive m.m.e’s of ξ:

The optimal linear combination is:

2

2

2

x Var[x]

y yVar

2

2 2

2

yx

( )x

Page 12: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

The Estimated True Points

0

20

40

60

80

100

120

100 105 110 115 120 125 130 135

Gestational Age

Lo

g A

FP

x y pair

xi eta pair

Page 13: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Estimated true against observed

100

105

110

115

120

125

130

135

140

100 105 110 115 120 125 130 135 140

x

esti

mat

ed x

i

x xi

y = x

Page 14: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

A residual?

• Attempt to write as a usual regression model:

y = α + βx + (ε - βδ)

1. x is always random due to random error

2. Cov(x, ε – βδ) = -βσδ

3. Using ordinary l.s. estimates leads to inconsistent estimators

2

Page 15: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Residuals

-40

-20

0

20

40

60

80

100 105 110 115 120 125 130 135

Gestational Age

Res

idu

al

Page 16: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Residuals again!

-40

-20

0

20

40

60

80

100 105 110 115 120 125 130 135

Estimated Gestational Age

Res

idu

al

Page 17: A Flavour of Errors in Variables Modelling Jonathan Gillard GillardJW@Cardiff.ac.uk

Questions?