spatial point processes and their applications in epidemiology

32
Spatial Statistics for Epidemiology Spatial Point Processes By Liu Xu U086105E Supervisor: Prof Loh Wei Liem Department of Statistics and Applied Probability National University of Singapore 15 March 2012 1

Upload: lilac-liu-xu

Post on 13-Apr-2017

63 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial Point Processes and Their Applications in Epidemiology

Spatial Statistics for Epidemiology —Spatial Point Processes

By Liu Xu U086105E

Supervisor: Prof Loh Wei Liem

Department of Statistics and Applied Probability

National University of Singapore

15 March 2012

1

Page 2: Spatial Point Processes and Their Applications in Epidemiology

Outline

Application in epidemiology

Theory: descriptive statistics

Models: spatial point processes

Data: spatial point patterns

2

Page 3: Spatial Point Processes and Their Applications in Epidemiology

A spatial point pattern is …

Space → Rd, d ≥ 2

Points → data values

Pattern → arrangement

Intro Models Theory Application 3

Page 4: Spatial Point Processes and Their Applications in Epidemiology

Example: tropical rainforest

Intro Models Theory Application 4

Page 5: Spatial Point Processes and Their Applications in Epidemiology

Example: cancer cases

Intro Models Theory Application5

marks: extra informationattached to points, categorical/ continuous

Page 6: Spatial Point Processes and Their Applications in Epidemiology

Example : the milky way galaxy

Intro Models Theory Application 6

Page 7: Spatial Point Processes and Their Applications in Epidemiology

Types of point patterns

Intro Models Theory Application7

regularityCSRclustering repulsionattraction

Page 8: Spatial Point Processes and Their Applications in Epidemiology

Aim: describe and model “pattern”

Are points randomly located?

• If so, find a statistical model to describe the “randomness”;

• If not, …

Intro Models Theory Application8

Page 9: Spatial Point Processes and Their Applications in Epidemiology

Models: spatial point processes

• A spatial point process is a stochastic process X whichgenerates a countable set of events in defined space.

• A spatial pattern x = {x1, x2, …, xn} on an observational regionW generated from a spatial point process is a realization ofthe process.

• Only consider point processes in 2-D space.

• The locations of any object can be modelled−plants, animals,cells, stars, disease cases, earthquakes, …

Intro Models Theory Application9

Page 10: Spatial Point Processes and Their Applications in Epidemiology

Models: spatial point processes

Intro Models Theory Application10

• Notation:

W: study region in R2

N(A): number of events inside subregion A, A W.

|A|: area of region A

s: random locations in W

ds: infinitesimal region centered at s

• Assumptions on spatial point processes:

i. Locally finite: the number of events in any bounded region is bounded

ii. At any point location s, there is either one event or no events at all

Page 11: Spatial Point Processes and Their Applications in Epidemiology

HPP

A spatial point process in a bounded region W in R2 is ahomogeneous Poisson process (HPP) if:

i. For all subregion A in W, N(A) ~ Poi(λ|A|), where 0 < λ < ∞is a constant, called intensity (homogeneity).

ii. If A1 and A2 are two disjoint subregions in W, then N(A1) andN(A2) are independent (independence).

• Standard model for complete spatial randomness (CSR);

• Can be generalized to more complicated models;

• A reference process when analyzing spatial characteristics ofa specific pattern.

Intro Models Theory Application11

Page 12: Spatial Point Processes and Their Applications in Epidemiology

IPP

A spatial point process in a bounded region W in R2 is ainhomogeneous Poisson process (IPP) if:

i. For any subregion A in W, N(A) ~ Poi(∫Aλ(s)ds), where 0 <λ(s) < ∞ is the intensity at s .

ii. If A1 and A2 are two disjoint subregions in W, then N(A1) andN(A2) are independent (independence).

Intro Models Theory Application12

HPPIPPtiongeneraliza

casespecial

Page 13: Spatial Point Processes and Their Applications in Epidemiology

Simulation from Poisson processes

Intro Models Theory Application13

Two Poisson process realizations on the unit square having the sameexpected number of events = 100.

Page 14: Spatial Point Processes and Their Applications in Epidemiology

Summary statistics: first-order

First-order intensity of a spatial point process is:

• Interpretation: expected number of events per unit area. Forsmall region ds, λ(s)|ds| describes the probability for an eventin ds.

• Intensity may be constant (homogeneous) or may vary fromlocation to location (inhomogeneous). If the process ishomogeneous, estimate intensity by

Intro Models Theory Application 14

s

ss

s d

dNE

d

))((lim)(

0

W

WN )(ˆ

Page 15: Spatial Point Processes and Their Applications in Epidemiology

Estimate λ(s) in inhomogeneous case• Estimating the intensity of a spatial point pattern is similar to estimating a

bivariate probability density

• How to estimate bivariate density?

Given an i.i.d. sample (y1, . . . , yn) of a bivariate random variable Y, an estimate of the density f (·) of Y at y is

where K(·) is the kernel and h is the bandwidth.

• The expression for kernel smoothing of the intensity function of a pointpattern x = {x1, …, xn} at location s is

the bandwidth h is chosen based on some cross-validation criterion.

Intro Models Theory Application15

n

i

)h

K(nh

)(f1

2

1ˆ iyy

y

n

i

)h

K(h

)(λ1

2

1ˆ sxs i

Page 16: Spatial Point Processes and Their Applications in Epidemiology

Kernel smoothed intensity of IPP

Intro Models Theory Application16

Kernel estimated intensity for the point pattern simulated from HPP with λ(s) = 400xy on [0, 1] * [0, 1].

Page 17: Spatial Point Processes and Their Applications in Epidemiology

Summary statistics: second-orderThe second-order properties of a point process involverelationship between number of events at different locations.

• The second-order intensity of a spatial point process is

• A point process is called stationary if

• A stationary point process is isotropic if

Intro Models Theory Application 17

ji

ji

ssji

ss

ssss

ji dd

dNdNE

dd

)]()([lim),(

0,2

)(),(

,)(

22 jiji ssss

ss

)(),( 22 jiji ssss

Page 18: Spatial Point Processes and Their Applications in Epidemiology

If a point process is stationary and isotropic, the K-function ofthe process is defined by:

λK(r) = E[number of further events within distance r from an arbitrary event]

Two properties of K-function:

• For a HPP, λK(r) = λπr2 , thus Kp(r) = πr2

• K(r) is invariant to random thinning.Intro Models Theory Application

18

K-function

Def. random thinning: each event of a point process X is eitherretained or deleted with retention probability p, independently ofother events. The resulting point process X’ contains a subset ofevents of the original process X.

Page 19: Spatial Point Processes and Their Applications in Epidemiology

Comparing estimated K-functions of simulated point patterns

Intro Models Theory Application 19

CSR: K(r) = πr2

clustered: K(r) > πr2

regular: K(r) < πr2

Page 20: Spatial Point Processes and Their Applications in Epidemiology

Estimation of K(r):E(# further events…)/λ

Intro Models Theory Application20

negatively biased edge correction

Page 21: Spatial Point Processes and Their Applications in Epidemiology

Application in epidemiology

John Snow (15 March 1813 – 16 June 1858) is considered to be one of the fathers of epidemiology, because of his work in tracing the source of a cholera outbreak in Soho, England, in 1854

Intro Models Theory Application21

Page 22: Spatial Point Processes and Their Applications in Epidemiology

Case-control study

Goal: compare the spatial distributionof disease cases with the underlyingpopulation

• Null hypothesis :

equal spatial distribution

• Controls:

selected to represent populationheterogeneity

Intro Models Theory Application22

Incidence of disease

Population density

Overall risk of

disease

Other risk factors, e.g.

distance from point source

Do disease cases occur randomly among population?

Page 23: Spatial Point Processes and Their Applications in Epidemiology

Case-control data consist of two point patterns:

• the locations of n1 cases of particular disease {x1, x2, …, xn1}

• the locations n0 controls {xn1+1, …, xn1+n0}

in a study region W over a defined period of time. Total number of data points n = n1 + n0.

Assumption:

• Cases from an IPP with intensity λ1(s)

• Controls from another independent IPP with intensity λ0(s)

Intro Models Theory Application23

Page 24: Spatial Point Processes and Their Applications in Epidemiology

Spatial risk

relative risk:

estimated relative risk:

H0:

test statistic:

estimated test statistic:

significance: Monte Carlo test

Intro Models Theory Application24

)(

)()(

0

1

s

ss

0

10)(

n

n s

n

i

T1

2

0 ])([ ix

n

i

T1

2

0 ])(ˆ[ˆ ix

)(ˆ

)(ˆ)(ˆ

0

1

s

ss

Page 25: Spatial Point Processes and Their Applications in Epidemiology

Spatial clusteringK0(r)→ amount of clustering due to population

K1(r)→ amount of clustering due to population plus effect of other possible risk factors

D(r) = K1(r) - K0(r) → the amount of clustering that is not due to population

estimate:

H0:

Test statistic:

significance: Monte Carlo test

Intro Models Theory Application25

m

k k

k

rD

rDD

1 )](var[

)(

)(ˆ)(ˆ)(ˆ 01 rKrKrD

0D(r)=

Page 26: Spatial Point Processes and Their Applications in Epidemiology

Monte Carlo test1). simulation with random labelling at jth iteration, j=1, 2, …, 99• randomly select n1 points from n data points and label the selected points as “case”, label

the remaining n0 points as “control”

• with the relabelled data, estimate kernel smoother and at every data point.

• estimate K1j(r) and K0j(r) and compute Dj(r) at a set of discrete distances {r1, r2, …, rm} .

2). test statistic• for each j, compute

• compute the variance of D(rk) for each k=1, 2, …, m. then get

3). p-value

Intro Models Theory Application26

)(ˆ),(ˆ01 xx jj )(ˆ xj

2

1 0])(ˆ[ˆ

n

i ijjT x

m

kk

kj

j

rD

rDD

1 )](ˆvar[

)(ˆˆ

)199/(]1}ˆˆ{[

)199/(]1}ˆˆ{[

99

1

2

99

1

1

j

j

j

j

DDIp

TTIp

Page 27: Spatial Point Processes and Their Applications in Epidemiology

Case study-the chorley data

Intro Models Theory Application27

58 cases 978 controls

Page 28: Spatial Point Processes and Their Applications in Epidemiology

Lots of graphs

Intro Models Theory Application28

Page 29: Spatial Point Processes and Their Applications in Epidemiology

Intro Models Theory Application29

Monte Carlo test gives p-value = 0.64 →there is no significant spatial variation in the relative risk.

Page 30: Spatial Point Processes and Their Applications in Epidemiology

graph

Intro Models Theory Application30

p-value = 0.91→ no significant relative spatial clustering

Page 31: Spatial Point Processes and Their Applications in Epidemiology

Summary

summary 31

Spatial point patterns

Spatial point processes

HPP

IPP

λ(s)

K(r)

CSR

Application in epidemiology

Page 32: Spatial Point Processes and Their Applications in Epidemiology

Thank you for your attention!

Time for Q&A

The end 32