introduction to particle filters for data assimilationoksana/docs/statmos-ssda/dowd... ·...

Post on 30-Jan-2018

216 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mike  Dowd  Dept of Mathematics & Statistics (and Dept of Oceanography)

Dalhousie University, Halifax, Canada    

Introduction to Particle Filters for Data Assimilation

STATMOS  Summer  School  in  Data  Assimila5on,  Boulder,  Colorado,  May  18-­‐25  2015      

Problem Statement    

•  Noisy  'me  series  observa'ons  on  the  system  state    

•  a  discrete  'me  stochas'c  dynamic  model  describing  the  'me  evolu'on  of  the  system  state  

         

y1:T = (y1,..., yT )

xt = θxt−1 + n t

•  We  want  to  es'mate  the  system  state  (and  some'mes  parameters)  

e.g.  Bivariate  AR(1)  process  

Noisy Observations of System State

0 50 100 150 200

-10

-50

510

time

state

Observations

Stochastic Dynamics governing System State

0 50 100 150 200

-10

-50

510

10 Realizations of System State

time

state

Estimation of System State by Particle Filters

0 50 100 150 200

-10

-50

510

time

state

Particle Filter Estimate of the State

Target: Evolution of the Probability Distribution Of the State through Time

state  

'me  

�Hierarchical  Bayesian  Model    

p(x1:T ,θ | y1:T )∝ p(y1:T | x1:T ,θ ) ⋅ p(x1:T |θ ) ⋅ p(θ )

A General Framework for DA

Special  Cases  in  DA  

•  Parameter  es'ma'on  (no  model  error)  

•  Filtering  solu'on  for  state  *  

•  Smoothing  solu'on  for  state  

•  Joint  state  &  parameter  es'ma'on    

•  Inference  on  dynamic  model  structure  

target  (posterior)   measurement   dynamics   prior  

•  The  general  goal  is  to  make  inferences  about  the  state  xt  and  some'mes  the  sta'c  parameters*  θ and  ϕ**.

•  We  use  explicit  priors  on  process  noise  &  observa'on  errors  (i.e.  parametric  distribu'ons  for  et  and  vt).  Assume  yt  condi'onally  independent  given  xt,  and  that  et  and vt are  independent.  

 •  For  DA,  we  want  to  do  space-­‐'me  problems:  space  is  incorporated  

in  xt

è Solu'ons  involve  sequen'al  Monte  Carlo  based  for  nonlinear  and  nonGaussian  state  space  models    

xt = d(xt−1,θ ,et )yt = h(xt ,φ,ν t )

� State  Space  Model:  xt ~ p(xt ,θ | xt−1)yt | xt ~ p(yt ,φ | xt )

or  

Data Assimilation: A Statistical Framework

*  Dynamic  parameters  are  part  of  the  state  

t =1,…,T

**  drop  φ  hereaTer,  and  defer  θ  

Problem Classes We  want  to  es'mate  the  system  state  at  'me  t,  or      given  the  observa'ons  up  to  'me  s,  or    

xty1:S = (y1, y2,.., yS )

data available

no data available

estimation time

A) Filtering (Nowcast)

Time

B) Forecasting

C) Smoothing (Hindcast)

s=t  èFiltering  

s>t  èPredic'on  

s<t  èSmoothing  

s  

s  

s  

time

measurement

nowcast forecast

time = t-1 time = t

prediction

nowcast

p(xt−1,θ | y1:t−1)

xt = d(xt−1,θ ,et ) yt

p(xt ,θ | y1:t−1) p(xt ,θ | y1:t )

-  We’ll drop parameters θ … for now

Single stage transition of system from time t-1 to time t Sequential Estimation

Probabilistic Solution for Filtering

p(xt | y1:t ) ∝ p( yt | xt ) p(xt | xt−1) ⋅ p(xt−1 | y1:t−1)dxt−1∫

è General Filtering Solution:

do for t=1, …, T, given p(x0 ) and

A single stage transition of the system for time t-1 to t involves:

Prediction:

Measurement update:

p(xt | y1:t−1) = p(xt | xt−1) ⋅ p(xt−1 | y1:t−1) dxt−1∫

p(xt | y1:t ) =p(yt | xt ) ⋅ p(xt | y1:t−1)

p(y1:t )

y1:t = (y1, y2,.., yT )

Sampling Based Estimation

Let    {xt|t(i ),wt

(i )}, i = 1, ..., n be  a  random  sample*  from      

drawn  from  the  target  distribu'on  (the  filter  density)    

p(xt | y1:t )

where:   xt|t(i ) are  the  support  points  

wt(i )

are  the  normalized  weights  

The  filter  density  can  then  be  approximated  as  

p(xt | y1:t ) ≈ wt(i ) δ (xt|t − xt|t

(i ) )i=1

n

*referred  to  an  ensemble,  made  up  of  a  collec'on  of  par'cles  (which  are  sample  members)  

Sampling Based Estimation

−2 0 20

0.2

0.4

(a) unweighted ensemble, n=25

x

pro

babi

lity

−2 0 20

0.2

0.4

(b) weighted ensemble, n=25

x

pro

babi

lity

−2 0 20

0.2

0.4

(c) unweighted ensemble, n=250

x

pro

babi

lity

−2 0 20

0.2

0.4

(d) weighted ensemble, n=250

x

pro

babi

lity

Sequential Importance Sampling (SIS)

At time t-1 we have:

i {xt−1|t−1(i ) ,wt−1

(i ) }, draws from p(xt−1 | y1:t−1)i yt , an observation

If a candidate xt|t(i ) is drawn from a proposal q(xt|t )

then wt(i ) ∝ p(xt|t

(i ) | yt )q(xt|t

(i ) | yt )

or wt(i ) ∝wt−1

(i ) p(yt | xt|t(i ) ) ⋅ p(xt|t

(i ) | xt−1|t−1(i ) )

q(xt|t(i ) )

A draw from p(xt | y1:t ) is then available as {xt|t(i ),wt

(i )}see  Arulampalam  et  al.  2002    

SIS Remarks

•  A  common  simplifica'on  is  to  let  the  proposal  be  the  the  prior  (the  predic've  density)                                                                    leading  to:  q(xt|t ) = p(xt | xt−1)

wt(i ) ∝wt−1

(i ) p(yt | xt−1(i ) )

•  The  main  prac'cal  problem  with  SIS  is  ‘weight  collapse”  wherein  aTer  a  few  'me  steps,  all  the  weights  become  concentrated  on  a  few  par'cles.  How  to  fix?  

•  SIS  solves  the  filtering  problems  for  the  nonlinear,  non-­‐Gaussian  state  space  model  via  an  algorithm  for  sequen'al  “online”  es'ma'on  of  the  system  state  relying  on  a  proposal  and  weight  update.      

Sequential Importance Resampling (SIR)

•  The major breakthrough in sequential Monte Carlo methods (Gordon et al. 1993) was to incorporate a resampling step to obviate weight collapse.

i Specifically, for each time t, {xt|t−1(i ) ,wt

(i )} is a sample from p(xt | yt−1)

i We can resample (with replacement) the xt|t(i ) with a probability

proportional to wt(i ) (i.e. a weighted bootstrap)

i This yields a modified sample {xt|t(i )}, wherein particles were both killed

and replicated, and the weights are now equal, i.e. wt(i ) = 1∀ i.

Remaining Issue: sample impoverishment - some particles replicated often in the unweighted sample . Less severe than weight collapse (but of course related).

{xt|t(i )}

Basic Particle Filter Algorithm Given:  (1)  dynamical  model,  (2)  observa'ons,  (3)  ini'al  condi'ons    

for  t  =  1  to  T  

(a) Prediction: {xt−1|t−1(i ) }→ {xt|t−1

(i ) } as xt|t−1

(i ) = d(xt−1|t−1(i ) ,et

(i ) ) for i = 1,...,n

(b) Measurement: {xt|t−1(i ) }→ {xt|t

(i )} as • wt

(i ) ∝ p(yt | xt|t−1(i ) ) for i = 1,...,n

• resample with replacement {xt|t−1(i ) } using weights wt

(i ) → yields {xt|t

(i )}

end  (for  t)  

Particle Filter Schematic

Interpretation: Particle History

     'me              

     state                      

R-code for a Basic Particle Filter

!for (t in 2:T)!{!!xf=t(apply(xa,1,dynamicmodel)) !!yobs=matrix(y[t,],1,ncol(y))! !w=apply(matrix(xf[,1],np,1),1,dmvnorm,x=yobs,sigma=R) !!m=sample(1:np,size=np,prob=w,replace=T);!xa=xf[m,] !!}!

predic'on  step  from  t-­‐1  to  t  

extract  measurement  at  'me  t  

assign  weights    

weighted  resampling        

Element 1. Prediction Step xt−1|t−1

(i ){ }draw from p(xt−1 | y1:t−1)

xt|t−1(i ){ }

draw from p(xt | y1:t−1)filter density at time t-1

predictive density at time t

Remarks:    -­‐  computa'onally  expensive  to  generate  large  samples  for  

big  GFD  models    -­‐  hard  to  effec'vely  populate  a  large  dimension  state  space  

with  par'cles    

Role: acts as proposal distribution for particle filter

xt|t−1(i ) = d(xt−1|t−1

(i ) ,et(i ) ) for i = 1,...,n

Element 2. Measurement Update   xt|t−1

(i ){ }draw from p(xt | y1:t−1)

xt|t(i ){ }

draw from p(xt | y1:t )predictive density

at time t-1 filter density

at time t

uses p(xt|t | y1:t )∝ p(yt | xt|t−1) ⋅ p(xt|t−1)(Bayes’  Formula)  

i PF/SIR: weighted resample of xt|t−1(i ) ,wt

(i ){ }→ xt|t(i ){ }

�          Metropolis Hastings MCMC

� Ensemble Kalman filter (approximate)

� plus many others (auxiliary, unscented, ….) that modify proposal distributions

yt

Approaches:  

Sequential Metropolis-Hastings  Construct  (independence)  chain  to  yield  sample  from  target  

filter  distribu5on      .    

xt|t(i ){ }, a sample from p(xt | y1:t )

•  flexible,  configurable  and  efficient  (+  a  parallel  with  SIR)    •  can  alleviate  sample  impoverishment  (resample-­‐move)  

1. Generate candidate from predictive density: draw xt|t−1

* from predictive density p(xt | y1:t−1)

α = min 1, p(yt | xt|t−1* )

p(yt | xt|t(k ) )

⎛⎝⎜

⎞⎠⎟

2. Evaluate acceptance probability

for  k  =  1  to  K  

3. Let xt|t* enter the posterior sample {xt|t

(i )} with probability α ,otherwise retain xt|t

(k )

end  (for  k)   yields  

Limitations

Theory:  convergence  to  target  marginal  distribu'on                        holds  under  weak  assump'ons  (e.g.  Kunsch  2005)    Prac7cal  Problem:  The  standard  par'cle  filtering  (SIR)  when  used  in  DA  suffers  from  sample  impoverishment  (recall  path  degeneracy)      Typically,  the  predic'on  ensemble  far  away  from  the  observed  state  (due  to  small  ensemble  size,  and  high  dimensional  state  space).      è likelihood  is  poorly  represented  and  es'ma'on  compromised.    

 

What  modifica5ons  are  made  to  improve  par5cle  filters  ?  

 

p(xt | y1:t )

Add  “Ji;er”:  overdisperse  the  sample  of  predic've  density  (inflate  process  error)    Make  a  Gaussian  approxima5on,  or  transforma5on/anamorphosis:  Can  Use  Kalman  filter  upda'ng  equa'on  

Approximate  the  Likelihood  func5on:  change  its  func'onal  form,  or  inflate  or  alter  measurement  error.  

Error  Subspace:  confine  stochas'city  in  parameters  only.  Dimension  reduc'on.  

Use  Fixed  lag  smoother,  Batch  processing  incorporate  observa'ons  from  mul'ple  'mes  into  observa'on  update.      Clever  Proposal  Distribu5ons  and  look-­‐ahead  filters:  move  beyond  using  “prior”  (predic've  density)  as  proposal.    

Some Fixes

What about Parameter Estimation?

xt=xtθt

⎝⎜⎜

⎠⎟⎟

1. State Augmentation: append parameters to the state

Can use particle filter machinery to solve (Kitagawa 1998). Also, see multiple iterative filtering (Ionides 2006)

2. Likelihood Methods: Use sample based likelihoods

[y1:T |θ ]= L(θ | y1:T ) = [yt | xt ,θ ][xt | y1:t−1,θ ]dxt∫t=1

T

≈ 1nt=1

T

∏ p(yt | xt|t−1(i )

i=1

n

∑ )

Summary

1.  State  space  model  is  sta's'cal  framework  for  considering  dynamic  models  and  measurements  

2.  Importance  sampling  provides  theore'cal  basis  for  par'cle  filters  

3.  Sequen'al  importance  resampling  (SIR)  yields  a  straighforward  and  workable  algorithm  for  the  filtering  problem    

4.  For  the  realis'c  problems  in  data  assimila'on,  par'cle  filters  must  be  modified  either  via  approxima'ons,  or  clever  use  of  proposal  distribu'ons.      

 

Some Key References Jazwinski  AH.  1970.  Stochas'c  Processes  and  Filtering  Theory.  Academic  Press:  New  York,  376pp.    Gordon  NJ,  Salmond  DJ,    Smith  AFM.  1993.  Novel  approach  to  nonlinear/non-­‐Gaussian  Bayesian  state  es'ma'on.  IEE  Proceedings-­‐F  140(2):107-­‐113.    Kitagawa  G.  1996.  Monte  Carlo  filter  and  smoother  for  non-­‐Gaussian  nonlinear  state  space  models.    Journal  of  Computa'onal  and  Graphical  Sta's'cs  5:  1-­‐25.    Kitagawa  G.  1998.  A  self-­‐organizing  state  space  model.  Journal  of  the  American  Sta's'cal  Associa'on.  93(443):1203-­‐1215.    Arulampalam  MS,  Maskell  S,  Gordon  N,  Clapp  T,  2002.  A  tutorial  on  par'cle  filters  for  online  nonlinear  non-­‐Gaussian  Bayesian  tracking.  IEEE  Transac'ons  on  Signal  Processing  50(2):174-­‐188.    Kunsch  HR.  2005.  Recursive  Monte  Carlo  filters:  algorithms  and  theore'cal  analysis.  The  Annals  of  Sta's'cs.  33(5):1983-­‐2021.    Godsill  SJ,  Doucet  A,  West  M.  2004.  Monte  Carlo  Smoothing  for  Nonlinear  Time  Series.  Journal  of  the  American  Sta's'cal  Associa'on.  99(465):156-­‐168.    Ionides  EL,  Breto  C,  King  AA.  2006.  Inference  for  nonlinear  dynamical  systems.  Proceedings  of  the  Na'onal  Academy  of  Sciences  103:  18438-­‐18443.      

top related