lecture 1. introduction to variational data assimilation...
TRANSCRIPT
Data assimila;on
데이터 동화 deita donghwa
Variational Data AssimilationLecture 1. Introduction to Variational Data Assimilation.
Adrian Sandu1
1Computational Science Laboratory (CSL)Department of Computer Science
Virginia Tech
Ewha International School on Data Assimilation (EISDA 2012)Seoul, Korea, 22-24 August 2012
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Title. [1/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Hello
안녕하세요 An-‐yeong-‐ha-‐se-‐yo
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Hello!. [2/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Data assimilation
Data assimilation = the fusion of information from imperfect modelpredictions, and from noisy data, to obtain a consistent description ofthe state of a physical system, such as the atmosphere.
Approaches to solving data assimilation:I Variational (rooted in control theory)I Ensemble-based (rooted in statistical estimation theory)
Drivers for improvements in data assimilation are:I Better algorithmsI Better observing systemsI Better computational platforms
Lecture
설교 seolgyo
1. Introduction to variational d.a.. General view of data assimilation. [3/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Observation coverageObservation Coverage
6 February 200900 UTC ± 3h
Yannick Tremolet (ECMWF) Variational Data Assimilation July 2009 37 / 44Figure: (Tremolet, 2009)Lecture
설교 seolgyo
1. Introduction to variational d.a.. General view of data assimilation. [4/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Improvements in data assimilation capabilitiesPerformance
1992 1995 1998 2001 2004 2007
98
95
90
80
70
60
50
40
D+3
D+7
1992 1995 1998 2001 2004 2007
98
95
90
80
70
60
50
40Operations
1992 1995 1998 2001 2004 2007
ERA-40
Anomaly correlation of 500hPa height forecasts
Northern hemisphere Southern hemisphere
D+5
ERA-Interim
D+3
D+5
D+7
Forecast performance has increased regularly over the years.
Yannick Tremolet (ECMWF) Variational Data Assimilation July 2009 41 / 44
Figure: (Tremolet, 2009)
Lecture
설교 seolgyo
1. Introduction to variational d.a.. General view of data assimilation. [5/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Data assimilation optimally combines three sources ofinformation
The true state (sampled at the model grid points) xtrue ∈ Rn is unknownand needs to be estimated from the available information. In order toobtain an estimate of xtrue data assimilation combines three differentsources of information:
1. the prior information encapsulates our current knowledge of thestate
2. the model encapsulates our knowledge about physical andchemical laws that govern the evolution of the system
3. the observations are noisy and sparse snapshots of realityavailable at discrete times
The best estimate that optimally fuses all these sources of informationis called the analysis, and is denoted by xa.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Sources of information. [6/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Source of information #1: The prior I
I The background (prior) probability density Pb(x) encapsulates ourcurrent knowledge of the tracer distribution.
I Specifically, Pb(x) describes the uncertainty with which one knowsxtrue at the present, before any (new) measurements are taken.
I The mean taken with respect to this pdf is denoted by
Eb [f ] =
∫f (x)Pb(x) dx .
I The current best estimate of the true state is called the apriori, orthe background state xb. (This is often taken to be the mean of thebackground distribution xb = Eb[x].)
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Sources of information. [7/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Source of information #1: The prior II
I A typical assumption is that the random background errorsεb = xb − xtrue are unbiased and have a normal pdf, i.e.,
εb = xb − xtrue ∈ N (0,B) .
Here B = Eb[εb (εb)T ] ∈ Rn×n is the background error covariance
matrix.I With many nonlinear models the normality assumption is difficult
to justify, but is nevertheless widely used because of itsconvenience.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Sources of information. [8/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Source of information #2: The model
I The model encapsulates our knowledge about physical andchemical laws that govern the evolution of the atmosphericcomposition.
I The model evolves an initial state x0 ∈ Rn at the initial time t0 tofuture state values xi ∈ Rn at future times ti ,
xi =Mt0→ti (x0) .
The size of the state space in realistic chemical transport modelsis very large, typically n ∈ O
(108) variables.
I The model is always imperfect (why?). Model error over [ti−1, ti ]
µi =Mti−1→ti(xtrue
i−1)− xtrue
i .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Sources of information. [9/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Source of information #3: The observations I
I Observations represent sparse and noisy snapshots of reality, thatare available at several discrete time moments.
I Specifically, measurements yi ∈ Rm of the true state are taken attimes ti , i = 1, · · · ,N
yi = Ht (xtruei)− ηobs
i , i = 1, · · · ,N.
I The observation operator Ht maps the physical state space ontothe observation space. In many practical situations Ht is a highlynonlinear mapping (as is the case, e.g., with satellite observationoperators).
I The measurement (instrument) errors are denoted by ηobsi .
I At present the chemical observations are sparsely distributed, andtheir number is small compared to the dimension of the statespace, m� n.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Sources of information. [10/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Source of information #3: The observations II
I Observation equation relates the true state with the observations.In order to relate the model state to observations we also considerthe relation
yi = H (xi)− εobsi , i = 1, · · · ,N ,
εobsi = H (xi)−H
(xtrue
i)
+H(xtrue
i)−Ht (xtrue
i)
+ ηobsi .
I The observation operator H maps the model state space onto theobservation space.
I The observation error term εobsi accounts for
1. measurement (instrument) errors, as well as2. representativeness errors (i.e., errors in the accuracy with which the
model can reproduce reality, and with which the numerical operatorH approximates Ht).
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Sources of information. [11/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Source of information #3: The observations III
I Typically observation errors are assumed to be unbiased andnormally distributed
εobsi ∈ N (0,Ri) , i = 1, · · · ,N .
Moreover, observation errors at different times (εobsi and εobs
j fori 6= j) are assumed to be independent.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Sources of information. [12/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Example: the observation operator maps the modelstate space into observation space
Model-predicted Radiance
To allow model-data comparison, observation operators map the model state space to observation space
H Satellite-observed
Radiance
Lars Isaksen (http://www.ecmwf.int) Nov. 30, 2011. Lecture 1: Data assimilation.
Model-computed T and q
Compare
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Sources of information. [13/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Result of data assimilation: The analysisI Based on these three sources of information data assimilation
computes the analysis (posterior) probability density Pa(x). Pa(x)describes the uncertainty with which one knows xtrue after all theinformation available from measurements has been accounted for.
I The mean taken with respect to this pdf is denoted by
Ea [f ] =
∫f (x)Pa(x) dx .
I The best estimate xa of the true state obtained from analysisdistribution is called the aposteriori, or the analysis state. (Thiscan be the posterior mean xa = Ea[x], or a posterior mode).
I The analysis estimation errors εa = xa − xtrue are characterized byI analysis mean error (bias): βa = Ea [εa]I analysis error covariance matrix:
A = Ea[(εa − βa) (εa − βa)T
]∈ Rn×n.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Data assimilation results. [14/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
The Bayesian estimation framework I
I The analysis probability density is the probability density of thestate conditioned by all the available observationsy = [y1, · · · ,yN ]. Bayes Theorem allows one to express theanalysis probability density as follows:
Pa(x) = P(x|y) =P(y|x) · Pb(x)
P(y).
I The denominator P(y) is the marginal probability density of theobservations and plays the role of a scaling factor.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The Bayesian framework. [15/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
The Bayesian estimation framework III The probability of the observations conditioned by the statesP(y|x) is the probability that the observation errors assume thevalues H
(xb)− y
P (y|x) = Pobs (εobs = H(xb)− y
).
Since the observation errors εobs1 , . . . , εobs
N at different timest1, . . . , tN are (considered to be) independent, we have that:
P (y|x) =N∏
i=1
Pobs (εobsi)
=N∏
i=1
Pobs (H (xi)− yi) .
I In practice we want to define estimators xa of the true state xtrue
that are optimal in a certain sense.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The Bayesian framework. [16/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Bayesian example. Background.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The Bayesian framework. [17/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Bayesian example. Background and observation.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The Bayesian framework. [18/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Bayesian example. Background, observation, andanalysis.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The Bayesian framework. [19/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
The meaning of “best” estimator1. Minimum mean square error (MMSE) estimator minimizes the
expected values of the mean square error minEa[‖xa − xtrue‖2]. Itis the mean of the posterior distribution, xa = Ea[x]. This estimatoris not practical for large scale systems, as it requires anintegration in the high dimensional state space. Practicalestimators are obtained by taking the mean of an approximation ofthe posterior distribution, see for example EnKF.
2. Maximum aposteriori estimator (MAP) is a computationallyfeasible estimator based on the mode of the posterior distribution,see for example variational methods.
3. Minimum variance unbiased (MVUE) estimator xa has the smallesttotal variance (min traceEa[(xa − Ea[xa])(xa − Ea[xa])T ]) among allunbiased estimators. An unbiased estimator is characterized by azero posterior error mean (i.e., zero bias, βa = 0). MVUEestimators are not guaranteed to exist, and when they do, they aredifficult to compute in practical problems.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The Bayesian framework. [20/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Different “best” estimators
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The Bayesian framework. [21/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Analytical solution in the Gaussian and linear case IConsider the ideal case where the observation operator is linear
H (x) = H · x , H ∈ Rm×n .
and both the background errors and the observation errors arenormally distributed
Pb(x) = (2π)−n/2 (det B)−1/2 exp(−1
2(x− xb)T B−1(x− xb)
)Pobs (y|x) = (2π)−m/2 (det R)−1/2 exp
(−1
2(Hx− y)T R−1 (Hx− y)
)Use this probabilities in Bayes’ formula. A direct calculation shows thatthe posterior probability density is also Gaussian, Pa(x) = N (xa,A),
Pa(x) = (2π)−n/2 (det A)−1/2 exp(−1
2(x− xa)T A−1(x− xa)
)Lecture
설교 seolgyo
1. Introduction to variational d.a.. The Bayesian framework. [22/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Analytical solution in the Gaussian and linear case II
The analysis mean xa and covariance A are given by the Kalman filterformulae:
K = BHT (H B HT + R)−1
xa = xb + K(y− H xb)
A = (I− K H) B
The matrix K ∈ Rn×m is called the “Kalman gain” operator.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The Bayesian framework. [23/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Maximum aposteriori estimator I
In the maximum likelihood approach one looks for the argument thatmaximizes the posterior distribution, or equivalently, minimizes itsnegative logarithm:
xa = arg maxxPa(x) = arg min
xJ (x) , J (x) = − ln Pa(x) .
The above equation (defines the maximum aposteriori estimator(MAP). In this context the data assimilation problem is formulated asan optimization problem. Using Bayes the minimization cost functioncan be written as
J (x) = − ln Pa(x) = − lnPb (x)− lnP (y|x) + const .
The scaling factors of the probability densities, as well as the term− lnP(y), are constants in x and do not influence the minimization.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. MAP estimators. [24/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Maximum aposteriori estimator IIUnder the assumption that the background errors are normallydistributed we have that
− lnPb (x) =12(x− xb)T B−1 (x− xb)+ const .
Similarly, under the assumption that observation errors areindependent and normally distributed we have that
− lnP (y|x) =12
(H (x)− y)T R−1 (H (x)− y) + const .
The maximum likelihood estimator is obtained as the minimizer of thecost function
J (x) =12(x− xb)T B−1 (x− xb)+
12
(H (x)− y)T R−1 (H (x)− y) ,
where the constant terms have been left out.Lecture
설교 seolgyo
1. Introduction to variational d.a.. MAP estimators. [25/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Maximum aposteriori estimator III
Note that if, in addition, the observation operator is linear then the costfunction is quadratic, and the minimizer can be computed explicitlyfrom setting the gradient to zero
∇xJ (xa) = B−1 (xa − xb)+ HT R (H(xa)− y) = 0 .
The result is the Kalman filter estimate for the mean. Moreover, theHessian of the cost function coincides with the inverse of the Kalmanfilter analysis covariance matrix
∇2x,xJ = B−1 + HT R−1 H = A−1 .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. MAP estimators. [26/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Three dimensional variational data assimilation(3D-Var) I
I Variational methods solve DA in an optimal control framework.I In the 3D-Var data assimilation the observations are considered
successively at times t1, · · · , tN .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 3D-Var approach. [27/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Three dimensional variational data assimilation(3D-Var) II
Figure: Example of 3D-Var sequential solution procedure.
I The background state (i.e., the best state estimate at time ti ) isgiven by the model forecast, starting from the previous analysis(i.e., best estimate at time ti−1):
xbi =Mti−1→ti
(xa
i−1).
I The discrepancy between the model state xi and observations attime ti , together with the departure of the state from the modelforecast xb
i , are measured by the 3D-Var cost function:
J (xi) =12(xi − xb
i)T B−1
i
(xi − xb
i)
+12
(H(xi)− yi)T R−1
i (H(xi)− yi) .Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 3D-Var approach. [28/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Three dimensional variational data assimilation(3D-Var) III
I While in principle a different background covariance matrix shouldbe used at each time, in practice the same matrix is re-usedthroughout the assimilation window, Bi = B, i = 1, . . . ,N.
I The 3D-Var analysis is the MAP estimator, and is computed as thestate which minimizes
xai = arg min J (xi) .
I Typically a gradient-based numerical optimization procedure isemployed to minimize. The gradient of the cost function is
∇xiJ (xi) = B−1i
(xi − xb
i)
+ HTi R−1
i (H(xi)− yi) .
Note that the gradient requires to computation of the linearizedobservation operator Hi = H′(xi) about the current state.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 3D-Var approach. [29/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Three dimensional variational data assimilation(3D-Var) IV
I For linear observation operator:
∇xiJ (xai ) = 0 ⇒
(B−1
i + HTi R−1
i Hi
)·xa
i = B−1i ·xb
i + HTi R−1
i yi .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 3D-Var approach. [30/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Four dimensional variational data assimilation(4D-Var) I
I 4D-Var = 3D-Var + timeI In 4D-Var data assimilation all observations at all times t1, · · · , tN
are considered simultaneously.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 4D-Var approach. [31/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Four dimensional variational data assimilation(4D-Var) II
Figure: Example of 4D-Var smoothing procedure.
I The control parameters are the initial conditions x0; they uniquelydetermine the future states of the system via the model equation.
I The MAP estimate xa0 is the minimizer of the 4D-var cost function:
J (x0) =12(x0 − xb
0)T B−1
0
(x0 − xb
0)
+12
N∑i=1
(H(xi)− yi)T R−1
i (H(xi)− yi)
Note that the departure of the initial conditions from thebackground is weighted by the inverse background covariancematrix, while the differences between the model predictions H(xi)
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 4D-Var approach. [32/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Four dimensional variational data assimilation(4D-Var) III
and observations yi are weighted by the inverse observation errorcovariances.
I The 4D-Var analysis is computed as the initial condition whichminimizes the cost function subject to the model equationconstraints
xa0 = arg minJ (x0) subject to: xi =Mt0→ti (x0) , i = 1, · · · ,N.
I The model propagates the optimal initial condition forward in timeto provide the analysis at future times, xa
i =Mt0→ti (xa0).
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 4D-Var approach. [33/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Four dimensional variational data assimilation(4D-Var) IV
I The large scale optimization problem is solved numerically using agradient-based technique. The gradient reads
∇x0J (x0) = B−10
(x0 − xb
0)
+N∑
i=1
(∂xi
∂x0
)T
HTi R−1
i (H(xi)− yi)
The 4D-Var gradient requires:I the linearized observation operator Hi = H′(xi ), andI the transposed derivatives of future states with respect to the initial
conditions (∂xi/∂x0)T .I The 4D-Var gradient can be obtained effectively by forcing the
adjoint model with observation increments, and running itbackwards in time. The construction of an adjoint model is anontrivial task.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 4D-Var approach. [34/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Example: Lorenz (three variables). “True” solution andobservations
Example: The Lorenz three-variable system. “True” solution and observations.
Nov. 30, 2011. Lecture 1: Data assimilation. Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 4D-Var approach. [35/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Example: Lorenz (three variables). “True” and“background” solutions
Example: The Lorenz three-variable system. “True” and background solutions.
Nov. 30, 2011. Lecture 1: Data assimilation. Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 4D-Var approach. [36/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Example: Lorenz (three variables). 4D-Var solutionafter 2 iterations
Example: The Lorenz three-variable system. 4D-Var solution, 2 optimization iterations
Nov. 30, 2011. Lecture 1: Data assimilation. Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 4D-Var approach. [37/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Example: Lorenz (three variables). 4D-Var solutionafter 20 iterations
Example: The Lorenz three-variable system. 4D-Var solution, 20 optimization iterations
Nov. 30, 2011. Lecture 1: Data assimilation. Lecture
설교 seolgyo
1. Introduction to variational d.a.. The 4D-Var approach. [38/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Three dimensional variational data assimilation(3D-Var) I
I The background state (i.e., the best state estimate at time ti ):
xbi =Mti−1→ti
(xa
i−1).
I The discrepancy between the model state xi and observations attime ti , together with the departure of the state from the modelforecast xb
i , are measured by the 3D-Var cost function:
J (xi) =12(xi − xb
i)T B−1
i
(xi − xb
i)
+12
(H(xi)− yi)T R−1
i (H(xi)− yi) .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Assimilation as an optimization problem. [39/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Three dimensional variational data assimilation(3D-Var) II
I The 3D-Var analysis is the MAP estimator, and is computed as thestate which minimizes
xai = arg min J (xi) .
I The optimality condition is:
∇xiJ (xai ) = B−1
i
(xa
i − xbi)
+ HTi R−1
i (H(xai )− yi) = 0.
I The gradient requires to computation of the linearized observationoperator Hi = H′(xi) about the current state.
I If the observation operator is linear this is a linear system:(B−1
i + HTi R−1
i Hi
)· xa
i = HTi R−1
i yi + B−1i xb
i .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Assimilation as an optimization problem. [40/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Three dimensional variational data assimilation(3D-Var) III
I Typically a nonlinear gradient-based unconstrained minimizationprocedure is employed.
I Preconditioning is often used to improve convergence of thenumerical optimization problem. A change of variables isperformed by shifting the state and scaling it with the square rootof covariance:
xi = B1/2i
(xi − xb
i),
and carrying out the optimization with the new variables xi .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Assimilation as an optimization problem. [41/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Strongly constrained 4D-Var I
The cost function:
J (x0, . . . ,xN) =12(x0 − xb
0)T B−1
0
(x0 − xb
0)
+12
N∑i=1
(H(xi)− yi)T R−1
i (H(xi)− yi)
The 4D-Var analysis is computed as the initial condition whichminimizes the cost function subject to the model equation constraints
[xa0, . . . ,x
aN ] = arg minJ (x0, . . . ,xN) s.t.: xi =Mti−1→ti (xi−1) , i = 1, · · · ,N.
The model propagates the optimal initial condition forward in time toprovide the analysis at future times, xa
i =Mt0→ti (xa0).
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Strongly constrained, nonlinear 4D-Var. [42/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Strongly constrained 4D-Var II
Comments.I 4D-Var determines the analysis state at every gridpoint and at
every time within the analysis window, i.e., provides a“four-dimensional analysis”.
I Strongly constrained 4D-Var assumes that the observationoperators and the model are perfect. As a consequence, theanalysis corresponds to a trajectory (i.e. an integration) of themodel.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Strongly constrained, nonlinear 4D-Var. [43/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Full space solution I
I “Full space”: all model states x = [x0, . . . ,xN ] are optimizationvariables.
I Use Lagrange multipliers approach and transform theequality-constrained into an unconstrained minimization problem.
I Use a Lagrange multiplier for each of the constraints and buildLagrangian function
L (x,λ) = J (x)−N∑
i=1
λTi ·(xi −Mti−1→ti (xi−1)
).
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Full space solution. [44/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Full space solution III The necessary conditions for a function minimizer are
(O)dLdx0
= B−10
(x0 − xb
0)
+
(d
dx0Mt0→t1(x0)
)T
· λ1 = 0
(A)dLdxi
= HTi R−1
i (H(xi)− yi)− λi +
(d
dxiMti→ti+1(xi)
)T
· λi+1 = 0 ,
i = 1, . . . ,N − 1,
(A)dLdxN
= HTN R−1
N (H(xN)− yN)− λN = 0
(F )dLdλi
= xi −Mti−1→ti (xi−1) = 0, i = 1, . . . ,N .
I It is convenient to impose the “forward model” (F) condition first, toobtain the state variables:
xi =Mti−1→ti (xi−1) , i = 1, . . . ,N .Lecture
설교 seolgyo
1. Introduction to variational d.a.. Full space solution. [45/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Full space solution IIII Next we impose the “adjoint model” (A) conditions, which defines
the Lagrange multipliers (a.k.a. adjoint variables):
λN = HTN R−1
N (H(xN)− yN)
λi =
(d
dxiMti→ti+1 (xi)
)T
· λi+1 + HTi R−1
i (H(xi)− yi)
=
(dxi+1
dxi
)T
· λi+1 + HTi R−1
i (H(xi)− yi) , i = N − 1, . . . ,1 ,
λ0 =
(d
dx0Mt0→t1 x0
)T
· λ1 =
(dx1
dx0
)T
· λ1 .
Note thatI the adjoint model runs backwards in time, andI the model-observation mismatch is a forcing term in the adjoint
model.Lecture
설교 seolgyo
1. Introduction to variational d.a.. Full space solution. [46/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Full space solution IV
I The remaining “optimality” (O) condition reads
B−10
(x0 − xb
0)
+ λ0 = 0 .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Full space solution. [47/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Reduced space solution I
I “Reduced space”: only x0 is an optimization variable.I Eliminate the constraints by running the model in each iteration,
xi =Mt0→ti (x0).I The reduced gradient reads
∇x0J (x0) = B−10
(x0 − xb
0)
+N∑
i=1
(∂xi
∂x0
)T
HTi R−1
i (H(xi)− yi)
I The 4D-Var gradient requires not only the linearized observationoperator Hi = H′(xi), but also the transposed derivatives of futurestates with respect to the initial conditions (∂xi/∂x0)T .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Reduced space solution. [48/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Reduced space solution II
I The transposed chain rule gives us that
∂xi
∂x0=
∂xi
∂xi−1· ∂xi−1
∂xi−2· · · ∂x1
∂x0(∂xi
∂x0
)T
· v =
(∂x1
∂x0
)T
·(∂x2
∂x1
)T
·(
∂xi
∂xi−1
)T
· v .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Reduced space solution. [49/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Reduced space solution IIII The sum of transposed derivatives times a vector can be obtained
by forcing the adjoint model with observation increments, andrunning it backwards in time. For example, for N = 3
vi = HTi R−1
i (H(xi)− yi) , i = 1,2,3
E =
(∂x1
∂x0
)T
· v1 +
(∂x2
∂x0
)T
· v2 +
(∂x3
∂x0
)T
· v3
=
(∂x1
∂x0
)T
· v1 +
(∂x1
∂x0
)T
·(∂x2
∂x1
)T
· v2
+
(∂x1
∂x0
)T
·(∂x2
∂x1
)T
·(∂x3
∂x2
)T
· v3
This expression can be evaluated iteratively.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Reduced space solution. [50/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Reduced space solution IV
I Iteration 3:
λ3 = v3
E =
(∂x1
∂x0
)T
· v1 +
(∂x1
∂x0
)T
·(∂x2
∂x1
)T
· v2
+
(∂x1
∂x0
)T
·(∂x2
∂x1
)T
·(∂x3
∂x2
)T
· λ3
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Reduced space solution. [51/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Reduced space solution V
I Iteration 2:
λ2 =
(∂x3
∂x2
)T
· λ3 + v2
E =
(∂x1
∂x0
)T
· v1 +
(∂x1
∂x0
)T
·(∂x2
∂x1
)T
· v2
+
(∂x1
∂x0
)T
·(∂x2
∂x1
)T
·(∂x3
∂x2
)T
· λ3
=
(∂x1
∂x0
)T
· v1 +
(∂x1
∂x0
)T
·(∂x2
∂x1
)T
· λ2
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Reduced space solution. [52/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Reduced space solution VI
I Iteration 1:
λ1 =
(∂x2
∂x1
)T
· λ2 + v1
E =
(∂x1
∂x0
)T
· v1 +
(∂x1
∂x0
)T
·(∂x2
∂x1
)T
· λ2
=
(∂x1
∂x0
)T
· λ1
I Iteration 0:
E = λ0 =
(∂x1
∂x0
)T
· λ1
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Reduced space solution. [53/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Reduced space solution VII
I The reduced gradient reads
∇x0J (x0) = B−10
(x0 − xb
0)
+ λ0
We note that the optimality condition in the full state approachsimply states that ∇x0J (x0) = 0.
I The construction of an adjoint model is a nontrivial task.I A typical gradient-based minimization requires 10–100 iterations.I Each iteration requires one forward model solution, plus one
backward adjoint solution (cost 2-3 times that of the forwardmodel).
I Total cost is therefore 30− 400 that of the forward modelintegration.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Reduced space solution. [54/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Control flow of the adjoint-based optimizationprocedureThe construction of adjoint models is a labor intensive, error prone task – O(10) FTEs
Observations
Forward CTM model evolution
Backward adjoint model integration
Optimization
Cost function
Gradients
Update control variables
Check-pointing files
Observations
Forward CTM model evolution
Backward adjoint model integration
Optimization
Cost function
Gradients
Update control variables
Check-pointing files
Nov. 30, 2011. Lecture 1: Data assimilation.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Reduced space solution. [55/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Steepest descent methodAt the current step we have x(k) estimate of the optimal solution.Denote:
I J (k) = J(
x(k)0
), the current cost function value
I g(k) = ∇x0J (x(k)0 ), the current gradient
Update the initial condition:
x(k+1)0 = x(k)
0 − α(k) g(k) .
The step size is computed by a line search procedure:
α(k) = arg minαJ(
x(k)0 − α g(k)
).
Issues with steepest descent method:I The convergence is slow – it needs very many iterations.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Nonlinear unconstrained optimization. [56/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Practical optimization algorithms
See coming lecture by Dr. Xu:I Newton’s method (gradient & Hessian)I Quasi-Newton methods (gradient only)I Nonlinear conjugate gradients (gradient only)I Truncated Newton, Gauss Newton, etc. (gradient & approximate
Hessian)
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Nonlinear unconstrained optimization. [57/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Newton’s methodAt the current step we have x(k) estimate of the optimal solution.Denote:
I J (k) = J (x(k)0 ), the current cost function value
I g(k) = ∇x0J (x(k)0 ), the current gradient
I H(k) = ∇2x0,x0J (x(k)
0 ), the current Hessian
Build a quadratic model of J (x0) valid in a neighborhood of x(k)0 using
Taylor:
J (x(k)0 + s) ≈ J (k) + sT · g(k) +
12
sT H(k) s .
If H(k) is positive definite, then there is a unique minimizer of thequadratic model. Find the point that minimizes the quadratic model bysolving the linear system
H(k) · s(k) = −g(k) .
Use this solution to update the initial condition:
x(k+1)0 = x(k)
0 + s(k) .
Issues with Newton’s method:I H(k) ∈ Rn×n is huge, and difficult to computeI The solution of the linear system is very expensive
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Nonlinear unconstrained optimization. [58/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Truncated Newton method
1. Solve the Newton equation at each step by a linear conjugategradient procedure. This gives an “inner iteration”, which istruncated early – since we are only interested in obtaining adescent direction, not in a quality solution. The full procedure iscalled truncated Newton.
2. The CG method avoids the use of the full Hessian, and requiresonly Hessian times vector products. They can be obtained:
I By finite differencing the adjoint gradient
∇2J (x0) · v ≈ ∇J (x0 + ε v)−∇J (x0)
ε
I By implementing and running a second order adjoint model
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Nonlinear unconstrained optimization. [59/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Nonlinear conjugate gradientsNonlinear conjugate gradients method extends the linear CG algorithmto minimize nonlinear functions. The following is the sketchedalgorithm:
First search direction:s(0) = −g(0)
for k = 0,1, . . .
Compute step size by line search: α(k) = arg minαJ(
x(k)0 + α s(k)
)Update solution: x(k+1)
0 = x(k)0 + α(k) s(k)
Update gradient: g(k+1 ) = ∇J (x(k+1)0 )
β(k+1) =g(k+1)T
g(k+1)
g(k)T g(k)︸ ︷︷ ︸Fletcher−Reeves
or
(g(k+1) − g(k))T
g(k+1)
g(k)T g(k)︸ ︷︷ ︸Polak−Ribiere
or ...
Update search direction: s(k+1) = −g(k+1) + β(k+1)s(k)
end Lecture
설교 seolgyo
1. Introduction to variational d.a.. Nonlinear unconstrained optimization. [60/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Quasi-Newton methods I
Quasi-Newton methods approximate the inverse of the Hessian H−1 bya symmetric, positive definite matrix B which is updated at each step.(
H(k))−1≈ B(k)
The algorithm proceeds as follows:1. Compute search direction:
s(k) = −B(k) · g(k) ≈ −(
H(k))−1· g(k)
2. Find step size by line search:
α(k) = arg minαJ(
x(k)0 + α s(k)
)Lecture
설교 seolgyo
1. Introduction to variational d.a.. Nonlinear unconstrained optimization. [61/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Quasi-Newton methods II3. Update solution:
x(k+1)0 = x(k)
0 + α(k) s(k)
4. Update Hessian inverse approximation B(k+1)
Hessian approximations and update formulae. Consider thedifferences:
ξ(k) = x(k+1)0 − x(k)
0 = α(k)s(k) ∈ Rn
γ(k) = g(k+1) − g(k) Taylor= H(k)ξ(k) + o(‖ξ(k)‖) ∈ Rn .
In the quasi-Newton approach, the Hessian approximation are chosento satisfy the “quasi-Newton” condition
B(k+1) · γ(k) = ξ(k) .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Nonlinear unconstrained optimization. [62/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Quasi-Newton methods IIIHow can we update B(k) to B(k+1) such as to satisfy the quasi-Newtonequation? The most successful methods are symmetric rank-twoupdates:
B(k+1) = B(k) + a u uT + b v vT
The choice of a,b and u, v is not unique. We can tune them to getB(k+1) symmetric, positive definite.
DFP (Davidon-Fletcher-Powell):
B(k+1)DFP = B(k) +
ξ(k) ξ(k) T
ξ(k) Tγ(k)− B(k)γ(k)γ(k) T B(k)
γ(k) T B(k)γ(k)
BFGS (Broyden-Flecher-Goldfarb-Shanno):
B(k+1)BFGS = B(k) +
(1 +
γ(k) T B(k)γ(k)
ξ(k) Tγ(k)
)ξ(k) ξ(k) T
ξ(k) Tγ(k)− ξ(k) γ(k) T B(k) + B(k)γ(k)ξ(k) T
ξ(k) Tγ(k)Lecture
설교 seolgyo
1. Introduction to variational d.a.. Nonlinear unconstrained optimization. [63/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Quasi-Newton methods IV
Formally, B(k) ∈ Rn×n matrices. For n large, the storage of thesematrices is prohibitive. A better way to represent these matrices is:
1. store B(0) (the initial approximation) and the pairs {ξ(j), γ(j)}1≤j≤k
2. keep the number of stored vectors small, k ≤ K , by dropping theolder pairs. This is the limited-memory quasi-Newton approach
3. L-BFGS is the current gold standard for full nonlinear 4D-Var dataassimilation problems
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Nonlinear unconstrained optimization. [64/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Incremental 4D-Var I
1. Apply a sequential quadratic programming (SQP) approach(Fisher, 2009). The nonlinear optimization problem isapproximated by a sequence of quadratic optimization problems.
2. Express
xi = x(k)i + ξi = “current solution” + “increment” , i = 1, · · · ,N .
The estimation problem is linearized around the current solutiontrajectory x(k) (Bennett, 2002; Lewis,2005).
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Incremental 4D-Var. [65/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Incremental 4D-Var II
3. This leads to a quadratic cost function:
J incr (ξ0) =12
(∆bx(k)
0 + ξ0
)TB−1
0
(∆bx(k)
0 + ξ0
)+
12
N∑i=1
(Hiξi + d (k)
i
)TR−1
i
(Hiξi + d (k)
i
),
where: d (k)i = H
(x(k)
i
)− yi ,
∆bx(k)0 = x(k)
0 − xb0 .
4. Quadratic optimization gives optimal increment:
ξa0 = arg minJ incr (ξ0) subject to: ξi = Mti−1→ti ξi−1 , i = 1, · · · ,N .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Incremental 4D-Var. [66/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Incremental 4D-Var III5. Update solution:
x(k+1)0 = x(k)
0 + ξa0 .
A new linearization is performed about x(k+1) and the incrementalproblem is solved again to improve the resulting analysis.
6. The gradient of the incremental 4D-Var cost function reads
∇ξ0J incr (ξ0) = B−10
(∆bx(k)
0 + ξ0
)+
N∑i=1
MTi HT
i R−1i
(Hiξi + d (k)
i
)Requires TL and Adj observation operators, and the Adj modelsolution op.
7. The Hessian of the incremental 4D-Var cost function is
∇2x0,x0J (x0) = B−1
0 +N∑
i=1
MTi HT
i R−1i Hi Mi .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Incremental 4D-Var. [67/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Incremental 4D-Var IV
8. The solution of the incremental 4D-Var problem is obtained bysolving the following linear system:
∇2x0,x0J (x0) · ξa
0 = −B−10 ∆bx(k)
0 −N∑
i=1
MTi HT
i R−1i d (k)
i .
The right hand side of this linear system is obtained by one adjointintegration.
9. The symmetric (hopefully positive definite) system can be solvedby a Lanczos iterative procedure (e.g., conjugate gradients). Ateach iteration one Hessian-vector product is required. This isobtained by ...
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Incremental 4D-Var. [68/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Incremental 4D-Var V
Comments. (Tremolet, 2009):I A coarse resolution, simplified physics, linearized model used in
incremental 4D-Var.I The state x(k)
i =Mt0→ti (x(k)0 ) computed with full resolution
nonlinear model.I Innovations are computed with the nonlinear observation operator.I The low resolution increments ξa
0 are interpolated to the full modelresolution to perform the incremental 4D-Var update
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Incremental 4D-Var. [69/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Preconditioning I
The convergence speed depends on the condition number of theHessian
κ = λmax(∇2x0,x0J )/λmin(∇2
x0,x0J ) .
The closer to one it is the faster the convergence; and the higher it isthe slower the convergence. The conditioning depends on:
1. the background,
2. the dynamics of the system through Mi and MTi ,
3. the observations used through Hi and HTi ; note that the addition of
new observations can change the conditioning of the system.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Preconditioning. [70/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Preconditioning IIPreconditioning is important to speed up iterations. The solution iscarried out in the transformed variables z = X−1 δx, whereX ≈ (∇2
x0,x0J )−1/2. The transformed matrix is
∇2z0,z0J (x0) = XT ∇2
x0,x0J (x0) X .
A common choice is X = B1/20 , which makes the background term
equal to identity, and the smallest eigenvalue equal to one. Weassume below that this transformation has been applied and that allHessian eigenvalues are greater than or equal to one.A more involved idea (Tremolet, 2009) is to use the solution processfrom the previous time window to precondition the iterations in thecurrent time window. Let λj and vj be the eigenvalues andeigenvectors of the Hessian in the current assimilation window w ,sorted in decreasing order of the eigenvalues.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Preconditioning. [71/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Preconditioning III
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Preconditioning. [72/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Weakly Constrained 4D-Var I
1. Weakly constrained 4D-Var avoids the assumption of a perfectmodel, implicit in the traditional formulation, at the expense ofsolving a larger optimization problem.
2. The state xi at ti is allowed to differ from the model prediction; thedifference is the model error, considered to be a random variable.With the assumptions that the model is not biased, the modelerror is normally distributed, and model errors at different timesare not correlated, we have that
xi =Mti−1→ti (xi−1) + ηi , ηi ∈ N (0,Qi) , i = 1, · · · ,N .
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Weakly constrained 4D-Var. [73/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Weakly Constrained 4D-Var II
3. The weakly constrained 4D-Var estimate of x = [x0,x1, . . . ,xN ] isthe unconstrained minimizer of the following cost function:
J wk (x) =12(x0 − xb
0)T B−1
0
(x0 − xb
0)
+12
N∑i=1
(H(xi)− yi)T R−1
i (H(xi)− yi)
+12
N∑i=1
(xi −Mti−1→ti (xi−1)
)T Q−1i
(xi −Mti−1→ti (xi−1)
).
4. The model is not imposed exactly. Rather, it is treated as a weakconstraint.
5. The optimization variables are the model states at all timesx ∈ Rn(N+1), and therefore the resulting optimization problem is oflarger dimension than that for strongly-constrained 4D-Var.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Weakly constrained 4D-Var. [74/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Weakly Constrained 4D-Var III6. An alternative is to treat the model is a strong constraint, but
account for the model bias which contributes to the discrepancybetween model predictions and observations.
J wk (x0, β) =12(x0 − xb
0)T B−1
0
(x0 − xb
0)
+12
N∑i=1
(H(xi + βi)− yi)T R−1
i (H(xi + βi)− yi)
+12
N∑i=1
βTi Q−1
i βi ,
xi = Mt0→ti (x0) .
7. Difficult issues are related to the calibration of the covariances Qiand to building temporal correlation models for biases and modelerrors.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Weakly constrained 4D-Var. [75/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Models of the background and observation errorcovariances I
I The quality of the assimilation depends on the accuracy with whichthe background and observation error covariances are known.
I Models of observation errors include information about themeasuring instrument noise and bias (measurement error), andabout the resolution with which the model reproduces thepointwise variability of the physical system (representativenesserror).
I Background error covariances determine the relative weightingbetween observations and a priori data, and dictate how theinformation is spread in space and among variables.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Error covariance models. [76/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Models of the background and observation errorcovariances II
I Background covariances are based on models of the error at thecurrent time (or at initial time in 4D-Var). In case of cyclic dataassimilation the analysis covariance from the previous cycle,transported to the current time, becomes the new backgroundcovariance.
I Background covariance matrices need to:I capture the spatial error correlations created by the flow (transport
and diffusion),I capture the inter-species error correlations created by the chemical
interactions,I have full rank, such that terms of the form xT B−1 x make sense,
andI allow for computationally efficient evaluations of matrix vector
operations of the form B x, B1/2 x, and B−1 x.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Error covariance models. [77/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Models of the background and observation errorcovariances III
I In (Chai ,2007) the CMAQ error statistics are estimated throughboth the NMC (National Meteorological Center) and theHollingsworth-Lönnberg methods.
Background error statistics can be estimated from ensembles of data assimilation runs
Run an ensemble of analyses with random observation and state
perturbations, plus stochastic model error representation.
Form differences between pairs of background fields.
Analysis Forecast xb+εb
Analysis Forecast xb+εb
Analysis Forecast xb+εb
Analysis Forecast xb+ηb
Analysis Forecast xb+ηb
Analysis Forecast xb+ηb
Background differences
Lecture given at EISDA 2012. Seoul, Korea, Aug. 2012. Figure: Run an ensemble of analyses with random observation and stateperturbations, plus stochastic model error representation. Form differ-ences between pairs of background fields.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Error covariance models. [78/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Models of the background and observation errorcovariances IV
I An autoregressive (AR) model approach to represent backgrounderror covariance matrices has been proposed in (Sandu,2007).The background state error field is modeled as a multilateral AR ofthe form
εbi,j,k = αi±1,j±1,k±1 ε
bi±1,j,k + σi,j,k ξi,j,k .
Here (i , j , k) are gridpoint indices on a three dimensionalstructured grid. The model captures the correlations amongneighboring grid points, with α representing the correlationcoefficients in the x , y and z directions. The last term representsthe additional uncertainty at each grid point, with ξ ∈ N (0,1)normal random variables and σ local error variances.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Error covariance models. [79/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Models of the background and observation errorcovariances V
Correct models of background (prior) errors are very important for data assimilation
• Background error representation determines the spread of information, and impacts the assimilation results
• Needs: high rank, capture dynamic dependencies, efficient computations
• Traditionally estimated empirically (NMC, Hollingsworth-Lonnberg)
1. Tensor products of 1d correlations, decreasing with distance (Singh et al, 2010)
2. Multilateral AR model (Constantinescu et al 2007)
3. Hybrid methods in the context of 4D-Var (Cheng et al, 2009)
[Constantinescu and Sandu, 2007]
Lecture given at EISDA 2012. Seoul, Korea, Aug. 2012. Figure: Correlations built by the AR model follow the flow lines.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Error covariance models. [80/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Models of the background and observation errorcovariances VI
I A simplified approach proposed in (Singh et al, 2011) constructsmultidimensional correlation matrices as tensor products ofone-dimensional correlations. This method has resulted inimproved chemical data assimilation results with GEOS-Chem.
I The hybrid approach (Chen et al, 2010) estimates the analysiscovariance at the end of each assimilation window. An ensembledrawn from the background distribution is run side by side with theoptimization process, the subspace of errors corrected by 4D-Varis identified, and the background ensemble modified into one thatsamples the analysis distribution.
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Error covariance models. [81/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)
Thank you
감사합니다 gamsahabnida
Lecture
설교 seolgyo
1. Introduction to variational d.a.. Thank you. [82/82]Lecture given at EISDA 2012, Seoul, Korea, Aug. 22-24, 2012. (http://csl.cs.vt.edu)