[ieee 2011 ieee international symposium on information theory - isit - st. petersburg, russia...

4

Click here to load reader

Upload: aslan

Post on 15-Apr-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2011 IEEE International Symposium on Information Theory - ISIT - St. Petersburg, Russia (2011.07.31-2011.08.5)] 2011 IEEE International Symposium on Information Theory Proceedings

Estimating a Gaussian Random Walk First-PassageTime from Noisy or Delayed Observations

Marat Burnashev and Aslan Tchamkerten

Abstract—Given a Gaussian random walk X with drift, weconsider estimating its first-passage time τ , of a given level `,with a stopping time η defined over an observation process Ythat is either a noisy version of X , or a delayed version of X .For both cases, we provide lower bounds on average momentsE|η − τ |p, p ≥ 1, for any stopping rule η, and exhibit simplestopping rules that achieve these bounds in the large thresholdregime and in the large threshold large delay regime, respectively.The results immediately extend to the corresponding continuoustime settings where X and Y are standard Wiener processes withdrift.

Index Terms—Estimation, Optimal Stopping Theory, StoppingTimes, Hypothesis Testing, Wiener Processes

I. INTRODUCTION

Suppose X = {Xt}t≥0 is a stochastic process and τ astopping time defined over X . Statistician has access to X ,sometimes referred to as the primary process, only throughcorrelated observations Y = {Yt}t≥0, and wishes to find astopping η, over Y , that best tracks τ , e.g., so as to minimizesome p-moment E|η−τ |p.1 This problem, recently introducedin [5] as the Tracking Stopping Time (TST) problem, canbe seen as a generalization of the well-known Bayesianchange-point detection problem whose provenance dates backto the 1930’s, and whose range of applications includes avariety of fields such as econometrics, medical diagnosis,and climate modeling (see, e.g., the books [6] and [1] forsurveys on the theory and applications of the change-pointproblem). In the Bayesian change-point problem, there is arandom variable θ, taking on values in the positive integers,and two probability distributions P0 and P1. Under P0, theconditional density function of Yt given Y1, Y2, . . . , Yt−1 isf0(Yt|Y1, Y2, . . . , Yt−1), for every t ≥ 0. Under P1, theconditional density function of Yt given Y1, Y2, . . . , Yt−1

is f1(Yt|Y1, Y2, . . . , Yt−1), for every t ≥ 0. The observedprocess is distributed according P θ, which assigns the sameconditional density functions as P0 for all t < θ, and thesame conditional density functions as P1 for all t ≥ θ. TheBayesian change-point problem typically consists in finding astopping time η, with respect to {Yt}, that minimizes someloss function of the delay η − τ .

M. Burnashev is with the Institute for Information Transmission Problems,Russian Academy of Sciences (Moscow), Russia. A. Tchamkerten is with theCommunications and Electronics Departement, Telecom ParisTech, France.Email: [email protected], [email protected]. Tchamkerten is partly supported by an Excellence Chair Grant from theFrench National Research Agency (ACE project).

1Recall that a stopping time with respect to a stochastic process {Xt}t≥0

is a random variable τ taking on values in the positive integers such that{τ = t} ∈ Ft, for all t ≥ 0, where Ft denotes the σ-algebra generated byX0, X1, . . . , Xt.

To see that the Bayesian change-point problem can alwaysbe formulated as a TST problem, it suffices to define theprimary process X = {Xt}t≥0 as Xt = 0 for t < θ andXt = 1 for t ≥ θ. The Bayesian change-point problembecomes the TST problem whose goal is to track θ (nowdefined as a stopping time with respect to X) through Y only.

The difference between the Bayesian change-point problemand the TST problem is that the change-point problem alwayssatisfies P(θ = k|τ > t, yt) = P(θ = k|τ > t) fork > t. In contrast, this equality need not be satisfied forthe TST problem [5]. In other words, for TST problems pastobservations are in general useful for predicting the value ofthe tracked stopping time, whereas for Bayesian change-pointproblems past observations are useless if the change in distri-bution hasn’t occurred yet. For specific applications of the TSTproblem formulation related to monitoring, communication,and forecasting we refer to [5, Section I].

In [5], through a computer science approach, a generalalgorithmic solution is proposed for constructing optimal“trackers” for the situation where X and Y are processesdefined over finite alphabets and where τ is bounded. Whatmotivated an algorithmic approach for the TST problem is thatit generalizes the Bayesian change-point problem for whichgeneral closed-form analytical solutions have been reportedonly for asymptotic settings (see, e.g., [4]). Non-asymptoticsolutions have been obtained essentially for i.i.d. cases where,conditioned on the change-point value, observations are inde-pendent with common distribution P0 and P1 before and afterthe change, respectively (see, e.g., [7], [8]).2

Practically important TST settings include the cases wherethe observation process Y is a noisy or delayed version ofX . In this paper, we investigate both situations in a Gaus-sian setting. The primary process X is a Gaussian randomwalk with drift, i.e., Xt = s · t +

∑ti=1 Vi with the Vi’s

i.i.d.∼ N (0, 1) (zero mean unit variance Gaussian randomvariables), s ≥ 0, and τ is the first time when X reachessome given level `. Two observation processes are considered:a noisy process Yt = Xt + ε

∑ti=1Wi with ε ≥ 0 and Wi’s

i.i.d.∼ N (0, 1), and a delayed process given by Yt = Xt−dfor some fixed lag d ≥ 0. For both cases, we establish lowerbounds on infη E|η−τ |p, p ≥ 1, and exhibit stopping rules thatachieve these bounds in the large threshold regime and largethreshold large delay regime, respectively. The results relatedto the noisy observation case generalize the results obtainedin [2] for p = 1.

In Section II we present the main results and in Section III

2A notable exception is [9] which considers Markov chain distributions,but of finite state.

2011 IEEE International Symposium on Information Theory Proceedings

978-1-4577-0595-3/11/$26.00 ©2011 IEEE 1594

Page 2: [IEEE 2011 IEEE International Symposium on Information Theory - ISIT - St. Petersburg, Russia (2011.07.31-2011.08.5)] 2011 IEEE International Symposium on Information Theory Proceedings

we provide sketches of the proofs.

II. MAIN RESULTS

Consider the discrete-time process

X : X0 = 0 Xt =t∑i=1

Vi + st t ≥ 1

where s ≥ 0 is some known constant, where V1, V2, . . . arei.i.d.∼ N (0, 1), and consider the first-passage time

τ` = inf{t ≥ 0 : Xt ≥ `}

for some known fixed threshold level ` ≥ 0.Given sequential observations of a process Y = {Yt}t≥0

correlated to X , we consider the optimization problem

infη

E|η − τ |p, p ≥ 1, (1)

where the infimum is over all stopping times η defined withrespect to the natural filtration induced by Y .

Noisy observations

Consider the observation process

Y : Y0 = 0 Yt = Xt + ε

t∑i=1

Wi t ≥ 1

where W1,W2, . . . are i.i.d.∼ N (0, 1) and where ε ≥ 0is some known constant (the observation noises {Wi} aresupposed to be independent of {Vi}).

The following Theorem generalizes [2, Theorem 2.3] whichconsiders the case p = 1. We use η(Y∞0 ) to denote anestimator of τ that depends on the entire observation processY∞0 (hence, such an estimator need not be a stopping time).

Notice that if ε = 0, X = Y , thus (1) is equal to zero bysetting η = τ`.

Theorem II.1 (Noisy observations). Given 0 < ε < ∞ and0 < s <∞, let

η∗ = n+(`− e(Yn))+

s

where n = b`/s− (`/s)qc,3 with q ∈ (1/2, 1), and where

e(Yn) =ε

(1 + ε2)1/2Yn +

11 + ε2

n .

Then, for any fixed p ≥ 1,

E|η∗ − τ |p = (1 + o(1)) infη(Y∞0 )

E|η − τ |p

=(1 + o(1))

sp

(`ε2

s(1 + ε2)

)p/2E |N |p (2)

as `→∞, where N ∼ N (0, 1).

Note that the very simple stopping rule η∗, which dependson the single observation Yn, not only is uniformly optimalover p ≥ 1, but does as well (asymptotically) as the bestnon-causal estimators of τ which have access to the entire

3x+ denotes max{0, x} and bxc denotes the largest integer not greaterthan x.

observation process Y . In particular, the simplicity of η∗ isin contrast with the optimal stopping rule proposed in [2,Theorem 2.3], for p = 1, which constantly monitors the Xprocess by estimating Xt via Yt for t = 1, 2, ... and bystopping as soon as this estimator reaches level `.

The reason for assuming s to be strictly positive in Theo-rem II.1 is that, for s = 0, ε > 0, and ` > 0, it is impossible tofinitely track τ , even if we have access to the entire observationprocess Y∞0 :

Proposition II.1 ([2], Proposition 2.1.ii.). For s = 0, ε > 0,` > 0, and p ≥ 1/2, we have

E|τ − η|p =∞

for any estimator η = η (Y∞0 ) of τ .

Delayed observations

Consider the observation process

Y : Yt = 0 t ≤ d, Yt = Xt−d t ≥ d+ 1 t ≥ d+ 1

for some fixed (positive integer) delay d ≥ 0.

Theorem II.2 (Delayed observations). Given s > 0 and p ≥1, let

η∗∗ = inf{t ≥ 0 : Yt ≥ `− s · d} ,

then

infη

E|η − τ |p = (1 + o(1))E|η∗∗ − τ |p

= (1 + o(1))dp/2

spE|N |p

as d→∞, ` ≥ s · d.

Similarly as for noisy observations, the case where there isno drift is specific:

Proposition II.2. For s = 0, ` > 0, and p ≥ 1/2

infη

E|η − τ |p = dp = E|η∗∗0 − τ |p

where η∗∗0 = inf{t ≥ 0 : Yt ≥ `}.

In other words, it is optimal to wait until we have absolutecertainty about X having crossed level `. Note that if we usethis stopping policy in the case where s > 0, we achievea delay that is a factor dp/2 larger compared to the bestachievable delay (Theorem II.2).

It is easy to check from our analysis that Theorems II.1 andII.2 remain valid if we replace X and Y by their continuoustime counterparts; i.e., Xt = s · t + Bt and Yt = Xt + εWt

for the noisy observations, and Yt = Xt−d for the delayedobservations, where {Bt}t≥0 and {Wt}t≥0 are independentstandard Wiener processes.

III. ANALYSIS

To prove Theorem II.1 and II.2 we often use the followingLemma, given without proof in the interest of space, on theconcentration of τ around its mean. Claims i. and ii. areobtained via basic large deviations arguments whereas Claimiii. is [3, Theorem 2.5].

1595

Page 3: [IEEE 2011 IEEE International Symposium on Information Theory - ISIT - St. Petersburg, Russia (2011.07.31-2011.08.5)] 2011 IEEE International Symposium on Information Theory Proceedings

Lemma III.1 (Large deviation). Let St =∑ti=1 Zi where

Z1, Z2, . . . are i.i.d. Gaussian random variables with mean0 < s < ∞ and variance σ2 < ∞. Let 0 < ` < ∞ and letτ = inf{t ≥ 1 : St ≥ `}. Then,

i. the following inequalities hold

P (τ < `/s− z) ≤ exp{− s2z2

2σ2(`/s− z)

}(3)

for 0 ≤ z < `/s, and

P (τ > `/s+ z) ≤ exp{− s2z2

2σ2(`/s+ z)

}(4)

for z ≥ 0;ii. for any p > 0

E∣∣∣∣τ − `

s

∣∣∣∣p ≤ k[(

`

s

)p/2+ 1

](5)

where k ≥ 0 is a constant that depends on p, σ2, and s(but not on `);

iii. the distribution of τ − `/s converges to the Gaussiandistribution N (0, `/s3) as `→∞.

Proof Sketch of Theorem II.1: Since Xt and Yt are jointlygaussian, we can represent Xt as4

Xtd= a · t1/2N + e(Yt) , (6)

where

e(Yt) = b · Yt + c · t , (7)

where N ∼ N (0, 1) is independent of Y∞0 , and where

a =ε

(1 + ε2)1/2b =

11 + ε2

c =sε2

1 + ε2. (8)

From (6) and the fact that N is independent of Y∞0 , it followsthat e(Yt) is the best estimator of Xt in the sense that itminimizes E|Xt − e|p, p ≥ 1, among all estimators e thathave access to the entire observation process Y∞0 .

Since τ concentrates around `/s (Claim i. of Lemma III.1),it follows from (6) that with high probability τ takes on valuesfor which

Xτ − e(Yτ )d≈ a · (`/s)1/2N,

i.e., for which the estimation error in process level has analmost constant Gaussian distribution. This uncertainty inprocess level translates into uncertainty over time, via Claimii. of Lemma III.1, yielding

τ − η(Y∞0 )d≈ 1s(Xτ − e(Yτ ))

d≈ a · (`/s3)1/2N (9)

for the best estimators of τ (whether causal or non-causal).From (9) one deduces that for any η = η(Y∞0 ), E|η − τ |p

is lower bounded by the right-side of the second equality in(2), by using large deviations arguments to show that atypicalevents, such as when τ is far from `/s, have negligiblecontribution to the moments.

For an optimal estimator, and based on the above discussion,it may be tempting to consider stopping at time `/s, then

4Throughout the paper we use “ d=” to denote equality in distribution.

declare time `/s + (` − e(Y`/s))/s. However, although thisestimator is (asymptotically) optimal, it is not a stopping rulesince causality may be violated. To circumvent this problem,it suffices to stop at a time n < `/s such that n ≈ `/s andP(τ ≥ n) ≈ 1, and declare time n+ (`−e(Yn))+

s . This providesan intuitive justification for the optimality of η∗.

Proof Sketch of Theorem II.2: Note the chief differencebetween the noisy observation and the delayed observationcases. For the noisy observation case, there is always some in-herent uncertainty about any Xt (t ≥ 1) due to the observationnoise. This allowed us to derive a lower bound by consideringestimators that have access to the entire observation process.In the delayed observation case instead, the uncertainty aboutXt vanishes whenever Yt+d is observed.

We now present the proof’s main idea. The stopping timeη∗∗ is in fact a very natural stopping time to consider since, onaverage, Xt is s ·d higher than Yt. Now, the time needed to gofrom level `−s ·d to level ` has (approximately) the Gaussiandistribution d+(

√d/s)N by Claim iii. of Lemma III.1. It then

follows that τ − η∗∗d≈ (√d/s)N .

The optimality of η∗∗ is established essentially by using acombination of Lemma III.1 Claim iii. and the fact (which canbe proved) that an optimal stopping rule stops before time

ν = inf{t ≥ 0 : Yt ≥ `− s · d(1− ε)}

with high probability as d→∞, ` ≥ s · d, for any ε > 0.Proof of Proposition II.2: We show that for any stopping

time η on Y such that P(η < τ + d) > 0,

E(|η − τ |p|Yη, η < τ + d) =∞ . (10)

Hence, if η satisfies E|η − τ |p <∞, necessarily

P(η ≥ τ + d) = 1 .

Then,

infη

E|η − τ |p = infη:P(η≥τ+d)=1

E|η − τ |p

≥ dp

= E|η∗∗0 − τ |p

where η∗∗0 = inf{t ≥ 0 : Yt ≥ `}.We prove (10). Equivalently, we show that for any stopping

rule η over X such that P(η < τ) > 0,

E(|η − τ |p|Xη, η < τ) =∞ . (11)

Let {Bt}t≥0 be a standard Wiener process starting at time ηat level B0 = Xη = `− h, for some h > 0. Define

τ̃h = inf{t ≥ 0 : Bt = `}.

Since τ̃h ≤ τ − η, had we proved that Eτ̃ph =∞, (11) wouldhold.

From the reflection principle

P(τ̃h ≤ t) = 2P(Bt ≥ h) = 2Q(h√t

)h > 0, t > 0 ,

1596

Page 4: [IEEE 2011 IEEE International Symposium on Information Theory - ISIT - St. Petersburg, Russia (2011.07.31-2011.08.5)] 2011 IEEE International Symposium on Information Theory Proceedings

where Q(x) = (1/√

2π)∫∞x

exp(−x2/2)dx. Hence,

Eτ̃ph = 2

∞∫0

tpdQ

(h√t

)

=h√2π

∞∫0

tp

t3/2e−h

2/2tdt

>he−h/2√

∞∫h

tp

t3/2dt.

Therefore, if p ≥ 1/2, then Eτ̃ph =∞.

REFERENCES

[1] Michèle Basseville and Igor Nikiforov. Detection of abrupt changes:theory and application. Prentice-Hall, 1993.

[2] M. V. Burnashev and A. Tchamkerten. Tracking a gaussian ran-dom walk first-passage time through noisy observations. http ://arxiv4.library.cornell.edu/abs/1005.0616, 2010.

[3] A. Gut. On the moments and limit distributions of some first passagetimes. Ann. Prob., 2(2):277–308, 1974.

[4] T.Z. Lai. Information bounds and quick detection of parameter changesin stochastic systems. IEEE Trans. Inform. Th., 44:2917–2929, November1998.

[5] U. Niesen and A. Tchamkerten. Tracking stopping times through noisyobservations. IEEE Trans. Inform. Th., 55(1):422–432, January 2009.

[6] H.V. Poor and O. Hadjiliadis. Quickest detection. Cambridge, New York,2009.

[7] A. N. Shiryaev. On optimum methods in quickest detection problems.Th. Prob. and its App., 8(1):22–46, 1963.

[8] A. N. Shiryayev. Optimal Stopping rules. Springer-Verlag, 1978.[9] B. Yakir. Optimal detection of a change in distribution when the

observations form a Markov chain with a finite state space. In Change-point problems, volume 23, pages 346–358. Institute of MathematicalStatistics, Lecture Notes, Monograph Series, 1994.

1597