stochastic maximum principle in the mean-field controls

8

Click here to load reader

Upload: juan-li

Post on 02-Sep-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stochastic maximum principle in the mean-field controls

Automatica 48 (2012) 366–373

Contents lists available at SciVerse ScienceDirect

Automatica

journal homepage: www.elsevier.com/locate/automatica

Brief paper

Stochastic maximum principle in the mean-field controls

Juan Li 1School of Mathematics and Statistics, Shandong University at Weihai, Weihai 264209, PR China

a r t i c l e i n f o

Article history:Received 10 May 2010Received in revised form21 June 2011Accepted 19 July 2011Available online 17 December 2011

Keywords:Mean-field modelsBackward stochastic differential equationsStochastic maximum principleLinear quadratic controls

a b s t r a c t

In Buckdahn, Djehiche, Li, and Peng (2009), the authors obtained mean-field Backward StochasticDifferential Equations (BSDEs) in a natural way as a limit of some highly dimensional system of forwardand backward SDEs, corresponding to a great number of ‘‘particles’’ (or ‘‘agents’’). The objective of thepresent paper is to deepen the investigation of such mean-field BSDEs by studying their stochasticmaximum principle. This paper studies the stochastic maximum principle (SMP) for mean-field controls,which is different from the classical ones. This paper deduces an SMP in integral form, and it also gets,under additional assumptions, necessary conditions as well as sufficient conditions for the optimality ofa control. As an application, this paper studies a linear quadratic stochastic control problem of mean-fieldtype.

© 2011 Elsevier Ltd. All rights reserved.

1. Introduction

In this paper, we consider the following mean-field typestochastic control problem, whose state equation is related to akind of McKean–Vlasov equation (refer to, for example, Sznitman(1991)):

dXt = E ′b(t, X ′

t , Xt , vt)dt + E ′

σ(t, X ′

t , Xt , vt)dBt ,

X0 = x, t ≥ 0, (1.1)

and cost functional is of the form

J(v) = E

T

0E ′

h(t, X ′

t , Xt , vt)dt + E ′

Φ(X ′

T , XT )

. (1.2)

More precisely, state equation (1.1) and cost functional (1.2) are inthe following forms, respectively:

dXt =

Ω

b(t, Xt(ω′), Xt , vt(ω

′, .))P(dω′)dt

+

Ω

σ(t, Xt(ω′), Xt , vt(ω

′, .))P(dω′)dBt ,

X0 = x, t ≥ 0;

This work was partially supported by the NSF of PR China (Nos. 10701050,11071144), Independent Innovation Foundation of Shandong University, SRF forROCS (SEM) and National Basic Research Program of China (973 Program) (No.2007CB814904). The material in this paper was not presented at any conference.This paper was recommended for publication in revised form by Associate EditorGeorge Yin under the direction of Editor Ian R. Petersen.

E-mail addresses: [email protected], [email protected] Tel.: +86 631 5672575; fax: +86 631 5688523.

0005-1098/$ – see front matter© 2011 Elsevier Ltd. All rights reserved.doi:10.1016/j.automatica.2011.11.006

and

J(v) = E

Ω

T

0h(t, Xt(ω

′), Xt , vt(ω′, .))dt

+Φ(XT (ω′), XT )

P(dω′)

.

Here B is the driving d-dimensional Brownian motion defined ona probability space (Ω,F , P), and v is from the space U of alladmissible controls taking their values in a convex U , where Uis supposed to be a convex subset of Rk (k ≥ 1). The admissiblecontrols vt(ω′, ω) : [0, T ] × Ω × Ω → U are supposed to be(F ⊗ F B

t )-progressively measurable.Stochastic differential equations of such type (without control)

were got recently in (2009) as a limit of highly dimensionalstochastic differential equations with interaction, and the abovecontrol problem can be regarded as such a limit of highlydimensional controlled stochastic differential equations. As forclassical control problems, it is important to be able to detectand characterize optimal controls of such control problems. So westudy the stochastic control problem consisting in minimizing theabove defined cost functional J over the set U = L2F(0, T ,U) ofadmissible controls. An admissible control u ∈ U is called optimalif

J(u) = minv∈U

J(v). (1.3)

In this paper, we study the stochastic maximum principle (SMP)with the help of a convex perturbation of the optimal control. Byextending classical approaches to the mean-field framework, weprove a necessary optimality condition as well as, under some

Page 2: Stochastic maximum principle in the mean-field controls

J. Li / Automatica 48 (2012) 366–373 367

restrictive assumptions (but comparablewith those in the classicalcase) the sufficiency of the necessary conditions.

Kushner (1965, 1972) was the first to study the SMP. Later,Haussmann gave a powerful version of the SMP (we referthe reader to Haussmann (1986) and the references in itsbibliography). Themain limit ofHaussmann’s theory consists in thefact that the diffusion coefficient does not depend on the control.Since these pioneering papers a lot of work on the stochasticmaximum principle have been done by different authors. Withoutbeing exhaustive, let us refer, for example, to those by Arkin andSaksonov (1979), Andersson and Djehiche (2011), Bahlali (2008),Bensoussan (1982), Bismut (1978), Cadenillas and Karatzas (1995),Elliott (1990), Peng (1990) as well as Hu and Zhou (2003). Inthese cited papers, different versions of the stochastic maximumprinciple were obtained, adapted to different frameworks. In thepresent paper, we adapt the methods developed by Bensoussan(1982), in order to get the necessary conditions of the optimalityof a control. So we suppose that the control state space isconvex. This convexity of the control state space allows to use anargument of convex perturbation of the optimal control in order todeduce the maximum principle. We emphasize that in Anderssonand Djehiche’s recent paper (Andersson & Djehiche, 2011), themaximum principle of stochastic differential equations (SDEs) ofmean-field type was studied, but in the present work we not onlyhave a different controlled system but also our method differs.Indeed, for us the technique of duality is important to get theadjoint equation to the controlled SDE. In the work of Meyer-Brandis, ØSendal, and Zhou (in press), there are no mean-fieldvariables in the controlled forward equation, there are only mean-field variables in the cost functional, but in the more specific form:E[f0(Xt)] in the running cost and E[g(XT )] in the terminal cost.The authors characterize the optimal control with the help of theMalliavin calculus (see, Theorem 3.4).

In Buckdahn, Djehiche, Li, and Peng (2009) and Buckdahnand Li et al. (2009), the authors introduced a new kind ofbackward stochastic differential equations (BSDEs)—mean-fieldBSDEs, inspired by Lasry and Lions (2007). The present paperstudies the SMP in this mean-field framework. We deduce theadjoint equation which turns out to be itself again of mean-field type, a mean-field backward SDE, and we derive necessaryconditions for the optimality of a stochastic control. Furthermore,under additional assumptions, we prove that our necessaryconditions are also sufficient ones. In the endwe study the exampleof a linear quadratic control problem of mean-field type.

2. Preliminaries

Let Btt≥0 be a d-dimensional standard Brownian motiondefined over some complete probability space (Ω,F , P). By F =

Fs, 0 ≤ s ≤ T we denote the natural filtration generated byBs0≤s≤T and augmented by all P-null sets, i.e.,

Fs = σ Br , r ≤ s ∨ NP , s ∈ [0, T ],

where NP is the set of all P-null subsets and T > 0 a fixed realtime horizon. For any n ≥ 1, |z| denotes the Euclidean normof z ∈ Rn. We shall also introduce the following two spaces ofprocesses which are used frequently in what follows:

S2F(0, T ; R) := (ψt)0≤t≤T real-valued F-adapted càdlàg pro-

cess: E[sup0≤t≤T |ψt |2] < +∞;

H2F(0, T ; Rn) := (ψt)0≤t≤TRn-valuedF-progressivelymeasur-

able process: ∥ψ∥22 = E[

T0 |ψt |

2dt] < +∞.Let us now consider a function g : Ω×[0, T ]×R×Rd

→ Rwiththe property that (g(t, y, z))t∈[0,T ] is F-progressively measurablefor each (y, z) in R × Rd, and which is assumed to satisfy thefollowing standard assumptions throughout the paper.

(H1) There exists a constant C ≥ 0 such that, dtdP-a.e., for ally1, y2 ∈ R, z1, z2 ∈ Rd, |g(t, y1, z1)− g(t, y2, z2)| ≤ C(|y1 − y2| +|z1 − z2|);

(H2) g(·, 0, 0) ∈ H2F(0, T ; R).

The following result on BSDEs is by now well-known, for itsproof the reader is referred to Pardoux and Peng (1990).

Lemma 2.1. Under the Assumptions (H1) and (H2) , for any randomvariable ξ ∈ L2(Ø,FT , P), the BSDE

yt = ξ +

T

tg(s, ys, zs)ds −

T

tzs dBs, 0 ≤ t ≤ T , (2.1)

has a unique adapted solution

(yt , zt)t∈[0,T ] ∈ S2F(0, T ; R)× H2

F(0, T ; Rd).

2.1. Mean-field BSDEs

This section is devoted to the recall of some basic results on anew type of BSDEs, the so called mean-field BSDEs; the reader isreferred to Buckdahn, Djehiche, Li, and Peng (2009) and Buckdahnand Li et al. (2009).

Let (Ω, F , P) = (Ω × Ω,F ⊗ F , P ⊗ P) be the (non-completed) product of (Ω,F , P) with itself. We endow thisproduct space with the filtration F = Ft = F ⊗ Ft , 0 ≤ t ≤ T .Any random variable ξ ∈ L0(Ω,F , P; Rn) originally defined onΩis extended canonically to Ω: ξ ′(ω′, ω) = ξ(ω′), (ω′, ω) ∈ Ω =

Ω × Ω . For any θ ∈ L1(Ω, F , P) the variable θ(., ω) : Ω → Rbelongs to L1(Ω,F , P), P(dω)-a.s.; we denote its expectation by

E ′[θ(., ω)] =

Ω

θ(ω′, ω)P(dω′).

Notice that E ′[θ ] = E ′

[θ(., ω)] ∈ L1(Ω,F , P), and

E[θ ]

=

Ω

θdP =

Ω

E ′[θ(., ω)]P(dω)

= E[E ′

[θ ]].

The driver of our mean-field BSDE is a function f = f (ω′, ω,t, y, z, y, z) : Ω × [0, T ] × R × Rd

× R × Rd→ R which is

F-progressively measurable, for all (y, z, y, z), and which satisfiesthe following assumptions.

(H3) There exists a constant C ≥ 0 such that, P-a.s., for all t ∈

[0, T ], y1, y2, y1, y2 ∈ R, z1, z2, z1, z2 ∈ Rd, |f (t, y1, z1, y1, z1) −

f (t, y2, z2, y2, z2)| ≤ C(|y1 − y2|+ |z1 − z2|+ |y1 − y2|+ |z1 − z2|).(H4) f (·, 0, 0, 0, 0) ∈ H2

F(0, T ; R).We now recall a main result of Buckdahn and Li et al. (2009).

Theorem 2.1. Under the Assumptions (H3) and (H4) , for any ran-dom variable ξ ∈ L2(Ω,FT , P), the mean-field BSDE

Yt = ξ +

T

tE ′

[f (s, Y ′

s , Z′

s, Ys, Zs)]ds −

T

tZs dBs, (2.2)

0 ≤ t ≤ T , has a unique adapted solution

(Yt , Zt)t∈[0,T ] ∈ S2F(0, T ; R)× H2

F(0, T ; Rd).

Remark 2.1. We emphasize that, due to our notations, the drivingcoefficient of (2.2) has to be interpreted as follows

E ′[f (s, Y ′

s , Z′

s, Ys, Zs)](ω) = E ′[f (s, Y ′

s , Z′

s, Ys(ω), Zs(ω))]

=

Ω

f (ω′, ω, s, Ys(ω′), Zs(ω′), Ys(ω), Zs(ω))P(dω′).

2.2. McKean–Vlasov SDEs

Weshall also considerMcKean–Vlasov-type SDEs, in order to beable to introduce the stochastic control problem we want to studyin the next section. As concerns McKean–Vlasov-type SDEs, the

Page 3: Stochastic maximum principle in the mean-field controls

368 J. Li / Automatica 48 (2012) 366–373

reader is, for example, referred to Buckdahn and Li et al. (2009). Letb : Ω × [0, T ] × Rn

× Rn→ Rn andσ : Ω × [0, T ] × Rn

× Rn→

Rn×d be two measurable functions which are supposed to satisfythe following conditions:

(H5) (i) b(·, x, x) and σ(·, x, x) are F-progressively measurablecontinuous processes, for all x, x ∈ Rn, and there exists someconstant C > 0 such that

|b(t, x, x)| + |σ(t, x, x)| ≤ C(1 + |x| + |x|), a.s.,

for all 0 ≤ t ≤ T , x, x ∈ Rn;

(ii) b and σ are Lipschitz in x, x, i.e., there is some constantC > 0 such that |b(t, x1, x1) − b(t, x2, x2)| + |σ(t, x1, x1) −

σ(t, x2, x2)| ≤ C(|x1 − x2| + |x1 − x2|), a.s., for all 0 ≤ t ≤

T , x1, x1, x2, x2 ∈ Rn.We now study the following SDE parameterized by the initial

condition (t, ζ ) ∈ [0, T ] × L2(Ω,Ft , P; Rn):dX t,ζ

s = E ′[b(s, (X t,ζ

s )′, X t,ζs )]ds

+E ′[σ(s, (X t,ζ

s )′, X t,ζs )]dBs,

X t,ζt = ζ , s ∈ [t, T ].

(2.3)

We recall that, due to our notational convention,

E ′[b(s, (X t,ζ

s )′, X t,ζs )](ω)

=

Ω

b(ω′, ω, s, X t,ζs (ω′), X t,ζ

s (ω))P(dω′), ω ∈ Ω.

Theorem 2.2. Under Assumptions (H5) SDE (2.3) has a uniquestrong solution.

Remark 2.2. From standard arguments we also get that, for anyp ≥ 2, there exists Cp ∈ R such that, for all t ∈ [0, T ] andζ , ζ ′

∈ Lp(Ω,Ft , P; Rn),

Esupt≤s≤T

|X t,ζs − X t,ζ ′

s |p|Ft

≤ Cp|ζ − ζ ′

|p, a.s.,

Esupt≤s≤T

|X t,ζs |

p|Ft

≤ Cp(1 + |ζ |p), a.s.,

E

supt≤s≤t+δ

|X t,ζs − ζ |

p|Ft

≤ Cp(1 + |ζ |p)δ

p2 , (2.4)

P-a.s., for all δ > 0 with t + δ ≤ T .These, in the classical case, well-known standard estimates

can be consulted, for instance, in Ikeda and Watanabe (1989,pp. 166–168) and also in Karatzas and Shreve (1987, pp. 289–290).We also emphasize that the constant Cp in (2.4) only depends onthe Lipschitz and the growth constants of b and σ .

3. Formulation of the problem

In this section, we study the stochastic maximum principle forour mean-field control problem. In order to be more precise, letus consider the following mean-field type optimal control system,with the state equation

dXt = E ′[b(t, X ′

t , Xt , vt)]dt + E ′[σ(t, X ′

t , Xt , vt)]dBt ,

X0 = x, t ≥ 0, (3.1)

and the cost functional

J(v) = E T

0E ′

[h(t, X ′

t , Xt , vt)]dt + E ′[Φ(X ′

T , XT )]

. (3.2)

The control problem consists in minimizing the functional J overthe space U = L2F(0, T ;U) of admissible controls, where U is

supposed to be a convex subset ofRk (k ≥ 1). An admissible controlu ∈ U is said to be optimal ifJ(u) = min

v∈UJ(v). (3.3)

Through what follows we shall assume the following.(A1) (i) The given functions b(t, x, x, v) : [0, T ] × Rn

× Rn× U →

Rn; σ(t, x, x, v) : [0, T ] × Rn

× Rn× U → Rn×d

; h(t, x, x, v) :

[0, T ] × Rn× Rn

× U → R; and Φ(x, x) : Rn× Rn

→ R,t ∈ [0, T ], x, x ∈ Rn, v ∈ U are differentiable with respect to(x, x, v).

(ii) The derivatives of b, σ are Lipschitz continuous andbounded.

(iii) The derivatives of h,Φ are Lipschitz continuous andbounded by C(1 + |x| + |x| + |v|).

Remark 3.1. For simplification we will make use of the followingnotations concerningmatrices.Wedenote byRn×d the space of realmatrices of n×d-type, and byRn×n

d the linear space of the vectors ofmatricesM = (M1, . . . ,Md), withMi ∈ Rn×n, 1 ≤ i ≤ d. Given anyα, β ∈ Rn, L, S ∈ Rn×d, γ ∈ Rd and M,N ∈ Rn×n

d , we introducethe following notations: αβ = Σn

i=1αiβi ∈ R; LS =d

i=1 LiSi ∈ R,where L = (L1, . . . , Ld), S = (S1, . . . , Sd); ML =

di=1 MiLi ∈ Rn;

Mαγ =d

i=1(Miα)γi ∈ Rn; MN =d

i=1 MiNi ∈ Rn×n.Let us suppose now that u is an optimal control and Xu theassociated optimal trajectory, defined by SDE (3.1) for v = u. Thenwe define the perturbed control as follows:

µθt = ut + θ(vt − ut), (3.4)where θ > 0 is sufficiently small (0 < θ < 1) and vt is anarbitrary element of U. We emphasize that the convexity of U hasthe consequence that µθ ∈ U.

We denote by Xθ the solution of (3.1) of the system associatedwith the control µθ . From the optimality of u the variationalinequality will be derived from the fact that

0 ≤ J(µθ )− J(u). (3.5)For this end we need the following classical results which we

have to translate to our framework.

Lemma 3.1. Under the above assumptions on the coefficients wehave,

limθ→0

Esup

0≤t≤T|Xθt − Xu

t |2

= 0. (3.6)

Proof. Fromstandard estimates and theBurkholder–Davis–Gundyinequality we get that, for some CT > 0, only depending on T > 0and the Lipschitz coefficients of b and σ :

Esups∈[0,t]

|Xθs − Xus |

2

≤ 2TE t

0|E ′

[b(s, (Xθs )′, Xθs , µ

θs )− b(s, (Xu

s )′, Xu

s , us)]|2ds

+ 8E

t

0|E ′

[σ(s, (Xθs )′, Xθs , µ

θs )− σ(s, (Xu

s )′, Xu

s , us)]|2ds

≤ CTE

t

0|Xθs − Xu

s |2ds + θ2CT E

t

0|vs − us|

2ds, (3.7)

t ∈ [0, T ]. FromGronwall’s Lemmawe have the desired result.

Lemma 3.2. Let Kt be the solution of the following linear equation:dKt = E ′

bx(t, (Xu

t )′, Xu

t , ut)(Kt)′+ bx(t, (Xu

t )′, Xu

t , ut)Kt

+bv(t, (Xut )

′, Xut , ut)(vt − ut)

dt

+E ′σx(t, (Xu

t )′, Xu

t , ut)(Kt)′+ σx(t, (Xu

t )′, Xu

t , ut)Kt

+σv(t, (Xut )

′, Xut , ut)(vt − ut)

dBt ,

K0 = 0.

(3.8)

Page 4: Stochastic maximum principle in the mean-field controls

J. Li / Automatica 48 (2012) 366–373 369

Then, we have

limθ→0

E

sups∈[0,t]

Xθs − Xus

θ− Ks

2

= 0, for all t ∈ [0, T ]. (3.9)

Proof. From Theorem 2.2, we know Eq. (3.8) has a unique strongsolution K . We put

ηt =Xθt − Xu

t

θ− Kt , t ∈ [0, T ]. (3.10)

Then we have

ηt =1θ

t

0E ′

b(s, (Xθs )

′, Xθs , µθs )− b(s, (Xu

s )′, Xu

s , us)ds

+1θ

t

0E ′

σ(s, (Xθs )

′, Xθs , µθs )− σ(s, (Xu

s )′, Xu

s , us)dBs

t

0E ′

bx(s, (Xu

s )′, Xu

s , us)K ′

s + bx(s, (Xus )

′, Xus , us)Ks

+ bv(s, (Xus )

′, Xus , us)(vs − us)

ds

t

0E ′

σx(s, (Xu

s )′, Xu

s , us)K ′

s

+ σx(s, (Xus )

′, Xus , us)Ks

+ σv(s, (Xus )

′, Xus , us)(vs − us)

dBs. (3.11)

We notice that1θ

t

0E ′

b(s, (Xθs )

′, Xθs , µθs )− b(s, (Xu

s )′, Xθs , µ

θs )

ds

=

t

0

1

0E ′

bx(s, (Xu

s )′+ λθ(ηs

+ Ks)′, Xθs , µ

θs )(ηs + Ks)

′dλds;

t

0E ′

b(s, (Xu

s )′, Xθs , µ

θs )− b(s, (Xu

s )′, Xu

s , µθs )

ds

=

t

0

1

0E ′

bx(s, (Xu

s )′, Xu

s + λθ(ηs

+ Ks), µθs )(ηs + Ks)

dλds;

t

0E ′

b(s, (Xu

s )′, Xu

s , µθs )− b(s, (Xu

s )′, Xu

s , us)ds

=

t

0

1

0E ′

bv(s, (Xu

s )′, Xu

s , us

+ λθ(vs − us))(vs − us)dλds.

The analogue relations hold for σ . On the other hand, we have

t

0E ′

[b(s, (Xθs )′, Xθs , µ

θs )− b(s, (Xu

s )′, Xθs , µ

θs )]ds

t

0E ′

[bx(s, (Xus )

′, Xus , us)K ′

s ]ds

=

t

0

1

0E ′

bx(s, (Xu

s )′+ λθ(ηs + Ks)

′, Xθs , µθs )(ηs)

dλds

+

t

0

1

0E ′

[(bx(s, (Xus )

′+ λθ(ηs + Ks)

′, Xθs , µθs )

− bx(s, (Xus )

′, Xus , us))(Ks)

′]dλds.

We denote by

Iθt =

t

0

1

0E ′

(bx(s, (Xu

s )′+ λθ(ηs + Ks)

′, Xθs , µθs )

− bx(s, (Xus )

′, Xus , us))(Ks)

′dλds, t ∈ [0, T ].

Then, from Lemma 3.1, (3.10) and the Lipschitz continuity ofbx(x, x, v)with respect to x, x, v, we have

limθ→0

E

sups∈[0,T ]

|Iθs |2

= 0.

Therefore, we get

Esups∈[0,t]

|ηs|2

≤ CE t

0

1

0E ′

[|bx(s, (Xus )

′+ λθ(ηs

+ Ks)′, Xθs , µ

θs )(ηs)

′|2]dλds

+ CE t

0

1

0E ′

[|bx(s, (Xus )

′, Xus + λθ(ηs

+ Ks), µθs )ηs|

2]dλds

+ CE t

0

1

0E ′

[|σx(s, (Xus )

′+ λθ(ηs

+ Ks)′, Xθs , µ

θs )(ηs)

′|2]dλds

+ CE t

0

1

0E ′

[|σx(s, (Xus )

′, Xus

+ λθ(ηs + Ks), µθs )ηs|

2]dλds

+ CEsups∈[0,t]

|βθs |2, (3.12)

where

βθt =

t

0

1

0E ′

(bx(s, (Xu

s )′+ λθ(ηs + Ks)

′, Xθs , µθs )

− bx(s, (Xus )

′, Xus , us))(Ks)

′dλds

+

t

0

1

0E ′

(bx(s, (Xu

s )′, Xu

s + λθ(ηs + Ks), µθs )

− bx(s, (Xus )

′, Xus , us))Ks

dλds

+

t

0

1

0E ′

(bv(s, (Xu

s )′, Xu

s , us + λθ(vs − us))

− bv(s, (Xus )

′, Xus , us))(vs − us)

dλds

+

t

0

1

0E ′

(σx(s, (Xu

s )′+ λθ(ηs + Ks)

′, Xθs , µθs )

− σx(s, (Xus )

′, Xus , us))(Ks)

′dλdBs

+

t

0

1

0E ′

(σx(s, (Xu

s )′, Xu

s + λθ(ηs + Ks), µθs )

− σx(s, (Xus )

′, Xus , us))Ks

dλdBs

+

t

0

1

0E ′

(σv(s, (Xu

s )′, Xu

s , us + λθ(vs − us))

− σv(s, (Xus )

′, Xus , us))(vs − us)

dλdBs. (3.13)

Now, proceeding as in the estimate of Iθt , we see that

limθ→0

E

sups∈[0,T ]

|βθs |2

= 0.

Page 5: Stochastic maximum principle in the mean-field controls

370 J. Li / Automatica 48 (2012) 366–373

On the other hand, since the derivatives of b and σ are bounded,(3.12) yields

Esups∈[0,t]

|ηs|2

≤ CE t

0|ηs|

2ds + CEsups∈[0,t]

|βθs |2, t ∈ [0, T ].

Finally, the application of Gronwall’s Lemma allows to completethe proof.

Lemma 3.3. Let u be an optimal control and Xut the corresponding

optimal trajectory. Then, for any v ∈ U, we have

0 ≤ EE ′

Φx((Xu

T )′, Xu

T )(KT )′+ Φx((Xu

T )′, Xu

T )KT

+ E T

0E ′

hx(t, (Xu

t )′, Xu

t , ut)(Kt)′

+ hx(t, (Xut )

′, Xut , ut)Kt

+ hv(t, (Xut )

′, Xut , ut)(vt − ut)

dt. (3.14)

Proof. From (3.5) we deduce

0 ≤ J(µθ )− J(u)= E

E ′

Φ((XθT )

′, XθT )− Φ((XuT )

′, XuT )

+ E

T

0E ′

h(t, (Xθt )

′, Xθt , µθt )− h(t, (Xu

t )′, Xu

t , µθt )

dt

+ E T

0E ′

h(t, (Xu

t )′, Xu

t , µθt )− h(t, (Xu

t )′, Xu

t , ut)dt

=: I1 + I2 + I3. (3.15)

Moreover, from the definition of µθt = ut + θ(vt − ut)we obtain

I1 = EE ′

Φ((XθT )

′, XθT )− Φ((XuT )

′, XθT )

+ EE ′

Φ((Xu

T )′, XθT )− Φ((Xu

T )′, Xu

T )

= E

E ′

1

0Φx((Xu

T )′+ λθ(ηT + KT )

′, XθT )θ(ηT + KT )′dλ

+ E

E ′

1

0Φx((Xu

T )′, Xu

T + λθ(ηT

+ KT ))θ(ηT + KT )dλ. (3.16)

I2 = E T

0E ′

h(t, (Xθt )

′, Xθt , µθt )− h(t, (Xu

t )′, Xθt , µ

θt )

dt

+ E T

0E ′

h(t, (Xu

t )′, Xθt , µ

θt )− h(t, (Xu

t )′, Xu

t , µθt )

dt

= E T

0E ′

1

0hx(t, (Xu

t )′

+ λθ(ηt + Kt)′, Xθt , µ

θt )θ(ηt + Kt)

′dλ

+ E T

0E ′

1

0hx(t, (Xu

t )′, Xu

t

+ λθ(ηt + Kt), µθt )θ(ηt + Kt)dλ

dt. (3.17)

I3 = E T

0E ′

h(t, (Xu

t )′, Xu

t , µθt )− h(t, (Xu

t )′, Xu

t , ut)

dt

= E T

0E ′

1

0hv(t, (Xu

t )′, Xu

t , ut

+ λθ(vt − ut))θ(vt − ut)dλdt. (3.18)

Consequently, from (3.15) we get

0 ≤ EE ′

1

0Φx((Xu

T )′+ λθ(ηT + KT )

′, XθT )(KT )′dλ

+ E

E ′

1

0Φx((Xu

T )′, Xu

T + λθ(ηT + KT ))KTdλ

+ E T

0E ′

1

0hx(t, (Xu

t )′+ λθ(ηt

+ Kt)′, Xθt , µ

θt )(Kt)

′dλdt

+ E T

0E ′

1

0hx(t, (Xu

t )′, Xu

t + λθ(ηt + Kt), µθt )Ktdλ

dt

+ E T

0E ′

1

0hv(t, (Xu

t )′, Xu

t , ut

+ λθ(vt − ut))(vt − ut)dλdt + ρθt , (3.19)

where

ρθt = E

E ′

1

0Φx((Xu

T )′+ λθ(ηT + KT )

′, XθT )(ηT )′dλ

+ EE ′

1

0Φx((Xu

T )′, Xu

T + λθ(ηT + KT ))ηTdλ

+ E T

0E ′

1

0hx(t, (Xu

t )′+ λθ(ηt

+ Kt)′, Xθt , µ

θt )(ηt)

′dλdt

+ E T

0E ′

1

0hx(t, (Xu

t )′, Xu

t

+ λθ(ηt + Kt), µθt )ηtdλ

dt.

From (3.9) we see that limθ→0 E[sups∈[0,T ] |ηs|2] = 0. Furthermore,

since the derivatives ofΦ and h are bounded, we have

limθ→0

ρθt = 0.

Finally, from (3.6), (3.19), µθt → ut (as θ → 0) and from theLipschitz continuity of the derivatives of Φ and h we obtain theresult.

3.1. Variational inequality and adjoint equation

In this subsection, we introduce the adjoint process. With thehelp of this process,wewill easily deduce the variational inequalityfrom (3.14). Let us consider the following adjoint equation:

−dpt = E ′bx(t, (Xu

t )′, Xu

t , ut)pt+σx(t, (Xu

t )′, Xu

t , ut)qt + hx(t, (Xut )

′, Xut , ut)

+bx(t, Xut , (X

ut )

′, (ut)′)(pt)′

+σx(t, Xut , (X

ut )

′, (ut)′)(qt)′

+hx(t, Xut , (X

ut )

′, (ut)′)dt − qtdBt ,

pT = E ′[Φx((Xu

T )′, Xu

T )+ Φx(XuT , (X

uT )

′)].

(3.20)

With its help we are going to give themost important result of thispaper: our version of the stochastic maximum principle.

Page 6: Stochastic maximum principle in the mean-field controls

J. Li / Automatica 48 (2012) 366–373 371

Theorem 3.1 (SMP in Integral Form). Let u be an optimal controlminimizing J over U, and let Xu

t denote the corresponding optimaltrajectory. Then the unique solution

(p, q) ∈ S2F(0, T ; R)× H2

F(0, T ; Rd)

of mean-field BSDE (3.20) satisfies the following integral SMP: for allv ∈ U,

E T

0E ′

Hv(t, (Xu

t )′, Xu

t , pt , qt , ut)(vt − ut)dt ≥ 0, (3.21)

where H(t, x, x, p, q, v) = pb(t, x, x, v) + qσ(t, x, x, v) + h(t, x,x, v).

Proof. By applying Itô’s formula to ptKt , we obtain

EE ′

Φx((Xu

T )′, Xu

T )+ Φx(XuT , (X

uT )

′)KT

= E

T

0E ′

ptbv(t, (Xu

t )′, Xu

t , ut)(vt − ut)dt

− E T

0E ′

hx(t, (Xu

t )′, Xu

t , ut)Ktdt

− E T

0E ′

hx(t, Xu

t , (Xut )

′, (ut)′)Kt

dt

+ E T

0E ′

qtσv(t, (Xu

t )′, Xu

t , ut)(vt − ut)dt. (3.22)

Finally, from (3.14) we get

0 ≤ E T

0E ′

(ptbv(t, (Xu

t )′, Xu

t , ut)

+ qtσv(t, (Xut )

′, Xut , ut)

+ hv(t, (Xut )

′, Xut , ut))(vt − ut)

dt, (3.23)

but this is just (3.21).

Remark 3.2. From (3.21), we can get that

E ′[Hv(t, (Xu

t )′, Xu

t , pt , qt , ut)(v − ut)] ≥ 0, (3.24)

dtdP-a.e., for any v ∈ U .

3.2. Necessary conditions for the optimality of the control

In this subsection, we especially consider the case:(A2) H(t, x, x, p, q, v) is convex with respect to v.From (3.21), we can get the following result.

Theorem 3.2 (Necessary Conditions For the Optimality of theControl). Let (A1) and (A2) hold. Then, the following condition isnecessary for the optimality of the control u ∈ U:

E ′[H(t, (Xu

t )′, Xu

t , pt , qt , ut)]

= infv∈U

E ′[H(t, (Xu

t )′, Xu

t , pt , qt , v)], (3.25)

dtdP-a.e. on [0, T ]×Ω,where Xut denotes the corresponding optimal

trajectory of the control u and (p, q) is the solution to mean-fieldBSDE (3.20).

Proof. For any v ∈ U , we have

E ′H(t, (Xu

t )′, Xu

t , pt , qt , v)− E ′

H(t, (Xu

t )′, Xu

t , pt , qt , ut)

=

1

0E ′

Hv(t, (Xu

t )′, Xu

t , pt , qt , ut + λ(v − ut))(v − ut)dλ

=

1

0E ′

Hv(t, (Xu

t )′, Xu

t , pt , qt , ut + λ(v − ut))

− Hv(t, (Xut )

′, Xut , pt , qt , ut)

(v − ut)

+

1

0E ′

Hv(t, (Xu

t )′, Xu

t , pt , qt , ut)(v − ut)dλ

≥ 0, dtdP-a.e. on [0, T ] ×Ω, (3.26)

since H(t, (Xut )

′, Xut , pt , qt , v) is convex with respect to v and

(3.21) holds. Then, relation (3.26) allows to conclude.

3.3. Sufficient conditions for the optimal control

In this section, we study assumptions, under which thenecessary condition (3.25) becomes also a sufficient one. We recallthe adjoint process (3.20) andwedenote by Xv the solution of (3.1),associated with any control v ∈ U.

Theorem 3.3 (Sufficient Conditions for the Optimality of the Con-trol). Let (A1) hold and suppose that the control u satisfies (3.25),where (p, q) is the solution of mean-field BSDE (3.20). We furtherassume that the functions Φ(x, x) and H(t, x, x, pt , qt , v) are con-vex with respect to (x, x, v). Then u is an optimal control of problem(3.1)–(3.3).

Proof. Let us suppose that the control process u satisfies (3.25).Then, for any v ∈ U, we have

J(v)− J(u) = EE ′

Φ((XvT )

′, XvT )− Φ((XuT )

′, XuT )

+ E

T

0E ′

h(t, (Xvt )

′, Xvt , vt)

− h(t, (Xut )

′, Xut , ut)

dt. (3.27)

SinceΦ is convex with respect to (x, x), we get

Φ((XvT )′, XvT )− Φ((Xu

T )′, Xu

T ) ≥ Φx((XuT )

′, XuT )(X

vT − Xu

T )′

+Φx((XuT )

′, XuT )(X

vT − Xu

T ).

Consequently,

J(v)− J(u) ≥ EE ′

Φx(Xu

T , (XuT )

′)+ Φx((XuT )

′, XuT )

(XvT − Xu

T )

+ E T

0E ′

h(t, (Xvt )

′, Xvt , vt)dt

− E T

0E ′

h(t, (Xu

t )′, Xu

t , ut)dt. (3.28)

Noticing pT = E ′[Φx((Xu

T )′, Xu

T ) + Φx(XuT , (X

uT )

′)], and by firstapplying Itô’s formula to pt(Xvt − Xu

t ) and then taking theexpectation, we obtain

J(v)− J(u) ≥ E T

0E ′

H(t, (Xvt )

′, Xvt , pt , qt , vt)

− H(t, (Xut )

′, Xut , pt , qt , ut)

dt

− E T

0E ′

Hx(t, (Xu

t )′, Xu

t , pt , qt , ut)(Xvt − Xut )

+ Hx(t, (Xut )

′, Xut , pt , qt , ut)(Xvt − Xu

t )′dt. (3.29)

Since H is convex with respect to (x, x, v), the use of the Clarkgeneralized gradient of H , evaluated at ((Xu

t )′, Xu

t , ut), yields

H(t, (Xvt )′, Xvt , pt , qt , vt)− H(t, (Xu

t )′, Xu

t , pt , qt , ut)

≥ Hx(t, (Xut )

′, Xut , pt , qt , ut)(Xvt − Xu

t )

+ Hx(t, (Xut )

′, Xut , pt , qt , ut)(Xvt − Xu

t )′

+ Hv(t, (Xut )

′, Xut , pt , qt , ut)(vt − ut).

Page 7: Stochastic maximum principle in the mean-field controls

372 J. Li / Automatica 48 (2012) 366–373

Thus, from (3.29) we have

J(v)− J(u)

≥ E T

0E ′

[Hv(t, (Xut )

′, Xut , pt , qt , ut)(vt − ut)]dt. (3.30)

Finally, since E ′[H(t, (Xu

t )′, Xu

t , pt , qt , v)] is convex, condition(3.25) and the convex optimization principle (see Ekeland andTemam (1976, Proposition 2.2.1, pp. 36–37)) yield

E ′[Hv(t, (Xu

t )′, Xu

t , pt , qt , ut)(vt − ut)] ≥ 0,

dtdP-a.e. on [0, T ] ×Ω . The above inequality and (3.30) completethe proof.

4. Application: a linear-quadratic control problem

Now we consider the example of a linear-quadratic stochasticcontrol problem. For simplicity, we restrict ourselves to the one-dimensional case, i.e., n = d = k = 1. Then SDE (3.1), but nowwith linear coefficients, writes as follows:dXt = AE[Xt ] + AXt + Bvtdt

+CE[Xt ] + CXt + DvtdBt ,X0 = x,

(4.1)

where A, A, B, C, C,D are constants, and v ∈ L2F(0, T ;U).The cost functional is a quadratic one, and it has the form

J(v) =12

T

0

RE[X2

t ] + NE[v2t ]dt +

12QE[X2

T ], (4.2)

where N > 0, R ≥ 0,Q ≥ 0 are constants. For simplicity wealso suppose that U = R. Let u be an optimal admissible controlminimizing J over U; by Xu we denote the corresponding optimaltrajectory. Then, due to (3.20), the adjoint equation writes−dpt = Apt + Cqt + RXu

t + AE[pt ]+CE[qt ]dt − qtdBt ,

pT = QXuT .

(4.3)

From (3.25) we have

Nut = −Bpt − Dqt . (4.4)

On the other hand, from Theorem 3.3 we know that a process ofthe form ut = −

1N (Bpt + Dqt) is necessarily an optimal control.

Hence, the related feedback control system takes the form

dXut =

AE[Xu

t ] + AXut −

1NB2pt −

1NBDqt

dt

+

CE[Xu

t ] + CXut −

1NDBpt −

1ND2qt

dBt ,

X0 = x,−dpt =

Apt + Cqt + RXu

t + AE[pt ] + CE[qt ]

dt − qtdBt ,pT = QXu

T .

(4.5)

Here we have to do with a fully coupled mean-field for-ward–backward SDE. In order to solve this system we set pt =

φ(t)Xut +ψ(t)E[Xu

t ],where φ(t), ψ(t) are deterministic differen-tial functions which will be specified below. Then, from (4.3) weget:

−Apt − Cqt − RXut − AE[pt ] − CE[qt ]

= φ(t)AE[Xut ] + φ(t)AXu

t + φ(t)But

+ψt(A + A)E[Xut ] + ψtBE[ut ]

+ddtφtXu

t +ddtψtE[Xu

t ],

qt = φ(t)CE[Xut ] + φ(t)CXu

t + φ(t)Dut .

(4.6)

From (4.4) and by comparing the coefficients of Xut and E[Xu

t ],respectively, in the first equation of (4.6), we get

ddtφt + (2A + C2)φt − (CD + B)2(N + D2φt)

−1φ2t + R = 0,

t ∈ [0, T ),φT = Q (≥ 0).

(4.7)

But this is just a Riccati equation, and it has an unique solution.Moreover,

ddtψt + (2A + 2A)ψt + (2CC + C2

+ 2A)φt

N + D2φt

−1D(C + C)φt + B(φt + ψt)

+(B + CD)φt

Bψt + CDφt

= 0;

ψT = 0.

(4.8)

Finally, by combining Theorems 3.2 and 3.3, we obtain thefollowing.

Theorem 4.1. The optimal control u ∈ U for the linear quadraticcontrol problem (4.1)–(4.2) is given (in feedback form) by

ut = −

N + D2φt

−1(B + CD)φtXu

t + (Bψt + CDφt)E[Xut ],

with φt , ψt solving (4.7) and (4.8), respectively.

Remark 4.1. When A = C = 0, we see from (4.8) that ψt = 0,t ∈ [0, T ], and u is just the optimal control in the classical case.

Acknowledgment

The author would like to thank the referees for careful readingand helpful suggestions. The author also would like to thankProfessor Shige Peng for inspiring discussions.

References

Arkin, V., & Saksonov, I. (1979). Necessary optimality conditions for stochasticdifferential equations. Soviet Mathemetical Doklady, 20, 1–5.

Andersson, D., & Djehiche, B. (2011). A maximum principle for SDEs of mean-fieldtype. Applied Mathematics and Optimization, 63, 341–356.

Bahlali, S. (2008). Necessary and sufficient optimality conditions for relaxedand strict control probelms. SIAM Journal of Control and Optimization, 47(4),2078–2095.

Bensoussan, A. (1982). Lectures on stochastic control. In S. K. Mitter, & A. Moro(Eds.), Springer lecture notes in mathematics: Vol. 972. Nonlinear filtering andstochastic control. Berlin: Springer.

Bismut, J. M. (1978). An introductory approach to duality in optimal stochasticcontrol. SIAM Review, 20, 62–78.

Buckdahn, R., Djehiche, B., Li, J., & Peng, S. (2009). Mean-field backward stochasticdifferential equations, a limit approach. Annals of Probablility, 37(4), 1524–1565.

Buckdahn, R., Li, J., & Peng, S. (2009). Mean-field backward stochastic differentialequations and related partial differential equations. Stochastic Processes andtheir Applications, 119, 3133–3154.

Cadenillas, A., & Karatzas, I. (1995). The stochastic maximum principle for linearconvex optimal control with random coefficients. SIAM Journal on Control andOptimization, 33(2), 590–624.

Ekeland, J., & Temam, R. (1976). Convex analysis and variational problems.Amsterdam and American Elsevier, New York: North-Holland.

Elliott, R. J. (1990). The optimal control of diffusions. Applied Mathematics andOptimization, 22, 229–240.

Haussmann, U. G. (1986). A stochastic maximum principle for optimal control ofdiffusions. Essex, UK: Longman Scientific and Technical.

Hu, Y., & Zhou, X. Y. (2003). Indefinite stochastic Riccati equations. SIAM Journal onControl and Optimization, 42(1), 123–137.

Ikeda, N., & Watanabe, S. (1989). Stochastic differential equations and diffusionprocesses. Amsterdam–Tokyo: North Holland–Kodansha.

Karatzas, I., & Shreve, S. E. (1987). Brownian motion and stochastic calculus. Springer.Kushner, H. J. (1965). On the stochastic maximum principle: fixed time of control.

Journal of Mathematical Analysis and Applications, 11, 78–92.Kushner, H. J. (1972). Necessary conditions for continuous parameter stochastic

optimization problems. SIAM Journal of Control, 10, 550–565.

Page 8: Stochastic maximum principle in the mean-field controls

J. Li / Automatica 48 (2012) 366–373 373

Lasry, J. M., & Lions, P. L. (2007). Mean field games. Japan Journal of Mathematics, 2,229–260. doi:10.1007/s11537-007-0657-8. Available online.

Meyer-Brandis, T., ØSendal, B., & Zhou, X.Y. 2010, Amean-field stochasticmaximumprinciple via Malliavin calculus. Stochastics (in press). (A Special Issue for MarkDavis’ Festschrift).

Pardoux, E., & Peng, S. (1990). Adapted solution of a backward stochastic differentialequation. Systems and Control Letters, 14(1–2), 61–74.

Peng, S. (1990). A general stochastic maximum principle for optimal controlproblems. SIAM Journal on Control and Optimization, 28, 966–979.

Sznitman, A. S. (1991). Lect. notes in math.: Vol. 1464. Topics in propagation of chaos(pp. 165–252). Berlin: Springer-Verlag.

Juan Li received the Ph.D. degree in probability andstatistics from Shandong University, Jinan, China, in 2003.She has been a postdoctoral in Fudan University, China,and University of Brittany, France from 2005 to 2007. Shejoined the Department of Control, Shandong University atWeihai, as an assistant professor in 1997, later she joinedthe School of Mathematics and Statistics where she hasbeen a full Professor since September 2007. Her researchinterests include stochastic analysis, stochastic controltheory, forward and backward stochastic differentialequations and mathematical finance.