morfismos, vol 12, no 2, 2008

64
VOLUMEN 12 NÚMERO 2 JULIO A DICIEMBRE DE 2008 ISSN: 1870-6525

Upload: morfismos-department-of-mathematics-cinvestav

Post on 07-Mar-2016

216 views

Category:

Documents


0 download

DESCRIPTION

Morfismos issue for December 2008

TRANSCRIPT

Page 1: Morfismos, Vol 12, No 2, 2008

VOLUMEN 12NÚMERO 2

JULIO A DICIEMBRE DE 2008ISSN: 1870-6525

Page 2: Morfismos, Vol 12, No 2, 2008

Morfismos

Comunicaciones EstudiantilesDepartamento de Matematicas

Cinvestav

Editores Responsables

• Isidoro Gitler • Jesus Gonzalez

Consejo Editorial

• Luis Carrera • Samuel Gitler• Onesimo Hernandez-Lerma • Hector Jasso Fuentes

• Miguel Maldonado • Raul Quiroga Barranco• Enrique Ramırez de Arellano • Enrique Reyes

• Armando Sanchez • Martın Solis• Leticia Zarate

Editores Asociados

• Ricardo Berlanga • Emilio Lluis Puebla• Isaıas Lopez • Guillermo Pastor

• Vıctor Perez Abreu • Carlos Prieto• Carlos Renterıa • Luis Verde

Secretarias Tecnicas

• Roxana Martınez • Laura Valencia

ISSN: 1870 - 6525

Morfismos puede ser consultada electronicamente en “Revista Morfismos”en la direccion http://www.math.cinvestav.mx. Para mayores informes dirigirseal telefono 57 47 38 71.

Toda correspondencia debe ir dirigida a la Sra. Laura Valencia, Departa-mento de Matematicas del Cinvestav, Apartado Postal 14-740, Mexico, D.F.07000 o por correo electronico: [email protected].

Page 3: Morfismos, Vol 12, No 2, 2008

VOLUMEN 12NÚMERO 2

JULIO A DICIEMBRE DE 2008ISSN: 1870-6525

Page 4: Morfismos, Vol 12, No 2, 2008
Page 5: Morfismos, Vol 12, No 2, 2008

Informacion para Autores

El Consejo Editorial de Morfismos, Comunicaciones Estudiantiles del Departamento deMatematicas del CINVESTAV, convoca a estudiantes de licenciatura y posgrado a someterartıculos para ser publicados en esta revista bajo los siguientes lineamientos:

• Todos los artıculos seran enviados a especialistas para su arbitraje. No obstante, losartıculos seran considerados solo como versiones preliminares y por tanto pueden serpublicados en otras revistas especializadas.

• Se debe anexar junto con el nombre del autor, su nivel academico y la instituciondonde estudia o labora.

• El artıculo debe empezar con un resumen en el cual se indique de manera breve yconcisa el resultado principal que se comunicara.

• Es recomendable que los artıculos presentados esten escritos en Latex y sean enviadosa traves de un medio electronico. Los autores interesados pueden obtener el for-mato LATEX2ε utilizado por Morfismos en “Revista Morfismos” de la direccion webhttp://www.math.cinvestav.mx, o directamente en el Departamento de Matematicasdel CINVESTAV. La utilizacion de dicho formato ayudara en la pronta publicaciondel artıculo.

• Si el artıculo contiene ilustraciones o figuras, estas deberan ser presentadas de formaque se ajusten a la calidad de reproduccion de Morfismos.

• Los autores recibiran un total de 15 sobretiros por cada artıculo publicado.

• Los artıculos deben ser dirigidos a la Sra. Laura Valencia, Departamento de Matemati-cas del Cinvestav, Apartado Postal 14 - 740, Mexico, D.F. 07000, o a la direccion decorreo electronico [email protected]

Author Information

Morfismos, the student journal of the Mathematics Department of the Cinvestav, invitesundergraduate and graduate students to submit manuscripts to be published under thefollowing guidelines:

• All manuscripts will be refereed by specialists. However, accepted papers will beconsidered to be “preliminary versions” in that authors may republish their papers inother journals, in the same or similar form.

• In addition to his/her affiliation, the author must state his/her academic status (stu-dent, professor,...).

• Each manuscript should begin with an abstract summarizing the main results.

• Morfismos encourages electronically submitted manuscripts prepared in Latex. Au-thors may retrieve the LATEX2ε macros used for Morfismos through the web sitehttp://www.math.cinvestav.mx, at “Revista Morfismos”, or by direct request to theMathematics Department of Cinvestav. The use of these macros will help in theproduction process and also to minimize publishing costs.

• All illustrations must be of professional quality.

• 15 offprints of each article will be provided free of charge.

• Manuscripts submitted for publication should be sent to Mrs. Laura Valencia, De-partamento de Matematicas del Cinvestav, Apartado Postal 14 - 740, Mexico, D.F.07000, or to the e-mail address: [email protected]

Page 6: Morfismos, Vol 12, No 2, 2008

Lineamientos Editoriales

“Morfismos” es la revista semestral de los estudiantes del Departamento de Mate-maticas del CINVESTAV, que tiene entre sus principales objetivos el que los estu-diantes adquieran experiencia en la escritura de resultados matematicos.

La publicacion de trabajos no estara restringida a estudiantes del CINVESTAV;deseamos fomentar tambien la participacion de estudiantes en Mexico y en el extran-jero, ası como la contribucion por invitacion de investigadores.

Los reportes de investigacion matematica o resumenes de tesis de licenciatura,maestrıa o doctorado pueden ser publicados en Morfismos. Los artıculos que apare-ceran seran originales, ya sea en los resultados o en los metodos. Para juzgar esto,el Consejo Editorial designara revisores de reconocido prestigio y con experiencia enla comunicacion clara de ideas y conceptos matematicos.

Aunque Morfismos es una revista con arbitraje, los trabajos se conside-raran como versiones preliminares que luego podran aparecer publicadosen otras revistas especializadas.

Si tienes alguna sugerencia sobre la revista hazlo saber a los editores y con gustoestudiaremos la posibilidad de implementarla. Esperamos que esta publicacion pro-picie, como una primera experiencia, el desarrollo de un estilo correcto de escribirmatematicas.

Morfismos

Editorial Guidelines

“Morfismos” is the journal of the students of the Mathematics Department ofCINVESTAV. One of its main objectives is for students to acquire experience inwriting mathematics. Morfismos appears twice a year.

Publication of papers is not restricted to students of CINVESTAV; we want toencourage students in Mexico and abroad to submit papers. Mathematics researchreports or summaries of bachelor, master and Ph.D. theses will be considered forpublication, as well as invited contributed papers by researchers. Papers submittedshould be original, either in the results or in the methods. The Editors will assignas referees well–established mathematicians.

Even though Morfismos is a refereed journal, the papers will be con-sidered as preliminary versions which could later appear in other mathe-matical journals.

If you have any suggestions about the journal, let the Editors know and we willgladly study the possibility of implementing them. We expect this journal to foster, asa preliminary experience, the development of a correct style of writing mathematics.

Morfismos

Page 7: Morfismos, Vol 12, No 2, 2008

Contenido

The vanishing discount approach to average reward optimality: the stronglyand the weakly continuous cases

Tomas Prieto-Rumeau and Onesimo Hernandez-Lerma . . . . . . . . . . . . . . . . . . . 1

Vertices simpliciales y escalonabilidad de grafos

Roberto Cruz y Mario Estrada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Asymptotic normality of average cost Markov control processes

Armando F. Mendoza-Perez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Page 8: Morfismos, Vol 12, No 2, 2008
Page 9: Morfismos, Vol 12, No 2, 2008

Morfismos, Vol. 12, No. 2, 2008, pp. 1–15

The vanishing discount approach to averagereward optimality: the strongly and the weakly

continuous cases ∗

amreL-zednanreHomisenOuaemuR-oteirPsamoT

Abstract

We consider a discrete-time stochastic dynamic programming mo-del and we propose conditions under which the limit of discountoptimal policies, as the discount factor converges to one, is averageoptimal. We prove this result under strong and weak continuityconditions and, moreover, we relax the usual value boundednesscondition on the relative values of the optimal discounted reward.

2000 Mathematics Subject Classification: 93E20, 90C40.Keywords and phrases: dynamic programming, vanishing discount, aver-age optimality.

1 Introduction

The basic problem dealt with in this paper is the existence of controlpolicies π that maximize the long-run expected average reward

(1) v(x,π) := lim infT→∞

Eπx

!1

T

T−1"

t=0

r(xt, π(xt))

#

for every initial state x0 = x. (The underlying controlled system is afairly general discrete-time stochastic control process described in Sec-tion 2; see (6).) Among the several known techniques to analyze thisproblem, the most common is the vanishing discount approach, whichcan be traced back to Taylor [16]. It is so-named because it is based on

∗This research was partially supported by CONACyT Grant 45693-F.

1

Page 10: Morfismos, Vol 12, No 2, 2008

2 T. Prieto-Rumeau and O. Hernandez-Lerma

the convergence as ρ ↑ 1 (0 < ρ < 1) of ρ-discounted optimal rewardpolicies. To state this more precisely, we need some notation.

For each discount factor ρ ∈ (0, 1), let

(2) vρ(x,π) := Eπx

! ∞"

t=0

ρtr(xt,π(xt))

#

be the expected discounted reward of the admissible control policy π ∈Π (see Section 2) when the initial state is x0 = x. The optimal ρ-discounted reward function is defined as

(3) vρ(x) := supπ∈Π

vρ(x,π)

for every state x. For a given fixed state x′, consider the relative valuefunction

uρ(x) := vρ(x) − vρ(x′).

This function is one of the key tools in the vanishing discount approach.To obtain the convergence of ρ-discount optimal policies to average op-timal policies as ρ ↑ 1, it was assumed in [16] that uρ was uniformlybounded, that is, there exists a constant L such that

|uρ(x)| ≤ L

for every state x and 0 < ρ < 1. This condition was later relaxed to thefollowing weaker value boundedness condition: there exists a constantL and a function m such that

(4) −m(x) ≤ uρ(x) ≤ L

for every state x and 0 < ρ < 1; see, e.g., [2, Assumption A1], [5,Assumption 4.1], [12, Definition 2.1] or [15].

In this paper, we further relax (4) and assume the existence of afunction m (satisfying appropriate hypotheses) such that

(5) −m(x) ≤ uρ(x) ≤ m(x)

for every x and 0 < ρ < 1. Such a condition can also be found in e.g. [3,Lemma 4.5], [4, Assumption 3.3] or [7, Lemma 10.4.2]. Relaxing (4) to(5) is indeed a relevant issue because (4) is, in fact, a fairly restrictivecondition. For instance, to obtain (4), it is assumed in [12] that thereward function r is bounded. Moreover, condition (4) excludes the case

Page 11: Morfismos, Vol 12, No 2, 2008

The vanishing discount approach to average reward optimality 3

of an unbounded utility function (see the comment after Assumption 5.3in [12, p. 1423]). Also, in Section 4 of this paper, we describe a controlmodel for which (5) holds, whereas (4) does not.

Summarizing, the goal of this paper is to give conditions on thecontrolled process that, together with the condition (5), ensure that thelimit of ρ-discount optimal policies, as ρ ↑ 1, is average optimal. Thebasic control model is described in Section 2. In Section 3, we considertwo different sets of hypotheses, namely, strong and weak continuityconditions, depending on the corresponding strong or weak continuityof the control system’s transition function. Also in Section 3, we stateour main results: Theorem 3.10 and Corollary 3.12, in which we mentionseveral particular cases of interest. Finally, we present an example inSection 4, and our conclusions are stated in Section 5.

2 The control model

The formulation of the controlled process and the notation is mainlydrawn from [12].

We assume that the state space S and the action set A are Borelspaces (that is, measurable subsets of complete and separable metricspaces). Let Γ be a nonempty set-valued function from S to A. Foreach x ∈ S, the corresponding set of feasible control actions is Γ(x) ⊆ A.The family of feasible state-action pairs is denoted by K, i.e.,

K := (x, a) ∈ S × A : a ∈ Γ(x),

which is assumed to be a measurable subset of S × A. (In this paper,measurability is always referred to the Borel σ-algebra.)

We consider a sequence ξtt≥0 of i.i.d. random variables from agiven probability space (Ω,F , P) to (Z,Z) with common distribution ν.Let h : K ×Z → S be a measurable function. We assume that the stateof the system is updated according to the function h, meaning that ifthe action a ∈ Γ(x) is chosen at x ∈ S and the value of the randomperturbation is ξ, then the next state of the system is h(x, a, ξ) ∈ S.

We suppose that the reward function is the measurable real-valuedmapping r : K → R.

Let Π be the family of measurable functions π : S → A such thatπ(x) ∈ Γ(x) for every x ∈ S. (We suppose that Π is nonempty.) Wecall π ∈ Π a deterministic stationary policy. For each π ∈ Π and every

Page 12: Morfismos, Vol 12, No 2, 2008

4 T. Prieto-Rumeau and O. Hernandez-Lerma

initial state x0 ∈ S independent of ξtt≥0,

(6) xt+1 = h(xt,π(xt), ξt) for t = 0, 1, . . .

is a Markov process and it stands for the state of the system under thepolicy π. The corresponding expectation operator is denoted by Eπ

x0.

Although larger classes of policies may be considered, it is well knownthat for the control problem we are dealing with Π is a “sufficient” classof policies — see [6, Chapter 4] or [7, Chapter 8], for instance.

Given an admissible policy π ∈ Π and an initial state x ∈ S, thecorresponding long-run average reward and expected discounted rewardare defined as in (1) and (2), respectively. Given a discount factor0 < ρ < 1, we say that π ∈ Π is ρ-discount optimal if vρ(x,π) = vρ(x)for every x ∈ S (recall (3)). Similarly, π ∈ Π is average reward optimalif

v(x,π∗) = supπ∈Π

v(x,π) ∀ x ∈ S.

3 Main results

As already mentioned, we will consider two different sets of hypotheses,which we label as strong and weak continuity assumptions.

The strongly continuous case

We state the assumptions we make on our control model. First, we havethe following Lyapunov-like condition.

Assumption 3.1 There exists a measurable function w : S → [1,∞),and constants 0 < β < 1 and b > 0 such that

!

Zw(h(x, a, ξ))ν(dξ) ≤ βw(x) + b ∀ (x, a) ∈ K.

The next assumption introduces some usual continuity and compact-ness requirements. We note that the function w in Assumptions 3.2 and3.4 is taken from Assumption 3.1.

Assumption 3.2 (i) For every x ∈ S, the set Γ(x) is compact.

(ii) The reward function r(x, a) is upper semicontinuous on A(x) forevery x ∈ S. In addition, there exists a constant M such that

|r(x, a)| ≤ Mw(x) ∀ (x, a) ∈ K.

Page 13: Morfismos, Vol 12, No 2, 2008

The vanishing discount approach to average reward optimality 5

(iii) The function

(x, a) !→!

Zw(h(x, a, ξ))ν(dξ)

is continuous on A(x) for every x ∈ S.

(iv) Strong continuity. For every bounded and measurable ζ : S →R, the function

(x, a) !→!

Zζ(h(x, a, ξ))ν(dξ)

is continuous on A(x) for every x ∈ S.

Remark 3.3 (The additive-noise case) The strong continuity con-dition is satisfied, for instance, when S = Z = R,

h(x, a, ξ) = g(x, a) + ξ,

where g is continuous on A(x) for each fixed x ∈ S, and, in addition,ν has an almost everywhere continuous bounded density with respect tothe Lebesgue measure. This includes, of course, the linear case in whichg(x, a) = k1x + k2a for some constants k1, k2.

Finally, we state the value boundedness condition.

Assumption 3.4 There exists a state x′ ∈ S and a constant M ′ > 0such that

sup0<ρ<1

|vρ(x) − vρ(x′)| ≤ M ′w(x) ∀ x ∈ S.

The weakly continuous case

Among the hypotheses made so far on the control model, the most re-strictive one is the strong continuity condition in Assumption 3.2(iv).Under additional appropriate conditions, strong continuity can be re-laxed to weak continuity. To this end, first, the “measurability” of w inAssumption 3.1 is replaced with “continuity”.

Assumption 3.5 There exists a continuous function w : S → [1,∞),and constants 0 < β < 1 and b > 0 such that

!

Zw(h(x, a, ξ))ν(dξ) ≤ βw(x) + b ∀ (x, a) ∈ K.

Page 14: Morfismos, Vol 12, No 2, 2008

6 T. Prieto-Rumeau and O. Hernandez-Lerma

In Assumptions 3.6 and 3.8 below, the function w is taken fromAssumption 3.5.

Assumption 3.6 (i) The function Γ : S → 2A is upper semicontinu-ous and compact-valued.

(ii) The reward function r is upper semicontinuous on K and, more-over, there exists a constant M > 0 such that

|r(x, a)| ≤ Mw(x) ∀ (x, a) ∈ K.

(iii) The function

(x, a) %→!

Zw(h(x, a, ξ))ν(dξ)

is continuous on K.

(iv) Weak continuity. The function

(x, a) %→!

Zζ(h(x, a, ξ))ν(dξ)

is continuous on K for every bounded and continuous ζ : S → R.

Remark 3.7 The weak continuity assumption is satisfied, for instance,if the function h(x, a, ξ) is continuous on K for each ξ ∈ Z.

We introduce some notation. Let Bw(S) be the family of measurablefunctions ζ : S → R with finite w-norm, that is,

||ζ||w := supx∈S

|ζ(x)|/w(x) < ∞.

Assumption 3.8 The controlled process is w-uniformly ergodic on Π;that is, for each π ∈ Π, the Markov process (6) has a unique invariantprobability measure µπ on S and, in addition, there exist constants R >0 and 0 < α < 1 such that for every x ∈ S, ζ ∈ Bw(S) and t ≥ 0

supπ∈Π

""""Eπx [ζ(xt)] −

!

Sζ(y)µπ(dy)

"""" ≤ w(x)||ζ||wRαt.

In the weakly continuous case, we do not need to impose a valueboundedness condition because, in fact, Assumption 3.8 implies As-sumption 3.4 (the proof is easy; see, e.g., Lemma 4.5 in [3] or Lemma

Page 15: Morfismos, Vol 12, No 2, 2008

The vanishing discount approach to average reward optimality 7

10.4.2 in [7]). A sufficient condition for Assumption 3.8 is proposed in[7, Proposition 10.2.5].

In what follows, we will suppose that either the Assumptions 3.1,3.2 and 3.4 or the Assumptions 3.5, 3.6 and 3.8 hold. In either case, weknow from the results in [7, Chapter 8] that the optimal ρ-discounted re-ward is the unique solution in Bw(S) of the discounted reward optimalityequation:

(7) vρ(x) = maxa∈Γ(x)

!r(x, a) + ρ

"

Zvρ(h(x, a, ξ))ν(dξ)

#∀ x ∈ S.

In addition, a policy π∗ ∈ Π is ρ-discount optimal if and only if π∗(x)attains the maximum in (7) for every x ∈ S, i.e.,

(8) vρ(x) = r(x,π∗(x)) + ρ

"

Zvρ(h(x,π∗(x), ξ))ν(dξ) ∀ x ∈ S.

The vanishing discount approach to average reward optimality isrelated to the following definition of limit and accumulation policies.

Definition 3.9 Given a policy π∗ ∈ Π and a sequence πkk∈N in Π,we say that

(i) πkk∈N converges to π if limk πk(x) = π(x) for every x ∈ S;

(ii) π∗ is an accumulation policy of πkk∈N if, for every x ∈ S, thereexists a subsequence kx such that πkx(x) → π(x);

(iii) πkk∈N converges continuously to π if limk πk(xk) = π(x) forevery x ∈ S and every sequence xk → x.

The concept of accumulation policy in Definition 3.9(ii) comes from[13]. Continuous convergence and its applications to stochastic dynamicprogramming are analyzed in [10].

Next, we prove our main result, which states the relation betweenaverage reward optimal policies and the limit of discount optimal poli-cies. The proof of this result, Theorem 3.10, follows the same argumentsneeded to obtain the so-called average reward optimality inequality [7,Theorem 10.3.1], although the proof is focused on the analysis of thelimit of discount optimal policies.

Theorem 3.10 Let ρkk∈N, with ρk ↑ 1, be a sequence of discountfactors, and let πk ∈ Π, for every k ∈ N, be a ρk-discount optimalpolicy. Then the following holds:

Page 16: Morfismos, Vol 12, No 2, 2008

8 T. Prieto-Rumeau and O. Hernandez-Lerma

(i) If Assumptions 3.1, 3.2 and 3.4 are satisfied and πk converges toπ∗ ∈ Π, then π∗ is an average reward optimal policy;

(ii) If Assumptions 3.5, 3.6 and 3.8 are satisfied and πk convergescontinuously to π∗ ∈ Π, then π∗ is an average reward optimalpolicy.

Proof: From Assumption 3.1 or 3.5, an induction argument (see, e.g.,[7, Lemma 10.4.1]) gives

(9) Eπx[w(xt)] ≤ βtw(x) +

(1 − βt)(1 − β)b

∀ π ∈ Π, x ∈ S, t ≥ 0.

Therefore, by Assumption 3.2(ii) or 3.6(ii), we have

Eπx|r(xt,π(xt))| ≤ Mβtw(x) +

M(1 − βt)(1 − β)b

,

so that supρ∈(0,1) |(1 − ρ)vρ(x′)| is finite, with x′ ∈ S as in Assump-tion 3.4. Thus

g := lim infk→∞

(1 − ρk)vρk(x′)

is well defined.Our proof now proceeds in two steps. In step one, we prove that

g ≥ supπ∈Π

v(x,π) ∀ x ∈ S.

In step two, we show that π∗ satisfies

g ≤ v(x,π∗) ∀ x ∈ S.

Average reward optimality of π∗ will then follow.

Step one. By definition of uρ (in Section 1), a simple calculation showsthat the discounted reward optimality equation (7) can be written inthe equivalent form:

(10) (1−ρ)vρ(x′)+uρ(x) = maxa∈Γ(x)

!r(x, a) + ρ

"

Zuρ(h(x, a, ξ))ν(dξ)

#

for every x ∈ S. Consider now a subsequence k′ of k such that

limk′→∞

(1 − ρk′)vρk′ (x′) = g.

Page 17: Morfismos, Vol 12, No 2, 2008

The vanishing discount approach to average reward optimality 9

Let u := lim infk′ uρk′ and note that u is in Bw(S). Now, by (10), forthe sequence ρk′ and every (x, a) ∈ K we have

(1 − ρk′)vρk′ (x′) + uρk′ (x) ≥ r(x, a) + ρk′

!

Zuρk′ (h(x, a, ξ))ν(dξ).

Taking the lim infk′→∞ in this inequality and using Fatou’s lemma(which indeed applies as a consequence of our assumptions), we obtain

(11) g + u(x) ≥ r(x, a) +!

Zu(h(x, a, ξ))ν(dξ) ∀ (x, a) ∈ K.

Iteration of (11) yields that, for every initial state x ∈ S, any policyπ ∈ Π and t ≥ 0,

g ≥ Eπx[r(xt,π(xt))] + Eπ

x[u(xt+1) − u(xt)].

Summing up these inequalities for t = 0, . . . , T −1 and then dividing byT yields

g ≥ Eπx

"1T

T−1#

t=0

r(xt,π(xt))

$+

Eπx[u(xT )] − u(x)

T.

Letting T → ∞, recalling that u ∈ Bw(S) and using (9), we obtaing ≥ v(x,π) and, therefore,

(12) g ≥ supπ∈Π

v(x,π) ∀ x ∈ S.

This completes step one.

Step two. Since πk is a ρk-discount optimal policy, from (8) and (10)we have

(1 − ρ)vρk(x′) + uρk(x) = r(x,πk(x)) + ρk

!

Zuρk(h(x,πk(x), ξ))ν(dξ)

for every k ∈ N and x ∈ S. Consequently, for every ε > 0 and largeenough k, we have

(13) g − ε + uρk(x) ≤ r(x,πk(x)) + ρk

!

Zuρk(h(x,πk(x), ξ))ν(dξ)

for every x ∈ S.

Page 18: Morfismos, Vol 12, No 2, 2008

10 T. Prieto-Rumeau and O. Hernandez-Lerma

Suppose now that the Assumptions 3.1, 3.2 and 3.4 are satisfied.Then, taking the lim sup in (13), recalling that r(x, ·) is upper semicon-tinuous and by the extension of Fatou’s lemma [7, Lemma 8.3.7], weobtain

g − ε + u(x) ≤ r(x,π∗(x)) +!

Zu(h(x,π∗(x), ξ))ν(dξ),

where u := lim supk uρk ∈ Bw(S). But ε > 0 being arbitrary, the samearguments as in the proof of step one yield that

g ≤ v(x,π∗) ∀ x ∈ S,

which combined with (12) shows that π∗ is an average reward optimalpolicy and, besides, that g is the (constant) optimal average reward.This completes the proof of statement (i), that is, under the hypothesesin the strongly continuous case.

We now consider the weakly continuous case, which consists of As-sumptions 3.5, 3.6 and 3.8. Following [8], we define the generalizedlim sup of the sequence uρk as

u∗(x) := suplim supk→∞

uρk(xk),

where the supremum is taken over the family of sequences xk ⊆ Ssuch that xk → x. Let us now go back to (13) and take the lim supthrough a sequence xk → x such that lim supk uρk(xk) ≥ u∗(x) − ε, sothat

g − 2ε + u∗(x) ≤ lim supk→∞

r(xk,πk(xk))

+ lim supk→∞

!

Zuρk(h(xk, πk(xk), ξ))ν(dξ).

Then we proceed as in the proof for the strongly continuous case, butthis time we take into account that both r and the multifunction Γare upper semicontinuous. Finally, we apply the Fatou lemma for ageneralized lim sup (see [8, Lemma 5] and also [14, Lemma 2.3]) toobtain

(14) g−2ε+u∗(x) ≤ r(x,π∗(x))+!

Zu∗(h(x,π∗(x), ξ))ν(dξ) ∀ x ∈ S.

This implies, by standard arguments, that v(x,π∗) ≥ g for every x ∈ S.The proof of Theorem 3.10 is complete. !

Page 19: Morfismos, Vol 12, No 2, 2008

The vanishing discount approach to average reward optimality 11

Remark 3.11 The second step in the proof of Theorem 3.10 relies onthe application of a Fatou-like lemma. For instance, when the usualvalue boundedness condition holds, then we use the standard Fatou lem-ma because the relative value function uρ is bounded above; see (4). Un-der the strong continuity assumptions, we use the Fatou lemma in [7,Lemma 8.3.7], while if the weak continuity conditions hold, then we usethe Fatou lemma for a generalized lim sup in [14, Lemma 2.3]. There-fore, the assumptions we make on the control model heavily depend onthe hypotheses needed for the corresponding Fatou lemma and, similarly,the kind of results we reach (statements (i) and (ii) in Theorem 3.10)also depend on the kind of Fatou lemma that is applied.

We specialize Theorem 3.10 to the following important particularcases.

Corollary 3.12 Suppose that ρkk∈N is a sequence of discount factorssuch that ρk ↑ 1 and let πk ∈ Π, for every k ∈ N, be a ρk-discountoptimal policy.

(i) Under the strong continuity conditions (Assumptions 3.1, 3.2 and3.4), if for every x ∈ S the function ρ #→ uρ(x) is monotone(either increasing or decreasing), then any accumulation policy ofπkk∈N is average reward optimal.

(ii) If the state space S is denumerable, then under either the strong orthe weak continuity conditions, any accumulation policy of πkis average reward optimal.

The condition in Corollary 3.12(i) can be interpreted as follows: theexpected discounted reward grows faster for any x ∈ S than for x′ ∈ Sas ρ ↑ 1, and it is satisfied, for instance, in the consumption-investmentmodel in [6, Section 3.6]; see also [1].

4 An example

In this section we give an example of a control model that satisfies (5)but does not satisfy the value boundedness condition (4).

The following inventory system with permitted backlog is based onthe model analyzed in [17]. The state space and the action set areS = A = R. The distribution ν is supported on [0,∞), it satisfies the

Page 20: Morfismos, Vol 12, No 2, 2008

12 T. Prieto-Rumeau and O. Hernandez-Lerma

conditions in Remark 3.3, and we assume that its expectation equalsone. Furthermore, we suppose that there exists some δ > 0 such that

! ∞

0eδξν(dξ) < ∞.

(Note that, for instance, the mean one exponential distribution satisfiesthese hypotheses.) Fix a constant K > 1/2 and let

0 < λ < −1δ

log! ∞

0e−δξν(dξ).

The action sets Γ(x) are the intervals

[−x, max−2x,−x + K] for x ≤ 0

and[−x, maxλ,−x + K] for x > 0.

The system’s transition function h is given by h(x, a, ξ) = x+a−ξ. Thecost function is c(x, a) = (x+ a)2 − a (cf. [17, Equation (3.1)]). Finally,let w(x) = eδ|x| for x ∈ R. This control model satisfies the Assumptions3.1 and 3.2.

Given a discount factor 0 < ρ < 1, a direct calculation shows thatthe optimal ρ-discounted cost function (recall that we are minimizing acost) is

vρ(x) = x − (ρ + 1)2

4(1 − ρ)∀ x ∈ R,

and the optimal ρ-discount policy is

πρ(x) = −x +12(1 − ρ) ∀ x ∈ R.

Hence, the value boundedness condition (4) does not hold, whereas (5)(or Assumption 3.4) is satisfied.

Moreover, for every x ∈ R, πρ(x) converges to −x as ρ ↑ 1. There-fore, by Theorem 3.10(i), the policy π(x) = −x, for x ∈ R, is averagecost optimal. Further, from the proof of Theorem 3.10 we also obtainthat the minimal average cost is

−1 = limρ↑1

(1 − ρ)vρ(x).

Page 21: Morfismos, Vol 12, No 2, 2008

The vanishing discount approach to average reward optimality 13

5 Concluding remarks

In the previous sections, we have considered a fairly general discrete-time stochastic control model and, under two different sets of hypothe-ses (strong and weak continuity), we have proved that the limit of ρ-discount optimal policies, as the discount factor ρ ↑ 1, is a long-runaverage reward optimal policy. The main contribution of this paper isto relax the usual value boundedness assumption on the relative valuefuntion (4) and, instead, assume the weaker condition (5). We haveillustrated our results with the generalized inventory system in Section4.

Some important issues, however, remain open. In Theorem 3.10(i)it is assumed that the discount optimal policies πk converge to someπ∗, and then it is proved that π∗ is average reward optimal. It would beinteresting to know whether this convergence can be relaxed, and thusobtain a result like that in Corollary 3.12(i) under general assumptions.To this end, results on the existence of measurable selectors would beinvolved. Also, it would be interesting to check whether the continuousconvergence in Theorem 3.10(ii) can be relaxed to (usual) convergence,perhaps by strengthening the hypotheses on the control model.

Tomas Prieto-RumeauDepartamento de Estadıstica,Facultad de Ciencias, UNED,Senda del Rey 9, 28040,Madrid, Spain,[email protected]

Onesimo Hernandez-LermaDepartamento de Matematicas,CINVESTAV-IPN, 14-470,Mexico D.F. 07000,Mexico,[email protected]

References

[1] Cruz-Suarez H. D., A stochastic consumption-investment problemwith unbounded utility function, Morfismos 4 (2000), 19–30.

[2] Dutta P. K., What do discounted optima converge to? A theory ofdiscount rate asymptotics in economic models, J. Econom. Theory55 (1991), 64–94.

[3] Gordienko E.; Hernandez-Lerma O., Average cost Markov con-trol processes with weighted norms: existence of canonical policies,Appl. Math. (Warsaw) 23 (1995), 199–218.

Page 22: Morfismos, Vol 12, No 2, 2008

14 T. Prieto-Rumeau and O. Hernandez-Lerma

[4] Guo X. P.; Zhu Q. X., Average optimality for Markov deci-sion processes in Borel spaces: a new condition and approach,J. Appl. Prob. 43 (2006), 318–334.

[5] Hernandez-Lerma O.; Lasserre J. B., Average cost optimal policiesfor Markov control processes with Borel state space and unboundedcosts, Systems Control Lett. 15 (1990), 349–356.

[6] Hernandez-Lerma O.; Lasserre J. B., Discrete-Time Markov Con-trol Processes: Basic Optimality Criteria, Springer, New York,1996.

[7] Hernandez-Lerma O.; Lasserre J. B., Further Topics on Discrete-Time Markov Control Processes, Springer, New York, 1999.

[8] Jaskiewicz A.; Nowak A. S., On the optimality equation for averagecost Markov control processes with Feller transition probabilities,J. Math. Anal. Appl. 316 (2006), 495–509.

[9] Kawaguchi K.; Morimoto H., Long-run average welfare in a pollu-tion accumulation model, J. Econom. Dynam. Control 31 (2007),703–720.

[10] Langen H. J., Convergence of dynamic programming models, Math.Oper. Res. 6 (1981), 493–512.

[11] Morimoto H.; Fujita Y., Ergodic control in stochastic manufac-turing systems with constant demand, J. Math. Anal. Appl. 243(2000), 228–248.

[12] Nishimura K.; Stachurski J., Stochastic optimal policies when thediscount rate vanishes, J. Econom. Dynam. Control 31 (2007),1416–1430.

[13] Schal M., Conditions for optimality and for the limit of n-stageoptimal policies to be optimal, Z. Wahrs. verw. Gerb. 32 (1975),179–196.

[14] Schal M., Average optimality in dynamic programming with generalstate space, Math. Oper. Res. 18 (1993), 163–172.

[15] Sennott L. I., A new condition for the existence of optimalstationary policies in average cost Markov decision processes,Oper. Res. Lett. 5 (1986), 17–23.

Page 23: Morfismos, Vol 12, No 2, 2008

The vanishing discount approach to average reward optimality 15

[16] Taylor H. M., Markovian sequential replacement processes, Ann.Math. Stat. 36 (1965), 1677–1694.

[17] Vega-Amaya O.; Montes-de-Oca R., Application of average dy-namic programming to inventory systems, Math. Methods Oper.Res. 47 (1998), 451–471.

Page 24: Morfismos, Vol 12, No 2, 2008
Page 25: Morfismos, Vol 12, No 2, 2008

Morfismos, Vol. 12, No. 2, 2008, pp. 17–32

Vertices simpliciales y escalonabilidad de grafos

Roberto Cruz Mario Estrada

Resumen

Dado un grafo simple no dirigido G, se le asocia un complejosimplicial ∆G cuyas caras corresponden a los conjuntos indepen-dientes de G. Van Tuyl y Villarreal definieron un grafo G comoescalonable si el complejo simplicial asociado ∆G es escalonable enel sentido no puro de Bjorner y Wachs. Estos autores demostraronque todos los grafos triangulados son escalonables y que los grafosbipartidos escalonables son precisamente los grafos bipartidos se-cuencialmente Cohen-Macaulay. En el presente artıculo se pruebaque el concepto de vertice simplicial de un grafo permite, no solodemostrar estos resultados, sino dar otras condiciones necesarias ysuficientes para la escalonabilidad de un grafo. Ademas se demues-tra que todo grafo simplicial es escalonable y que todo grafo arco-circular que contenga al menos un vertice simplicial es escalonable.

2000 Mathematics Subject Classification:13F55, 13D02, 05C38, 05C75.Keywords and phrases: grafos escalonables, vertices simpliciales, se-cuencialmente Cohen-Macaulay, grafos simpliciales, grafos arco-circu-lantes.

1 Introduccion

Sea G = (VG, EG on)selpitlumsatsirainsozalnis(elpmisofargnu)dirigido, VG = x1, . . . , xn su conjunto de vertices y EG su conjuntode aristas. Identificando cada vertice xi con la variable xi en el anillode polinomios R = k[x1, . . . , xn] sobre el campo k, se le asocia a G unideal de monomios libres de cuadrados I(G) = (xixj | xi, xj ∈ EG).El ideal I(G) se denomina ideal de aristas del grafo G. Utilizando lacorrespondencia de Stanley - Reisner, se le asocia al grafo G el complejosimplicial ∆G tal que I∆G = I(G), es decir que el ideal de Stanley-Reisner del complejo simplicial coincida con el ideal de aristas del grafo.

17

Page 26: Morfismos, Vol 12, No 2, 2008

18 Roberto Cruz y Mario Estrada

En este caso, las caretas de ∆G son los conjuntos independientes oconjuntos estables maximales de G.

Se dice que el grafo G es escalonable si su complejo simplicial aso-ciado ∆G es escalonable. Esta definicion fue introducida por Van Tuyly Villarreal [15] y se utiliza la definicion de escalonabilidad no pura in-troducida por Bjorner y Wachs [1]. Para los grafos, la generalizacionnatural de la propiedad Cohen-Macaulay es la de ser secuencialmenteCohen-Macaulay. Un teorema de Stanley [13] afirma que la escalon-abilidad implica la propiedad de ser secuencialmente Cohen-Macaulay.En el mencionado trabajo de Van Tuyl y Villarreal [15] se prueban lossiguientes teoremas:

Teorema 1.1.1 [15, Teorema 2.12] Sea G un grafo triangulado. En-tonces G es escalonable.

Teorema 1.1.2 [15, Teorema 3.8] Sea G un grafo bipartido. EntoncesG es escalonable si y solo si G es secuencialmente Cohen-Macaulay.

El argumento central en la prueba del teorema 1.1.1 es la existencia deun vertice x en un grafo triangulado G cuya vecindad induce un subgrafocompleto [15, Lema 2.11]. Por otra parte, la demostracion del teorema1.1.2 se basa en que todo grafo bipartido, conexo y secuencialmenteCohen-Macaulay tiene un vertice con grado 1 y en la siguiente afirmacionque da condiciones necesarias y suficientes para la escalonabilidad de ungrafo que contiene un vertice de grado 1:

Teorema 1.1.3 [15, Teorema 2.9] Sea G un grafo y sean x1,y1 dosvertices adyacentes de G con deg(x1) = 1. Sean

G1 = G\ (x1 ∪ NG(x1)) y G2 = G\ (y1 ∪ NG(y1)) ,

entonces G es escalonable si y solo si G1 y G2 son escalonables.

Curiosamente la introduccion del concepto de vertice simplicial permitesustituir las condiciones del anterior teorema por la condicion mas gen-eral de que el grafo G contenga un vertice simplicial. Un vertice x de ungrafo G se denomina simplicial si su vecindad NG(x) induce un subgrafocompleto.

En la Seccion 2 se demuestra el siguiente teorema que generaliza elteorema 1.1.3 de Van Tuyl y Villarreal.

Page 27: Morfismos, Vol 12, No 2, 2008

Vertices simpliciales y escalonabilidad de grafos 19

Teorema 1.1.4 (Teorema 2.1.13) Sea G un grafo, x1 un vertice sim-plicial, NG(x1) = x2, . . . , xr y Gi = G\ (xi ∪ NG(xi)), para i =1, . . . , r. G es escalonable si y solo si Gi es escalonable para todoi = 1, . . . , r.

Este resultado es ideal para establecer la escalonabilidad de grafos quetengan al menos un vertice simplicial y ofrece otra demostracion para elteorema de Van Tuyl y Villarreal sobre la escalonabilidad de los grafostriangulados y para el teorema de los mismos autores sobre la equiv-alencia para grafos bipartidos entre la escalonabilidad y la condicionde ser secuencialmente Cohen-Macaulay. Este ultimo teorema puedeextenderse a los grafos que contienen al menos un vertice simplicial.

Teorema 1.1.5 (Corolario 2.1.15) Sea G un grafo que contiene unvertice simplicial. Entonces G es escalonable si y solo si es secuen-cialmente Cohen - Macaulay.

En la Seccion 3 se aplica la multiplicacion de vertices simplicialespara obtener nuevas condiciones necesarias y suficientes para la escalon-abilidad de un grafo. Dado un grafo G y x un vertice simplicial de G, elgrafo Gx se obtiene mediante la multiplicacion del vertice x, agregandoun nuevo vertice x′ que se conecta a todos los vertices de la vecindadde x. En el trabajo se prueba el siguiente

Teorema 1.1.6 (Teorema 3.1.17) Sea G un grafo, x un vertice simpli-cial de G y G x el grafo obtenido por la multiplicacion del vertice x.Entonces G es escalonable si y solo si G x es escalonable.

En la seccion 4 y final se establece la escalonabilidad de los grafos simpli-ciales y de los grafos arco-circulares que tienen un vertice simplicial. Enun grafo simplicial cada vertice es un vertice simplicial o es adyacentea un vertice simplicial.

Teorema 1.1.7 (Teorema 4.1.24)Sea G un grafo simplicial, entoncesG es escalonable.

Finalmente se demuestra el siguiente teorema sobre la escalonabilidadde los grafos arco-circulares:

Teorema 1.1.8 (Teorema 4.1.30) Sea G un grafo arco-circular que tieneal menos un vertice simplicial. Entonces G es escalonable.

Page 28: Morfismos, Vol 12, No 2, 2008

20 Roberto Cruz y Mario Estrada

2 Escalonabilidad de grafos que contienen ver-tices simpliciales

En esta seccion se generaliza el teorema de Van Tuyl y Villarreal [15,Teorema 2.9] sobre las condiciones necesarias y suficientes para la escalon-abilidad de un grafo, reemplazando la condicion sobre la existencia deun vertice de grado 1, por la existencia de un vertice simplicial.

Definicion 2.1.9 Se dice que un complejo simplicial ∆ es escalonable sisus caretas pueden ordenarse F1, . . . , Fs de forma tal que para todo 1 ≤i < j ≤ s, existe un vertice v ∈ Fj\Fi y un numero l ∈ 1, . . . , j−1 talque Fj\Fl = v. La secuencia F1, . . . , Fs se denomina escalonamientode ∆.

Aqui se utiliza la definicion de escalonabilidad ’no pura’ introducida porBjoner and Wachs [1]. Se dira que ∆ es escalonable puro si todas lascaretas tienen la misma dimension.

Definicion 2.1.10 Sea G un grafo simple no dirigido y ∆G su complejosimplicial asociado. Se dice que G es un grafo escalonable si ∆G es uncomplejo simplicial escalonable.

La anterior definicion fue introducida por Van Tuyl y Villarreal [15]. Enel referido artıculo se demuestra que todo grafo triangulado es escalon-able [15, Teorema 2.12]. Un grafo se denomina triangulado si todo ciclode longitud estrictamente mayor que 3 posee una cuerda, es decir, unaarista entre dos vertices no consecutivos del ciclo. La demostracion sebasa en el lema de Dirac [4] que asegura que todo grafo trianguladoposee un vertice, denominado simplicial, cuya vecindad induce un sub-grafo completo o clique.

Dado un subconjunto S ⊂ VG, por G\S se denota el grafo formadoa partir de G eliminando todos los vertices de S y todas las aristasincidentes en cada vertice de S. Si x es un vertice de G, por NG(x) sedenota la vecindad de x, es decir, el conjunto de todos los vertices de Gque son adyacentes a x.

Definicion 2.1.11 Sea G un grafo simple no dirigido. Un vertice xde G se denomina simplicial si su vecindad NG(x) induce un subgrafocompleto de G.

Dado un grafo G y S ⊂ VG, denotemos por ⟨S⟩ el subgrafo inducidopor el conjunto de vertices S. Notemos que si x es un vertice simplicial

Page 29: Morfismos, Vol 12, No 2, 2008

Vertices simpliciales y escalonabilidad de grafos 21

de G, el subgrafo inducido ⟨x∪NG(x)⟩ es un clique maximal, ademases el unico clique maximal que contiene a x. El siguiente teorema deDirac afirma que todo grafo triangulado tiene un vertice simplicial.

Teorema 2.1.12 (Dirac, [4]) Todo grafo triangulado G tiene un verticesimplicial. Ademas, si G no es un clique, entonces tiene dos verticessimpliciales no adyacentes entre si.

En el teorema 2.9 de [15] el vertice x1, al ser de grado 1, es un verticesimplicial ya que este vertice junto con su vecindad induce un subgrafocompleto maximal que ademas es el unico que contiene a x1 (la arista(x1, y1). Este hecho y la utilizacion de los vertices simpliciales en lademostracion de la escalonabilidad de los grafos triangulados sugierenla siguiente generalizacion:

Teorema 2.1.13 Sea G un grafo, x1 un vertice simplicial de G y suvecindad NG(x1) = x2, . . . , xr. Sea Gi = G\ (xi ∪ NG(xi)) parai = 1, . . . , r. G es escalonable si y solo si Gi es escalonable para todoi = 1, . . . , r.

Demostracion: Sea G escalonable. El teorema 2.6 del artıculo deVan Tuyl y Villarreal[15], asegura que si G es escalonable y x cualquiervertice de G, entonces el grafo G′ = G\ (x ∪ NG(x)) es escalonable.Por tanto, los grafos Gi son escalonables.La prueba en la otra direccion es practicamente identica a la prueba delteorema 2.9 de [15] sobre la escalonabilidad de los grafos triangulados.Sea Gi escalonable y Fi1, . . . , Fisi un escalonamiento de ∆Gi para cadai = 1, . . . , r. El subgrafo ⟨x1 ∪ NG(x1)⟩ = ⟨x1, . . . , xr⟩ es el unicosubgrafo maximal que contiene a x1. Ademas cada careta de ∆G, es de-cir, cada conjunto independiente maximal de G, intersecta a x1, . . . , xrexactamente en un vertice. Por el argumento anterior, la lista completade caretas de ∆G es

F11 ∪ x1, . . . , F1s1 ∪ x1; . . . ; Fr1 ∪ xr, . . . , Frsr ∪ xr.

Se demuestra que la lista con ese orden lineal es un escalonamiento de∆G. Se consideran dos casos:

1. F ′ = Fik ∪ xi, F = Fjt ∪ xj, i < j. Se tiene que xj ∈ F\F ′.Ademas, el conjunto Fjt∪x1 es un conjunto independiente de G,por tanto esta contenido en una de las caretas de ∆G que contienea x1, es decir, existe l, 1 ≤ l ≤ s1, tal que Fjt ∪ x1 ⊂ F1l ∪ x1.

Page 30: Morfismos, Vol 12, No 2, 2008

22 Roberto Cruz y Mario Estrada

Denotando por F ′′ = F1l ∪ x1, se tiene que xj = F\F ′′ y F ′′

es anterior a F .

2. F ′ = Fik ∪ xi, F = Fit ∪ xi, k < t. Este caso se demuestra apartir de la escalonabilidad del grafo Gi.

El teorema anterior generaliza el teorema 2.9 de [15] al usar quetodo vertice de grado 1, es un vertice simplicial. Este resultado ademaspuede servir para dar otra demostracion de que los grafos triangula-dos son escalonables [15, Teorema 2.12]. Todo subgrafo inducido de ungrafo triangulado es triangulado, ademas todo grafo triangulado por ellema de Dirac (teorema 2.1.12) o es un clique o contiene dos verticessimpliciales. Aplicando la induccion en n = |VG| y suponiendo que elvertice x1 de G es simplicial, los subgrafos Gi son triangulados al ser sub-grafos inducidos de G y son escalonables por la hipotesis de induccion.Por el teorema 2.1.13 el grafo G es escalonable. De igual forma, en lademostracion de que la condicion de un grafo bipartido de ser secuen-cialmente Cohen-Macaulay implica la escalonabilidad del mismo, [15,Teorema 3.8] se puede utilizar el teorema 2.1.13. Asumiendo que G esbipartido y secuencialmente Cohen-Macaulay y aplicando la induccionen el numero de vertices, el lema 3.7 de [15] asegura la existencia enG de un vertice x1 de grado 1 (es decir, un vertice simplicial). Por elteorema 3.3 del mismo artıculo los subgrafos G1 = G\ (x1 ∪ NG(x1))y G2 = G\ (y1 ∪ NG(y1)), donde y1 es el vertice adyacente a x1, sonsecuencialmente Cohen-Macaulay. Por la hipotesis de induccion estosgrafos son escalonables y por el teorema 2.1.13 se obtiene que G esescalonable.

El teorema 2.1.13 tambien puede usarse para establecer la escalon-abilidad de grafos que tengan vertices simplicales. Se toma el verticesimplicial x1, se hallan los subgrafos Gi, si alguno de estos no es escalon-able, entonces el grafo inicial no es escalonable. Si todos son escalon-ables entonces el grafo original es escalonable y su escalonamiento puedeconstruirse a partir de los escalonamientos de los subgrafos Gi.

!!

a

b

c

d

e

f

g

g

f

d

e

c

b aLos grafos G (a la derecha) y H (a la izquierda)

Page 31: Morfismos, Vol 12, No 2, 2008

Vertices simpliciales y escalonabilidad de grafos 23

Ejemplo 2.1.14 Sean G y H los grafos indicados en la figura anterior.El vertice g del grafo G es simplicial y su vecindad es NG(g) = c, d.Los grafos Gg, Gc, Gd son escalonables con escalonamientos:

∆Gg = ⟨a, e, a, f, b, e, b, f⟩; ∆Gc = ⟨b, f⟩; ∆Gd = ⟨a, e⟩.

Por el teorema 2.1.13 se obtiene que G es escalonable y que

∆G = ⟨a, e, g, a, f, g, b, e, g, b, f, g, b, f, c, a, e, d⟩,

es un escalonamiento de G. Por otra parte, el vertice a es un verticesimplicial del grafo H y su vecindad es NH(a) = b, c. El grafo Ha =H\ (a ∪ NH(a)) no es escalonable y por el teorema 2.1.13, el grafo Hno es escalonable.

Van Tuyl y Villarreal demostraron la equivalencia entre la escalon-abilidad y la propiedad de ser secuencialmente Cohen - Macaulay paralos grafos bipartidos [15, Teorema 3.8]. Como consecuencia del teo-rema 2.1.13, puede obtenerse un resultado analogo para los grafos quecontienen al menos un vertice simplicial.

Corolario 2.1.15 Sea G un grafo que contiene un vertice simplicial.Entonces G es escalonable si y solo si es secuencialmente Cohen -Macaulay.

Demostracion: Si G es escalonable entonces es secuencialmente Co-hen - Macaulay segun se deriva de un resultado de Stanley [13]. Seaahora G secuencialmente Cohen - Macaulay y supongamos que todografo secuencialmente Cohen - Macaulay con un numero menor de ver-tices es escalonable. Sea x1 un vertice simplicail de G, NG(x1) =x2, . . . , xr. Los grafos Gi = G\ (xi ∪ NG(xi)) para i = 1, . . . , r, sonsecuencialmente Cohen - Macaulay [15, Teorema 3.3] y por la hipotesisde induccion son escalonables. El teorema 2.1.13 asegura la escalonabil-idad del grafo G.

3 Multiplicacion de vertices simpliciales

En esta seccion se aplica la multiplicacion de vertices simpliciales agrafos escalonables con el fin de obtener nuevos grafos escalonables.La multiplicacion de vertices es la clave de la demostracion dada porLovasz [7] del teorema de los grafos perfectos, que afirma que un grafo

Page 32: Morfismos, Vol 12, No 2, 2008

24 Roberto Cruz y Mario Estrada

es perfecto si y solo si lo es su complemento. Un grafo G es perfectosi para todo subgrafo inducido, se cumple que su numero cromatico esigual a su numero clique. Tanto los grafos bipartidos como los grafostriangulados son grafos perfectos. En la seccion se utiliza la definicionde multiplicacion de vertices dada por Golumbic[6].

Definicion 3.1.16 [6] Sea G un grafo, x un vertice de G. El grafoG x se obtiene de G agregando un nuevo vertice x′ que se conecta atodos los vertices de NG(x). En este caso se dice que el grafo G x seobtiene por multiplicacion del vertice x.

Teorema 3.1.17 Sea G un grafo, x un vertice simplicial de G y G xel grafo obtenido por la multiplicacion del vertice x. Entonces G esescalonable si y solo si G x es escalonable.

Demostracion: Sea x un vertice simplicial de G, x′ el nuevo verticeque se conecta a todos los vertices de NG(x), G′ = G x y NG(x) =x1, . . . , xr = NG′(x′). El vertice x′ es simplicial en G′. Para i =1, . . . , r sea

Gi = G\ (xi ∪ NG(xi)) = G′i = G′\ (xi ∪ NG′(xi)) .

Los grafos obtenidos al quitar los vertices x′ y x junto con sus vecindadesde los respectivos grafos G′ y G cumplen la relacion:

G′x′ = G′\

!x′ ∪ NG′(x′)

"= G\ (x ∪ NG(x)) ∪ x = Gx ∪ x,

es decir, el grafo G′x′ es el mismo grafo Gx agregandole el vertice aislado

x.Sea G escalonable. Por teorema 2.1.13, los grafos Gx, G1, . . . , Gr, son

escalonables. El grafo Gx ∪ x es tambien escalonable, basta agregarel vertice x a todas las caretas de ∆Gx . Esto significa que los grafosG′

x′ , G′1, . . . , G

′r son escalonables y por el teorema 2.1.13, G′ = G x es

escalonable.Sea ahora G′ escalonable. Los grafos G′

x′ , G′1, . . . , G

′r son escalon-

ables por el teorema 2.1.13. Notemos que si G′x′ = Gx ∪ x es escalon-

able, entonces Gx es escalonable, basta quitar al vertice x de todas lascaretas de ∆G′

x′, pues x aislado. Entonces los grafos Gx, G1, . . . , Gr son

escalonables y por el teorema 2.1.13, el grafo G es escalonable.

Si G es un grafo que tiene dos vertices simpliciales no adyacentes conla misma vecindad, se puede considerar uno de estos vertices como mul-tiplicacion del otro, por tanto podemos eliminarlo del grafo y analizarla escalonabilidad del grafo reducido.

Page 33: Morfismos, Vol 12, No 2, 2008

Vertices simpliciales y escalonabilidad de grafos 25

La multiplicacion de vertices simpliciales puede generalizarse agre-gando mas de un vertice a cada vertice simplicial.

Definicion 3.1.18 Sea G un grafo, S = x1, . . . , xr ⊂ VG un conjuntovertices simpliales tales que NG(xi) = NG(xj) para i = j y sea h =(h1, . . . , hr) un vector de enteros positivos. El grafo H = G h seobtiene de G por multiplicacion de los vertices de S, si por cada verticesimplicial xi, i = 1, . . . , r, se agregan a G hi nuevos vertices x1

i , . . . , xhii

y cada uno de estos vertices se conecta a todos los vertices de NG(xi).

Corolario 3.1.19 Sea G un grafo y S = x1, . . . , xr ⊂ VG, conjuntode vertices simpliciales tales que NG(xi) = NG(xj) para i = j y sea h =(h1, . . . , hr) un vector de enteros positivos. Entonces G es escalonablesi y solo si el grafo H = G h es escalonable.

Demostracion: Para cada vertice xi de S, se aplica hi veces el teo-rema 3.1.17.

Nota 3.1.20 Dado un grafo escalonable G que contiene varios verticessimpliciales, el corolario anterior permite obtener nuevos grafos escalon-ables multiplicando cada uno de los vertices simpliciales de G. Si setiene un escalonamiento de G, serıa conveniente contar con un pro-cedimiento sencillo que permita construir un escalonamiento del grafomultiplicado. La demostracion del teorema 2.1.13 garantiza que si xes un vertice simplicial, se puede construir un escalonamiento de ∆G,F1, . . . , Fs, F ′

1, . . . , F′r, tal que las caretas F1, . . . , Fs, en las cuales x

esta contenido, ocupan las primeras posiciones. Por otra parte, la de-mostracion del teorema 3.1.17 garantiza que el grafo G x tiene unescalonamiento que se obtiene agregando el nuevo vertice a las caretasF1, . . . , Fs. Sin embargo, cuando el grafo escalonable H es producto dela multiplicacion de mas de un vertice simplicial del grafo escalonable Gy partiendo de un escalonamiento de G se agregan los nuevos vertices alas caretas en las cuales estan contenidos los vertices simpliciales corre-spondientes, se puede obtener una lista de caretas que no constituye unescalonamiento de H como se muestra en el ejemplo 3.1.21.

Sea G un grafo escalonable, x1, . . . , xr vertices simpliciales de G, h =(h1, . . . , hr) un vector de enteros positivos y F1, . . . , Fs es un escalon-amiento de ∆G. El escalonamiento del grafo escalonable H = G h sepuede obtener de la siguiente forma.

Sea Ft1 . . . , Ftp la subsucesion de las caretas que contienen a x1 y seaFr1 . . . , Frq la subsucesion de las caretas que no contienen a x1 donde

Page 34: Morfismos, Vol 12, No 2, 2008

26 Roberto Cruz y Mario Estrada

p + q = s. La demostracion del teorema 2.1.13 garantiza que

Ft1 . . . , Ftp , Fr1 . . . , Frq

es un escalonamiento de ∆G. Sean ahora F ′t1 . . . , F ′

tp las caretas obtenidasal agregarles a las caretas Ft1 . . . , Ftp los h1 nuevos vertices correspon-dientes a x1; la demostracion del teorema 3.1.17 garantiza que

F ′t1 . . . , F ′

tp , Fr1 . . . , Frq

es un escalonamiento del grafo escalonado H1 = G (h1, 0, . . . , 0). Siaplicamos sucesivamente el procedimiento descrito a los grafos

H2 = H1 (0, h2, 0, . . . , 0), H3 = H2 (0, 0, h3, 0, . . . , 0), . . . ,H = Hr = Hr−1 (0, 0, . . . , 0, hr)

se obtiene el escalonamiento buscado.

Ejemplo 3.1.21 Sean los grafos G y H = G h, donde S = x1, y1 yh = (1, 2).

""

"

"

""

a d y1

b cx1

a d

b c

x2

x1

y1

y2

y2

G H

Es facil ver que el grafo G es escalonable y que

∆G = ⟨x1, c, y1, x1, d, a, c, y1, b, y1, b, d⟩,

es un escalonamiento de ∆G.El vertice x2 y los vertices y2, y3 del grafo H son producto de la

multiplicacion de los vertices x1, y1 respectivamente. Por el corolarioanterior el grafo H es escalonable, sin embargo si en el escalonamientoanterior agregamos el vertice x2 a las caretas que contienen x1 y losvertices y2, y3 a las caretas que contienen y1, es facil ver que la listaobtenida

⟨x1, x2, c, y1, y2, y3, x1, x2, d, a, c, y1, y2, y3, b, y1, y2, y3, b, d⟩,

no es un escalonamiento de ∆G.Como las caretas que contienen x1 ocupan las primeras posiciones,

siguiendo la demostracion del teorema 3.1.17 podemos agregar a estas

Page 35: Morfismos, Vol 12, No 2, 2008

Vertices simpliciales y escalonabilidad de grafos 27

caretas el vertice x2, para obtener un escalonamiento del grafo H1 =G(1, 0) producto de la multiplicacion del vertice x1. El escalonamientoobtenido es:

∆H1 = ⟨x1, x2, c, y1, x1, x2, d, a, c, y1, b, y1, b, d⟩.

Se puede reorganizar el escalonamiento anterior, tomando todas lascaretas que contienen el vertice simplicial y1 en su orden y colocandolasen las primeras posiciones

∆H1 = ⟨x1, x2, c, y1, a, c, y1, b, y1, x1, x2, d, b, d⟩.

Agregando ahora a estas caretas los vertices y2, y3 se obtiene un escalon-amiento del complejo simplicial asociado a H = H1 (0, 2) = G (1, 2):

∆H =⟨x1, x2, c, y1, y2, y3, a, c, y1, y2, y3, b, y1, y2, y3, x1, x2, d, b, d⟩.

4 Grafos simpliciales y arco-circulares.

En esta seccion se establece la escalonabilidad de los grafos simpliciales yde los grafos arco-circulares que contienen al menos un vetice simplicial.

Los grafos simpliciales fueron introducidos en [2] y en [3] se estudianvarias propiedades de estos grafos que pueden ser establecidas con algo-ritmos polinomiales. En un grafo simplicial cada vertice es simplicial oes adyacente a un vertice simplicial.

Definicion 4.1.22 Dado un grafo G, un clique de G se denomina sim-plejo si contiene uno o mas vertices simpliciales. El grafo G se denom-ina simplicial si cada vertice esta contenido en un simplejo, es decir,cada vertice es simplicial o pertenece a la vecindad de un vertice sim-plicial.

Lema 4.1.23 Sea G un grafo simplicial. Para cualquier vertice v deG, el grafo Gv = G\ (v ∪ NG(v)) es simplicial.

Demostracion:Notemos que si G es un grafo simplicial, x un vertice simplicial de

G y v un vertice de G tal que x /∈ NG(v), entonces x es simplicial enel subgrafo Gv = G\ (v ∪ NG(v)). En efecto, al quitar del grafo G elvertice v y su vecindad, pueden eliminarse algunos vertices del simplejo

Page 36: Morfismos, Vol 12, No 2, 2008

28 Roberto Cruz y Mario Estrada

que contiene a x, no obstante x y los vecinos de x que quedan en Gv

inducen un simplejo en Gv.Supongamos que dado v un vertice cualquiera de G, el grafo Gv =

G\ (v ∪ NG(v)) no es simplicial. Entonces existe un vertice u de Gv

tal que u no es simplicial en Gv y no es adyacente a un vertice simplicialen Gv. Como G es simplicial pueden darse dos casos:

(Caso 1) u es simplicial en G. Como u /∈ NG(v), por la anteriorobservacion, u es simplicial en Gv, lo cual es una contradiccion.

(Caso 2) u es adyacente a un vertice x simplicial en G. Eviden-temente x /∈ NG(v), de lo contrario NG(x) ⊂ NG(v) y entonces upertenecerıa a la vecindad de v. Por la anterior observacion, x es sim-plicial en Gv lo cual es una contradiccion.

El lema 4.1.23, conjuntamente con el teorema 2.1.13 permiten es-tablecer la escalonabilidad de los grafos simpliciales.

Teorema 4.1.24 Sea G un grafo simplicial, entonces G es escalonable.

Demostracion: La prueba es por induccion en el numero de vertices.Sea G un grafo simplicial y supongamos que todo grafo simplicial conmenos vertices es escalonable.

Sea x1 un vertice simplicial y NG(x1) = x2, . . . , xr. Los grafosGi = G\ (xi ∪ NG(xi)) , son grafos simpliciales por el lema 4.1.23. Porla hipotesis de induccion estos grafos son escalonables. Por el teorema2.1.13 el grafo G es escalonable.

Si un grafo es escalonable, entonces es secuencialmente Cohen -Macaulay segun se deriva del resultado de Stanley [13]. El teoremaanterior implica que los grafos simpliciales son secuencialmente Cohen-Macaulay. Si ademas un grafo simplicial G es no mezclado, es decir,todos los cubrimientos-vertices de G tienen la misma cardinalidad, en-tonces el teorema 4.1.24 implica que G es escalonable puro y por tantoCohen-Macaulay.

Dado un grafo G y S ⊂ VG, se considera el grafo G∪W (S), obtenidomediante la adicion de nuevos vertices yi | xi ∈ S y nuevas aristas lla-madas ”bigotes”’(whiskers) xi, yi | xi ∈ S. Un corolario del teoremaanterior es el siguiente teorema de Villarreal[14]; ver tambien [12].

Corolario 4.1.25 [12, Teorema 2.1] Sea G un grafo simple y VG suconjunto de vertices. Entonces el grafo G∪W (VG) es Cohen-Macaulay.

Page 37: Morfismos, Vol 12, No 2, 2008

Vertices simpliciales y escalonabilidad de grafos 29

Demostracion: Cada uno de los vertices agregados es simplicial,por tanto G ∪ W (VG) es un grafo simplicial y por el teorema 4.1.24 esescalonable. Como G∪W (VG) es no mezclado, entonces G∪W (VG) esCohen - Macaulay.

Los grafos bien cubiertos, fueron introducidos en [8] y han sido ex-tensamente estudiados [9].

Definicion 4.1.26 Un grafo G es bien cubierto si todo conjunto inde-pendiente maximal es un conjunto independiente maximo.

La clase de los grafos bien cubiertos coincide con la clase de los grafosno mezclados pues si todos los conjuntos independientes maximales deun grafo tienen la misma cardinalidad, los cubrimientos vertices min-imales tambien tienen la misma cardinalidad. En [10], Prisner et al.caracterizan los grafos simpliciales y triangulados que son bien cubier-tos.

Teorema 4.1.27 [10, Teorema 1] Un grafo G es simplicial y bien cu-bierto si y solo si cada vertice v de G pertenece exactamente a un sim-plejo.

Teorema 4.1.28 [10, Teorema 2] Sea G un grafo triangulado. En-tonces G es bien cubierto si y solo si cada vertice v de G perteneceexactamente a un simplejo.

Aun cuando la clase de los grafos simpliciales y la clase de los grafostriangulados no son comparables entre sı, es decir, ninguna de estasdos clases de grafos es subclase de la otra, el teorema 4.1.28 implicaque los grafos triangulados bien cubiertos (no mezclados) son grafossimpliciales bien cubiertos. Ası la propiedad de los grafos trianguladosno mezclados de ser Cohen-Macaulay es consecuencia de la propiedadde los grafos simpliciales no mezclados de ser Cohen-Macaulay.

Los grafos arco-circulares [6] son una clase de grafos que general-izan a los grafos de intervalos. Un grafo de intervalo es un grafo deinterseccion de un conjunto de intervalos en la recta real. Los grafos deintervalos son triangulados y por tanto escalonados y secuencialmenteCohen - Macaulay.

Definicion 4.1.29 Un grafo G es arco-circular si sus vertices puedenponerse en correspondencia uno a uno con un conjunto de arcos en uncırculo de forma tal que dos vertices de G son adyacentes si y solo sisus arcos asociados se intersectan.

Page 38: Morfismos, Vol 12, No 2, 2008

30 Roberto Cruz y Mario Estrada

Los grafos arco-circulares en general no son triangulados pues todoslos ciclos son arco-circulares. Estos grafos no son escalonables en gen-eral, pues los ciclos pares no son escalonables. En el siguiente teoremamostraremos que si un grafo arco-circular contiene al menos un verticesimplicial, entonces es escalonable.

Teorema 4.1.30 Sea G un grafo arco-circular que tiene al menos unvertice simplicial. Entonces G es escalonable.

Demostracion: Sea G un grafo arco-circular y v un vertice cualquierade G. Entonces el grafo Gv = G\ (v ∪ NG(v)) , es un grafo de inter-valo. De hecho, quitar de G el vertice v y su vecindad, es equivalentea quitar del cırculo el arco correspondiente y todos los arcos que seintersectan con este. Los arcos restantes se pueden entonces poner encorrespondencia uno a uno con un conjunto de intervalos en la rectareal, es decir, el grafo Gv es un grafo de intervalo.

Supongamos que G no es completo, (de lo contrario el grafo es ob-viamente escalonable), x1 es un vertice simplicial de G y NG(x1) =x2, . . . , xr. Los grafos Gi = G\ (xi ∪ NG(xi)), para i = 1, . . . , r songrafos de intervalos, por tanto son triangulados y escalonables. Por elteorema 2.1.13 el grafo G es escalonable.

El teorema anterior implica que todos los grafos arco-circulares quetienen al menos un vertice simplicial son secuencialmente Cohen-Macau-lay. Si G es un grafo arco-circular, con al menos un vertice simplicial yno mezclado, entonces G es Cohen-Macaulay.

AgradecimientosEl financiamiento de este trabajo esta a cargo del Proyecto de In-

vestigacion ”Algebra conmutativa combinatoria, algebras monomiales ygrafos quımicos”, E01250, Universidad de Antioquia. El primer autortambien agradece al Programa de Asociados del International Centre ofTheoretical Physics (ICTP).

Roberto Cruz RodesDepartamento de Matematicas,Universidad de Antioquia,Calle 67 N 53108 - A. A. 1226,Medellın, [email protected]

Mario Estrada ValdesDepartamento de Matematicas,Universidad de Antioquia,Calle 67 N 53108 - A. A. 1226,Medellın, [email protected]

Page 39: Morfismos, Vol 12, No 2, 2008

Vertices simpliciales y escalonabilidad de grafos 31

Referencias

[1] Bjorner A. y Wachs M., Shellable nonpure complexes and posets.I. Trans. Amer. Math. Soc., 348 (1996), 1299-1327.

[2] Cheston G. C. A., Hare E. O., Hedetniemi S. T. y Laskar R. C.,Simplicial graphs, Congressus Numerantium, 67 (1988), 241 - 258.

[3] Cheston G. A. y Jap T. S., A survey of the algorithmic propertiesof simplicial, upper bound and midle graphs, Journal of GraphAlgorithms and Applications, 10 (2006), 159 - 190.

[4] Dirac G. A., On rigid circuit graphs, Abh. Math. Sem. Univ. Ham-burg. 25 (1961), 71-76.

[5] Fulkerson D.R. y Gross O.A., Incidence matrices and intervalgraphs, Pacific J. Math. 15 (1965), 835-855.

[6] Golumbic M. C., Algorithmic graph theory and perfect graphs. Sec-ond edition., Elsevier, 2004.

[7] Lovasz L., A characterization of perfect graphs, J. Combin. TheoryB 13 (1972), 253 - 267.

[8] Plummer M. D., Some covering concepts in graphs, J. Combin.Theory, 8 (1970), 91 - 98.

[9] Plummer M. D., Well covered graphs: a survey, Quaest. Math.,16 (1993), 253 - 287.

[10] Prisner E., Topp J., Vestergaard P. D., Well covered simplicial,chordal and circular arc graphs, J. of Graph Theory, 21 (1996),113-119.

[11] Rose D. J., Tarjan R. E. y Leuker G. S., Algorithmic aspects ofvertex elimination on graphs, SIAM J. Comput., 5 (1976), 266-283.

[12] Simis A., Vasconcelos W. y Villarreal R., On the ideal theory ofgraphs, J. Algebra 167 (1994), 389 - 416.

[13] Stanley R. P., Combinatorics and Commutative Algebra.SecondEdition. Progress in mathematics 41, Birkhuser Boston, Inc.,Boston, MA, 1996.

Page 40: Morfismos, Vol 12, No 2, 2008

32 Roberto Cruz y Mario Estrada

[14] Villarreal R. H., Cohen-Macaulay graphs, Manuscripta Math., 66(1990), 277-293.

[15] Van Tuyl A. y Villarreal R. H., Shellable graphs and sequen-tially Cohen - Macaulay bipartite graphs, (2007) Preprint. mathCO/0701296v1.

Page 41: Morfismos, Vol 12, No 2, 2008

Morfismos, Vol. 12, No. 2, 2008, pp. 33–52

Asymptotic normality of average cost Markovcontrol processes ∗

Armando F. Mendoza-Perez

Abstract

This paper studies asymptotic normality of Markov control pro-cesses (MCPs) in Borel spaces with unbounded cost. Under suit-able hypotheses we show that within the class of canonical policiesthere exists one where the cost is asymptotically normal.

2000 Mathematics Subject Classification: 93E20, 90C40.Keywords and phrases: (discrete-time) Markov control processes, av-erage cost criteria, expected average cost, average variance, asymptoticnormality.

1 Introduction.

We study the asymptotic normality of discrete-time MCPs in Borelspaces with possibly unbounded cost. Under suitable hypotheses weshow that within the class of so-called canonical policies, those thatminimize the limiting average variance have an asymptotic normalitybehavior, that is, certain distribution of the cost is asymptotically nor-mal. Asymptotic normality is very useful in adaptive control problems.

The only works for the variance minimization problem in MCPs arethose by Mandl [7, 9, 10], Hernandez-Lerma et al. [5], Prieto-Rumeauand Hernandez-Lerma [11] and Zhu and Guo [15]. For the asymptoticbehavior of the MCPs, there are a lot fewer works. For instance, weshould mention the paper by Mandl [8] for finite state MCPs.

∗This paper is part of the author’s Doctoral Thesis written at the Departamentode Matematicas, CINVESTAV-IPN.

33

Page 42: Morfismos, Vol 12, No 2, 2008

34 Armando F. Mendoza-Perez

To obtain our results we combine two approaches. The first one, toobtain canonical policies with minimum average variance, we use the W -uniform ergodicity assumptions in [5]. The second one follows Mandl’sapproach [8] to extend asymptotic normality for MCPs in Borel spaces.

The remainder of the paper is organized as follow. Section 2 containsa brief description of the Markov control model of interest. In Section3 we introduce our hypotheses and state our main result, Theorem 3.7,which is proved in Section 4. Finally, a LQ system in Section 5 illustratesour results.

2 The control model.

Let (X,A, A(x) : x ∈ X, Q,C) be a discrete time Markov controlmodel with state space X and control (or action) set A, both assumedto be Borel spaces with σ-algebras B(X) and B(A), respectively. Foreach x ∈ X there is a nonempty Borel set A(x) in B(A) which representsthe set of feasible actions in the state x. The set

K := (x, a) : x ∈ X, a ∈ A(x)

is assumed to be a Borel subset of K × A. The transition law Q is astochastic kernel on X given K and the one-stage cost C is a real-valuedmeasurable function on K.

The class of measurable functions f : X → A such that f(x) is inA(x) for every x ∈ X is denoted by F and we suppose that is nonempty.

Control policies. For every n = 0, 1, . . ., let Hn be the family ofadmissible histories up to time n; that is, H0 := X, and Hn := Kn ×Xif n ≥ 1. A control policy is a sequence π = πn of stochastic kernelsπn on A given Hn such that πn(A(xn)|hn) = 1 for every n-history hn =(x0, a0, · · · , xn−1, an−1, xn) in Hn. The class of all policies is denoted byΠ.

A policy π = πn is said to be a (deterministic) stationary pol-icy if there exists f ∈ F such that πn(·|hn) is the Dirac measure atf(xn) ∈ A(xn) for all hn ∈ Hn and n = 0, 1, . . .. Following a standardconvention, we identify F with the class of stationary policies.

For notational ease we write

Cf (x) := C(x, f(x)) and Qf (·|x) := Q(·|x, f(x)) ∀x ∈ X(1)

for every stationary policy f in F.

Page 43: Morfismos, Vol 12, No 2, 2008

Asymptotic normality of MCPs 35

Let (Ω,F) be the (canonical) measurable space consisting of thesample space Ω := (X × A)∞ and its product σ-algebra F . Then, foreach policy π and “initial state” x ∈ X, a stochastic process (xn, an)and a probability measure P π

x are defined on (Ω,F) in a canonical way,where xn and an represent the state and control at time n, n = 0, 1, . . ..The expectation operator with respect to P π

x is denoted by Eπx .

Average cost criteria. For each n = 1, 2, . . ., let

Jn(π, x) := Eπx

n−1!

t=0

C(xt, at)

be the n-stage expected cost when using the policy π, given the initialstate x ∈ X. The long-run expected average cost (EAC) is then definedas

J(π, x) := lim supn→∞

1n

Jn(π, x).(2)

Definition 2.1 (a) A policy π∗ is said to be EAC-optimal if

J(π∗, x) = infπ∈Π

J(π, x) =: J∗(x) ∀x ∈ X.(3)

(b) A stationary policy f∗ ∈ F is called canonical if there exists aconstant ρ∗ and a measurable function h1 : X → R such that

ρ∗ + h1(x) = mina∈A(x)

"

C(x, a) +#

Xh1(y)Q(dy|x, a)

$

∀x ∈ X,(4)

and f∗(x) ∈ A(x) attain the minimum on the right-hand side of (4) forevery x ∈ X, i.e.,

ρ∗ + h1(x) = Cf∗(x) +#

Xh1(y)Qf∗(dy|x) ∀x ∈ X.(5)

If (4) and (5) are satisfied, then (ρ∗, h1, f∗) is said to be a canonicaltriplet (see [1, 2, 14]).

Remark 2.2 (See [2, Section 5.2].) If (ρ∗, h1, f∗) is a canonical tripletand in addition h1 satisfies that

limn→∞

1n

Eπxh1(xn) = 0 ∀π ∈ Π, x ∈ X,(6)

then f∗ is EAC-optimal and ρ∗ is the optimal expected average cost, thatis,

J(f∗, x) = J∗(x) = ρ∗ ∀x ∈ X.(7)

Page 44: Morfismos, Vol 12, No 2, 2008

36 Armando F. Mendoza-Perez

Hence we haveFcp ⊂ Feac,(8)

where Fcp is the class of canonical policies and Feac ⊂ F is the class ofstationary EAC-optimal policies.

For each n = 1, 2, . . ., let

Sn(f, x) :=n−1!

t=0

C(xt, at)(9)

be the n-stage pathwise (or sample-path) cost when using the policyf ∈ F, given the initial state x ∈ X.

Definition 2.3 (a) For each f ∈ F and x ∈ X, define the limitingaverage variance

V (f, x) := lim supn→∞

1n

Efx

"

Sn(f, x) − Jn(f, x)#2

.(10)

(b) A stationary policy f is called variance-minimal if

V (f , x) = inff∈Feac

V (f, x) ∀x ∈ X.(11)

3 Assumptions and main result.

In this section we introduce conditions to study asymptotic normality.We shall first introduce two sets of hypotheses. The first one, As-

sumption 3.1, consists of standard continuity-compactness conditions(see, for instance, [1, 3, 5, 12]) together with a growth condition on theone-step cost C.

Assumption 3.1 For every state x ∈ X:

(a) A(x) is a compact subset of A;

(b) C(x,a) is lower semicontinuous in a ∈ A(x);

(c) the function a %→$X u(y)Q(dy|x, a) is continuous on A(x) for

every bounded measurable function u on X;

(d) there exists a measurable function W ≥ 1, a bounded measurablefunction b ≥ 0, and nonnegative constants r1 and β with β < 1,such that

Page 45: Morfismos, Vol 12, No 2, 2008

Asymptotic normality of MCPs 37

(d1) |C(x, a)| ≤ r1W (x) ∀(x, a) ∈ K and

(d2)!X W (y)Q(dy|x, a) is continuous in a ∈ A(x); and

(d3)!X W (y)Q(dy|x, a) ≤ βW (x) + b(x) for every x ∈ X.

To state our second set of hypotheses, let us first introduce the fol-lowing notation: BW (X) denotes the normed linear space of measurablefunctions u on X with finite W -norm ∥u∥W , which is defined as

∥u∥W := supx∈X

|u(x)|/W (x).(12)

In this case we say that u is W -bounded.Let µ(·) be a measure on X. We write

µ(u) :="

Xu(y)µ(dy)(13)

whenever the integral is well-defined.

Assumption 3.2 For each stationary policy f ∈ F:

(a) (W -geometric ergodicity) There exists a probability measure µf onX such that

#####

"

Xu(y)Qt

f (dy|x) − µf (u)

##### ≤ ∥u∥W RρtW (x),(14)

for every t = 0, 1, . . ., u in BW (X) and x ∈ X, where R > 0 and0 < ρ < 1 are constants independent of f .

(b) (Irreducibility) There exists a σ-finite measure λ on B(X) withrespect to which Qf is λ-irreducible.

Remark 3.3 (See [4, Theorem 3.5],[13, Theorem 4.5.3],[3, Theorem10.3.6].) Under Assumptions 3.1 and 3.2, there exists a canonical triplet(ρ∗, h1, f∗); see Definition 2.1.

To obtain asymptotic normality we need to strengthen the growthcondition on the cost function C in Assumption 3.1(d1).

Assumption 3.4 There exists a positive constant r2 such that

C4(x, a) ≤ r2W (x) ∀(x, a) ∈ K.(15)

Page 46: Morfismos, Vol 12, No 2, 2008

38 Armando F. Mendoza-Perez

Remark 3.5 (a) Because W ≥ 1, Assumption 3.4 implies Assump-tion 3.1(d1). Moreover, we have that C2(x, a) ≤ r2

1/2W (x) forevery (x, a) in K (Assumption 3.6 in [5]), condition which is nec-essary to obtain optimal policies with minimal average variance.

(b) Under Assumptions 3.1, 3.2 and 3.4, the function h1 satisfying(4) and (5) above is such that h2

1 and h41 belong to BW (X). (See

Lemma 4.3 below.)

By the Remark 3.5(b), the function Λ(·, ·) on K defined as

Λ(x, a) :=!

Xh1

2(y)Q(dy|x, a) −" !

Xh1(y)Q(dy|x, a)

#2

(16)

is finite-valued. This function is used to state the following variance-minimization result.

Proposition 3.6 (See [5, Theorem 3.8] or [3, Theorem 11.3.8].) Un-der Assumptions 3.1, 3.2 and 3.4, there exists a constant σ2

∗ ≥ 0, adeterministic canonical policy f∗ ∈ Fcp, and a function h2 in BW (X)such that, for each x ∈ X,

σ2∗ + h2(x) = Λf∗(x) +

!

Xh2(y)Qf∗(dy|x)(17)

Furthermore, f∗ satisfies (11) and V (f∗, ·) = σ2∗; in fact

V (f∗, x) = µf∗(Λf∗) = σ2∗ ∀x ∈ X(18)

andσ2∗ ≤ V (f, x) ∀f ∈ Feac, x ∈ X.(19)

Hence, (19) states that σ2∗ is the minimal average variance. We can

now state our main result, which is proved in Section 4.

Theorem 3.7 Suppose that Assumptions 3.1, 3.2 and 3.4 hold. Letf∗ ∈ Fcp be a canonical policy satisfying Proposition 3.6, and ρ∗ theoptimal average cost as in (7). Then for every initial state x ∈ X,

Sn(f∗, x) − nρ∗√n

(20)

has asymptotically a normal distribution N(0, σ2∗) as n → ∞, with

Sn(f∗, x) as in (9).

Page 47: Morfismos, Vol 12, No 2, 2008

Asymptotic normality of MCPs 39

4 Proof of Theorem 3.7.

In the remainder of this paper we suppose that Assumptions 3.1, 3.2and 3.4 hold.

To prove Theorem 3.7 we need some preliminary results, which arestated as Lemmas 4.1, 4.2, 4.3.

The following lemma summarizes some well-known results, whichare stated here for ease of reference.

Lemma 4.1 Let f ∈ F be a deterministic stationary policy and xtthe Markov chain induced by f . Then

(a) [3, Lemma 10.4.1] For each x ∈ X and t = 1, 2, . . .

EfxW (xt) ≤ [1 + b/(1 − β)]W (x),(21)

with b := supx∈X |b(x)|. Moreover, for every function u in BW (X)the following limits hold:

limn→∞

1np

Efxu(xn) = 0(22)

with p > 0.

(b) [3, Proposition 10.2.3] |Jn(f, x)−nJf | ≤ r1RW (x)/(1−ρ) ∀x ∈X, n = 1, 2, . . ., where Jf := µf (Cf ). Hence:

(c) J(f, x) = limn→∞ Jn(f, x)/n = Jf ∀x ∈ X.

(d) [3, Proposition 10.2.3] The function

hf (x) := limn→∞

[Jn(f, x) − nJf ]

=∞!

t=0

Efx [Cf (xt) − Jf ](23)

belongs to BW (X) which is called the “bias of f”. Moreover, by(b), we have

∥hf∥W ≤ r1R/(1 − ρ).(24)

(e) [3, Theorem 10.3.6] The pair (Jf , hf ) is the unique solution of thePoisson equation

Jf + hf (x) = Cf (x) +"

Xhf (y)Qf (dy|x),∀x ∈ X,(25)

that satisfies the condition µf (hf ) = 0.

Page 48: Morfismos, Vol 12, No 2, 2008

40 Armando F. Mendoza-Perez

(f) [3, Theorem 10.3.7] If f is a canonical policy in Fcp, the corre-sponding solution (Jf , hf ) = (ρ∗, hf ) to the Poisson equation (25)is such that hf coincides with the function h1, with h1 as in (4)and (5), that is,

hf (·) = h1(·) + kf

for some constant kf .

The following lemma states a stronger version of (14) and Lemma4.1(e).

Lemma 4.2 Let w(x) := W (x)1/m with m = 2 or m = 4. For eachstationary policy f ∈ F:

(a) The Markov chain xn induced by f is w-geometrically ergodic,that is,

!!!!!

"

Xu(y)Qt

f (dy|x) − µf (u)

!!!!! ≤ ∥u∥wR0ρt0w(x)(26)

for all x ∈ X and t = 0, 1, . . ., where ρ0 = ρ1/m < 1 and R0 :=R1/m;

(b) The unique solution (Jf , hf ) of the Poisson equation (25) is suchthat hf is w-bounded.

Proof. (a) This part follows from [3, Lemma 11.3.9].(b) Case m = 4: Note that (15) and part (a) of this lemma yield

the W 1/4-analogue of Lemma 4.1(d). Hence hf is W 1/4-bounded.Case m = 2: Assumption 3.4 and the fact that W ≥ 1 imply that

|C(x, a)| ≤ r1/42 W (x)1/4 ≤ r1/4

2 W (x)1/2 ∀(x, a) ∈ K.(27)

Part (a) (with m = 2) and (27) yield the W 1/2-analogue of Lemma4.1(d), that is, hf is W 1/2-bounded.

Lemma 4.3 (a) The function h1(·) satisfying (4) and (5) is W 1/4-bounded.

(b) The function h2(·) satisfying (17) is W 1/2-bounded.

Page 49: Morfismos, Vol 12, No 2, 2008

Asymptotic normality of MCPs 41

Proof. (a) By Lemma 4.1(f), h1 coincides with hf except for anadditive constant, with f a canonical policy. From Lemma 4.2(b), hf isW 1/4-bounded, therefore h1 is also W 1/4-bounded.

(b) From the proof of Proposition 3.6 (see for instance, [5, Theorem3.8] or [3, Theorem 11.3.8]) we consider the new Markov control model

(X,A, A∗(x) : x ∈ X, Q,Λ),(28)

with A∗(x) an appropriate compact subset of A(x) for every x, andΛ(x, a) as in (16). From part (a) of this lemma, h1 is W 1/4-bounded.Hence we have that Λ satisfies the following growth condition

Λ2(x, a) ≤ r3W (x) ∀(x, a) ∈ K,(29)

where r3 is a positive constant. Observe that (29) yields the W 1/2-analogue of Assumption 3.1(d1); hence, by Lemma 4.2(a), the controlmodel (28) is W 1/2-geometrically ergodic. Then from Lemma 4.1 ap-plied to the control model (28) with W 1/2 instead of W , and h2 insteadof h1, it follows that h2 is W 1/2-bounded.

We are finally ready for the proof of Theorem 3.7.Proof of Theorem 3.7. Let (ρ∗, h1, f∗) be a canonical triplet as

in Definition 2.1. Moreover, let (σ2∗, h2, f∗) be as in Proposition 3.6.

We define

τ1(x, a) :=!

Xh1(y)Q(dy|x, a) − h1(x) + C(x, a) − ρ∗

andτ2(x, a) :=

!

Xh2(y)Q(dy|x, a) − h2(x) + Λ(x, a) − σ2

for all (x, a) ∈ K. For l = 1, 2, and x ∈ X, let

ψl(x, a) :=!

Xhl(y)Q(dy|x, a) − hl(x),

and consider the characteristic functions

χn(u) := expiu(Sn(f∗, x) − nρ∗) for n = 1, 2, · · · ;u ∈ R,

with χ0(u) := 1. Let

e1(z) := expiz− iz − 1,(30)

e2(z) := expiz +z2

2− iz − 1.(31)

Page 50: Morfismos, Vol 12, No 2, 2008

42 Armando F. Mendoza-Perez

Observe thatτ1(x, a) = ψ1(x, a) + C(x, a) − ρ∗,(32)

andτ2(x, a) = ψ2(x, a) + Λ(x, a) − σ2

∗(33)

for all (x, a) ∈ K.To prove the theorem we have to verify that

limn→∞

Ef∗x χn

! u√n

"= exp−1

2σ2∗u

2.(34)

To this end, first notice that ψl(xm, am) for l = 1, 2, is the conditionalexpectation of hl(xm+1) − hl(xm) given xm, am, that is,

ψl(xm, am) = Ef∗x [hl(xm+1) − hl(xm)|xm, am].

This yields for l = 1, 2, with χm := χm(u) and ψl := ψl(xm, am), theequations

0 = iuEf∗x

#n−1$

m=0

χmψ1 −n−1$

m=0

χm

!h1(xm+1) − h1(xm)

"%

(35)

and

0 =u2

2Ef∗

x

#n−1$

m=0

χm

!h2(xm+1) − h2(xm)

"−

n−1$

m=0

χmψ2

%

.(36)

To simplify the notation, let C := C(xm, am), e1 := e1

!u(C − ρ∗)

"

and e2 := e2

!u(C − ρ∗)

". Moreover, notice that

χm+1 − χm =&expiu(C − ρ∗)− 1

'χm.(37)

From (30), (31) and (37) we have

Ef∗x χn − 1 = Ef∗

x

n−1$

m=0

(χm+1 − χm)

= Ef∗x

n−1$

m=0

&iu(C − ρ∗) −

12u2(C − ρ∗)2 + e2

'χm,(38)

and

−iuEf∗x

n−1$

m=0

χm

!h1(xm+1) − h1(xm)

"=

Page 51: Morfismos, Vol 12, No 2, 2008

Asymptotic normality of MCPs 43

iuEf∗x

!h1(x0) − χnh1(xn) +

n−1"

m=0

h1(xm+1)(χm+1 − χm)#

=

iuEf∗x

!h1(x0) − χnh1(xn)+

n−1"

m=0

h1(xm+1)$iu(C − ρ∗) + e1

%χm

#.(39)

Similarly,

u2

2Ef∗

x

n−1"

m=0

χm

$h2(xm+1) − h2(xm)

%=

−u2

2Ef∗

x

!h2(x0) − χnh2(xn) +

n−1"

m=0

h2(xm+1)(χm+1 − χm)#

=

−u2

2Ef∗

x

!h2(x0) − χnh2(xn)+

n−1"

m=0

h2(xm+1)$

expiu(C − ρ∗)− 1%χm

#.(40)

Adding (35)-(40) and using (32)

Ef∗x χn−1

= iuEf∗x

!h1(x0)−χnh1(xn)+

n−1"

m=0

χmτ1(xm, am)+n−1"

m=0

e1h1(xm+1)χm

#

−u2

2Ef∗

x

n−1"

m=0

χm

&ψ2 + 2h1(xm+1)(C − ρ∗) + (C − ρ∗)2

'

−u2

2Ef∗

x

!h2(x0)− χnh2(xn) +

n−1"

m=0

h2(xm+1)$

expiu(C − ρ∗)− 1%χm

#

+Ef∗x

n−1"

m=0

e2χm.

Hence

Ef∗x χn − 1 = κ′′(n, u)−

u2

2Ef∗

x

n−1"

m=0

χm

&ψ2 + 2h1(xm+1)(C − ρ∗) + (C − ρ∗)2

'(41)

Page 52: Morfismos, Vol 12, No 2, 2008

44 Armando F. Mendoza-Perez

with

κ′′(n, u) =

iuEf∗x

!h1(x0) − χnh1(xn) +

n−1"

m=0

χmτ1(xm, am) +n−1"

m=0

e1h1(xm+1)χm

#

−u2

2Ef∗

x

!h2(x0)−χnh2(xn) +

n−1"

m=0

h2(xm+1)$

expiu(C−ρ∗)−1%χm

#

+Ef∗x

n−1"

m=0

e2χm.(42)

Observing that

Λ(xm, am) = Ef∗x [h2

1(xm+1)|xm, am] −$Ef∗

x [h1(xm+1)|xm, am]%2

and in view of (33), we can express (41) as

Ef∗x χn−1

= κ′′(n, u)− u2

2Ef∗

x

n−1"

m=0

χm

&σ2∗ +τ2(xm, am)−h2

1(xm+1)

+$Ef∗

x [h1(xm+1)|xm, am] + C(xm, am) − ρ∗%2'

= κ′′(n, u)− u2

2Ef∗

x

n−1"

m=0

χm

&σ2∗ +τ2(xm, am)−h2

1(xm+1)

+$ (

Xh1(y)Q(dy|xm, am) + C(xm, am) − ρ∗

%2'.

Since f∗ is a canonical policy, it satisfies

h1(xm) =(

Xh1(y)Q(dy|xm, am) + C(xm, am) − ρ∗.

Then, from (37), we have

Ef∗x χn−1

= κ′′(n, u)− u2

2Ef∗

x

n−1"

m=0

χm

&σ2∗ + τ2(xm, am)−h2

1(xm+1)+h21(xm)

'

Page 53: Morfismos, Vol 12, No 2, 2008

Asymptotic normality of MCPs 45

= κ′′(n, u)− u2σ2∗

2

n−1!

m=0

Ef∗x χm − u2

2Ef∗

x

"h2

1(x0)−χnh21(xn)

+n−1!

m=0

χmτ2(xm, am) +n−1!

m=0

h21(xm+1)(χm+1 − χm)

#.

= κ′′(n, u)− u2σ2∗

2

n−1!

m=0

Ef∗x χm − u2

2Ef∗

x

"h2

1(x0)−χnh21(xn)

+n−1!

m=0

χmτ2(xm, am) +n−1!

m=0

h21(xm+1)

$expiu(C − ρ∗)− 1

%χm

#.

Hence

Ef∗x χn = 1 − u2σ2

∗2

n−1!

m=0

Ef∗x χm + κ′(n, u)(43)

with

κ′(n, u) = κ′′(n, u)−u2

2Ef∗

x

"h2

1(x0)−χnh21(xn)+

n−1!

m=0

χmτ2(xm, am)

+n−1!

m=0

h21(xm+1)

$expiu(C − ρ∗)− 1

%χm

#.(44)

Let us rewrite (43) as

Ef∗x χn = 1 +

$exp−u2σ2

∗2

− 1% n−1!

m=0

Ef∗x χm + κ(n, u),(45)

with

κ(n, u) := κ′(n, u) +"1 − u2σ2

∗2

− exp−u2σ2∗

2# n−1!

m=0

Ef∗x χm.(46)

From (45), an induction argument gives

Ef∗x χn(u) = exp−nσ2

∗u2

2+

"exp−σ2

∗u2

2− 1

# n−1!

m=0

exp&− σ2

∗u2

2(n − 1 − m)

'κ(m,u)

+κ(n, u).(47)

Page 54: Morfismos, Vol 12, No 2, 2008

46 Armando F. Mendoza-Perez

Observe that the proof of the limit (34) and consequently of Theorem3.7 follows from (47) if we show

max1≤m≤n

|κ(m,u√n

)| → 0 as n → ∞.(48)

This relation is obtained by an inspection of the different terms ofκ(m, u/

√n). We will do this in the following six steps.

(i) Since f∗ is a canonical policy satisfying (5), we have τ1(xm, am) =0 for m = 0, 1, · · · in (42). Similarly, by (17), τ2(xm, am) = 0 in (44).

(ii) From (22) we have that

limn→∞

1√n

Ef∗x h(xn) = 0 and lim

n→∞1n

Ef∗x h(xn) = 0

for every h in BW (X). This limit appears in (42) and (44) when wereplace u by u/

√n.

(iii) In this part we prove the limit (see (42))

limn→∞

1√n

Ef∗x

n−1!

m=0

e1h1(xm+1)χm = 0.

From the fact |e1(z)| ≤ z2/2 for all z in R, we obtain

"""1√n

Ef∗x

n−1!

m=0

e1h1(xm+1)χm

"""

≤ 12√

nEf∗

x

n−1!

m=0

u2

n|h1(xm+1)|(C(xm, am) − ρ∗)2

=u2

2n3/2Ef∗

x

n−1!

m=0

|#

Xh1(y)Qf∗(dy|xm)|(Cf∗(xm) − ρ∗)2.

By Lemma 4.3(a), h1(·) is 4√

W -bounded, in particular h1(·) is√

W -bounded. Hence the function

$X h1(y)Qf∗(dy|·) is

√W -bounded. On

the other hand, by Assumption 3.4 (Cf∗(x)−ρ∗)2 is also√

W -bounded.Therefore

"""1√n

Ef∗x

n−1!

m=0

e1h1(xm+1)χm

""" ≤λu2

2n3/2Ef∗

x

n−1!

m=0

W (xm)

where λ is a constant depending on h1 and C. By (21) we obtain

"""1√n

Ef∗x

n−1!

m=0

e1h1(xm+1)χm

""" ≤λu2

2n3/2n[1 + b/(1 − β)]W (x).

Page 55: Morfismos, Vol 12, No 2, 2008

Asymptotic normality of MCPs 47

which converges to zero as n → ∞.(iv) We shall next prove

limn→∞

1n

Ef∗x

n−1!

m=0

e2χm = 0.

This limit appears in (42) when we replace u by u/√

n.Observe that |e2(z)| ≤ |z|3/6 for all z in R. So, by Assumptions

3.1(d) and 3.4, together with (21),

"""1n

Ef∗x

n−1!

m=0

e2χm

""" ≤ |u|3

6n5/2Ef∗

x

n−1!

m=0

|Cf∗(xm) − ρ∗|3

≤ k3|u|3

6n5/2Ef∗

x

n−1!

m=0

W (xm)3/4

≤ k3|u|3

6n5/2Ef∗

x

n−1!

m=0

W (xm)

≤ k3|u|3

6n3/2[1 + b/(1 − β)]W (x)

which converges to zero as n → ∞, with k a constant.(v) Let h be a

√W -bounded function on X. Then

limn→∞

1n

Ef∗x

n−1!

m=0

h(xm+1)#

expi u√n

(C − ρ∗)− 1$χm = 0.

This limit appears in (42) and (44) when u is replaced by u/√

n.It follows from the relation e1(z) = expiz− iz − 1 that

expi u√n

(C − ρ∗)− 1 = iu√n

(C − ρ∗) + e1

# u√n

(C − ρ∗)$.

So

| 1n

Ef∗x

n−1!

m=0

h(xm+1)#

expi u√n

(C −ρ∗)− 1$χm| ≤

|u|n3/2

Ef∗x

n−1!

m=0

|h(xm+1)||(Cf∗(xm) − ρ∗)| +1n

Ef∗x

n−1!

m=0

|h(xm+1)||e1|.

This gives the desired conclusion by similar arguments to those in (iii).

Page 56: Morfismos, Vol 12, No 2, 2008

48 Armando F. Mendoza-Perez

(vi) The absolute value of the expression within brackets in (46)is majorized by σ4

∗u4/8, then the corresponding term in κ(n, u/

√n) is

majorized by σ4∗u

4/8n2.The statements (i)-(vi) imply (48) and consequently prove the the-

orem.

Remark 4.4 Taking A as a single-point set (singleton) we obtain theCentral Limit Theorem for (noncontrolled) Markov chains.

5 An example: a LQ system

Consider the linear system

xt+1 = k1xt + k2at + zt, t = 0, 1, · · · ,(49)

with state space X := R and positive coefficients k1, k2. The controlset is A := R, and the set of admisible controls in each state x is theinterval

A(x) := [−k1|x|/k2, k1|x|/k2].(50)

The disturbances zt consists of i.i.d. random variables with values inZ := R, zero mean and finite variance, that is,

E(zt) = 0, σ2 := E(z2t ) < ∞.(51)

To complete the description of our control model we introduce thequadratic cost-per-stage function

C(x, a) := c1x2 + c2a

2 ∀(x, a) ∈ K,(52)

with positive coefficients c1, c2. We also define

W (x) := exp[γ|x|] for all x ∈ X,(53)

with γ ≥ 4. clearly, Assumption 3.4 holds. Moreover, let s > 0 be suchthat

γs < log(γ/2 + 1),

which implies

β :=2γ

(exp[γs] − 1) < 1.(54)

Throughout the rest of this section, we suppose the following As-sumptions taken from [6, Section 5]:

Page 57: Morfismos, Vol 12, No 2, 2008

Asymptotic normality of MCPs 49

Assumption 5.1 0 < k1 < 1/2.

Assumption 5.2 The i.i.d. disturbances zt have a common density g,which is a continuous bounded function supported on the interval S :=[−s, s]. Moreover, there exists a positive number ε such that g(s) ≥ εfor all s ∈ S.

These assumptions, 5.1 and 5.2, imply that Assumptions 3.1 and 3.2hold ( see, for instance,[6, Propositions 6, 23 and 24]).

On the other hand, in [6] it is proved that there exists a uniquecanonical policy given by

f∗(x) = −f0x, ∀x ∈ X,(55)

satisfying (4) and (5), with

f0 :=v0k1k2

c2 + v0k22

and v0 is the unique positive solution to the quadratic (so-called Riccati)equation

k22v

20 + (c2 − c1k

22 − c2k

21)v0 − c1c2 = 0.

In this case, the corresponding function h1(·) is given by

h1(x) = v0x2 ∀x ∈ X,(56)

and the optimal value isρ∗ = v0σ

2,(57)

where σ as in (51). Thus (ρ∗, h1, f∗) is a canonical triplet for our linearquadratic Markov control model.

Since f∗ in (55) is the unique canonical policy, by Proposition 3.6we have that this policy also minimizes the limit average variance. Inparticular, the optimal value for the variance is

σ2∗ = V (f∗, x) = lim

n→∞1n

n−1!

t=0

Ef∗x Λf∗(xt),(58)

We next calculate the limit in (58) and find the value of the optimalvariance. To this end, let "k := k1−k2f0, B :=

#R z3g(z) dz and D :=#

R z4g(z) dz. Then by (16), (55) and (56), we have

Λf∗(xt) = v20

$4"k2σ2Ef∗

x (x2t ) + 4"kBEf∗

x (xt) + D − σ4%,(59)

Page 58: Morfismos, Vol 12, No 2, 2008

50 Armando F. Mendoza-Perez

Replacing at in (49) with at := f∗(xt) = −f0xt, we obtain

xt = (k1 − k2f0)xt−1 + zt−1 = !kxt−1 + zt−1 ∀t = 1, 2, · · · .

By (50) and Assumption 5.1, we can check that |!k| < 1.By an induction procedure, for all t = 1, 2, · · ·,

xt = !ktx0 +t−1"

j=0

!kjzt−1−j .

From this relation, we obtain

Ef∗x (xt) = !ktx,(60)

andEf∗

x (x2t ) = !k2tx2 + σ2(1 − !k2t)/(1 − !k2).(61)

The relations (60) and (61) imply the limits

limn→∞

1n

n−1"

t=0

Ef∗x (xt) = 0 and lim

n→∞1n

n−1"

t=0

Ef∗x (x2

t ) = σ2/(1 − !k2).(62)

Hence, by (59) and (62) we obtain

σ2∗ = lim

n→∞1n

n−1"

t=0

Ef∗x Λf∗(xt)

= v20

#5!k2 − 11 − !k2

σ4 +$

Rz4g(z) dz

%≥ 0.(63)

Finally, by Theorem 3.7 and considering (57), we obtain that for everyinitial state x ∈ X, as n → ∞, the distribution of the cost

&n−1t=0 Cf∗(xt) − nv0σ2

√n

has an asymptotic normal distribution N(0,σ2∗) with σ2

∗ as in (63).By (5), we obtain v0(1 − !k2) = c1 + c2f2

0 . Hence, Cf∗(x) = (c1 +c2f2

0 )x2 = v0(1− !k2)x2 for all x. This implies that for every initial statex, as n → ∞, &n−1

t=0 x2t − nσ2/(1 − !k2)√

n

Page 59: Morfismos, Vol 12, No 2, 2008

Asymptotic normality of MCPs 51

has asymptotic normal distribution N(0, s2), where

s2 =!5"k2 − 1

1 − "k2σ4 +

#

Rz4g(z) dz

$%(1 − "k2)2.

AcknowledgementThe author wishes to thank Professor Onesimo Hernandez-Lerma

for his valuable comments and suggestions.

Armando F. Mendoza-PerezUniversidad Politecnicade Chiapas,Calle Eduardo J.Selvas S/N,Tuxtla Gutierrez, [email protected]

References

[1] Gordienko E. and Hernandez-Lerma O., Average cost Markov control processeswith weigthed norms: existence of canonical policies, Appl. Math. (Warsaw),23 (1995), 199-218.

[2] Hernandez-Lerma O. and Lasserre J.B., Discrete-Time Markov Control Pro-cesses: Basic Optimality Criteria, Springer-Verlag, New York, (1996).

[3] Hernandez-Lerma O. and Lasserre J.B ., Further Topics on Discrete-timeMarkov Control Processes, Springer-Verlag, New York, (1999).

[4] Hernandez-Lerma O. and Vega-Amaya O., Infinite-horizon Markov control pro-cesses with undiscounted cost criteria: From average to overtaking optimality,Appl. Math. (Warsaw), 25 (1998), 153-178.

[5] Hernandez-Lerma O., Vega-Amaya O. and Carrasco G., Sample-path optimalityand variance-minimization of average cost Markov control processes, SIAM J.Control Optim., 38(1) (1999), 79-93.

[6] Hilgert N. and Hernandez-Lerma O., Bias optimality versus strong 0-discountoptimality in Markov control processes with unbounded costs, Acta Appl. Math.77 (2003), 215-235.

[7] Mandl P., On the variance in controlled Markov chains, Kybernetika (Prague),7 (1971), 1-12.

[8] Mandl P., On the asymptotic normality of the reward in a controlled Markovchain, Colloquia Mathematica Societatis Janos Bolyai, 9. European Meetingof Statisticians, Budapest (Hungary), (1972).

[9] Mandl P., A connection between controlled Markov chains and martingales,Kybernetika (Prague), 9 (1973), 237-241.

[10] Mandl P., Estimation and control in Markov chains, Adv. Appl. Probab., 6(1974), 40-60.

Page 60: Morfismos, Vol 12, No 2, 2008

52 Armando F. Mendoza-Perez

[11] Prieto-Rumeau T. and Hernandez-Lerma O., Variance minimization and theovertaking optimality approach to continuous–time controlled Markov chains,To appear in Math. Meth. Oper. Res.

[12] Puterman M.L., Markov Decision Process, Wiley, New York, (1994).

[13] Vega-Amaya O., Markov control processes in Borel spaces: Undiscounted cri-teria, Doctoral thesis, UAM-Iztapalapa, Mexico, 1998 (in Spanish).

[14] Yushkevich A.A., On a class of strategies in general Markov decision models,Theory Probab. Appl., 18 (1973), 777-779.

[15] Zhu Q.X. and Guo X.P., Markov decision processes with variance minimization:A new condition and approach, Stoch. Anal. Appl., 25 (2007), 577-592.

Page 61: Morfismos, Vol 12, No 2, 2008

Morfismos, Vol. 12, No. 2, 2008

Errata

En la edicion impresa del Vol. 9,No. 2 de Morfismos (diciembre de2005) se omitio involuntariamentela formula con etiqueta (14) al fi-nal de la pagina 11. La forma co-rrecta en que dicha pagina debioterminar es con los dos renglonessiguientes:

By an involuntary error, formula(14) was removed at the bottomof page 11 in the December 2005printed issue of Morfismos (Vol. 9,No. 2). The last two lines in thatpage should have been:

... B = 1.10555. He used this to show that, for x large,

0.89x

log x< π(x) < 1.11

x

log x(14)

53

Page 62: Morfismos, Vol 12, No 2, 2008
Page 63: Morfismos, Vol 12, No 2, 2008

Morfismos, Comunicaciones Estudiantiles del Departamento de Matematicas delCINVESTAV, se termino de imprimir en el mes de marzo de 2009 en el taller dereproduccion del mismo departamento localizado en Av. IPN 2508, Col. San PedroZacatenco, Mexico, D.F. 07300. El tiraje en papel opalina importada de 36 kilo-gramos de 34 × 25.5 cm consta de 500 ejemplares con pasta tintoreto color verde.

Apoyo tecnico: Omar Hernandez Orozco.

Page 64: Morfismos, Vol 12, No 2, 2008

Contenido

The vanishing discount approach to average reward optimality: the stronglyand the weakly continuous cases

Tomas Prieto-Rumeau and Onesimo Hernandez-Lerma . . . . . . . . . . . . . . . . . . . 1

Vertices simpliciales y escalonabilidad de grafos

Roberto Cruz y Mario Estrada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Asymptotic normality of average cost Markov control processes

Armando F. Mendoza-Perez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Errata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53