j s i a mjsiaml.jsiam.org/ebooks/jsiamletters_vol3-2011.pdf · keywords calculus of variations,...

J S I A M

The Japan Society for Industrial and Applied Mathematics

Vol.3 (2011) pp.1-100


Vol.3 (2011) pp.1-100

Editorial Board Chief Editor Yoshimasa Nakamura (Kyoto University)

Vice-Chief Editor Kazuo Kishimoto (Tsukuba University)

Associate Editors Reiji Suda (University of Tokyo) Satoshi Tsujimoto (Kyoto University) Masashi Iwasaki (Kyoto Prefectural University) Norikazu Saito (University of Tokyo) Koh-ichi Nagao (Kanto Gakuin University) Koichi Kato (Japan Institute for Pacific Studies) Nagai Atsushi (Nihon University) Takeshi Mandai (Osaka Electro-Communication University) Ryuichi Ashino (Osaka Kyoiku University) Tamotu Kinoshita (University of Tsukuba) Yuzuru Sato (Hokkaido University) Ken Umeno (NiCT) Katsuhiro Nishinari (University of Tokyo) Tetsu Yajima (Utsunomiya University) Narimasa Sasa (Japan Atomic Energy Agency) Fumiko Sugiyama (Kyoto University) Hiroko Kitaoka (JSOL) Hitoshi Imai (University of Tokushima) Nobito Yamamoto (University of Electro-Communications) Daisuke Furihata (Osaka University) Takahiro Katagiri (The University of Tokyo) Tetsuya Sakurai (University of Tsukuba) Takayasu Matsuo (Tokyo University) Tomohiro Sogabe (Aichi Prefectural University) Yoshitaka Watanabe (Kyushu University) Katsuhisa Ozaki (Shibaura Institute of Technology) Kenta Kobayashi (Kanazawa University) Takaaki Nara (The University of Electro-Communications) Takashi Suzuki (Osaka University) Tetsuo Ichimori (Osaka Institute of Technology) Tatsuo Oyama (National Graduate Institute for Policy Studies) Hideyuki Azegami (Nagoya University) Kenji Shirota (Aichi Prefectural University) Eiji Katamine (Gifu National College of Technology) Masami Hagiya (University of Tokyo) Toru Fujiwara (Osaka University) Yasuyuki Tsukada (NTT Communication Science Laboratories) Naoyuki Ishimura (Hitotsubashi University) Jiro Akahori (Ritsumeikan University) Kiyomasa Narita (Kanagawa University) Ken Nakamula (Tokyo Metropolitan University) Miho Aoki (Shimane University) Kazuto Matsuo (Institute of Information Security)

Keiko Imai (Chuo University) Ichiro Kataoka (HITACHI) Naoshi Nishimura (Kyoto University) Hiromichi Itou (Gunma University) Shin-Ichi Nakano (Gunma University) Akiyoshi Shioura (Tohoku University)

Contents

Regular solution to topology optimization problems of continua ・・・ 1-4 Hideyuki Azegami, Satoshi Kaizu and Kenzen Takeuchi

A convergence improvement of the BSAIC preconditioner by deflation ・・・ 5-8 Ikuro Yamazaki, Hiroto Tadano, Tetsuya Sakurai and Keita Teranishi

Cache optimization of a non-orthogonal joint diagonalization method ・・・ 9-12 Yusuke Hirota, Yusaku Yamamoto and Shao-Liang Zhang

Quasi-minimal residual smoothing technique for the IDR(s) method ・・・ 13-16 Lei Du, Tomohiro Sogabe and Shao-Liang Zhang

A new approach to find a saddle point efficiently based on the Davidson method ・・・ 17-20 Akitaka Sawamura

On rounding off quotas to the nearest integers in the problem of apportionment ・・・ 21-24 Tetsuo Ichimori

Traveling wave solutions to the nonlinear evolution equation for the risk preference ・・・ 25-28 Naoyuki Ishimura and Sakkakom Maneenop

Approximation algorithms for a winner determination problem of single-item multi-unit auctions

・・・ 29-32

Satoshi Takahashi and Maiko Shigeno

On the new family of wavelets interpolating to the Shannon wavelet ・・・ 33-36 Naohiro Fukuda and Tamotu Kinoshita

Conservative finite difference schemes for the modified Camassa-Holm equation ・・・ 37-40 Yuto Miyatake, Takayasu Matsuo and Daisuke Furihata

A multi-symplectic integration of the Ostrovsky equation ・・・ 41-44 Yuto Miyatake, Takaharu Yaguchi and Takayasu Matsuo

Solutions of Sakaki-Kakei equations of type 1, 2, 7 and 12 ・・・ 45-48 Koichi Kondo

Analysis of credit event impact with self-exciting intensity model ・・・ 49-52 Suguru Yamanaka, Masaaki Sugihara and Hidetoshi Nakagawa

On the reduction attack against the algebraic surface public-key cryptosystem(ASC04) ・・・ 53-56 Satoshi Harada, Yuichi Wada, Shigenori Uchiyama and Hiro-o Tokunaga

Deterministic volatility models and dynamics of option returns ・・・ 57-60 Takahiro Yamamoto and Koichi Miyazaki

Stochastic estimation method of eigenvalue density for nonlinear eigenvalue problem on the complex plane

・・・ 61-64

Yasuyuki Maeda, Yasunori Futamura and Tetsuya Sakurai

Computation of multipole moments from incomplete boundary data for Magnetoencphalography inverse problem

・・・ 65-68

Hiroyuki Aoshika, Takaaki Nara, Kaoru Amano and Tsunehiro Takeda

An alternative implementation of the IDRstab method saving vector updates ・・・ 69-72 Kensuke Aihara, Kuniyoshi Abe and Emiko Ishiwata

Error analysis of H1 gradient method for topology optimization problems of continua ・・・ 73-76

Daisuke Murai and Hideyuki Azegami

Evolution of bivariate copulas in discrete processes ・・・ 77-80 Yasukazu Yoshizawa and Naoyuki Ishimura

On boundedness of the condition number of the coefficient matrices appearing in Sinc-Nyström methods for Fredholm integral equations of the second kind

・・・ 81-84

Tomoaki Okayama, Takayasu Matsuo and Masaaki Sugihara

A modified Calogero-Bogoyavlenskii-Schiff equation with variable coefficients and its non-isospectral Lax pair

・・・ 85-88

Tadashi Kobayashi and Kouichi Toda

A parallel algorithm for incremental orthogonalization based on the compact WY representation

・・・ 89-92

Yusaku Yamamoto and Yusuke Hirota

Analysis of downgrade risk in credit portfolios with self-exciting intensity model ・・・ 93-96 Suguru Yamanaka, Masaaki Sugihara and Hidetoshi Nakagawa

Automatic verication of anonymity of protocols ・・・ 97-100 Hideki Sakurada

JSIAM Letters Vol.3 (2011) pp.1–4 c⃝2011 Japan Society for Industrial and Applied Mathematics

Regular solution to topology optimization problems

of continua

Hideyuki Azegami1, Satoshi Kaizu2 and Kenzen Takeuchi3

1 Graduate School of Information Science, Nagoya University, A4-2 (780) Furo-cho, Chikusa-ku,Nagoya 464-8601, Japan

2 College of Science and Technology, Nihon University, 7-24-1 Narashinodai, Funabashi, Chiba274-8501, Japan

3 Quint Corporation, 1-14-1 Fuchu-cho, Fuchu, Tokyo 183-0055, Japan

E-mail azegami is.nagoya-u.ac.jp

Received September 30, 2010, Accepted November 1, 2010

Abstract

The present paper describes a numerical solution to topology optimization problems of do-mains in which boundary value problems of partial differential equations are defined. Densityraised to a power is used instead of the characteristic function of the domain. A design vari-able is set by a function on a fixed domain which is converted to the density by a sigmoidalfunction. Evaluation of derivatives of cost functions with respect to the design variable appearas stationary conditions of the Lagrangians. A numerical solution is constructed by a gradientmethod in a design space for the design variable.

Keywords calculus of variations, boundary value problem, topology optimization, densitymethod, H1 gradient method

Research Activity Group Mathematical Design

1. Introduction

A problem finding the optimum layout of holes indomain in which boundary value problem is defined iscalled the topology optimization problem of continua [1].In the present paper, the Poisson problem is consideredas a boundary value problem for the simplicity.One of the most natural expressions of a topology opti-

mization problem uses the characteristic function of thedomain as a design variable. Let D be a fixed domain inRd, d ∈ 2, 3, ΓD ⊂ ∂D be a fixed subboundary, defineΓN = ∂D \ ΓD, and let functions f , p, and uD be fixedfunctions on D. Denoting the characteristic function forΩ ⊆ D by χΩ ∈ X = χ ∈ L∞(D;R) | 0 ≤ χ ≤ 1 a.e. inD, the normal by ν, and ∂ν = ν ·∇, we can write thetopology optimization problem as follows.

Problem 1 (Topology optimization problem)For each χΩ ∈ X, let u ∈ H1(D;R) satisfy

−∇ · (χΩ∇u) = f in D,

χΩ∂νu = p on ΓN, u = uD on ΓD.

Find χΩ such that

minχΩ∈X

J0(χΩ, u) | J(χΩ, u) ≤ 0,

where J0 and J = (J1, . . . , Jm)⊤, J l ∈ C0(X ×H1(D;R);R) are cost functions.

However, it has been shown that Problem 1 does notalways have a solution [2].To avoid the non-existence of a solution, the idea of

assuming that D consists of a micro-structure havingrectangular holes was presented [3]. In this formulation,

χΩ is substituted by a function evaluated by homoge-nization theory. A numerical scheme was demonstratedusing the finite element method [4].Moreover, it has been found that introducing a den-

sity ϕ : D → [0, 1] and a constant α > 1, and replacingχΩ by ϕα obtains a similar result to that from the micro-structure model. This method is called the SIMP (solidisotropic material with penalization) method [1, 5]. Themeaning of the penalization is that the intermediate den-sity is weakened by the nonlinear function ϕα.However, numerical instabilities such as checkerboard

patterns or mesh-dependencies are observed if the pa-rameters of micro-structure or the density is constructedby a constant function in each finite element and theyare varied using a gradient method [6,7]. If the design pa-rameters are approximated by continuous functions [8],it is known that a numerical instability, such as the so-called island phenomena, is observed [9]. In addition, al-though many numerical schemes have been proposed toovercome such numerical instabilities [10,11], regularityin the sense of functional analysis has not been shown.In the present paper, a regular solution which is free

of numerical instability is presented, where the meaningof regular is as follows. First, the admissible set of adesign variable is defined. Then, a solution is regular ifany point obtained by the solution from a point in theadmissible set also belongs to the admissible set.

2. Admissible set of design variable

To define a boundary value problem, the Lipschitzboundary is required for a domain. Accordingly, to de-termine a boundary from a level set of density ϕ, ϕ has

– 1 –

JSIAM Letters Vol. 3 (2011) pp.1–4 Hideyuki Azegami et al.

to be an element ofW 1,∞(D;R), whereD also has a Lip-schitz boundary. To avoid the restriction of the range ofϕ to [0, 1], we introduce a function θ belonging to

S = θ ∈ H1(D;R) | θ ∈W 1,∞(D;R), ∥θ∥1,∞ ≤M

as a design variable and relate it to the density ϕ by asigmoidal function, for which

ϕ (θ) =1

πtan−1 θ +

1

2(1)

is used in the present paper. BecauseM is initially fixed,the set S is weakly compact in H1(D;R). If ∥θ∥1,∞ =∥θ∥W 1,∞(D;R) ≤M becomes active, let this condition beincluded among the constraints. In the present paper,let M be sufficiently large for simplicity.To avoid loss of regularity on ∂ΓD and a set Υ ⊂ ∂D

on which u /∈ Hk+2(D;R) and vl /∈ H3−k(D;R), l ∈0, 1, . . . ,m, k ∈ 0, 1, in Problems 2 and 6 respec-tively, we provide a fixed neighborhood Ur = x ∈ D ||x− y| < r, y ∈ ∂ΓD ∪Υ for a small positive constantr, and Dr = D \ Ur.We call S the admissible set of the design variable. We

callH1(D;R) the design space with respect to S becausea Hilbert space is required for the gradient method.

3. SIMP problem

Let us consider a topology optimization problem ofSIMP type by using θ ∈ S. First, we define a boundaryvalue problem as follows.

Problem 2 (Poisson problem) For some k ∈ 0,1, let f ∈ Hk(D,R), p ∈ Hk+1/2(ΓN;R) and uD ∈Hk+2(D;R) be fixed functions, and ϕ(θ) as in (1). Findu ∈ H1(D;R) such that

−∇ · (ϕα(θ)∇u) = f in D,

ϕα(θ)∂νu = p on ΓN, u = uD on ΓD.

From the assumptions for Problem 2, we have u|Dr be-longs to Hk+2(Dr;R). Moreover, Problem 2 gives theLagrangian as

L BV(θ, v, w)

=

∫D

ϕα(θ)∇v ·∇w dx−∫D

fw dx

−∫ΓD

wp dγ −∫ΓD

(v − uD)ϕα(θ)∂νw dγ

−∫ΓD

wϕα(θ)∂νv dγ (2)

for all v, w ∈ H1(D;R) [12]. If u is a stationary pointsuch that

L BV(θ, u, w) = 0

for all w ∈ H1(D;R), u is the solution to Problem 2.Using θ and u, we define cost functions. Let us use the

following notation: ( · )θ = ∂( · )/∂θ and ( · )u = ∂( · )/∂u.Definition 3 (Cost functions) For (θ, u) ∈ S ×H1(D;R) = Y and S ×H1(Dr;R) = Yr, let g

l ∈ C1(Y ;L1(D;R)) and jl ∈ C1(Y ;L1(∂D;R)), l ∈ 0, 1, . . . ,m, aregiven functions such that glθ ∈ C0(Yr;H

1(Dr;R)), glu ∈C0(Y ;H1−k(D;R)), k ∈ 0, 1 used in Problem 2, jlθ ∈

C0(Yr;H3/2(∂Dr;R)) and jlu ∈ C0(Y ;H3/2−k(∂D;R)).

We call J0 and J = (J1, . . . , Jm)⊤,

J l(θ, u) =

∫D

gl(θ, u) dx+

∫∂D

jl(θ, u) dγ + cl,

the cost functions, where J0 is the objective function andJ are the constraint functions.

We assume that constants cl, l ∈ 0, 1, . . . ,m, are setsuch that some θ ∈ S satisfies J ≤ 0.Based on the definitions above, we consider a SIMP

problem as follows.

Problem 4 (SIMP problem) Let u be the solutionto Problem 2 for θ ∈ S. Find θ such that

minθ∈SJ0(θ, u) | J(θ, u) ≤ 0.

4. θ derivatives of J l

To solve Problem 4 by a gradient method, the Frechetderivatives of J l with respect to θ are required. Let ρ ∈H1(D;R) be a variation of θ and denote

θρ = θ + ρ

as an updated function of θ. Also, let uρ be the solutionto Problem 2 for θρ.

Definition 5 (θ derivative of J l) For J l(θ, u(θ)) :

H1(D;R) ⊃ S ∋ θ 7→ J l ∈ R, if J l′(θ, u)[ρ] such that

J l(θρ, uρ) = J l(θ, u) + J l′(θ, u)[ρ] + o(∥ρ∥1,2)

is a bounded linear functional for all ρ ∈ H1(D;R), wecall J l′(θ, u) ∈ H1′(D;R) the θ derivative of J l at θ, and

denoting as J l′(θ, u)[ρ] = ⟨Gl(θ, u), ρ⟩ with the notationof dual product, Gl(θ, u) ∈ H1′(D;R) the θ gradient.

Let us evaluate Gl(θ, u). The Lagrangian for J l(θ, u)subject to Problem 2 is defined by

L l(θ, u, vl) =

∫D

gl(θ, u)dx+

∫∂D

jl(θ, u)dγ + cl

−L BV(θ, u, vl),

where vl ∈ H1(D;R) is used as the Lagrange multiplierfor Problem 2, and L BV( · , · , · ) is as in (2).If u is the solution to Problem 2, the stationary con-

dition such that L lvl(θ, u, v

l)[w] = L BV(θ, u, w) = 0 forall w ∈ H1(D;R) is satisfied.The stationary condition such that

L lu(θ, u, v

l)[w]

= ⟨L lu(θ, u, v

l), w⟩

=

∫D

gluw dx+

∫∂D

jluw dγ −∫D

ϕα(θ)∇w ·∇vl dx

+

∫ΓD

ϕα(θ)w∂νvl dγ +

∫ΓD

ϕα(θ)vl∂νw dγ

= 0

for all w ∈ H1(D;R) is satisfied if vl ∈ H1(D;R) is thesolution of the following adjoint problem.

Problem 6 (Adjoint problem for J l) For the solu-tion u to Problem 2 at θ ∈ S, find vl ∈ H1(D;R) such

– 2 –


that

−∇ ·(ϕα (θ)∇vl

)= glu(θ, u) in D,

ϕα(θ)∂νvl = jlu(θ, u) on ΓN, vl = 0 on ΓD.

Since glu ∈ H1−k(D;R) and jlu ∈ H3/2−k(∂D;R), k ∈0, 1 as in Problem 2, we have vl|Dr ∈ H3−k(Dr;R).If u and vl are the solutions of Problems 2 and 6, re-

spectively, for θ ∈ S, the θ derivative of L l with respectto ρ ∈ H1(D;R) is given by

L l′(θ, u, vl)[ρ] = L lθ (θ, u, v

l)[ρ]

= ⟨Gl, ρ⟩

=

∫D

(Glg +Gl

a)ρdx+

∫∂D

Gljρdγ (3)

and agrees with J l′(θ, u)[ρ], where

Glg(θ, u) = glθ, Gl

j(θ, u) = jlθ,

Gla(θ, u, v

l) = −αϕα−1ϕθ∇u ·∇vl.

Therefore, we have the following result.

Theorem 7 (θ derivative of J l) For the solutions uand vl of Problems 2 and 6, respectively, for θ ∈ S,

J l′(θ, u)[ρ] = ⟨Gl, ρ⟩

holds for all ρ ∈ H1(D;R), where Gl∣∣Dr

, Glg

∣∣Dr

, Gla

∣∣Dr

and Glj

∣∣∂Dr

of (3) belong to H1′(Dr;R), H1(Dr;R)s andH3/2(∂Dr;R), respectively.

5. H1 gradient method

Since Gl∣∣Dr

belongs to the dual space H1′(Dr;R) of

H1(Dr;R), ⟨Gl, ρ⟩ is well defined in Dr. However, θϵGl

=θ+ϵGl for a small ϵ > 0 does not belong to the admissibleset S. This is considered to be the cause of the numericalinstabilities discussed in the Introduction.To avoid irregularity, we propose using an H1 gra-

dient method, which is an application of the tractionmethod [13–15] to the SIMP problem, to determine avariation ρlG ∈ H1(D;R) from θ ∈ S with Gl which isan extension of Gl

∣∣Dr

to H1(D;R).

Problem 8 (H1 gradient method) Let a : H1(D;R) × H1(D;R) → R be a coercive bilinear form suchthat there exists β > 0 that satisfies

a(y, y) ≥ β∥y∥21,2for all y ∈ H1(D;R). For Gl as in (3), find ρlG ∈H1(D;R) such that

a(ρlG, y) = −⟨Gl, y⟩

for all y ∈ H1(D;R).By the Lax-Milgram theorem, there exists a unique

solution ρlG to Problem 8. From Theorem 7, it is guaran-teed that ρlG

∣∣Dr

belongs to H3(Dr;R) ⊂ W 1,∞(Dr;R)and an extension ρlG of ρlG

∣∣Dr

belongs to W 1,∞(D;R).Moreover, since

J l(θϵρlG , uϵρ

lG)− J l(θ, u)

= ⟨Gl, ϵρlG⟩+ o(ϵ∥ρlG∥1,2)

≤ −ϵa(ρlG, ρlG) + o(ϵ∥ϵρlG∥1,2)

≤ −ϵβ∥ρlG∥21,2 + o(ϵ∥ρlG∥1,2)

< 0

for a sufficiently small positive number ϵ, ρlG is a regularvector toward to a descent direction of J l.In the present paper, we use

a(y, z) =

∫D

(∇y ·∇z + cyz) dx (4)

as a coercive bilinear form in Problem 8, where c is apositive constant.

6. Solution to SIMP problem

Let us consider a solution to Problem 4 by using asequential quadratic approximation problem.

Problem 9 (SQ approximation) Let G0 and G =(G1, . . . , Gm)⊤ be θ derivatives of J0 and J , respectively,for a θ ∈ S, a( · , · ) be given as in (4), and ϵ be a smallpositive constant. Find ϵρ such that

minρ∈BQ(ϵρ) | J(θ, u) + ⟨G, ϵρ⟩ ≤ 0,

where B = ρ ∈ H1(D;R) | ∥ρ∥1,2 = 1, and

Q(ϵρ) =1

2ϵa(ϵρ, ϵρ) + ⟨G0, ϵρ⟩.

The Lagrangian of Problem 9 is defined as

L SQ(ϵρ,λ) = Q(ϵρ) + λ · (J(θ, u) + ⟨G, ϵρ⟩),

where λ = (λ1, . . . , λm)⊤ ∈ Rm are the Lagrange multi-pliers for the constraints. The Karush-Kuhn-Tucker con-ditions for Problem 9 are given as

1

ϵa(ϵρ, y) + ⟨(G0 + λ ·G), y⟩ = 0, (5)

J(θ, u) + ⟨G, ϵρ⟩ ≤ 0, (6)

diag(λ)(J(θ, u) + ⟨G, ϵρ⟩) = 0, (7)

λ ≥ 0, (8)

for all y ∈ Y .Here, let ρ0G and ρG = (ρ1G, . . . , ρ

mG )⊤ be the solutions

to Problem 8 using a( · , · )/ϵ instead of a( · , · ), and ρG =ρ0G + λ · ρG,

ρ = ρ0 + λ · ρ =ρG

∥ρG∥1,2. (9)

Then, it is confirmed that ρG = ϵρ satisfies (5). If theall constraints in (6) are active, we have

⟨G, ϵρ⊤⟩λ = −J(θ, u)− ⟨G, ϵρ0⟩. (10)

If the G1, . . . , Gm are linearly independent, (10) has aunique solution λ. Using the λ, if there are inactive con-straints l such that J l(θ, u) + ⟨Gl, ϵρ⟩ < 0 or λl < 0, letus remove the constraints from (10), put λl = 0, and re-solving (10). Then, we can obtain λ which satisfies from(6) to (8). Since Problem 9 is a convex problem, λ is theunique solution to Problem 9.

– 3 –


To ensure the global convergence, we use the followingcriteria for ϵ in Problem 9. Let L (θ, u,λ) = J0(θ, u) +λ · J(θ, u) be the Lagrangian for Problem 4, and λϵρ

be the λ for (θϵρ, uϵρ) that satisfies the Karush-Kuhn-Tucker conditions. For a constant ξ ∈ (0, 1), the Armijocriterion [16] gives the upper limit of ϵ as

L (θϵρ, uϵρ,λϵρ)−L (θ, u,λ)

≤ ξ⟨(G0(u, v0

)+ λ ·G(u,v)

), ϵρ⟩. (11)

For a constant µ ∈ (0, 1) such that 0 < ξ < µ < 1, theWolfe criterion [17] gives lower limit of ϵ as

µ⟨(G0(u, v0) + λ ·G(u,v)), ϵρ⟩

≤ ⟨(G0(uϵρ, v0 ϵρ) + λϵρ ·G(uϵρ,vϵρ)), ϵρ⟩. (12)

We propose a numerical solution as follows. LetJ(θ0, u0) ≤ 0 is satisfied for θ0 in the following.

(i) Set θ0 ∈ S, ϵ > 0, ξ and µ such that 0 < ξ < µ < 1,ϵ0 > 0 and k = 0.

(ii) Compute J0, J , G0 and G at θ0.(iii) Solve ρ0G = ρ0 k

G and ρG = ρkG in Problem 8.

(iv) Solve λ in

⟨G,ρ⊤G⟩λ = −⟨G, ρ0G⟩. (13)

• If (8) is satisfied, proceed to the next step.• Otherwise, remove the constraints such that λl <

0, put λl = 0 and resolve (13) until (8) is satisfied.(v) Using ρ defined by (9), compute J0 and J at θϵρ.• Put λl = 0 for the inactive constraints such thatJ l(θϵρ, uϵρ) < 0.

• If J(θϵρ, uϵρ) ≤ 0, proceed to the next step.• Otherwise, set λ = λ0 and i = 0, solve δλ in

⟨G, ϵρ⊤⟩δλ = −J(θϵρ(λi), uϵρ(λ

i)) (14)

for the active constraints such that J l(θϵρ, uϵρ) ≥0, replace λi+1 = λi + δλ and i + 1 with i, andresolve (14) until J(θϵρ, uϵρ) ≤ 0 is satisfied.

(vi) Compute G0 and G at θϵρ.• If (11) and (12) hold, proceed to the next step.• If (11) or (12) does not hold, update ϵ with a

smaller or larger value. Return to (v).(vii) Let θk+1 = θϵρ, λk+1 = λ, and judge terminal

condition by ∥θk+1 − θk∥1,∞ ≤ ϵ0.• If the condition holds, terminate the algorithm.• Otherwise, replace k+1 with k and return to (iii).

7. Numerical example

A SIMP problem for a three-dimensional linear elasticcontinuum is solved by the method shown above. Let pbe a traction force and u be a displacement. Set uD = 0.A mean compliance J0(θ,u) =

∫ΓN

p · udγ and a mass

J1(θ) =∫D(ϕ(θ) − 0.4) dx are used as cost functions.

We have G0g = G0

j = 0 and G0a = −αϕα−1ϕθσ(u) ·

ε(u) for J0, and G1g = ϕθ and zeros of the other terms

for J1, where σ(u) and ε(u) denote the stress and thestrain. We use α = 2 and c = 1/(10L)2 in (4) for thewidth L of D. Finite element model consists of eight-node brick elements with three nonconforming modesand a bubble mode of 120 × 160 × 1. Fig. 1 shows theresult of the density obtained by the present method.We did not encounter any numerical instability.

¡D D

p

Fig. 1. Converged density (right) to the mean compliance mini-mization problem with mass constraint for a linear elastic prob-lem as cantilever (left).

Acknowledgments

The present study was supported by JSPS KAKENHI(20540113).

References

[1] M. P. Bendsøe, Optimization of Structural Topology, Shape,and Material, Springer-Verlag, Berlin, 1995.

[2] F. Murat, Contre-exemples pour divers problemes ou le

controle intervient dans les coefficients, Ann. Mat. Pura edAppl., Serie 4, 112 (1977), 49–68.

[3] M. P. Bendsøe and N. Kikuchi, Generating optimal topologiesin structural design using a homogenization method, Comput.

Meths. Appl. Mech. Engrg., 71 (1988), 197–224.[4] K. Suzuki and N. Kikuchi, A homogenization method for

shape and topology optimization, Comput. Meths. Appl.

Mech. Engrg., 93 (1991), 291–318.[5] G. I. N. Rozvany, M. Zhou and T. Birker, Generalized shape

optimization without homogenization, Struct. Optim., 4(1992), 250–254.

[6] A. R. Diaz and O. Sigmund, Checkerboard patterns in layoutoptimization, Struct. Optim., 10 (1995), 40–45.

[7] O. Sigmund and J. Petersson, Numerical instabilities intopology optimization: a survey on procedures dealing with

checkerboards, mesh-dependencies and local minima, Struct.Optim., 16 (1998), 68–75.

[8] K. Matsui and K. Terada, Continuous approximation of ma-terial distribution for topology optimization, Int. J. Numer.

Meth. Engng., 59 (2004), 1925–1944.[9] S. F. Rahmatalla and C. C. Swan, A Q4/Q4 continuum struc-

tural topology optimization implementation, Struct. Multi-disc. Optim., 27 (2004), 130–135.

[10] J. Petersson and O. Sigmund, Slope constrained topology op-timization, Int. J. Numer. Meth. Engng., 41 (1998), 1417–1434.

[11] G. -W. Jang, J. H. Jeong, Y. Y. Kim, D. Sheen, C. Park andM. -N. Kim, Checkerboard-free topology optimization usingnon-conforming finite elements, Int. J. Numer. Meth. Engng.,57 (2003), 1717–1735.

[12] G.Allaire, F.Jouve and A.M.Toader, Structural optimizationusing sensitivity analysis and a level-set method, J. Comput.Phys., 194 (2004), 363–393.

[13] H. Azegami, Solution to domain optimization problems (in

Japanese), Trans. JSME, Ser. A, 60 (1994), 1479–1486.[14] H. Azegami and K. Takeuchi, A smoothing method for shape

optimization: traction method using the Robin condition, Int.J. Comput. Methods, 3 (2006), 21–33.

[15] S. Kaizu and H. Azegami, Optimal shape problems and trac-tion method (in Japanese), Trans. JSIAM, 16 (2006), 277–290.

[16] L. Armijo, Minimization of functions having Lipschitz con-tinuous first partial derivatives, Pacific J. Math., 16 (1966),1–3.

[17] P. Wolfe, Convergence conditions for ascent methods, SIAM

Review, 11 (1969), 226–235.

– 4 –


A convergence improvement of the BSAIC preconditioner

by deflation

Ikuro Yamazaki1, Hiroto Tadano1, Tetsuya Sakurai1 and Keita Teranishi2

1 Graduate School of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tenn-odai, Tsukuba-shi, Ibaraki 305-8573, Japan

2 Cray, Inc., 380 Jackson St. Suite 210, St Paul, MN 55101, USA

E-mail yamazaki mma.cs.tsukuba.ac.jp

Received May 31, 2010, Accepted September 16, 2010

Abstract

We have proposed a block sparse approximate inverse with cutoff (BSAIC) preconditioner forrelatively dense matrices. The BSAIC preconditioner is effective for semi-sparse matrices whichhave relatively large number of nonzero elements. This method reduces the computationalcost for generating the preconditioning matrix, and overcomes the performance bottlenecksof SAI using the blocked version of Frobenius norm minimization and the drop-thresholdschemes (cutoff) for semi-sparse matrices. However, a larger parameter of cutoff leads to a lesseffective preconditioning matrix with a large number of iterations. We analyze this convergencedeterioration in terms of eigenvalues, and describe a deflation-type method which improvesthe convergence.

Keywords linear system, preconditioning, sparse approximate inverse, deflation

Research Activity Group Algorithms for Matrix / Eigenvalue Problems and their Applications

1. Introduction

Linear systems

Ax = b,

where A ∈ Cn×n is a semi-sparse matrix which is rela-tively dense, appear in nano-simulations. A sparse ap-proximate inverse (SAI) technique is proposed as a par-allel preconditioner for sparse matrices [1]. This precon-ditioner has a good parallel performance. However, thearithmetic costs of constructing the preconditioning ma-trix grow cubically with the number of nonzero entriesper row. We have proposed a block sparse approximateinverse with cutoff (BSAIC) [2] preconditioner for suchsemi-sparse linear systems.The BSAIC preconditioner can reduce the computa-

tional cost for constructing the approximate inverse ma-trix, and overcome the performance bottlenecks of SAIusing the blocked version of Frobenius norm minimiza-tion and the cutoff strategy for semi-sparse matrices. Alarge cutoff parameter leads to a further decrease cost ofconstructing the approximate inverse matrix. Thus, wewant to use a larger cutoff parameter as much as possi-ble. However, a convergence of Krylov subspace methodspreconditioned with BSAIC deteriorates when the cut-off parameter is large. In this paper, this deteriorationof convergence is investigated in terms of eigenvalues,and a method of the convergence improvement is alsopresented.This paper is organized as follows. In Section 2, our

method, the BSAIC preconditioner, is described. We de-scribe the convergence deterioration by large cutoff pa-rameters, and how to improve this convergence deteri-

oration and algorithms of the method in Section 3. InSection 4, the BSAIC preconditioner applied to the im-proving method is verified by numerical experiments, fol-lowed by the concluding remarks in Section 5.

2. Block SAI with Cutoff (BSAIC)

We describe the block SAI with cutoff (BSAIC) pre-conditioner. In the BSAIC preconditioner, the cutoff isapplied to the coefficient matrix A in order to reducethe computational cost of least square problems whichappear in block SAI. Firstly, the approximate coefficientmatrix Ac is generated by the following cutoff:

Ac = [aij ], aij =

aij , (|aij | > θ or i = j),

0, otherwise,(1)

where θ is a nonnegative real value. After applying thecutoff, least square problems with the approximate ma-trix Ac:

minM∥AcM − I∥2F ≈

L∑k=1

minMk

∥AcMk − Ek∥2F, (2)

where l is a block size, L = ⌈n/l⌉ and Ek is a submatrixof the identity matrix I such that I = [E1, E2, . . . , EL]are solved. The matrix M = [M1,M2, . . . ,ML] is em-ployed as the preconditioning matrix. The initial spar-sity patternM0 of the preconditioning matrix is decidedby the following:

spy(M0) = spy(Ac), (3)

where “spy” denotes the sparsity pattern of a matrix.We overcome a performance bottleneck by using a

blocked version of SAI with drop-threshold schemes to

– 5 –

JSIAM Letters Vol. 3 (2011) pp.5–8 Ikuro Yamazaki et al.

reduce the computational cost for constructing the ap-proximate inverse matrix and to improve the conver-gence of Krylov subspace methods. However, a largervalue of θ leads to a less effective preconditioning ma-trix with a large number of iterations, but a value of θis preferred to be large as much as possible. In the nextsection, we describe this convergence deterioration andthe improvement method.

3. Convergence improvement by defla-

tion

We consider to solve preconditioned linear systems(AM)(M−1x) = b, by some Krylov subspace methods.We investigate an eigenvalue distribution of AM . Theblock size l is fixed and the cutoff parameter θ is variedin BSAIC. The preconditioning matrix M approximatesthe inverse of matrix A, and AM is nearly equal to theidentity matrix I when M is a good approximation toA−1. Eigenvalues of AM are clustered around 1 whenM is a good approximation to A−1.In the restarting GMRES (GMRES(m)) [3] method,

the information concerning the eigenvalues around theorigin is discarded at the restart. These small eigenvaluesoften slow the convergence. As GMRES iterations areperformed, deflation-type schemes (e.g. GMRES-IR [4]and GMRES-DR [5]) calculate small approximate eigen-values and corresponding eigenvectors. These eigenvec-tors are added to the Krylov space in a bid to speedconvergence. An implicitly restarted GMRES (GMRES-IR) [4] proposed by Morgan is employed in Section 4.In the GMRES-IR(m, k) method, we compute the

eigenpairs of the eigenvalue problem from an Arnoldiprocess of lengthm. We then apply an implicitly restart-ing Arnoldi (IRA) [6] with the unwanted harmonic Ritzvalues [7] as shifts. The IRA method filters a chosenharmonic Ritz value away from the Arnoldi process.Here, small harmonic Ritz values are chosen, and k smalleigenvalues near the origin can be deflated. Therefore,the convergence will be improved by this deflation. Fig. 1shows the algorithm of GMRES-IR. Our experiments inSection 4 indicate the validity of the GMRES-IR methodpreconditioned with BSAIC.

4. Numerical experiments

In this section, firstly, the performance of the Krylovsubspace method preconditioned with the BSAIC pre-conditioner corresponding to θ is verified. Secondly, weanalyze the convergence deterioration by a larger valueof θ and apply the improvement strategy to the BSAICpreconditioner. All experiments are carried out by MAT-LAB 7.4 on MacBook (CPU: Intel Core 2 Duo 2.26GHz,Memory: 4.0Gbytes, OS: Mac OS 10.6.3). The testproblems are solved by the preconditioned GMRES(50)method. The stopping criterion for the relative residualis 10−10. The initial guess x0 is set to 0 and all elementsof b are set to 1. The notation #MVs means the num-ber of matrix-vector products, and the dagger (†) meansthat the stopping criterion is not satisfied in 5, 000 MVs.The test matrix is derived from the computation of the

molecular orbitals of an epidermal growth factor (EGF).

Algorithm GMRES-IR(m,k) method

1: Compute p = m− k and r0 = b−Ax0

2: Compute β = ∥r0∥2 and v1 = v0/β3: Compute Vm+1, Hm with Arnoldi method4: Compute y, the minimizer of ∥V ⊤

m+1r0 − Hmy∥2,and xm = x0 + Vmy

5: If satisfied Stop, else proceed6: Compute the harmonic Ritz values θ1, . . . , θm7: Sort |θ1∥ ≥ · · · ≥ ∥θm|8: Set shift θ1, . . . , θp9: Update Vk+1 and Hk with IRA method10: Go to 3, and resume the Arnoldi method from step

k + 1

Fig. 1. Algorithm of the GMRES-IR(m, k) method.

Iteration time

10−610−7 10−5 10−4 10−3 10−2100

101

101

102

103

104 102

Preconditioning time

Pre

con

dit

ion

ing t

ime

[sec

] (l

og)

Iteration tim

e [sec] (log)

Cutoff parameter θ

Fig. 2. The computational time of GMRES(50) with BSAIC cor-responding to θ for EGF.

The size of A is 4, 505 and the number of nonzero ele-ments is 5, 254, 215 (25.89%). In this example, the blocksize l of BSAIC is set to 30.The computational time of GMRES(50) with precon-

ditioned BSAIC corresponding to θ for EGF is reportedin Fig. 2. Our BSAIC preconditioner can solve this prob-lem faster than SAI and block SAI. However, as Fig. 2indicates, GMRES(50) preconditioned with BSAIC doesnot converge when θ is larger than 10−5. Fig. 2 alsoshows that the cutoff parameter θ is preferred to belarge as much as possible (e.g. θ = 10−3) for the pre-conditioning time. We investigate the slow down of theconvergence in terms of eigenvalue distributions of AM .Figs. 3(a), 3(b), . . . , 3(e) show eigenvalue distribu-

tions of AM corresponding to θ = 10−6, 10−5, . . . , 10−2,respectively. Fig. 3(f) shows the eigenvalue distributionof A. The red line in Fig. 3 denotes a zero eigenvalue. InFigs. 3(a)–3(e), the eigenvalue distributions of AM areclustered around 0 as θ becomes larger. The eigenvalueditribution of A in Fig. 3(f) is expanded and clusteredaround 0 more than that of AM . It is predicted that thecoefficient matrix A is ill-conditioned. This clusteringof eigenvalues is one of the key reasons for the conver-gence deterioration. Therefore, we apply the BSAIC pre-conditioner and GMRES-IR, which deflates the smallesteigenvalues, to these linear equations, and we improvethe convergence of Krylov subspace methods.Table 1 shows the results of GMRES-IR without pre-

conditioner and GMRES-IR preconditioned with ILU(0)[8] and ILUT [8]. The ε in Table 1 denotes the thresholdof ILUT. The GMRES-IR method does not converge ex-

– 6 –


0.25

0.15

0.05

−0.1

−0.2

−0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

−0.05

−0.15

−0.25

0

0.2

0.1

(a) θ = 1.0× 10−6.

0.25

0.15

0.05

−0.1

−0.2

−0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

−0.05

−0.15

−0.25

0

0.2

0.1

(b) θ = 1.0× 10−5.

0.25

0.15

0.05

−0.1

−0.2

−0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

−0.05

−0.15

−0.25

0

0.2

0.1

(c) θ = 1.0× 10−4.

0.25

0.15

0.05

−0.1

−0.2

−0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

−0.05

−0.15

−0.25

0

0.2

0.1

(d) θ = 1.0× 10−3.

0.25

0.15

0.05

−0.1

−0.2

−0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

−0.05

−0.15

−0.25

0

0.2

0.1

(e) θ = 1.0× 10−2.

0.25

0.15

0.05

−0.1

−0.2

−0.2 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

−0.05

−0.15

−0.25

0

0.2

0.1

(f) Eigenvalue distribution of A.

Fig. 3. Eigenvalue distributions of AM and A for EGF.

Table 1. Results of preconditioned GMRES-IR(50, 25) for EGF.

Preconditioner #MVsWall clock time [sec]

Precond. Iter. Total

None † — — —

ILU(0) † 11.04 — —

ILUT(ε = 10−2) † 20.51 — —

ILUT(ε = 10−3) 74 33.49 16.76 50.25

Table 2. Results of BiCGSTAB, GMRES(50) and GMRES-IRpreconditioned with BSAIC (l = 30, θ = 1.0× 10−3) for EGF.

Krylov #MVsWall clock time [sec]

Cutoff Precond. Iter. Total

BiCGSTAB † 0.59 36.89 — —

GMRES(50) † 0.59 36.89 — —

IR(50, 5) † 0.59 36.89 — —IR(50, 10) 331 0.59 36.89 15.91 53.39IR(50, 15) 262 0.59 36.89 12.55 50.03

IR(50, 20) 233 0.59 36.89 10.90 48.38IR(50, 25) 226 0.59 36.89 10.83 48.30

Table 3. Results of BiCGSTAB, GMRES(50) and GMRES-IRpreconditioned with BSAIC (l = 30, θ = 5.0× 10−3) for EGF.

Krylov #MVsWall clock time [sec]

Cutoff Precond. Iter. Total

BiCGSTAB † 0.56 17.09 — —

GMRES(50) † 0.56 17.09 — —

IR(50, 5) † 0.56 17.09 — —IR(50, 10) † 0.56 17.09 — —IR(50, 15) † 0.56 17.09 — —

IR(50, 20) 351 0.56 17.09 15.15 32.80IR(50, 25) 329 0.56 17.09 15.85 32.50

cept ILUT(ε = 10−3). GMRES-IR preconditioned withILUT(ε = 10−3) has good convergence. However, ILUTdoes not have good parallel efficiency such as SAI.

Iteration time

10−610−7 10−5 10−4 10−3 10−2100

101

101

102

103

104 102

Preconditioning time

Pre

cond

itio

nin

g t

ime

[sec

] (l

og

)

Iteration

time [sec] (lo

g)

Cutoff parameter θ

Fig. 4. The computational time of GMRES-IR(50, 25) withBSAIC corresponding to θ for EGF.

Tables 2 and 3 show the results for EGF with θ = 1.0×10−3 and 5.0 × 10−3, respectively. When BiCGSTAB[9] and GMRES(50) are used, the stopping criterionis not satisfied in both Tables 2 and 3. In Table 2,the GMRES-IR method preconditioned with BSAICconverges except GMRES-IR(50, 5). As a result, theGMRES-IR(50, 25) method converges faster than otherKrylov subspace methods. Table 3 shows that each ofGMRES-IR(50, 20) and GMRES-IR(50, 25) converges,and GMRES-IR(50, 25) converges faster than any othermethod. Fig. 4 shows that a larger value of θ can be ap-plied by using GMRES-IR. The convergence is depen-dent not only on the cutoff parameter θ but also on therestart value m and the number of deflated eigenvaluesk. Thus, we need to set an appropriate m and k. Morganalso mentioned that the choice of m and k changes theconvergence in [4].Tables 4 and 5 show the real part of harmonic Ritz

values of GMRES-IR(50, 25) and the real part of smalleigenvalues of AM , respectively. In Table 4, the param-

– 7 –


Table 4. The harmonic Ritz values of GMRES-IR(50, 25) andthe eigenvalues of AM (l = 30, θ = 1.0 × 10−3). Underlines

indicate the correct digits.

Re(H. R.) Re(eig(AM))

λ1 0.000001191238635 0.000001191238641

λ2 0.000225851028469 0.000225851028462λ3 −0.000307394803250 −0.000307394803254λ4 0.002204006095033 0.002204006095033λ5 −0.003517837137617 −0.003517837137617

Table 5. The harmonic Ritz values of GMRES-IR(50, 25) andthe eigenvalues of AM (l = 30, θ = 5.0 × 10−3). Underlines

indicate the correct digits.

Re(H. R.) Re(eig(AM))

λ1 −0.000072402852854 −0.000072402852260λ2 0.000197498551705 0.000197498552292λ3 −0.001027046005229 −0.001027046005286

λ4 0.002560480805997 0.002560480809970λ5 0.002632881319376 0.002632881318118

Table 6. The number of eigenvalues of AM around 0 (l = 30).

θ #(|d| < 10−1) #(|d| < 10−2) #(|d| < 10−3)

1.0× 10−6 6 2 0

1.0× 10−5 10 3 01.0× 10−4 18 6 21.0× 10−3 26 9 35.0× 10−3 39 11 2

eters of BSAIC are set at l = 30 and θ = 1.0 × 10−3.In Table 5, the parameters of BSAIC are set at l = 30and θ = 5.0× 10−3. “Re” and “H.R.” denote a real partand a harmonic Ritz value, respectively. The MATLABcommand eig is used to calculate the eigenvalues of AM .Both Tables 4 and 5 show that the harmonic Ritz valuesapproximate the eigenvalues of AM well. Hence, smalleigenvalues of AM are deflated, and Tables 2 and 3 alsoshow that the GMRES-IR method improves convergencemore than any other Krylov subspace method.The number of eigenvalues of AM around 0 corre-

sponding to θ is reported in Table 6. The block size l isfixed at 30. #(|d| < value) in Table 6 denotes the num-ber of absolute eigenvalues which are less than value.When θ = 1.0× 10−6 and 1.0× 10−5 are used, #(|d| <10−3) is zero and the GMRES(50) method with BSAICconverges in Fig. 2. However, when θ which is larger than10−5 is used, #(|d| < 10−3) is not zero and GMRES(50)with BSAIC does not converge in Fig. 2. Thus, a largervalue of θ increases the number of eigenvalue of AMaround 0 and eventually deteriorates the convergence ofKrylov subspace methods.

5. Conclusions

We proposed a method to improve the convergence ofthe BSAIC preconditioner using the deflation of smalleigenvalues. Our BSAIC preconditioner reduces the con-structing cost of the approximate inverse M for semi-sparse matrices. However, a larger value of cutoff param-eter θ increases iteration counts and makes convergencedifficult. We investigate this convergence deteriorationwith respect to eigenvalue distributions of AM . As a re-sult, a larger value of θ leads the eigenvalue distribution

of AM to be expanded and clustered around 0. Thiscluster of small eigenvalues makes the convergence slow,and thus the deflation-type Krylov subspace methodsimprove the convergence.In future work, we will try to find an automatic pro-

cedure for selecting the cutoff parameter θ, the restartcount m and the number of small eigenvalue k. We alsoapply for large scale problems.

Acknowledgments

This research was supported in part by a Grant-in-Aidfor Scientific Research of Ministry of Education, Cul-ture, Sports, Science and Technology, Japan (Grant Nos.21246018 and 21105502).

References

[1] E. Chow and Y. Saad, Approximate inverse preconditionersvia sparse-sparse iterations, SIAM J. Sci. Comput., 19 (1998),995–1023.

[2] I. Yamazaki, M. Okada, H. Tadano, T. Sakurai and K. Teran-

ishi, A block sparse approximate inverse with cutoff precondi-tioner for semi-sparse linear systems derived from MolecularOrbital calculations, JSIAM Letters, 2 (2010), 41–44.

[3] Y. Saad and M. H. Schultz, GMRES: a generalized minimal

residual algorithm for solving nonsymmetric linear systems,SIAM J. Sci. Stat. Comput., 7 (1986), 856–869.

[4] R. B. Morgan, Implicitly restarted GMRES and Arnoldimethods for nonsymmetric systems of equations, SIAM J.

Matrix Anal. Appl., 21 (2000), 1112–1135.[5] R. B. Morgan, GMRES with deflated restarting, SIAM J. Sci.

Comput., 24 (2002), 20–37.[6] D. C. Sorensen, Implicit application of polynomial filters in

a k-step Arnoldi method, SIAM J. Matrix Anal. Appl., 13(1992), 357–385.

[7] C. C. Paige, B. N. Parlett and H. A. Van der Vorst, Approxi-

mate solution and eigenvalue bounds from Krylov subspaces,Numer. Lin. Alg. Appl., 2 (1995), 115–133.

[8] Y. Saad, Iterative methods for sparse linear systems, SIAM,Philadelphia, 2003.

[9] H.A.van der Vorst, BiCGSTAB: a fast and smoothly converg-ing variant of Bi-CG for the solution of nonsymmetric linearsystems, SIAM J. Sci. Stat. Comput., 13 (1992), 631–644.

– 8 –


Cache optimization of a non-orthogonal joint

diagonalization method

Yusuke Hirota1, Yusaku Yamamoto1 and Shao-Liang Zhang2

1 Graduate School of System Informatics, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe657-8501, Japan

2 Graduate School of Engineering, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601,Japan

E-mail hirota stu.kobe-u.ac.jp

Received September 30, 2010, Accepted October 31, 2010

Abstract

The LUJ2D algorithm is a recently proposed numerical solution method for non-orthogonaljoint diagonalization problems appearing in signal processing. The original LUJ2D algorithmattains low performance on modern microprocessors since it is dominated by cache ineffectiveoperations. In this study, we propose a cache efficient implementation of the LUJ2D algorithm.The experimental results show that the proposed implementation is about 1.8 times fasterthan the original one, achieving 21% of the peak performance on the Opteron 1210 processorusing one core.

Keywords joint diagonalization, LUJ2D, cache optimization


1. Introduction

Given nonsingular symmetric matrices C(k) ∈ RN×N

(k = 1, 2, . . . ,K), we consider the problem of finding anonsingular matrix W ∈ RN×N such that

WC(k)WT = Λ(k) (k = 1, 2, . . . ,K) (1)

are diagonal. This is called the non-orthogonal joint di-agonalization problem. This type of problem appears insignal processing [1, 2]. Of course, these problems haveno solution for general C(k)k. In that case, we try tomake Λ(k)’s as diagonal as possible by some criterion.This problem can be solved as an orthogonal joint

diagonalization problem when some preprocessing is ap-plied. However, this approach degrades the quality of theapproximate solution for real problems whose input ma-trices contain errors [3]. To avoid this problem, variousalgorithms to find the non-orthogonal matrixW directlyhave been proposed [4–7].The LUJ2D algorithm is the one, which is recently

proposed by B. Afsari [7]. It has a desirable propertythat it does not require the positive-definiteness of theinput matrices. However, a simple implementation of thealgorithm (e.g. the MATLAB code used in the experi-ments in [7]) attains low performance on modern cachebased processors, since the implementation is based onvector update operations. In this paper, we propose acache efficient implementation of the LUJ2D algorithmwhich is dominated by matrix products.This paper is organized as follows. In Section 2, we de-

scribe the LUJ2D algorithm in its original form. Then wepropose a cache efficient implementation. In Section 3,we evaluate the performance of the proposed implemen-tation by numerical experiments. The paper is concluded

in Section 4.

2. The LUJ2D algorithm

The non-orthogonal joint diagonalization problem canbe formulated as the following minimizing problem

minW

J2(W )

whereW is a nonsingular matrix and J2 is a nonnegativefunction

J2(W ) =K∑

k=1

∥∥∥C(k) −W−1diag(WC(k)WT)(W−1)T∥∥∥2F.

Here, diag(X) is the diagonal part of X. J2 measuresthe non-diagonality of (1). Λ(k)’s are diagonal simulta-neously if and only if J2(W ) = 0.The LUJ2D algorithm reduces J2 iteratively by the

update

Wm+1 = LmUmWm,

where Wm is the m-th approximate solution. Here, Um

is a product of N(N − 1)/2 matrices Ri,j(ai,j) (1 ≤ i <j ≤ N) and Lm is that of Ri,j(ai,j) (1 ≤ j < i ≤ N).Ri,j is defined as Ri,j(x) = I+xeie

Tj ∈ RN×N where ek

is k-th unit vector. Note that there is a freedom in theorder of the Ri,j ’s in the product. The parameters ai,j ’sare determined by

ai,j =

argmina

J2(Ri,j(a)U′mWm) (1 ≤ i < j ≤ N)

argmina

J2(Ri,j(a)L′mUmWm) (1 ≤ j < i ≤ N)

(2)

where each U ′m and L′

m is a product of Ri,j(ai,j)’s thathave already been determined. Each one dimensional

– 9 –

JSIAM Letters Vol. 3 (2011) pp.9–12 Yusuke Hirota et al.

minimization problem (2) is solved by finding zero pointsof a cubic equation. The coefficients of the algebraicequation are determined from the (i, j)-th and (j, j)-thelements of (U ′

mWm)C(k)(U ′mWm)T (k = 1, . . . ,K) (or

(L′mUmWm)C(k)(L′

mUmWm)T (k = 1, . . . ,K)). The it-eration is terminated if WmC

(k)WTm (k = 1, . . . ,K) are

sufficiently close to diagonal. Then Wm is the numericalsolution.In this section, we describe two existing implementa-

tions of the LUJ2D algorithm and propose a new one.The implementations use only lower and diagonal el-ements of C(k) (k = 1, . . . ,K) since the matrices aresymmetric.

2.1 Original implementations

Vector update based implementation A simpleimplementation of the LUJ2D algorithm is shown below.Here, submatrices and subvectors of C(k) are representedin MATLAB notation. The notation A+= B means A←A+B.

1: procedure LUJ2D (vector update based)2: W ← (a nonsingular initial guess)3: for m = 1, 2, . . . until convergence do4: U ← I5: for j = 2, . . . , N do6: for i = 1, . . . , j − 1 do7: Compute ai,j by (2)8: Update U9: for k = 1, . . . ,K do

10: C(k)i,1:i+= ai,jC

(k)j,1:i

11: C(k)i+1:j,i+= ai,j(C

(k)j,i+1:j)

T

12: C(k)j+1:N,i+= ai,jC

(k)j+1:N,j

13: end for14: end for15: end for16: (Lower part is constructed in a similar way)17: W ← LUW18: end for19: return W20: end procedure

At each (i, j) step, ai,j is determined first. Then, theupdates of C(k) ← Ri,jC

(k)RTi,j (k = 1, . . . ,K) are per-

formed. These updates are performed by adding C(k)’sj-th row vector multiplied by ai,j to C(k)’s i-th row vec-tor and C(k)’s j-th column vector multiplied by ai,j toC(k)’s i-th column vector, since

Ri,jC(k)RT

i,j = [(I + ai,jeieTj )C

(k)](I + ai,jeieTj )

T.

When the symmetry is exploited, these operations canbe written as lines 10–12 of the procedure.These vector update operations require 2KN3 FLOPS

and dominate the computational cost. The performanceof the implementation is quite low since the vector up-dates are cache ineffective.Rank-1/2 update based implementation The

rank-1/2 update based implementation is shown below.

1: procedure LUJ2D (rank-1/2 update based)2: W ← (a nonsingular initial guess)3: for m = 1, 2, . . . until convergence do

4: U ← I5: for j = 2, . . . , N do6: for i = 1, . . . , j − 1 do7: Compute ai,j by (2)8: Update U9: end for10: aj = [a1,j , a2,j , . . . , aj−1,j ]

T

11: for k = 1, . . . ,K do

12: y ← 1/2 c(k)j,j aj + (C

(k)j,1:j−1)

T

13: C(k)1:j−1,1:j−1+= yaT

j + ajyT

(only lower elements are computed)

14: C(k)j:N,1:j−1+= C

(k)j:N,ja

Tj

15: end for16: end for17: (Lower part is constructed in a similar way)18: W ← LUW19: end for20: return W21: end procedure

The determination of ai,j is not influenced by the up-date of C(k) by Rl,j , if l = i. Accordingly, the updateoperations with the same j:

C(k) ← Ri,jC(k)RT

i,j (i = 1, 2, . . . , j − 1)

can be performed by two rank-1 updates

C(k) ← C(k) + [a1,j , a2,j , . . . , aj−1,j , 0, . . . , 0]T(eTj C

(k)),

C(k) ← C(k) + (C(k)ej)[a1,j , a2,j , . . . , aj−1,j , 0, . . . , 0]

if i < j (the updates can be performed in a similar way ifi > j). Moreover, by exploiting the symmetry of C(k)’s,the updates are performed by rank-1 updates and sym-metric rank-2 updates as shown in lines 13–14.This implementation requires 4/3KN3 FLOPS for the

rank-2 updates and 2/3KN3 FLOPS for the rank-1 up-dates per iteration. This implementation attains betterperformance than the vector update based one sincerank-1/2 updates are more cache effective than vectorupdates. Nevertheless, the performance of the rank-1/2update based implementation is still low.

2.2 A matrix product based implementation

In this subsection, we propose a matrix product basedimplementation. It is shown below.

1: procedure LUJ2D (matrix product based)2: W ← (a nonsingular initial guess)3: for m = 1, 2, . . . until convergence do4: U ← I5: for J = 1, 2, . . . , N/M do6: j′ = (J − 1)M + 17: for I = 1, 2, . . . , J − 1 do8: i′ = (I − 1)M + 19: for j = j′, j′ + 1, . . . , JM do10: for i = i′, i′ + 1, . . . , IM do11: Compute ai,j by (2)12: Update U13: end for14: aj = [ai′,j , . . . , aIM,j ]

T

15: for k = 1, . . . ,K do

16: y(k)j ← 1/2 c

(k)j,j aj +(C

(k)j,i′:IM )T

– 10 –


17: C(k)j′:j−1,i′:IM+= (C

(k)j,j′:j−1)

TaTj

18: C(k)j:JM,i′:IM+= C

(k)j:JM,ja

Tj

19: end for20: end for21: AI,J ← [aj′ ,aj′+1, . . . ,aJM ]22: for k = 1, . . . ,K do

23: Y (k) ← [y(k)j′ ,y

(k)j′+1, . . . ,y

(k)JM ]

24: C(k)i′:IM,i′:IM+= AI,JY

T + Y ATI,J


25: C(k)i′:IM,1:i′−1+= AI,JC

(k)j′:JM,1:i′−1

26: C(k)IM+1:j′−1,i′:IM+=

(C(k)j′:JM,IM+1:j′−1)

TATI,J

27: end for28: end for29: AJ ← [AT

1,J , . . . , ATJ−1,J ]

T

30: for k = 1, . . . ,K do

31: C(k)JM+1:N,1:j′−1+= C

(k)JM+1:N,j′:JMA

TJ

32: end for33: for j = j′ + 1, . . . , JM do34: for i = j′, . . . , j − 1 do35: Compute ai,j by (2)36: Update U37: end for38: aj = [aj′,j , . . . , aj−1,j ]

T

39: for k = 1, . . . ,K do

40: yj = 1/2 cj,jaj + (C(k)j,j′:j−1)

T

41: C(k)j′:j−1,j′:j−1+= yja

Tj + ajy

Tj


42: C(k)j′:j−1,1:j′−1+= ajC

(k)j,1:j′−1

43: C(k)j:N,j′:j−1+= C

(k)j:N,ja

Tj

44: end for45: end for46: end for47: (Lower part is constructed in a similar way)48: W ← LUW49: end for50: return W51: end procedure

The sweeping order for matrix products on Um andLm in this implementation is different from the onesdescribed in the previous subsection as shown in Fig. 1.We partition (i, j)|1 ≤ i ≤ N, 1 ≤ j ≤ N into subsetsSI,J = (i, j)|(I−1)M+1 ≤ i ≤ IM, (J−1)M+1 ≤ j ≤JM (1 ≤ I ≤ N/M, 1 ≤ J ≤ N/M), where M is thedivisor of N . M is called the block size. Also, the sets

SI,: and S:,I are defined as∪N/M

J=1 SI,J and∪N/M

J=1 SJ,I

respectively.We consider determining the ai,j ’s ((i, j) ∈ SI,J) at

first and then perform the following M2 updates

C(k) ← Ri,jC(k)RT

i,j ((i, j) ∈ SI,J)

at once. However, to determine ai,j from c(k)j,i and c

(k)j,j ,

we must use the values of these elements partially up-dated with the preceding Ri,j ’s. To solve this problem,

we update only c(k)i,j ((i, j) ∈ SI,J ∪ SJ,I) just after the

determinations of ai,j ((J − 1)M + 1 ≤ i ≤ JM). Theupdates C(k) ← Ri,jC

(k)RTi,j ((J − 1)M + 1 ≤ i ≤ JM)

(i)

(j)

N

Block size

M

Fig. 1. The sweeping order of (i, j) (the red line is for Um, theblue one is for Lm). The left is the sweeping order of the original

implementations. The right is that of the proposed one.

can be performed by two rank-1 updates

C(k) ← C(k) + a(eTj C(k)),

C(k) ← C(k) + (C(k)ej)aT,

where a is N dimensional column vector and (a)i =ai,j ((J − 1)M + 1 ≤ i ≤ JM); 0 (otherwise). Byexploiting the symmetry, these updates are performedby rank-1/2 updates as shown in lines 17–18. The rest

c(k)i,j ((SI,: \ SI,J) ∪ (S:,I \ SJ,I)) are updated after thewhole determinations of ai,j ((i, j) ∈ SI,J). The updatesC(k) ← Ri,jC

(k)RTi,j ((i, j) ∈ SI,J) can be performed byC(k) ← C(k) +AC(k),

C(k) ← C(k) + C(k)AT,(3)

where A is an N -by-N matrix and (A)i,j = ai,j((i, j) ∈SI,J); 0(otherwise). Therefore, these updates can beperformed as matrix products. Moreover, the updates

(3) (1 ≤ I ≤ J − 1 or J + 1 ≤ I ≤ N) to c(k)i,j ((i, j) ∈

∪N/M

P=J+1(SI,P ∪ SP,I)(I < J);∪J−1

P=1(SI,P ∪ SP,I)(I >J)) can be combined. By the combination, the perfor-mance of matrix products improves since the size in-creases. By exploiting the symmetry, these updates areperformed by matrix products and rank-2M updates(that are essentially identical with matrix products) asshown in lines 24–26, and 31. If (i, j) ∈ SJ,J , the updatesC(k) ← Ri,jC

(k)RTi,j ((J−1)M +1 ≤ i ≤ J−1) are per-

formed by rank-1 updates like the case on the rank-1/2update based implementation. By exploiting the symme-try, these updates can be performed by rank-1/2 updatesas shown in lines 41–43.The number of floating point operations in this imple-

mentation is shown in Table 1. The total number of oper-ations is identical with that of the original implementa-tions, most of which are matrix products ifM ≪ N . Theperformance of the matrix products is improved with anincrease in M . However, there is a trade-off between theimprovement and an increase of the number of rank-1/2update operations which are cache ineffective.


To evaluate the performance of the implementationswe perform numerical experiments.The test set of matrices are generated by the following

procedure: (i) Generate nonsingular diagonal matrices

– 11 –


Table 1. The number of floating point operations per iterationin the proposed implementation.

Rank-2 updates 4/3KNM2

Rank-1 updates (lines 42,43) 2KN2M − 4/3KNM2

Rank-1 updates (lines 17,18) 2KN2M − 2KNM2

Rank-2M updates 2KN2M − 2KNM2

Matrix products 2KN3 − 6KN2M + 4KNM2

Total 2KN3

∆(k) ∈ RN×N (k = 1, 2, . . . ,K) whose diagonal elementsare random values in (0, 1). (ii) Generate a nonsingularmatrix B ∈ RN×N using random numbers in (0, 1). (iii)Compute C(k) = B∆(k)BT (k = 1, 2, . . . ,K).We set N = 240 and K = 240. The iteration is started

with W = I and terminated when the condition√√√√ K∑k=1

∑i =j

(W (m)C(k)(W (m))T)2i,j√√√√ K∑k=1

N∑i=1

N∑j=1

(W (m)C(k)(W (m))T)2i,j

< 10−5

is satisfied.In the numerical experiments, we compare the per-

formance of the proposed implementations with variousvalues ofM and the rank-1/2 update based one. The ex-periments were carried out on the machine with CPU:Opteron 1210 1.8 GHz (3.6 GFLOPS, only one core wasused), OS: CentOS 5.5, Compiler: GFortran 4.4.0, Com-piler options: -march=native -O3 -funroll-loops.AMD Core Math Library (ACML) subroutines are usedfor the rank-1/2/2M updates and matrix products.In all the implementations, the number of iteration

was 128 and W was almost identical. Fig. 2 shows thetotal execution time and its breakdown. The proposedimplementation with optimal block size M = 12 is thefastest and 1.77 times faster than the rank-1/2 updatebased one. Fig. 3 shows the performance of rank-1/2/2Mupdates, matrix products and average for each blocksize. We can observe that the performance of matrixproducts and rank-2M updates increases significantlywith M . On the other hand, as seen in Table 1, thenumber of operations in rank-1/2 update, which pro-vides less performance, increases. Accordingly, the per-formance of the implementation is maximized when Mis much smaller than the best size for matrix products.Also, we observe that the proposed implementation withoptimal block size achieves 21% of the peak performance.We remark that the performance of the blocked im-

plementation strongly depends on the performance ofthe product of small matrices. Thus, the implementa-tion may show better performance if a library tuned forsmall matrix products is available.

4. Conclusion

In this paper, we proposed a cache efficient implemen-tation of the LUJ2D algorithm. The numerical experi-ments show that the performance of the proposed im-plementation with optimal block size is about 1.8 timesfaster than the rank-1/2 update based one. Moreover,

Determination of ai j,

Matrix productRank-2M updateRank-1 update (lines 17 18),Rank-1 update (lines 42 43),Rank-2 update

Rank-2 updateRank-1 update

Rank-1/2

update

based

5 6 8 10 12 15 20 24 30 40

Block size M

Matrix product based

0

500

1000

1500

2000

2500

3000

Exec

uti

on t

ime

[sec

]

Fig. 2. The execution time of the implementations.

Matrix productRank-2M updateRank-1 update (lines 17 18),Rank-1 update (lines 42 43),Rank-2 update

Rank-2 updateRank-1 update

Average

Rank-1/2

update

based

5 6 8 10 12 15 20 24 30 40

Block size M

Matrix product based

0

0.5

1

1.5

2

2.5

Per

form

ance

[G

FL

OP

S]

Fig. 3. The performance of the subroutines.

the proposed one achieved 21% of the peak performanceon the Opteron 1210 processor using one core.

References

[1] A. Belouchrani, K. Abed-Meraim, J. -F. Cardoso and E.

Moulines, A blind source separation technique using second-order statistics, IEEE Trans. Signal Processing, 45 (1997),434–444.

[2] A. Ziehe and K. -R. Muller, TDSEP – an efficient algorithm

for blind separation using time structure, in: Proc. of the 8thInt. Conf. Artificial Neural Networks, pp. 675–680, 1998.

[3] J. -F. Cardoso, On the performance of orthogonal source sep-aration algorithms, in: Proc. of European Signal Processing

Conf., pp. 776–779, 1994.[4] A. Yeredor, Non-orthogonal joint diagonalization in the least-

squares sense with application in blind source separation,

IEEE Trans. Signal Processing, 50 (2002), 1545–1553.[5] A. Ziehe, P. Laskov, G. Nolte and K. -R. Muller, A fast al-

gorithm for joint diagonalization with non-orthogonal trans-formations and its application to blind source separation, J.

Mach. Learn. Res., 5 (2004), 777–800.[6] R. Vollgraf and K. Obermayer, Quadratic optimization for si-

multaneous matrix diagonalization, IEEE Trans. Signal Pro-cessing, 54 (2006), 3270–3278.

[7] B. Afsari, Simple LU and QR based non-orthogonal matrixjoint diagonalization, in: Proc. of the 6th Int. Conf. on Inde-pendent Component Analysis and Blind Source Separation,J. Rosca et al. ed., Lect. Notes in Comput. Sci., Vol. 3889,

pp. 1–7, Springer-Verlag, Berlin, 2006.

– 12 –


Quasi-minimal residual smoothing technique

for the IDR(s) method

Lei Du1, Tomohiro Sogabe2 and Shao-Liang Zhang1

1 Department of Computational Science and Engineering, Nagoya University, Furo-cho,Chikusa-ku, Nagoya 464-8603, Japan

2 Graduate School of Information Science and Technology, Aichi Prefectural University,Nagakute-cho, Aichi-gun, Aichi 480-1198, Japan

E-mail lei-du na.cse.nagoya-u.ac.jp

Received September 17, 2010, Accepted December 13, 2010

Abstract

The IDR(s) proposed by Sonneveld and Gijzen is an efficient method for solving large non-symmetric linear systems. In this paper, QMRIDR(s), a new variant of the IDR(s) method ispresented. In this method, the irregular convergence behavior of IDR(s) is remedied and boththe fast and smooth convergence behaviors are expected. Numerical experiments are reportedto show the performance of our method.

Keywords Induced Dimension Reduction, the IDR(s) method, linear systems, QMRIDR(s),residual smoothing


1. Introduction

Numerical iterative methods play an important rolein solving large and sparse linear systems of the form:

Ax = b (1)

in which coefficient matrix A is real, nonsymmetric andnonsingular with the order of n, and right hand side bis a given vector.The IDR(s), a generalization of the IDR method [1]

for solving the problem (1), was recently proposed bySonneveld and Gijzen [2]. Some variants of this methodhave been proposed since then. A new IDR(s) variantby imposing bi-orthogonalization conditions was devel-oped in [3]. By exploiting the merit of BiCGStab(ℓ) [4]to avoid the potential breakdown, especially for skew-symmetric or nearly skew-symmetric systems, IDRStaband GBi-CGStab(s, L) were proposed with higher orderstabilization polynomials in [5] and [6], respectively. Ablock version of IDR(s) for solving linear systems withmultiple right-hand sides was developed in [7]. The re-lation between IDR and BiCGStab [8] was discussed in[9]. From the view point of Petro-Galerkin method, IDRwas explained by Gutknecht in [10]. Moreover, Ritz-IDRwas also explained in [11].The IDR(s) method has the property of fast conver-

gence, but its convergence history of the norm of resid-uals shows a quite irregular convergence behavior likemany other Lanczos-type product methods. Further, thequasi-minimal residual technique [12], as a variant of theBiCG method [13], will be shortly called QMR. We knowthat QMR can remedy the irregular convergence behav-ior. Thus we consider using QMR to IDR(s), which canproduce our method QMRIDR(s). Both the property offast convergence and smooth convergence behavior of

QMRIDR(s) are expected.This paper is organized as follows: In the next section,

we review the IDR(s) method to show how it works. InSection 3, we present our idea which tries to reformu-late the relations of residuals and their auxiliary vectorsin the IDR(s) method and construct an iterative solu-tion by minimizing the norm of a quasi-residual. Numer-ical results are reported to show the performance of ourmethod in Section 4. Finally, we conclude this paper inSection 5.

2. The IDR(s) method

In this section, we review the IDR(s) method.Given an initial approximation x0 with its correspond-

ing residual r0 := b−Ax0, the kth Krylov subspace canbe defined as follows:

Kk(A, r0) := spanr0, Ar0, . . . , Ak−1r0.

Let G0 := Kn(A, r0) be the full Krylov subspace and Sbe a subspace in Cn. Define a sequence of subspaces Gjby recursion as Gj := (I − ωjA)(Gj−1 ∩S) in which ωj ’sare nonzero constants.Under the assumption that subspace S ∩ G0 does not

contain a nontrivial invariant subspace of A, the fol-lowing result of the IDR theorem [1, 2] is obtained:Gj $ Gj−1, i.e., Gj is a proper subset of Gj−1. This factimplies that the sequence of nested subspaces Gj is finiteuntil Gj = 0.Based on this theorem, the IDR(s) method was pro-

posed to construct the next s + 1 new residuals inthe same subspace Gj when the former s + 1 residu-als are given in Gj−1. Usually subspace S is defined asS := N (PT ), a null space of the transpose of P , whereP is a matrix with the order of n×s. It was suggested to

– 13 –

JSIAM Letters Vol. 3 (2011) pp.13–16 Lei Du et al.

orthogonalize a set of random vectors for P in [2]. Now,we show the process of constructing a new residual.Assume that residuals ri−s, . . . , ri in subspace Gj−1

are known, then a new residual ri+1 is constructed as

ri+1 := (I − ωjA)vi, (2)

in which the auxiliary vector vi is defined as vi = ri −∑sl=1 γl∆ri−l where γl ∈ R and ∆rk := rk+1 − rk.It is obvious that vi ∈ Gj−1 for all linear combinations

of ri−s, . . . , ri. To ensure ri+1 ∈ Gj , the auxiliary vectorvi should be also in subspace S which implies that theunknowns γl’s will be determined under the condition ofPTvi = 0. The parameter ωj is obtained by minimizingthe 2-norm of ri+1 and will keep the same in the next siterations.Let ∆xk := xk+1 − xk, then xi+1 can be updated as

xi+1 = xi + ωjvi −s∑

l=1

γl∆xi−l. (3)

In the following s iterations, the foremost residual isreplaced by the new one and the above process can becycled to construct a new intermediate residual. Finally,s+1 residuals in Gj are obtained. The IDR(s) algorithm[2] is summarized as follows.

Algorithm 1 IDR(s)

1: Initialize x0, j = 0, P ∈ Rn×s;2: r0 = b − Ax0, and compute r1, . . . , rs in G0 by an

existing Krylov solver;3: for k = s, s+ 1, . . . , do4: Determine γl’s by solving PTvk = 0;5: Construct vk = rk −

∑sl=1 γl∆rk−l;

6: j = j + 1 when k + 1 ≡ 0 (mod s+ 1),ωj = argmin

ω∥rk+1∥2;

7: Compute rk+1 = (I − ωjA)vk in Gj ;8: Update xk+1 = xk + ωjvk −

∑sl=1 γl∆xk−l;

9: If xk+1 has converged then stop;10: end for

3. QMR smoothing technique

In this section, we reconsider the relations of residualsand their auxiliary vectors in the IDR(s) method andpropose the QMRIDR(s) method by constructing a newiterative solution. First, let us define

yi :=

ri if i < s,

vi if i ≥ s,

and

Yk := [y0 y1 . . . yk−1],

Wk+1 := [r0 r1 . . . rk−1 rk].

Then, we see that (2) in Algorithm 1 can be reformu-lated as

Avi =1

ωj(vi − ri+1) (4)

=1

ωj

(ri −

s∑l=1

γl∆ri−l − ri+1

),

which can be represented in the matrix form of

AYk =Wk+1Hk (5)

where

Hk =

∗ · · · ∗ 0∗ · · · ∗ ∗

. . .. . .

. . .. . .

∗. . .

. . . ∗

∗. . .

.... . . ∗

0 ∗

is a (k+1)× k upper Hessenberg matrix with the band-width of s+2, and symbol ∗ denotes a nonzero entry ofHk.By the definition of vi, we can easily prove that col-

umn vectors of Wk and Yk can span the same Krylovsubspace, i.e.,

Kk(A, r0) = spanr0, r1, . . . , rk−1

= spany0,y1, . . . ,yk−1. (6)

Now, we construct a new iterative solution xk basedon the basis of y0,y1, . . . ,yk−1 which can be written asxk = x0 + Ykzk for zk ∈ Rk. By (5), the correspondingresidual vector rk = b−Axk satisfies rk = r0−AYkzk =Wk+1(e1 −Hkzk) where e1 = [1, 0, . . . , 0]T ∈ Rk+1.To obtain a smooth convergence, the ideal generated

by xk is to determine zk by minimizing ∥rk∥2, but thestorage requirement is hard from the non-orthogonalityof r0, r1, . . . , rk. As a compromise between the opti-mality and storability, the quasi-minimal residual tech-nique used to BiCG is reconsidered here by minimizing∥e1 −Hkzk∥2.A diagonal matrix Ωk+1 = diag (δ0, . . . , δk) with δi =∥ri∥2, is used to make the columns of Wk+1 to be ofunit norm, i.e., rk = Wk+1Ω

−1k+1(δ0e1 − Hkzk) where

Hk = Ωk+1Hk. Then, the quasi-residual ∥δ0e1 − Hkz∥2is minimized for zk instead of ∥e1 −Hkzk∥2.Due to the special structure of Hk, QR decomposition

by Givens rotations can be adopted, let

Hk = QTk+1

[Rk

0

]where Qk+1 is a unitary (k + 1) × (k + 1) matrix, andRk is a nonsingular upper triangular k × k matrix withbandwidth of s+ 2. Then, we have that

minz∥δ0e1 − Hkz∥2 = min

z

∥∥∥∥ δ0Qk+1e1 −[Rk

0

]z

∥∥∥∥2

and zk is determined as zk = R−1k tk, where

tk :=

τ1...τk

T

,

[tkτk+1

]:= δ0Qk+1e1.

It is easy to see that

minz∥δ0e1 − Hkz∥2 = ∥δ0e1 − Hkzk∥2 = |τk+1|,

– 14 –


and the iterative solution xk can be rewritten as

xk = x0 + YkR−1k tk,

instead of (3) in the IDR(s) method.As Rk is a triangular matrix with bandwidth of s+2,

the iterative solution xk can be updated in short-termrecurrence analogous to the way in [12]. The differencebetween the previous method and the proposed methodis using the decomposition of a Hessenberg matrix withthe bandwidth of s + 2 instead of using a tridiagonalmatrix.Under the framework of Algorithm 1, we can propose

the QMRIDR(s) algorithm which is summarized as fol-lows.

Algorithm 2 QMRIDR(s)

1: Initialize x0, j = 1, P ∈ Rn×s;2: Compute r0 = b−Ax0, r1, . . . , rs in G0, and Hs;3: Compute vs, rs+1 in G1, the new column of Hs+1,

then decompose Hs+1 and compute [ts+1T , τs+2]

T ;4: [f1,f2, . . . ,fs+1] = Ys+1R

−1k+1;

5: xs+1 = x0 + [f1,f2, . . . ,fs+1]ts+1;6: for k = s+ 1, s+ 2, . . . do7: Determine γl’s by solving PTvk = 0;8: Construct vk = rk −

∑sl=1 γl∆rk−l;

9: j = j + 1 when k + 1 ≡ 0 (mod s+ 1),ωj = argmin

ω∥rk+1∥2;

10: Compute rk+1 = (I − ωjA)vk in Gj ;11: Update the new column of Hk+1 by the latest

(s + 1) Givens rotations, then zero out the lastelement by a new Givens rotation G(ck+1, sk+1);

12: τk+1 = ck+1τk+1, τk+2 = −sk+1τk+1, and fk+1 =

(yk −∑s+1

i=1 Rk+1−i,k+1fk+1−i)/Rk+1,k+1, whereRi,k+1 denotes the entry of Rk+1 at the ith rowand (k + 1)th column;

13: xk+1 = xk + τk+1fk+1;14: If xk+1 has converged then stop;15: end for

Several criteria can be used to stop the iteration inour algorithm. A natural choice is to make use of ∥rk∥2,which has been calculated previously. Other conditionsare checked for ∥rk∥2 or its upper bound ∥rk∥2 ≤√k + 1 |τk+1| where residual rk can be easily updated

at low cost per iteration step. Mixed strategies of themcan also be utilized in Algorithm 2.We expect to obtain smooth convergence history of the

residuals, but at the cost of more memory requirementsand level one operations of BLAS per iteration step. Forexample, we should save more s+ 1 vectors and updatethe vector fk for the iterative solution xk in Algorithm2.


In this section, we report some numerical results withthe IDR(s) and QMRIDR(s) methods. Parameter s wasequal to 1 and 4. As for the initial guess and right-handside vectors, we always chose x0 = 0 and b = [1, . . . , 1]T .All the elements of matrix Pn×s were random value dis-tributed in the interval (0,1). The stopping criterion was

Table 1. Test matrices.

Matrix n nnz∗ Application discipline

ADD32 4960 23884 Electronic circuit design

FIDAP037 3565 67591 Finite element modelingPDE2961 2961 14585 Partial differential equationSHERMAN4 1104 3786 Oil reservoir modeling

*Number of the nonzero entries.

Table 2. Computation time [sec.].

ADD32 FIDAP037 PDE2961 SHERMAN4

IDR(1) 0.20 0.31 0.30 0.06QMRIDR(1) 0.23 0.34 0.34 0.07

IDR(4) 0.23 0.29 0.34 0.06QMRIDR(4) 0.32 0.34 0.45 0.09

∥rk∥2/∥b∥2 ≤ 10−8, with rk = b − Axk being the trueresidual, otherwise 2n matrix-vector products would beperformed at most.Experiments were performed on a Redhat linux sys-

tem (64 bit) with an AMD Phenom(tm) 9500 Quad-CoreProcessor using double precision arithmetic. Codes werewritten in the C++ language and compiled with GCC4.1.2. All test matrices in this section were taken fromthe Matrix Market collection [14]. The order, number ofnonzero elements and application disciplines of the testmatrices are listed in Table 1.Algorithms were run without preconditioning. The

convergence behavior is shown by the number of matrix-vector products (on the horizontal axis) versus log10 ofthe relative norm ∥rk∥2/∥b∥2 (on the vertical axis) in allfour figures, and the computation time is listed in Table2.As shown in Figs. 1–4, we have the following observa-

tions. First, all peaks of the graphs related to the IDR(s)method disappear for the graphs of the QMRIDR(s)method which converged with much smoother curves.Second, both methods (the same s) need almost thesame number of matrix-vector products to stop the iter-ations, and with larger s can converged at less iterationsteps. This shows the QMRIDR(s) method also keepsthe fast convergence property of the IDR(s) method.From Table 2, we also note that the QMRIDR(s)

method required more computation time because of theadditional costs per iteration step. Although both meth-ods with larger s converged at less iteration steps, itseems that they took more computation time for someof our test problems because of more inner products withlarger s.

5. Conclusions

In this paper, we propose a variant of the IDR(s)method: QMRIDR(s) for solving nonsymmetric linearsystems. To define this method, we reformulated the re-lations of residuals and their auxiliary vectors in theIDR(s) method and presented them in matrix form.Based on this arrangement, we can adopt the quasi-minimal residual smoothing technique and successfullyconstruct an iterative solution in short-term recurrence.Numerical results show that the proposed method

not only has the smooth convergence behavior but also

– 15 –


−9

−8

−7

−6

−5

−4

−3

−2

−1

0

1

0 20 40 60 80 100 120

Rel

ativ

e 2-n

orm

res

idu

als

Number of matrix-vector products

IDR(1)QMRIDR(1)

IDR(4)QMRIDR(4)

Fig. 1. ADD32.

0 10 20 30 40 50 60 70 80 90 100


IDR(1)QMRIDR(1)

IDR(4)QMRIDR(4)

−9

−8

−7

−6

−5

−4

−3

−2

−1

0

1

Rel

ativ

e 2

-norm

res

idual

s

Fig. 2. FIDAP037.

retains the fast convergence property of the IDR(s)method.

Acknowledgments

We sincerely thank the anonymous referee whosecomments and suggestions helped us to improve themanuscript. This research was partially supported by theChina Scholarship Council and the Ministry of Educa-tion, Science, Sports and Culture, Grant-in-aid for Scien-tific Research (Nos. 21760058, 19560065 and 22104004).

References

[1] P. Wesseling and P. Sonneveld, Numerical experiments witha multiple grid and a preconditioned lanczos type method,Lect. Notes Math., Vol. 771, pp. 543–562, Springer-Verlag,Berlin, Heidelberg, New York, 1980.

[2] P. Sonneveld and M. van Gijzen, IDR(s): a family of simpleand fast algorithms for solving large nonsymmetric systems oflinear equations, SIAM J. Sci. Comput., 31 (2008),1035–1062.

[3] M. van Gijzen and P. Sonneveld, An elegant IDR(s) vari-

ant that efficiently exploits bi-orthogonality properties, DelftUniv. of Technology, Reports of the Department of AppliedMathematical Analysis, Report 08-21, 2008.

[4] G. L. G. Sleijpen and D. R. Fokkema, BiCGstab(ℓ) for lin-

ear equations involving unsymmetric matrices with complexspectrum, Elec. Trans. Numer. Anal., 1 (1993), 11–32.

[5] G. L. G. Sleijpen and M. B. van Gijzen, ExplotingBiCGSTAB(ℓ) strategies to induce dimension reduction,

−10

−8

−6

−4

−2

0

2

4

0 50 100 150 200 250 300

Rel

ativ

e 2

-no

rm r

esid

ual

s


IDR(1)QMRIDR(1)

IDR(4)QMRIDR(4)

Fig. 3. PDE2961.

0 20 40 60 80 100 120 140 160 180


−10

−8

−6

−4

−2

0

2

Rel

ativ

e 2

-no

rm r

esid

ual

s

IDR(1)QMRIDR(1)

IDR(4)QMRIDR(4)

Fig. 4. SHERMAN4.

SIAM J. Sci. Comput., 32 (2010), 2687–2709.[6] M. Tanio and M. Sugihara, GBi-CGSTAB(s,L): IDR(s)

with higher-order stabilization polynomials, J.Comput.Appl.Math., 235 (2010), 765–784.

[7] L. Du, T. Sogabe, B. Yu, Y. Yamamoto and S.-L. Zhang, Ablock IDR(s) method for nonsymmetric linear systems with

multiple right-hand sides, submitted to J. Comput. Appl.Math..

[8] H. A. van der Vorst, Bi-CGSTAB: A fast and smoothly con-verging variant of Bi-CG for the solution of nonsymmetric

linear systems, SIAM J. Sci. Stat. Comput., 13 (1992), 631–644.

[9] G. L. G. Sleijpen, P. Sonneveld and M. B. van Gijzen, Bi-CGSTAB as an induced dimension reduction method, Appl.

Numer. Math., 60 (2010), 1100–1114.[10] M. H. Gutknecht, IDR explained, Elec. Trans. Numer. Anal.,

36 (2010), 126–148.

[11] V. Simoncini and D. B. Szyld, Interpreting IDR as a petrov-galerkin method, SIAM J. Sci. Comput., 32 (2010), 1898–1912.

[12] R. W. Freund, QMR: a quasi-minimal residual method for

non-Hermitian linear systems, Numer.Math., 60 (1991), 315–339.

[13] C. Lanczos, Solution of systems of linear equations by mini-mized iterations, J. Res. Nat. Bur. Standards, 49 (1952), 33–

53.[14] Matrix Market, http://math.nist.gov/MatrixMarket/.

– 16 –


A new approach to find a saddle point efficiently

based on the Davidson method

Akitaka Sawamura1

1 Sumitomo Electric Industries, Ltd., 1-1-3, Shimaya, Konohana-ku, Osaka 554-0024, Japan

E-mail sawamura-akitaka sei.co.jp

Received September 30, 2010, Accepted February 24, 2011

Abstract

A new eigenvector-following approach for finding a saddle point without the Hessian matrixis described. The most important feature of the proposed approach is to rely not only onthe lowest as in the case of a conventional approach, but also on higher, albeit less accurate,eigensolutions which are available when the Davidson method is employed. The proposedapproach is shown to be more efficient than the conventional one by application to diffusionof a Zn interstitial atom in an InP supercell.

Keywords diffusion, reaction, transition state, potential energy surface


1. Introduction

Many aspects of diffusion and chemical reaction canbe reduced to questions about potential energy surface,in particular where saddle points are on the surface. Oneof approaches frequently employed to locate the saddlepoints is an eigenvector-following strategy [1]. This strat-egy resembles nonlinear optimization methods. A stepis, however, taken uphill along the eigenvector with thelowest eigenvalue of a Hessian or dynamical matrix, anddownhill along all the other directions.To my knowledge, the eigenvector-following method

is first described by Crippen and Scheraga [2] and re-fined by Cerjan and Miller [3] from the viewpoint ofLagrangian multiplier technique. This early version re-quires explicitly the Hessian, which it is usually costly ortedious to evaluate. To overcome this problem, Munroand Wales (MW) proposed an alternative approachwhich makes use of only force [4]. MW purged the Hes-sian by employing conjugate-gradient method [5] to ob-tain the lowest eigensolution, considering the facts that

• the conjugate-gradient method requires a matrix-vector product, not the matrix as a whole,

and that

• the matrix-vector product is calculated approxi-mately as a difference in force.

The Davidson method [6] is another algorithm foreigenvalue problems. Since in the Davidson method asubspace is constructed explicitly, higher “otiose” eigen-soltions are also calculated even when the lowest is theonly targeted solution. Taking advantage of all theseavailable solutions, the author propose a new approachfor finding the saddle point efficiently while utilizing theforce only.

2. Method

To locate the saddle points, if the Hessian H is readilyavailable, iterative application of a Newton-like formula,

∆x = (H − λ)−1f (1)

can be a preferred choice [7,8], where ∆x is a step vector

which atoms are moved according to, f is the force act-ing on the atoms, and λ is a shift parameter. If all theeigenvalues ϵi (in an ascending order) and correspondingeigenventors vi of the Hessian are known, (1) is rewrittenas

∆x =∑

vi1

ϵi − λv Ti f . (2)

Form (2), clearly λ should be so chosen that ϵ1 − λ isnegative while ϵi − λ with i > 1 is positive to ensurethat ∆x represents a direction energetically uphill alongv1 and downhill along the remaining eigenvectors [3].Even when the Hessian is not explicitly available, for-

tunately at least the lowest eigenvalue ϵ1 and corre-sponding eigenvector v1 can be calculated as alreadymentioned. Using this solution the force f is partitionedinto parallel and perpendicular components as

f ∥ = v1 vT1 f , (3)

and

f ⊥ =(I − v1 v T

1

)f , (4)

respectively. A modified force

f † = −f ∥ + f ⊥ (5)

is one of directions uphill along v1 and downhill in thetangent subspace as with ∆x in (1). In the MW method

the atoms are moved relying on −f ∥ and then on f ⊥

in a sequential manner. Henkelman and Jonsson (HJ)proposed, however, that the step vector ∆x is set pro-

– 17 –

JSIAM Letters Vol. 3 (2011) pp.17–19 Akitaka Sawamura

portional to a further modified form:

f † =

−f ∥ if ϵ1 > 0

−f ∥ + f ⊥ otherwise,(6)

for faster convergence toward a near-by saddle point [9].The approach proposed in the present Letter is some-

what in between the above two. Eq. (2) can be rewrittenby dividing its summation into two parts:

∆x =∑i≤n

vi1

ϵi − λv Ti f +

∑i>n

vi1

ϵi − λv Ti f . (7)

Suppose that ϵi is independent form i and equal to ϵ′ fori > n, where n is such an integer that n eigenvectors areeasily handled. On this assumption, the second term ofthe right-hand side of (7) can be rewritten as∑

i>n

vi1

ϵi − λv Ti f =

∑i>n

vi1

ϵ′ − λv Ti f

=1

ϵ′ − λ∑i>n

vivTi f

=1

ϵ′ − λ

(I −

∑i≤n

vi vTi

)f . (8)

Inserting (8) into (7) we have

∆x =∑i≤n

vi1

ϵi − λv Ti f +

1

ϵ′ − λ

(I −

∑i≤n

vi vTi

)f . (9)

While actually an approximation to (2), for findinga saddle point when the Hessian is unavailable, (9)can lead to an efficient approach. In contrast to theconjugate-gradient method as chosen by MW and HJ,the Davidson method is one of iterative algorithms whichcan compute multiple eigensolutions at once even ifonly the lowest is to be sought. Therefore, if the multi-ple eigensolutions supplied by the Davidson method areused with (9), faster convergence toward a saddle pointcan be expected than if only the lowest one is considered.In practice, determining λ appropriately requires the

Hessian again [3]. In the proposed approach, alternativeforms of the parallel and perpendicular forces similar inspirit to (9) are introduced as follows:

f ∥ = v1ϵmax

|ϵ1|v T1 f , (10)

and

f ⊥ =∑

1<i≤n

viϵmax

|ϵi|v Ti f +

(I −

∑1≤i≤n

vi vTi

)f , (11)

where ϵmax is an absolute maximum among ϵi:

ϵmax = maxi|ϵi|. (12)

The proposed approach is a doubly iterative one. Theouter loop consists of following steps. First, the force f isevaluated at the current atomic configuration x. Second,the Davidson method as the innter loop is started withonly the lowest eigensolution targeted. The Hessian-vector product is approximated by a finite-difference for-

mula,

Ht ≈ −1

η

(f∣∣∣x+ηt

− f∣∣∣x

), (13)

where t is a normalized trial vector, specifically a residualassociated with v1 orthonormalized against the currentsubspace, and η is a scaling parameter. If stopping cri-terion for the Davidson method is satisfied after n inneriterations, the n eigensolutions are available. Third, rely-ing on these solutions the modified force f † is obtainedform (6), (10), and (11). Forth, a tentative step vector

∆x ′ proportional to f † is so adjusted that ∥∆x ′∥2 isequal to a prescribed value. When ϵ1 is positive, ∆x ′ isaccepted as the established step vector ∆x. Otherwise,fifth, f † is evaluated at x+∆x ′ with the eigensolutionsnot recalculated. Sixth, a linearized modified force

ζ f †∣∣∣x+∆x ′

+ (1− ζ) f †∣∣∣x

(14)

is minimized in a least-square sense with respect to ζ. Inother words, one-dimensional search is performed once.Seventh, ζ∆x ′ is accepted as the established step vector∆x. The force is evaluated n+1 or n+2 times per singlecycle of the outer loop.If (10) and (11) are replaced with (3) and (4) at the

third step, respectively, the proposed approach is re-duced to a conventional one which resembles the MWand HJ methods.

3. Test calculation

While all the available eigensolutions are exploited inthe proposed approach, since only the lowest is targeted,the higher are not likely to be very accurate. This lackof accuracy may hinder convergence. Therefore test cal-culation is performed to confirm whether the proposedapproach is actually more efficient or not than the con-ventional one.The system considered here is an InP supercell of 64

atoms with an interstitial Zn atom. Initially the Zn atomis placed at a tetrahedral site surrounded by four Inatoms and displaced slightly toward a near-by hexago-nal site. The force f is evaluated by plane-wave, pseu-dopotential formalism [10, 11] within density-functionaltheory [12, 13]. The saddle point is taken to be found

when ϵ1 is negative and ∥f∥∞ falls within 4 × 10−11N.The inner Davidson loop is terminated either when theiteration count exceeds five or when 2-norm of the resid-ual vector associated with v1 is smaller than one tenthof ∥f∥2. η and ∥∆x ′∥2 are set to be 5 × 10−13m and5× 10−12m, respectively. As already suggested, the con-ventional approach is implemented in the present studyby simply using (3) and (4) instead of (10) and (11). Re-maining technical details of the formalism are explainedelsewhere [14,15].The runs with both the proposed and conventional ap-

proaches converged within numerical error to the samesaddle point, where the Zn interstitial atom was locatedat the hexagonal site. The results ate summarized in Ta-ble 1. As expected in the previous section, the proposedapproach required the fewer number of times both of theouter iteration and of the force evaluation than the con-

– 18 –

JSIAM Letters Vol. 3 (2011) pp.17–19 Akitaka Sawamura

Table 1. The number of times of outer iteration and force eval-

uation required to find a saddle point with proposed and con-ventional approaches.

ApproachNumber of times

Outer iteration Force evaluation

Proposed 18 124

Conventional 24 178

0.01

0.1

1

10

0 5 10 15 20 25

Iteration of the outer loop

Fo

rce

( × 1

0 N

)−9

Fig. 1. Comparison of ∥f∥∞ versus the number of times of theouter iteration. Solid and dashed lines indicate the results of theproposed and conventional approaches.

ventional one. The convergence history is shown in Fig.1. With the proposed approach the force decreases notmerely faster but also as smoothly after the third outeriteration. This means the efficiency and stability of theproposed approach.

4. Summary

The new eigenvector-following approach without theHessian matrix is presented. In the proposed approach,multiple eigensolutions obtained employing the David-son method are used together with the force to calculatethe steps leading toward a saddle point. Faster conver-gence is expected than using the conventional one, whichrelies merely on the lowest eigensolution. This may notthe case, however, because the higher eigensolutions, nottargeted in the Davidson iteration, are less accurate ingeneral. For comparison the approaches have been testedon a model system involving diffusion of a Zn intersti-tial atom in an InP supercell. The test calculation hasconfirmed that the proposed approach is the preferredchoice because of the fewer number both of the steps andof the force evaluations. The next problem is to find opti-mal parameter settings that could be adopted by others.

References

[1] H. B. Schlegel, Exploring potential energy surface for chemi-

cal reactions: an overview of some practical methods, J. Com-put. Chem., 24 (2003), 1514–1527.

[2] G. M. Crippen and H. A. Scheraga, Mimimization of polypep-tide energy XI.The method of gentlest ascent, Arch.Biochem.

Biophys., 144 (1971), 462–466.

[3] C. J. Cerjan and W. H. Miller, On finding transition states,J. Chem. Phys., 75 (1981), 2800–2806.

[4] L. J. Munro and D. J. Wales, Defect migration in crystallinesilicon, Phys. Rev. B, 59 (1999), 3969–3980.

[5] W. W. Bradbury and R. Fletcher, New iterative methods forsolution of the eigenproblem, Numer. Math., 9 (1966), 259–

267.[6] E. R. Davidson, The iterative calculation of a few of the low-

est eigenvalues and corresponding eigenvectors of large real-symmetric matrices, J. Comput. Phys., 17 (1975), 87–94.

[7] A. Heyden, A. T. Bell and F. J. Keil, Efficient methods forfinding transition states in chemical reactions: Comparisonof improved dimer method and partitioned rational functionoptimization method, J. Chem. Phys., 123 (2005), 224010.

[8] R. A. Olsen, G. J. Kroes, G. Henkelman, A. Arnaldson andH. Jonsson, Comparison of methods for finding saddle pointswithout knowledge of the final states, J. Chem. Phys., 121(2004), 9776–9792.

[9] G. Henkelman and H. Jonsson, A dimer method for findingsaddle points on high dimensional potential surface using onlyfirst derivatives, J. Chem. Phys., 111 (1999), 7010–7022.

[10] J. Ihm, A. Zunger and M. L. Cohen, Momentum-space for-malism for the total energy of solids, J. Phys. C: Solid StatePhys., 12 (1979), 4409–4422.

[11] W. E. Pickett, Pseudopotential methods in condensed matter

applications, Comput. Phys. Rep., 9 (1989), 115–197.[12] P. Hohenberg and W. Kohn, Inhomogeneous electron gas,

Phys. Rev., 136 (1964), B864–B871.[13] W. Kohn and L. J. Sham, Self-consistent equations includ-

ing exchange and correlation effects, Phys. Rev., 140 (1965),A1133–A1138.

[14] A. Sawamura, Reformulation of the Anderson method usingsingular value decomposition for stable convergence in self-

consistent calculations, JSIAM Letters, 1 (2009), 32–35.[15] A. Sawamura, M. Kohyama and T. Keishi, An efficient pre-

conditioning scheme for plane-wave-based electronic struc-ture calculations, Comput. Mater. Sci., 14 (1999), 4–7.

– 19 –


On rounding off quotas to the nearest integers

in the problem of apportionment

Tetsuo Ichimori1

1 Department of Information Systems, Osaka Institute of Technology, 1-79-1 Kitayama, HirakataCity, Osaka 573-0196, Japan

E-mail ichimori is.oit.ac.jp

Received September 16, 2010, Accepted January 16, 2011

Abstract

Simulations are performed in order to make comparisons among five methods of U.S. Con-gressional apportionment. Specifically, the probability is estimated under each method ofapportionment that the number of Representatives allocated to a state is equal to the numberobtained by rounding off the quota of that state to the nearest integer. According to theWebster method, numerical evidence shows that the probability is 97.6 percent on average.

Keywords apportionment, rounding, optimization

Research Activity Group Mathematical Politics

1. Introduction

The U.S. Constitution requires that “Representativesshall be apportioned among the several States accordingto their respective numbers, counting the whole numberof persons in each State” (see U.S. Constitution, Art. 1,Sec. 2, Amend. 14, Sec. 2). Because each state must berepresented by a whole number of Representatives, it isalmost impossible to carry out the requirement exactly.In fact, the U.S. Supreme Court (in the case of UnitedStates Department of Commerce v. Montana, 503 U.S.442(1992)) admits this fact. The issue of apportioningRepresentatives among the several states constitution-ally has been debated for over 200 years.Mathematically speaking, let s denote the number of

states, h the total number of seats to be apportioned orthe house size, p = (p1, . . . , ps) the population of the sstates where pi is a positive integer for each i. In the the-oretically perfect apportionment, the proportional share,namely, the quota of state i is qi = hpi/p

∗ where p∗ isthe total population of the country, i.e., p∗ =

∑j pj .

Let a = (a1, . . . , as) ≥ 0 be a vector of non-negativeintegers, then the vector a is called an apportionmentof h if

∑i ai = h. Then, carrying out the constitutional

requirement exactly means to achieve the mathematicalequality a = q where q = (q1, . . . , qs) is the vector ofquotas. Undoubtedly it is virtually impossible.One of the most natural methods of apportioning Rep-

resentatives among the states might be rounding offthe quotas in the usual way. Mathematically, this im-plies that ai must be [qi]0.5 for all i’s and the equality∑

i ai = h must be achieved, where [z]0.5 is an integerobtained by rounding off z in the usual way, namely,[z]0.5 is the nearest whole number to z. If the fractionalpart of z is exactly 0.5, then [z]0.5 can be either of twoconsecutive integers z − 0.5 and z + 0.5.On the other hand, the probability that such a

rounding-off method can produce an apportionment of

just h would be very low. Generally, such a rounding-off method would produce an apportionment of anotherhouse size h′ = h. Conversely speaking, a method pro-ducing an apportionment of just h does not generallygive an apportionment satisfying ai = [qi]0.5 for all i’s.Nevertheless, it can be much expected that any reason-able method producing an apportionment of just h willgive an apportionment satisfying ai = [qi]0.5 for almostall i’s.The purpose of this article is to identify a method of

apportionment which can produce an apportionment ofh satisfying ai = [qi]0.5 for as many states i’s as possiblebecause such a method of apportionment seems to bemost natural.

2. The Hamilton method and the Al-

abama paradox

Because the constitutional requirement that the num-ber of Representatives to which each state is entitledshall be proportional to the population of that state can-not be met completely, it might be reasonable to seek anapportionment a which is as close to the vector of quotasq = (q1, . . . , qs) as practicable.In fact, the method given by the first apportionment

bill passed by Congress in 1792 minimizes the distancebetween these two vectors a and q, i.e., ∥a − q∥, orminimizes

s∑i=1

(ai − qi

)2s.t.

s∑i=1

ai = h and ai ∈ N for all i’s,

where N denotes the set of non-negative integers and“s.t.” is an abbreviation for “subject to.” At that time,the values of s = 15 and h = 120 were used. Althoughthis bill was vetoed by President Washington, the ap-portionment is appealing because exactly the same ap-portionment results if the quotas are rounded off in theusual way, see Table 1. If an apportionment method

– 21 –

JSIAM Letters Vol. 3 (2011) pp.21–24 Tetsuo Ichimori

Table 1. First apportionment bill in 1792, extracted from [1].

State Population Quota Apportionment

Virginia 630,560 20.926 21Massachusetts 475,327 15.744 16

Pennsylvania 432,879 14.366 14North Carolina 353,523 11.732 12New York 331,589 11.004 11

Maryland 278,514 9.243 9Connecticut 236,841 7.860 8South Carolina 206,236 6.844 7New Jersey 179,570 5.959 6

New Hampshire 141,822 4.707 5Vermont 85,533 2.839 3Georgia 70,835 2.351 2Kentucky 68,705 2.280 2

Rhode Island 68,446 2.271 2Delaware 55,540 1.843 2

Totals 3,615,920 120.000 120

satisfies such a property, then it will be said that themethod satisfies the “rounding-off constraints.” If not, itwill be said that it violates the rounding-off constraints.It is clear that rounding off quotas in the usual way

does not always yield an apportionment of h. For exam-ple, let there be three states (s = 3) whose populationsare p1 = 235, p2 = 333 and p3 = 432. When the to-tal number of representatives is h = 10, the quotas ofthe three states are q1 = 2.35, q2 = 3.33 and q3 = 4.32.Rounding off the quotas in the usual way yields a vec-tor a = (2, 3, 4) which is an apportionment of only ninerepresentatives. That is, one representative is surplus. Ifthe total number of representatives increases by one, i.e.,h = 11, then the quotas of the three states change intoq1 = 2.585, q2 = 3.663 and q3 = 4.752. Rounding themoff as before gives another vector a = (3, 4, 5). This time,an apportionment of as many as twelve representativesresults and one representative is short because there areno more than eleven representatives.In order to overcome this difficulty, Alexander Hamil-

ton invented an apportionment method and severaldecades later Samuel Vinton reinvented it which is todayreferred to as the “Hamilton method” or the “methodof greatest remainders.” In fact, the Hamilton methodyields the same seat distribution to 15 states as thatof the first apportionment bill passed by Congress in1792. Although, as said above, this method produces anapportionment which minimizes

∑si=1(ai − qi)2 subject

to the constraints given above, another explanation forthis method is more familiar: Each state i receives thenumber of Representatives corresponding to the wholenumber of the quota qi, that is, the number obtained byignoring the fractional remainders. The remaining Rep-resentatives are distributed to the states with the largestfractional remainders.Unfortunately, the Hamilton method is subject to the

so-called “Alabama paradox.” The first numerical exam-ple gives an apportionment a = (3, 3, 4) of ten Repre-sentatives under the Hamilton method while the secondone an apportionment a = (2, 4, 5) of eleven Represen-tatives. Then, the first state gets three Representativeswhen the house size is h = 10, while it gets only twoRepresentatives when the house size increases by one,

i.e., h = 11. This peculiar phenomenon is known as theAlabama paradox because this phenomenon occurred inthe State of Alabama. Although the Hamilton methodhad been used under the censuses of 1850 through 1900,Congress rejected it in 1911 because of this paradox.

3. Methods of apportionment

After the Hamilton method was rejected, Congress re-turned to the Webster method which had been used afterthe 1840 census. After debating over the proper methodof apportionment for several decades, Congress adoptedthe Hill method in 1941 and it has been used ever sincethen. The methods of Webster and Hill come under so-called “divisor methods” which can avoid the Alabamaparadox, see [1] for the details of other paradoxes. There-fore, the scope of the debate over the proper method ofapportionment may be reduced mainly to the methodsof Webster and Hill. In what follows, divisor methodsare described shortly.

3.1 Divisor methods

Define a real valued function d(a) on the non-negativeintegers a ≥ 0. The function d(a) is strictly increasingin a. It satisfies a ≤ d(a) ≤ a+ 1 and moreover satisfiesd(b) = b and d(c) = c + 1 for no pair of integers b ≥ 1and c ≥ 0.Let z be a positive real number and [z] denote an

integer satisfying the following. (i) For d with d(0) = 0: Ifd(a−1) < z < d(a) for some positive integer a ≥ 1, then[z] = a. If z = d(a) for some positive integer a ≥ 1, then[z] = a − 1 or a. (ii) For d with d(0) > 0: Additionally,define d(−1) = 0. If d(a − 1) < z < d(a) for some non-negative integer a ≥ 0, then [z] = a. If z = d(a) for somenon-negative integer a ≥ 0, then [z] = a or a+ 1.Next introduce a divisor method M and a divisor x >

0. A divisor x means that each Representative is givenan approximate constituency of x persons. If the equality∑s

i=1[pi/x] = h is achieved for some divisor x > 0, thenthe number of Representatives which state i receives isai = [pi/x] where pi/x is referred to as the “quotient”of state i. Given p, h ≥ s, and d(a), a divisor methodM is defined as the set of apportionments

a : ai =[pix

]and

s∑i=1

[pix

]= h for some x > 0

.

Although there can be innumerable divisor methods,the following methods are known as the “five historicalmethods” and have received special treatment for a longtime:

• the Adams method with d(a) = a,

• the Dean method with d(a) = a(a+ 1)/(a+ 0.5),

• the Hill method with d(a) =√a(a+ 1),

• the Webster method with d(a) = a+ 0.5,

• the Jefferson method with d(a+ 1) = a+ 1.

3.2 Relaxedly proportional methods

In the history of U.S. apportionment, the Jeffersonmethod was used after each of the first four censusesand was abandoned by Congress because it tends to fa-vor large states over small states. The Adams method

– 22 –


was considered by Congress but it was not adopted be-cause it tends to favor small states over large states incontrast to the Jefferson method. The Dean method hasnever be used by Congress in the history of apportion-ment. Recently, this author has developed a class of “re-laxedly proportional” methods, see [2] for the details.He explains why these three methods, i.e., the methodsof Adams, Dean and Jefferson, produce apportionmentswhich are not proportional to the population of statesin some sense.Now consider the following minimization problem:

s∑i=1

a2ipi

s.t.s∑

i=1

ai = h and ai ∈ N for all i’s.

Let W denote the set of all optimal solutions a = (a1,. . . , as), then it is well known thatW defines the Webstermethod, see [1]. In other words, any apportionment ofh produced by the Webster method minimizes

∑i a

2i /pi

subject to∑

i ai = h and ai ∈ N for all i’s, while anyoptimal solution a = (a1, . . . , as) to the minimizationproblem above is an apportionment of h under the Web-ster method. Here it should be noticed that the Webstermethod which is a divisor method can be also defined asa discrete optimization problem.Next consider its continuous relaxation minimizing

s∑i=1

a2ipi

s.t.

s∑i=1

ai = h and ai ∈ R+ for all i’s,

where R+ denotes the set of positive real numbers. Thenit is clear that there can exist some positive λ > 0 suchthat (a2i /pi)

′ = 2(ai/pi) = λ for all i’s at optimality,which means that ai is proportional to pi for all i’s atoptimality. In other words, ai = (λ/2)pi = (h/p)pi = qiat optimality. Then, the Webster method is said to berelaxedly proportional. In general, if an apportionmentmethod can be described in the form of discrete opti-mization and its continuous relaxation has an optimalsolution identical to the vector of quotas, i.e., a = q,then the method is relaxedly proportional.Similarly, the Hill method is obtained by minimizing

s∑i=1

p2iai

s.t.s∑

i=1

ai = h and ai ∈ N+ for all i’s,

where N+ denotes the set of positive integers. This au-thor proposed to use the following three relaxedly pro-portional methods instead of the methods of Adams,Dean and Jefferson which were shown not to be relaxedlyproportional, see [2]:

• the Theil-Schrage (T&S for short) method with d(0)= 0 and d(a) = 1/ log((a + 1)/a) for all integersa ≥ 1, which is obtained by maximizing

s∑i=1

pi log ai s.t.

s∑i=1

ai = h and ai ∈ N+ for all i’s.

• The Theil method with d(0) = 1/e ≈ 0.37 andd(a) = (1/e)(a + 1)a+1/aa for all integers a ≥ 1,

Table 2. Expected numbers of states violating the rounding-offconstraints according to the 2000 through 1960 censuses.

Hill T&S Theil Webster “1/3”

2000 2.014 1.669 1.377 1.213 1.4091990 2.049 1.700 1.389 1.212 1.4191980 2.089 1.724 1.404 1.210 1.415

1970 2.216 1.743 1.416 1.213 1.4391960 2.317 1.867 1.469 1.212 1.409

means 2.137 1.741 1.411 1.212 1.418

which is obtained by minimizing

s∑i=1

ai logaipi

s.t.

s∑i=1


• The “1/3” method with d(a) =√a2 + a+ 1/3 for

all integers a ≥ 0, which is obtained by minimizing

s∑i=1

a3ip2i

s.t.s∑

i=1


The “new five” are defined to be the methods of Hill,T&S, Theil, Webster and “1/3.” They are not only di-visor methods but also relaxedly proportional. See [3,4]for the ancestors of the methods of Theil and T&S.

4. Violating the rounding-off constraints

The purpose of this section is to study on averagehow many out of the s states violate their rounding-offconstraints, i.e., |ai−qi| ≤ 0.5, under apportionments ofh produced according to the new five methods.First, according to the 2000 census fix an apportion-

ment of the 435 Representatives among the 50 statesproduced by each of the new five methods. Note herethat the Hill method produces the existing apportion-ment according to the 2000 census.Let method M define an apportionment a = a(M)

and a divisor x = x(a(M)). Let random Pi be uniformlydistributed on the interval

d(ai(M)− 1)x(a(M)) ≤ Pi ≤ d(ai(M))x(a(M)),

then the apportionment method M gives the same ap-portionment a(M) for the population P1, . . . , Ps as forthe actual population according to the 2000 census.To avoid the unrealistic assumption of very small

states, assume in estimating the total number of statesviolating the rounding-off constrains that no state’s quo-tient is less than 0.5. In other words, the random popula-tion of each state is assumed to be uniformly distributedon the interval

max0.5, d(ai(M))x(a(M)) ≤ Pi ≤ d(ai(M))x(a(M)).

One million instances are generated for each of thenew five methods. Then the average number of statesout of the 50 states which violate the rounding-off con-strains is estimated for each method. In addition, thesame simulation is run for each of the 1990 through 1960censuses, see Table 2.The results of these simulations show that the average

number of states whose numbers of Representatives areai = [qi]0.5 under the Webster method is about 48.8,

– 23 –


Table 3. Expected numbers of states violating the rounding-offconstraints according to the 1950 through 1920 censuses.


1950 2.138 1.751 1.408 1.177 1.3311940 2.134 1.750 1.406 1.177 1.3301930 1.980 1.651 1.358 1.176 1.361

1920 2.010 1.666 1.316 1.178 1.349

means 2.066 1.705 1.372 1.177 1.34350/48 times 2.151 1.776 1.429 1.226 1.399

Table 4. Expected numbers of states violating the modifiedrounding-off constraints according to the 2000 through 1960 cen-suses.


2000 1.837 1.585 1.374 1.255 1.4831990 1.861 1.603 1.380 1.257 1.4961980 1.914 1.627 1.381 1.225 1.494

1970 2.020 1.638 1.401 1.284 1.5511960 2.133 1.755 1.428 1.229 1.461

means 1.953 1.642 1.393 1.250 1.497

Table 5. Expected numbers of states violating the modifiedrounding-off constraints according to the 1950 through 1920 cen-suses.


1950 1.993 1.661 1.366 1.164 1.3411940 1.990 1.659 1.364 1.165 1.4001930 1.809 1.556 1.336 1.195 1.4101920 1.883 1.587 1.302 1.196 1.395

means 1.919 1.616 1.342 1.180 1.38750/48 times 1.999 1.683 1.398 1.229 1.444

while that under the Hill method is about 47.9.Next, according to the 1950 through 1920 censuses,

fix an apportionment of the 435 Representatives amongthe 48 states produced by each of the new five meth-ods. Note that Alaska and Hawaii became the 49th and50th states of the United States in 1959 respectively. Thesame procedure is repeated, see Table 3. For easy com-parison with Table 2, each entry on the last line showsthe value of the respective entry on the second last linemultiplied by 50/48. Here appear the numbers similarto those in Table 2.The U.S. Constitution also requires that “each State

shall have at least one Representative” (see U.S. Con-stitution, Art. 1, Sec. 2). Since this requirement favorsextremely small states, it might be better to modify thequota qi = hpi/p

∗ of each state i or to change it intoqi = max1, θqi where θ satisfies

s∑i=1

max1, θqi = h.

In other words, the quotas of all states are reduced pro-portionally but never reduced to less than one. If thequota qi is replaced by the modified one qi for each statei, then the rounding-off constraint |ai− qi| ≤ 0.5 shouldbe replaced by the modified rounding-off constraint, i.e.,|ai − qi| ≤ 0.5.Simulations are performed according to this modifica-

tion. Tables 4 and 5 present the simulation results. Theyshow that expected numbers of states violating the con-

straints under the methods of Hill, T&S and Theil de-crease slightly and those under the methods of Websterand “1/3” increase almost as much. Therefore the differ-ence between the methods of Hill and Webster shrinksa little.

5. Conclusions

Generally admitted, the debate over the proper meth-od of apportionment narrows down to which is better,Webster’s or Hill’s method. Using the numerical resultsof Table 2, the probability that one state satisfies itsrounding-off constraint under the method of Webster isabout 97.58% while that under the method of Hill isabout 95.73%. The Webster method wins by only 1.85points.In this article, the rounding-off constraints are pro-

posed to identify which method is superior to the oth-ers. Although this identification is limited to the new fivemethods (the methods of Hill, T&S, Theil, Webster and“1/3”), they include the leading two methods, namely,Webster’ and Hill’s methods, and satisfy the most tellingproperties in the apportionment problem, see [2,5], andhence the limitation seems to be reasonable.In the end, the Webster method turned out to pro-

duce almost the same apportionment as that obtainedby rounding off all the quotas of the states in the usualway. This is one of the most important properties whicha proper apportionment method should have. From thisstandpoint we can say that the Webster method is betterthan any other method discussed in this article.

References

[1] M. L. Balinski and H. P. Young, Fair Representation, YaleUniv. Press, New Haven, 1982.

[2] T. Ichimori, New apportionment methods and their quota

property, JSIAM Letters, 2 (2010), 33–36.[3] H. Theil, The desired political entropy, Amer. Polit. Sci. Rev.,

63 (1969), 521–525.[4] H. Theil and L. Schrage, The apportionment problem and the

European Parliament, Eur. Econ. Rev., 9 (1977), 247–263.[5] M. L. Balinski, The problem with apportionment, J. Oper.

Res. Soc. Jpn, 36 (1993), 134–148.

– 24 –


Traveling wave solutions to the nonlinear evolution

equation for the risk preference

Naoyuki Ishimura1 and Sakkakom Maneenop1

1 Graduate School of Economics, Hitotsubashi University, Kunitachi, Tokyo 186-8601, Japan

E-mail ishimura econ.hit-u.ac.jp, ed101005 g.hit-u.ac.jp

Received October 6, 2010, Accepted January 29, 2011

Abstract

A singular nonlinear partial differential equation (PDE) is introduced, which can be inter-preted as the evolution of the risk preference in the optimal investment problem under therandom risk process. The unknown quantity is related to the Arrow-Pratt coefficient of relativerisk aversion with respect to the optimal value function. We show the existence of monotonetraveling wave solutions and the nonexistence of non-monotone such solutions, which aresuitable from the standpoint of financial economics.

Keywords optimal economic behavior, Arrow-Pratt coefficient of relative risk aversion, riskpreference, singular nonlinear partial differential equation, traveling wave solu-tions

Research Activity Group Mathematical Finance

1. Introduction

In this article we propose a singular nonlinear partialdifferential equation (PDE) which is derived from theHamilton-Jacobi-Bellman (HJB) equation for the valuefunction in the optimal investment problem. We recallthat optimal behavior within continuous time economicsenvironments has been an intensive area of research andthat many models have already been introduced withinthe stochastic control framework. The analysis is thenoften reduced to the treatment of the HJB equation forthe value function. However, the HJB equation is typ-ically fully nonlinear and hard to solve; it may not bean exaggeration to say that all that we can do is merelyguess a shape of solution and manage to arrange theparameters. See for instance [1].We here propose a different approach and derive a

singular quasilinear PDE from the HJB equation. Al-though essential difficulties are equivalent to those ex-pressed by the HJB equation, the derived PDE is rathersimple looking when viewed from the theory of nonlinearPDE. Moreover, the unknown quantity is related to theArrow-Pratt coefficient of relative risk aversion [2] withrespect to the optimal value function. In this sense ourPDE may be interpreted as the characteristic equationfor the risk structure of the model. We do not insist thatour PDE would replace the HJB itself, but we at leastbelieve that the study of this PDE is interesting, as wellas important.The equation is related to our previous work [3, 4],

which is concerned with the evolution of the risk prefer-ence whose unknown quantity is related to the Arrow-Pratt coefficient of the “absolute” risk aversion. The cur-rent equation is formulated on the “relative” risk aver-sion, which is much more popular in financial economics.The main purpose of this article is to prove the exis-

tence of monotone traveling wave solution to this PDE.The solutions can be interpreted positively from theviewpoint of financial economics. In addition, we showthe nonexistence of non-monotone traveling wave so-lutions, by which we refer to those whose derivativechanges sign several times. This observation is also wel-come as a financial concept.We here perform an analytical study. A numerical

investigation, in particular for the monotone travelingwave solution, is attempted in [5]. See also [6–9].The organization of the paper is as follows. In Section

2 we recall the model and introduce our PDE. Sections3 and 4 are devoted to proving the existence of mono-tone traveling wave solutions and the nonexistence ofnon-monotone traveling wave solutions, respectively. Weconclude with discussions in Section 5.

2. Model

Here we briefly review our model. Suppose that thewealth Xt at time t (≥ 0) of the company is subjectto a fluctuating process, and the company wants to in-vest in one risky stock. We assume that the price Pt

of the stock available for investment is governed by thestochastic differential equation of Black-Scholes-Merton

type [10, 11] dPt = Pt(µdt + σdW(1)t ), where µ and σ

are constants and W (1)t t≥0 is a standard Brownian

motion. The fluctuating process, which directly affectsthe wealth of the company, is denoted by Yt, and is as-

sumed to evolve as dYt = αdt+ βdW(2)t , where α and β

(β > 0) are constants and W (2)t t≥0 is another standard

Brownian motion. It is allowed that these two Brownianmotions be correlated with the correlation coefficient ρ(0 ≤ |ρ| < 1).The investment policy f = ft0≤t≤T of the company

is a suitable admissible adapted control process. Here T

– 25 –

JSIAM Letters Vol. 3 (2011) pp.25–28 Naoyuki Ishimura et al.

stands for the maturity date. The stochastic process ofthe wealth Xf

t of the company is then assumed to beexpressed as

dXft

Xft

= ftdPt

Pt+ dYt

= (ftµ+ α)dt+ ftσdW(1)t + βdW

(2)t ,

Xf0 = x ∈ R.

Suppose that the company aims to maximize the util-ity u(x) from his terminal wealth. The utility func-tion u(x) is customarily assumed to satisfy u′ > 0 andu′′ < 0. Let

V (x, t) := supfE[u(Xf

T ) | Xft = x]. (1)

Now the Hamilton-Jacobi-Bellman equation for thevalue function (1) becomes

supfAfV (x, t) = 0, V (x, T ) = u(x), (2)

where the generator Af is given by

(Afg)(x, t) :=∂g

∂t+ (fµ+ α)x

∂g

∂x

+1

2(f2σ2 + β2 + 2βσρf)x2

∂2g

∂x2.

Suppose that (2) has a classical solution V with∂V/∂x > 0, ∂2V/∂x2 < 0. We then discover that theoptimal policy f∗t 0≤t≤T is

f∗t = − µ

σ2

∂V

∂x

x∂2V

∂x2

− βρ

σ. (3)

Placing (3) back into (2) we obtain

0 =∂V

∂t+

(α− βρµ

σ

)x∂V

∂x− µ2

2σ2

(∂V

∂x

)2

∂2V

∂x2

+1

2β2(1− ρ2)x2 ∂

2V

∂x2for 0 < t < T,

V (T, x) = u(x).

(4)

Let τ := 2(1 − ρ2)−1β−2(T − t) and put V (x, τ) =V (x, t) by abuse of notation, we find that

∂V

∂τ= x2

∂2V

∂x2− a2

(∂V

∂x

)2

∂2V

∂x2

− bx∂V∂x

,

V (x, 0) = u(x),

(5)

where we have set

a2 :=µ2

(1− ρ2)β2σ2, b :=

2(ρµβ − ασ)(1− ρ2)β2σ

.

Eq. (5) is fully nonlinear parabolic type [8].

Now we define

r(x, τ) := −x∂2V

∂x2

∂V

∂x

= −x ∂

∂xlog

∣∣∣∣∂V∂x (x, τ)

∣∣∣∣, (6)

which extends the Arrow-Pratt coefficient of relative riskaversion for the utility function. Here we note that r isintroduced with respect to the optimal value function.A similar transformation is considered in [12], where thetransformation −Vx/Vxx is employed.Following [13], we make a change of variables x = ey

(y = log x) and put r(y, τ) = r(x, τ); we infer that

∂r

∂τ=

(∂2

∂y2+

∂

∂y

)(r − a2

r

)− (2r + b)

∂r

∂y

for −∞ < y <∞, τ > 0. (7)

In the following two sections, we prove the existence ofmonotone traveling wave solutions and the nonexistenceof non-monotone solutions to (7).

3. Monotone traveling wave solution

For a standard risk averse investor, the coefficient ofrelative risk aversion is expected to be non-increasing[14]. In addition, it is easy to see that every constantfunction verifies (7). We thus wish to seek a travelingwave solution r = r(y − vτ) with the property

r′(y) < 0 for −∞ < y <∞,

r(y)→ r− as y → −∞,

and r(y)→ r+ as y →∞,

(8)

where r− > r+ > 0 are prescribed constants and thewave speed v ∈ R should be determined later on.Putting r(y, τ) = r(y − τv) into (7), we derive the

ordinary differential equation

−vr′ =(r − a2

r

)′′

+

(r − a2

r

)′

− (r2 + br)′, (9)

where r = r(y) and ′ = d/dy. Integrating once, we obtain(r − a2

r

)′

+ r − a2

r− r2 − br + vr = C. (10)

Here C denotes a constant, and from the boundary con-dition (8) we deduce that

C = r−r+ −r− + r+r−r+

a2,

v = r− + r+ −a2

r−r++ b− 1.

(11)

Eq. (10) can be written in the separable form

r2 + a2

r[r3 + (b− v − 1)r2 + Cr + a2]dr = dy. (12)

We define f(r) := r3 − (v+1− b)r2 +Cr+ a2, which isthe factor of the denominator in (12). The condition (11)implies that f(r−) = f(r+) = 0. Since f(0) = a2 > 0,the solution r for (10) which fulfills (8) is constructedimplicitly through the integration of (12) on the interval

– 26 –


r ∈ (r+, r−) and ∞ > y > −∞, provided the prescribedconstants r− > r+ (> 0) are realized as positive realnumbers.We examine such criterion. Taking account of f ′(r) =

3r2 − 2(v + 1− b)r + C, we learn that they are

( i ) (v + 1− b)2 − 3C > 0,

( ii ) v + 1− b > 0,

(iii) f

(v + 1− b+

√(v + 1− b)2 − 3C

3

)< 0.

In view of (11), condition ( i ) is reduced to

r2− − r−r+ + r2+ + a2(r− + r+r−r+

+a2

r2−r2+

)> 0,

which is true for r− > r+ > 0. Condition ( ii ) results inr− + r+ − a2r−1

− r−1+ > 0, which should be imposed be-

forehand. Finally, condition (iii) becomes, after a tediouscalculation,

8a4 < 2(r− + r+)(r2− + r−r+ + r2+)a

2

+(r− + r+)

2

r2−r2+

(r2− + r2+)a4

+2(r− + r+)

r3−r3+

(r− − r+)2a6

+(r− − r+)2

r4−r4+

a8 + r2−r2+(r− − r+)2.

By virtue that r−2− r−2

+ (r− + r+)2(r2− + r2+) ≥ 8, this

requirement is always satisfied.To summarize, we have completed the proof of the

next theorem.

Theorem 1 For any r− > r+ > 0 satisfying r−r+(r−+r+) > a2, there exists a traveling wave solution r = r(y−vτ) to (7) with v = r− + r+ − r−1

− r−1+ a2 + b − 1 such

that

r′(y) < 0 for −∞ < y <∞,

and r(y)→ r± as y → ±∞, respectively.

4. Nonexistence of non-monotone travel-

ing wave solutions

In this section we make an elementary observationthat there exists no non-monotone traveling wave solu-tions to (7). Here, by non-monotone solution, we meana traveling wave solution whose derivative changes signseveral times. As examples, the solution r = r(y) to (9)with r′(y) > 0 on −∞ < y < l0 and r′(y) < 0 onl0 < y < ∞ for some l0 ∈ R is referred to as a “one-pulse” solution; r = r(y) with r′(y) < 0 on−∞ < y < l0,l2i−1 < y < l2i (i = 1, 2, . . . ,m) and r′(y) > 0 onl2i < y < l2i+1 (i = 0, 1, . . . ,m − 1), l2m < y < ∞ forsome −∞ < l0 < l1 < · · · < l2i−1 < l2i < · · · < l2m <∞is referred to as an “(m+1)-bump” solution. We remarkthat the similar nonexistence result holds true for solu-tions whose derivative changes sign an even number oftimes.The proof proceeds as follows. Suppose the solution

r = r(y) to (9) changes the sign of its derivative and let

r′(l0) = 0 for some l0 ∈ R. We know that the ODE (9)is equivalent to the first order system

d

dy

(rr′

)=

r′[(2r + b− v)

(1 +

a2

r2

)−1

−1 + 2a2

r3

(1 +

a2

r2

)−1

r′

]r′

.

Since this system is regular at (r(l0), r′(l0)) = (r(l0), 0)

and r(y) ≡ r(l0) solves the system, we conclude thatthe solution r = r(y) should be the constant function,thanks to the uniqueness theorem of ODE. This is acontradiction and we obtain the next theorem.

Theorem 2 There exists no traveling wave solutionr = r(y − vτ) (v ∈ R) to (7) such that r′ changes sign.

5. Discussions

We have introduced a singular quasilinear parabolicequation for the risk preference. The unknown func-tion is related to the coefficient of relative risk aversionwith respect to the value function in the optimal invest-ment problem. We established the existence of monotonetraveling wave solutions and the nonexistence of non-monotone traveling wave solutions. Since the coefficientof relative risk aversion is claimed to be nonincreasing,our existence theorem of monotone solutions, as well asthe nonexistence theorem of non-monotone solutions, iswelcome in the standpoint of financial economics.The nonexistence theorem of non-monotone solutions

perfectly corresponds with the economic theory whereclearly the coefficient of risk aversion is always nonnega-tive. Despite resulting in nonnegative wave solutions, theexistence of monotone solutions, however, casts doubt onwhat happens in the markets. From the traveling wavesolution, as the maturity gets closer, the solution will de-crease. This means a company (or an individual) is lessrisk averse (recall that such a solution is determined asthe coefficient of relative risk aversion). We can infer thatthe company is less risk averse in short-term investmentand more risk averse in long-term investment. This is,however, counterintuitive from the general case, whereit should be the opposite. In brief, long-term investorstend to be less risk averse than short-term investors asannualized volatilities of returns on some assets are lowerin the longer term.Nevertheless, we may interpret this counterintuitive

property as a special case. For example, when an econ-omy has been stable for a long period or is in the recov-ering process from its trough, it seems that an individualor a company will be much cautious about its investmentstrategy in longer term. The company, thus, is less riskaverse for short-term investment (when it predicts thatmarkets are stable) and more risk averse for long terminvestment (when it forecasts that markets will be morevolatile).Also, as to the derived equation (7) itself, there cer-

tainly exist many remaining open questions. For in-stance, a general existence theorem is an interestingproblem, which is worth further research.

– 27 –


Acknowledgments

We are grateful to the referee for various precious com-ments, which help in improving the manuscript. The firstautor (NI) is partially supported by Grant-in-Aid for Sci-entific Research (C) No. 21540117 from Japan Societyfor the Promotion of Science (JSPS).

References

[1] T. Bjork, Arbitrage Theory in Continuous Time, 2nd ed.,

Oxford Univ. Press, Oxford, 2004.[2] J.W.Pratt, Risk aversion in the small and in the large, Econo-

metrica, 32 (1964), 122–136.[3] R. Abe and N. Ishimura, Existence of solutions for the nonlin-

ear partial differential equation arising in the optimal invest-ment problem, Proc. Jpn Acad., Ser. A., 84 (2008), 11–14.

[4] N. Ishimura and K. Murao, Nonlinear evolution equations forthe risk preference in the optimal investment problem, Pa-

per presented at AsianFA/NFA 2008 Int. Conf. in Yokohama,http://fs.ics.hit-u.ac.jp/nfa-net/.

[5] N. Ishimura, M. N. Koleva and L. G. Vulkov, Numerical solu-

tion of a nonlinear evolution equation for the risk preference,Lect. Notes Comp. Sci., Vol. 6046, pp. 445–452, 2011.

[6] N. Ishimura and H. Imai, Global in space numerical computa-tion for the nonlinear Black-Scholes equation, in: Nonlinear

Models in Mathematical Finance: New Research Trends inOption Pricing, M. Ehrhardt ed., Nova Science Publishers,Inc., New York, pp. 219–242, 2008.

[7] N. Ishimura, M. N. Koleva and L. G. Vulkov, Numerical solu-

tion via transformation methods of nonlinear models in op-tion pricing, in: AIP Conf. Proc., Volume 1301, pp. 387–394,2010.

[8] M. N. Koleva and L. G. Vulkov, Quasilinearization numerical

scheme for fully nonlinear parabolic problems with applica-tions in models of mathematical finance, preprint.

[9] M. N. Koleva and L. G. Vulkov, Fast two-grid algorithms

for solutions of the difference equations of nonlinear Black-Scholes equations, preprint.

[10] F. Black and M. Scholes, The pricing of options and corporateliabilities, J. Polit. Econ., 81 (1973), 637–654.

[11] R. C. Merton, Theory of rational option pricing, Bell J. Econ.Manag. Sci., 4 (1973), 141–183.

[12] L. Songzhe, Existence of solutions to initial value problem fora parabolic Monge-Ampere equation and application, Non-

linear Anal., 65 (2006), 59–78.[13] Z. Macova and D. Sevcovic, Weakly nonlinear analysis of the

Hamilton-Jacobi-Bellman equation arising from pension sav-ing management, Int. J. Numer. Anal. Model., 7 (2010), 619–

638.[14] A.Mas-Collel, D.W.Michael and J. R.Green, Microeconomic

Theory, Oxford Univ. Press, Oxford, 1995.

– 28 –


Approximation algorithms for a winner determination

problem of single-item multi-unit auctions

Satoshi Takahashi1 and Maiko Shigeno1

1 Graduate School of System and Information Systems, University of Tsukuba, Tsukuba, Ibaraki305-8573, Japan

E-mail stakahashi sk.tsukuba.ac.jp

Received September 30, 2010, Accepted January 13, 2011

Abstract

This paper treats a winner determination problem of a Vickrey-Clarke-Groves mechanismbased single-item multi-unit auction. For this problem, two simple 2-approximation algorithmsare proposed. One is a linear time algorithm using a linear knapsack problem. The other isa greedy type algorithm. In addition, a fully polynomial time approximation algorithm basedon a dynamic programming is described. Computational experiments verify availabilities ofour algorithms by comparing computational times and approximation ratios.

Keywords auction theory, winner determination, approximation algorithm

Research Activity Group Discrete Systems

1. Introduction

Recent Internet auctions with huge participators re-quire to compute an optimal allocation and paymentsas quick as possible. A winner determination problemon auction theory consists of an item allocation problemand a payment determination problem, which dependson an auction mechanism. One of the most desirable auc-tion mechanisms is due to Vickrey, Clarke and Groves,which is called VCG [1]. Throughout this paper, we con-sider only VCG based auctions. Winner determinationproblems of VCG based auctions are known asNP-hard.Therefore, it is important to consider fast approxima-tion algorithms for a winner determination problem inthe Internet auction environment.We treat single-item multi-unit auctions, where a

seller who wants to sell M units of a single item and nbidders participate. Each bidder i submits sets of anchorvalues dki | k = 0, . . . , ℓi and of unit values eki | k = 1,. . . , ℓi, where anchor values satisfy dk−1

i < dki for any0 < k ≤ ℓi and eki implies a unit value over half-openrange (dk−1

i , dki ] of item quantity. Without loss of gen-

erality, we assume that d0i = 0 and dℓii ≤ M for everybidder i. Let N = 1, . . . , n be a set of bidders andℓ =

∑i∈N ℓi. We define a value function vi : R+ → R of

bidder i by

vi(x) =

eki · x (dk−1

i < x ≤ dki , k = 1, . . . , ℓi),

0 (x = d0i or x > dℓii ).

Our item allocation problem (AP ) is to find each quan-tity xi that bidder i gets such that the total valuation ismaximized. It is formulated as

(AP ) maximize∑

i∈N vi(xi)subject to

∑i∈N xi ≤M, xi ≥ 0 (∀i ∈ N).

Denoted by x⋆, an optimal solution for (AP ) is. We saythat a solution for (AP ) satisfies an “anchor property” if

there are at least n− 1 bidders whose getting quantitiesare given by their anchor values.

Lemma 1 ( [2] ) A problem (AP ) has an optimal so-lution satisfies the anchor property, when every bidder’sunit values are monotone non-increasing for k.

To compute a payment of bidder j, we need to solve arestricted problem excepting j from bidders. Let N−j =N \ j and x−j be an optimal solution of an item allo-cation problem under the set N−j , that is,

maximize∑

i∈N−j vi(xi)subject to

∑i∈N−j xi ≤M, xi ≥ 0 (∀i ∈ N−j).

On a VCG based auction, a payment pj of bidder j isdefined by

pj =∑

i∈N−j vi(x−ji )−

∑i∈N−j vi(x

⋆i ). (1)

We now review briefly approximation algorithms fora winner determination problem of single-item multi-unit auctions. With respect to constant-factor approx-imations for (AP ), Kothari, Parkes and Suri [2] pro-posed a 2-approximation algorithm with O(ℓ2) time forthe so-called generalized knapsack problem which mod-els an item allocation problem in reverse auction. Whentheir greedy algorithm is applied to (AP ) directly, it re-turns a solution whose approximation ratio may not bebounded by two. Zhou [3] said that he improved this al-gorithm to run in O(ℓ log ℓ) time. Moreover, he showed a3-approximation algorithm with O(ℓ) time and a (9/4)-approximation algorithm with O(ℓ log ℓ) time for the so-called interval multiple-choice knapsack problem whosespecial case is (AP ). According to [3], it is an open prob-lem to compute a 2-approximation of (AP ) in lineartime. With respect to fully polynomial time approxima-tion schemes (FPTAS) for a winner determination prob-lem, Kothari, Parkes and Suri [2] proposed the first onewhich is based on dynamic programming and uses the

– 29 –

JSIAM Letters Vol. 3 (2011) pp.29–32 Satoshi Takahashi et al.

anchor property. It finds a solution with an approxima-tion ratio at most (1+ ϵ) for (AP ) in O(nℓ2/ϵ) time andcalculates every bidders’ payment in O((nℓ2/ϵ) log(n/ϵ))time. In order to solve (AP ), their algorithm repeats fix-ing a specified bidder j and index 0 < k ≤ ℓj , and solvingthe problem (AP ) adding a constraint dk−1

j < xj ≤ dkj .This FPTAS was improved by Zhou [3]. His algorithmdoes not repeat to compute (AP ) with an additionalconstraint. Thus, Zhou’s FPTAS solves a winner deter-mination problem in O((nℓ/ϵ) log(n/ϵ)) time. Moreover,by employing a technique of vector merge, he say thathis algorithm can run in O((nℓ/ϵ) log n) time. However,a solution found by his algorithm may not satisfy theanchor property.In the next section, we propose two 2-approximation

algorithms for solving (AP ). One is an O(ℓ) algorithm,which gives a positive answer to the open problem in[3]. The other is an O(ℓ(log n + ℓmax)) algorithm basedon greedy method improved from [2], where ℓmax =maxi∈N ℓi. We also describe an FPTAS for solving a win-ner determination problem in O((nℓ/ϵ) log(n/ϵ)) time,which finds a solution satisfying the anchor property.Section 3 shows computational experiments comparingperformances of proposed algorithms.

2. Approximation algorithms

This section proposes simple 2-approximation algo-rithms for (AP ) and an FPTAS that is modified versionof Zhou’s algorithm [3] for a winner determination prob-lem to obtain a solution satisfying the anchor property.

2.1 2-approximation algorithms for item allocationproblems

We propose two 2-approximation algorithms for (AP ).One is based on Dyer’s polynomial time algorithm [4] forlinear knapsack problem:

(LKP ) maximize∑

i∈N

∑ℓik=0(e

ki d

ki )y

ki

subject to∑

i∈N

∑ℓik=0 d

ki y

ki ≤M,∑ℓi

k=0 yki = 1 (∀i ∈ N),

yki ≥ 0 (∀i ∈ N, ∀k = 0, . . . , ℓi),

where, for any i ∈ N , e0i is given by an arbitrary value.

Lemma 2 The optimal value of (LKP ) gives an upperbound of (AP ).

Proof For a feasible solution x of (AP ), we can con-struct a feasible solution y for (LKP ) by setting, if thereexists an index ki with d

ki−1i < xi ≤ dki

i ,

yki =

xi/dkii (k = ki),

0 (k > 0, k = ki),

1− xi/dkii (k = 0),

and, otherwise, yki =

1 (k = 0),0 (k = 0).

The objective values of these solutions x and y satisfy∑i∈N

∑ℓik=0(e

ki d

ki )y

ki =

∑i∈N |0<xi≤d

ℓii e

kii xi

=∑

i∈N vi(xi).

Hence, the optimal value of (LKP ) is not less than theoptimal value of (AP ).

(QED)

With respect to a feasible solution y of (LKP ), we callan index i as saturated if yki = 1 holds for some k. It isknown that there exists an optimal solution for (LKP )with at most one unsaturated index. Let y∗ be such anoptimal solution and i∗ be the unsaturated index. Fromy∗, we construct two solutions of (AP ) by setting

xi =

∑ℓik=0 d

ki y

∗ki (i = i∗),

0 (i = i∗),(2)

and xi =

0 (i = i∗),

dkii (i = i∗),

(3)

where ki is an index attaining max0<k≤ℓi eki d

ki . Obvi-

ously, both solutions x and x are feasible for (AP ).Moreover, we have∑

i∈N

∑ℓik=0 e

ki d

ki y

∗ki

=∑

i∈N−i∗∑ℓi

k=0 eki d

ki y

∗ki +

∑ℓi∗k=0 e

ki∗d

ki∗y

∗ki∗

≤∑

i∈N vi(xi) +∑

i∈N vi(xi)

≤ 2 ·max∑

i∈N vi(xi),∑

i∈N vi(xi). (4)

Our approximation algorithm can be described as fol-lows.

Algorithm AA1

Step 1 Find an optimal solution y∗ of (LKP ) with atmost one unsaturated index i∗.

Step 2 From y∗, get two feasible solutions x and xby (2) and (3). If

∑i∈N vi(xi) ≥

∑i∈N vi(xi), then

return x, otherwise, return x.

Theorem 3 Algorithm AA1 finds a 2-approximationsolution for (AP ) in O(ℓ) time.

Proof Lemma 2 and inequality (4) derive our approx-imation ratio. Because (LKP ) can be solved in lineartime [4], we obtain our time complexity.

(QED)

Since Algorithm AA1 runs in linear time, it gives a pos-itive answer to the open problem, which in [3].We now turn to the other 2-approximation algorithm

that is greedy type. Our algorithm uses sloop functionspki : R→ R, for i ∈ N and 0 < k ≤ ℓi, given by a gradi-ent of the value function vi between a current allocatedunit x and each anchor value dki , i.e.,

pki (x) = (vi(dki )− vi(x))/(dki − x).

We describe our greedy type algorithm as follows.

Algorithm AA2

Step 1 Set xi = 0 for any i ∈ N .

Step 2 Find a pair (i∗, k∗) such as pk∗

i∗ (xi∗) = maxpki (xi) | i ∈ N, xi < dki . If pk

∗

i∗ (xi∗) ≤ 0, thenreturn x, otherwise, update xi∗ = dk

∗

i∗ .

Step 3 If∑

i∈N xi < M , go to Step 2.

Step 4 Make two solutions x and x by

xi =

xi (i = i∗),M −

∑j =i∗ xj (i = i∗),

– 30 –


and xi =

0 (i = i∗),xi∗ (i = i∗).

If∑

i∈N vi(xi) >∑

i∈N vi(xi), then return x, oth-erwise, return x.

Theorem 4 Algorithm AA2 finds a 2-approximationsolution of (AP ) in O(ℓ(log n+ ℓmax)) time, where ℓmax

= maxi∈N ℓi.

Proof When AA2 stops at Step 2, we can show thatthe returned solution is an optimal for (AP ). When AA2stops at Step 4, for a solution x at the end of AA2, letM ′ =

∑i∈N xi. It can be shown that the solution x is

optimal for

maximize∑

i∈N vi(xi)subject to

∑i∈N xi ≤M ′, xi ≥ 0 (∀i ∈ N).

Since x⋆ is also feasible for the above problem, we have∑i∈N vi(x

⋆i ) ≤

∑i∈N vi(xi). It comes from the defini-

tions of x and x that∑

i∈N vi(xi) ≤∑

i∈N vi(xi) +∑i∈N vi(xi) holds. Thus, we obtain the desired approx-

imation ratio.It is clear that the number of iteration of AA2 is at

most ℓ. If we store maxpki (xi) | xi < dki for all i ∈ Nin a heap, Step 2 can be performed in O(logn). AfterStep 2, we need to compute maxpki∗(xi∗) | xi∗ < dki∗for updated xi∗ , which runs in O(ℓi∗). Hence, the totalrunning time is bounded by O(ℓ(log n+ ℓmax)).

(QED)

2.2 FPTAS for winner determination problems

We show a modified version of Zhou’s algorithm [3]such that it finds a solution satisfying the anchor prop-erty, when every bidder’s unit values satisfy ek−1

i ≥ ekifor all 1 ≤ k ≤ ℓi. Let ϵ > 0 be a relative error and V bean objective value obtained by a 2-approximation algo-rithm. We define a scaled value function vi : R+ → R ofbidder i by

vi(x) = ⌊(n · vi(x))/ϵ · V ⌋.

We denote an item allocation problem over this scaledvalue function by (AP ). For an optimal solution x of

(AP ), we have∑i∈N vi(x

⋆i ) <

∑i∈N (ϵV/n)(vi(x

⋆i ) + 1)

≤∑

i∈N (ϵV/n)vi(xi) + ϵV

≤∑

i∈N vi(xi) + ϵ∑

i∈N vi(x⋆i ).

Thus, an optimal solution for (AP ) is a solution witha relative error at most ϵ for (AP ). In order to solve

(AP ) by dynamic programming, for two parameters tand r, the value min

∑ti=1 xi |

∑ti=1 vi(xi) ≥ r is

stored in G[t, r] and H[t, r], where in G[t, r] each xiis restricted to an anchor value, and in H[t, r] eachxi except only one bidder is restricted to an anchorvalue. An optimal solution of (AP ) is obtained from asolution x[n,r∗] establishing H[n, r∗], where r∗ attains

maxr∑

i∈N vi(x[n,r]i ) | H[n, r] ≤ M. In order to ob-

tain r∗, it is enough to search H[n, r] for r from 0 to

⌊(2n)/ϵ⌋, since the optimal value of (AP ) is bounded by⌊(2n)/ϵ⌋. Thus, our algorithm finds an optimal solution

for (AP ) by computing H[t, r] recursively, together withG[t, r], for r = 0, . . . , ⌊(2n)/ϵ⌋ and t = 0, . . . , n. It isobvious that we can initialize G[0, 0] = H[0, 0] = 0 andG[0, r] = H[0, r] = ∞ for any r > 0. For convenience,we set G[t, r] = H[t, r] = ∞ for any t ∈ N and r < 0.By recursively, we can represent

G[t, r] = minG[t−1, r],mink

(G[t−1, r − vt(dkt )] + dkt ).

Defining m[t, r] by

min0≤r′≤r

minxt | VG[t−1, r′]+vt(xt) ≥ r+G[t−1, r′],

where VG[t − 1, r′] is the value∑t−1

i=1 vi(xi) for a solu-tion x that establishes G[t−1, r′], we have the followingrecurrence for H:

H[t, r] = minH[t−1, r],mink(H[t−1, r− vt(dkt )]+dkt ),m[t, r].

In m[t, r], we can rewrite minxt | VG[t−1, r′]+ vt(xt) ≥r by

minxt | ⌊nekt xt/ϵV ⌋ ≥ r−VG[t−1, r′], dk−1t < xt≤ dkt .

(5)

Since r−VG[t−1, r′] is an integer, the smallest xt satisfiesthe first condition in (5) is given by (r− VG[t−1, r′])/ekt ·(ϵV/n). Thus (5) is equivalent to

mink(r − VG[t− 1, r′])/ekt · (ϵV/n)| dk−1

t < (r − VG[t− 1, r′])/ekt · (ϵV/n) ≤ dkt .

By using this formula, the values m[t, r] for all r = 1,. . . , ⌊(2n)/ϵ⌋ can be found simultaneously in O((nℓi/ϵ)log(n/ϵ)) time. After obtaining the values of m[t, r] forall r = 1, . . . , ⌊(2n)/ϵ⌋, we can compute each G[t, r] andH[t, r] in O(ℓi) time. Therefore we obtain entire elementsof G and H in O((nℓ/ϵ) log(n/ϵ)) time.Finally, we compute each bidder’s payment defined by

(1) by employing the method of Kothari, Parkes and Suri[2]. In their method, all payments can be computed inthe same time complexity to obtain G and H.

Theorem 5 Our algorithm finds a solution with a rel-ative error at most ϵ of (AP ) in O((nℓ/ϵ) log(n/ϵ)) time.It also finds every payment in the same time complexity.

If a vector merge technique by [3] is applied, ouralgorithm solves a winner determination problem inO((nℓ/ϵ) log n) time.

3. Experimental results

This section shows computational results of algo-rithms described in Section 2. All computations wereconducted on a personal computer with Core2 Duo CPU(3.06GHz) and 4GB memory. Our code was written bypython2.6.5. For given numbers of bidders n and of unitsM , all instances used in this experiment were generatedusing random numbers. The number of anchor values ℓifor each bidder i was selected uniformly from integerswithin the interval [1, 15]. Every unit value eki and an-chor value dki were selected uniformly from integers in[1, 100] and in [1,M ], respectively.Table 1 shows averages of computational times and

– 31 –


Table 1. Averages of computational times and of approximation ratios of 2-approximation algorithms for ten instances of each (n,M).

instance comp. times (sec.) app. ratios instance comp. times (sec.) app. ratios

(n,M) AA1 AA2 AA1 AA2 (n,M) AA1 AA2 AA1 AA2

(10, 200) 0.00460 0.00119 1.873 1.093 (10, 50) 0.00388 0.00101 1.299 1.251(50, 200) 0.01171 0.00409 1.736 1.244 (10, 100) 0.00314 0.00102 1.458 1.296

(100, 200) 0.02331 0.00768 1.505 1.195 (10, 200) 0.00460 0.00119 1.873 1.093(200, 200) 0.04721 0.01440 1.673 1.429 (50, 50) 0.01237 0.00392 1.398 1.485(400, 200) 0.09666 0.02869 1.492 1.399 (50, 100) 0.01430 0.00390 1.356 1.371(800, 200) 0.19884 0.05754 1.559 1.316 (50, 200) 0.01171 0.00409 1.736 1.244

(1000, 200) 0.25427 0.06933 1.749 1.639 (100, 50) 0.02468 0.00755 1.831 1.206(5000, 200) 1.84318 0.34113 1.550 1.328 (100, 100) 0.02401 0.00737 1.566 1.618(10000, 200) 5.08805 0.72498 1.548 1.609 (100, 200) 0.02331 0.00768 1.505 1.195

Table 2. Averages of computational times (sec.) and relative er-rors of our FPTAS for ten instances with n = 10 and M = 50.

epsiloncomp. times (sec.) relative errorsAA1 AA2 AA1 AA2

1.0 0.27925 0.66331 0.057 0.038

0.9 0.32014 0.78129 0.038 0.0370.8 0.40650 0.98921 0.041 0.0470.7 0.49900 1.20912 0.042 0.0340.6 0.67769 1.66427 0.026 0.021

0.5 0.92052 2.26631 0.037 0.0210.4 1.42961 3.50712 0.024 0.0010.3 2.54312 6.25542 0.014 0.0010.2 5.59242 13.77336 0.012 0.004

0.1 22.06652 54.52771 0.007 0.003

approximation ratios of two 2-approximation algo-rithms, Algorithm AA1 and Algorithm AA2, for ten in-stances of each size. Algorithm AA1 was implementedso that its time complexity was O(ℓ log ℓ), since we em-ployed a sorting algorithm instead of linear-time medianfinding in Dyer’s algorithm for (LKP ). It is consistentwith the theoretical complexities that resulting compu-tational times depend on n but not M . On the otherhand, Algorithm AA2 is faster than Algorithm AA1 inthe average times, because our instances seem not toderive worst cases. Approximation ratios of both algo-rithms seem not to be affected by sizes of n and M . Inour results, Algorithm AA2 tended to have better aver-age approximation ratio than Algorithm AA1. This ten-dency was influenced by a few solutions with bad approx-imation ratios. At the end of both algorithms AlgorithmAA1 and Algorithm AA2, they choose a solution x orx. Among 150 instances of this experiment, AlgorithmAA1 returned a solution x in 89 instances and AlgorithmAA2 returned x in 23 instances. Indeed, a solution givenby x, which was returned by Algorithm AA2 especially,did not have so good approximation ratio, because itallocated almost all units to only one bidder. This factseems to affect evaluations of approximation ratios.The second experiment evaluated behavior of our FP-

TAS described in Subsection 2.2 to solve a problem(AP ). We investigated influence of a given relative er-ror ϵ on computational times and on obtained relativeerrors. In addition, we compared performance of ourFPTAS where the value V is given by Algorithm AA1and Algorithm AA2, respectively. Table 2 shows aver-ages of computational times and of relative errors forten instances fixed with n = 10 and M = 50. In ourresult, while the case using Algorithm AA1 spent lesstimes on computing than the case using Algorithm AA2,

the latter frequently returned solutions with better rela-tive errors than the former, which derived from the factthat Algorithm AA2 tended to return a greater value Vthan Algorithm AA1. Although the theoretical complex-ity does not depend on V , we say there is a differenceof average computational times between the cases us-ing Algorithm AA1 and Algorithm AA2. This differencecomes from each computational time of m[t, r]. By usingAlgorithm AA2, our FPTAS returned almost optimal so-lutions, when ϵ is less than 0.4. However, the returnedallocations were different from optimal.

4. Concluding remarks

For winner determination problems of a VCG basedsingle-item multi-unit auction, we proposed two 2-approximation algorithms for item allocation problems.One runs in the linear time, which gives a positive an-swer to the open problem in [3]. The other does not runin the linear time, but it computes fast in some experi-ments. We also discussed an FTPAS, which returned anapproximation solution satisfying the anchor property.When some bidders know all bids and can compute op-

timal allocations and payments, they may not approvean approximate solution whose allocations and pay-ments are entirely different from optimal ones. To ap-prove approximate solutions in real auctions, we needsome rules about allocations. For instance, Fukuta andIto [5] discussed a rule that bidder j is allocated no morethan an allocation of bidder i if vi(d) > vj(d) for someanchor value d. It is our future work to develop an ap-proximation algorithm for finding an allocation satisfy-ing this rule.

References

[1] P. Milgrom, Putting Auction Theory to Work, CambridgeUniv. Press, 2004.

[2] A. Kothari, D. C. Parkes and S. Suri, Approximately-strategyproof and tractable multiunit auctions, Decision Sup-

port System, 39 (2005), 105–121.[3] Y. Zhou, Improved multi-unit auction clearing algorithms

with interval (multiple-choice) knapsack problems, in: Proc.of 17th Int. Sympo. on Algorithms and Computation, pp. 494–

506, 2006.[4] M. E. Dyer, An O(n) algorithm for the multiple-choice knap-

sack linear program, Math. Program., 29 (1984), 57–63.

[5] N. Fukuta and T. Ito, An analysis about approximated allo-cation algorithms of combinatorial auctions with large num-bers of bids (in Japanese), IEICE Trans. D, J90-D (2007),2324–2335.

– 32 –


On the new family of wavelets interpolating

to the Shannon wavelet

Naohiro Fukuda1 and Tamotu Kinoshita1

1 Institute of Mathematics, University of Tsukuba, 1-1-1 Tennodai, Tsukuba-shi, Ibaraki 305-8571, Japan

E-mail naohiro-f math.tsukuba.ac.jp

Received November 17, 2010, Accepted January 6, 2011

Abstract

There are various types of orthogonal wavelet families with order parameters. In this paper weintroduce a new family of wavelets which converges in Lq to the Shannon wavelet as the orderparameter n increases. In particular, we shall give a symmetric orthogonal scaling functionwhose time-bandwidth product is near 1/2.

Keywords Shannon wavelet, Haar wavelet, Battle-Lemarie wavelet

Research Activity Group Wavelet Analysis

It is well-known that the limit of B-spline family isthe Gaussian function, which achieves the smallest time-bandwidth product permitted by the uncertainty prin-ciple, i.e., 1/2 (see [1, 2]). The B-spline and the Gaus-sian function are not orthogonal to their translates.While, there are various types of orthogonal wavelet fam-ilies with order parameters, e.g., Battle-Lemarie wavelet,Daubechies wavelet, Stromberg wavelet (see [3–7]). Inparticular, [8] showed that the Battle-Lemarie waveletof order n converges to the Shannon wavelet as n tendsto infinity. Let us denote the low pass filter, the scal-ing function and the wavelet of the Battle-Lemarie bymBL

n (ξ), φBLn (x) and ψBL

n (x) respectively. This familyof the Battle-Lemarie wavelet interpolates from the (nonsmooth) Haar wavelet which has the best localizationin time to the (smooth) Shannon wavelet which has thebest localization in frequency. For some applications, theorder parameter n enables us to control the smoothnessand the proportion between the time window and thefrequency window. On the other hand, the Daubechieswavelet of order n does not converge to the Shannonwavelet as n tends to infinity. As for the Daubechies fil-ters, the asymptotic behavior is studied in [9].Firstly, we shall introduce another family of wavelets

interpolating from the Haar wavelet to the Shannonwavelet. In this paper, the low pass filter of the Haarwavelet mBL

1 (ξ) is denoted also by mH1 (ξ) and given by

mH1 (ξ)

(≡ mBL

1 (ξ))= e−i ξ

2 cosξ

2.

mH1 (ξ) is 2π-periodic. We immediately see thatmH

1 (0) =1 and |mH

1 (ξ)|2 + |mH1 (ξ + π)|2 = 1. Now, let us put

νn(ξ) =

0 for ξ < 0,pn(ξ) for 0 ≤ ξ ≤ 1,1 for ξ > 1,

(1)

where pn(ξ) is the (2n+1)-th order polynomial satisfyingpn(ξ) + pn(1 − ξ) ≡ 1 and pn(0) = 0, i.e.,

∫ x

0tn(1 −

t)ndt/∫ 1

0tn(1 − t)ndt. Then, by neglecting e−iξ/2 and

replacing ξ/2 by πνn[3|ξ|/(2π)−1]/2 in the argument ofthe cosine of mH

1 (ξ), one will get

mMn (ξ) = cos

[π

2νn

(3

2π|ξ| − 1

)].

The Meyer wavelet family is constructed from mMn (ξ)

(n ≥ 1) and its Fourier transform belongs to Cn due tothe irregularity of (1) at the points ξ = 0, 1. This causespolynomial decay of the Meyer wavelet.While in our paper, we shall replace ξ/2 by π sin2(

ξ/2)/2 in the argument of the cosine ofmH1 (ξ) and define

mH2 (ξ) = cos

(π

2sin2

ξ

2

).

Here we remark that mH2 (ξ) is 2π-periodic and satisfies

mH2 (0) = 1 and |mH

2 (ξ)|2 + |mH2 (ξ + π)|2 = 1, since

mH2 (ξ + π) = cos

(π

2cos2

ξ

2

)= cos

(π

2− π

2sin2

ξ

2

)= sin

(π

2sin2

ξ

2

).

To construct a new wavelet family, let us consider Θn(ξ)given recursively by

Θ1(ξ) =ξ

2and Θn(ξ) =

π

2sin2 Θn−1(ξ) for n ≥ 2.

Then we also define the 2π-periodic function

mHn (ξ) = cosΘn(ξ) for n ≥ 2. (2)

mHn (ξ) satisfies mH

n (0) = 1. Noting that mHn (ξ + π) =

sinΘn(ξ) still holds, we can obtain |mHn (ξ)|2 + |mH

n (ξ +π)|2 = 1. Therefore, since mH

n is differentiable, we findthat

∏∞j=1m

Hn (2−jξ) converges uniformly on bounded

sets ofR (see [10]). Thus we can define φHn (x) and ψH

n (x)by

– 33 –

JSIAM Letters Vol. 3 (2011) pp.33–36 Naohiro Fukuda et al.

π−π π2

1

π2

−

(a) mH2

1

π−π π2

π2

−

(b) mH3

1

π−π π2

π2

−

(c) mH5

1

π−π π2

π2

−

(d) mSH∞

Fig. 1. Graphs of mH2 , mH3 , mH5 and mSH∞ .

φHn (ξ) =

∞∏j=1

mHn (2−jξ)

and

ψHn (ξ) = ei

ξ2mH

n

(ξ

2+ π

)φHn

(ξ

2

).

Let mSH∞ (ξ) be the low pass filter of the Shannon

wavelet, i.e., 2π-periodic function defined by

mSH∞ (ξ) =

1 for |ξ| ≤ π

2,

0 forπ

2< |ξ| ≤ π,

we get the following properties for (2):

Proposition 1 mHn (ξ) satisfies that for all n ≥ 1

mHn (ξ) = 0 for |ξ| ≤ π

2, (3)

and

limn→∞

mHn (ξ) = mSH

∞ (ξ) for ξ ∈ R\πZ+

π

2

. (4)

Proof We shall show the pointwise convergence in (4).For ξ = 0, π we easily see that

mHn (0) = mSH

∞ (0) = 1, mHn (π) = mSH

∞ (π) = 0.

Since mHn is even and 2π-periodic, it is enough to con-

sider the two cases 0 < ξ < π/2 and π/2 < ξ < π.Define the function

f(θ) =π sin2 θ

2θfor 0 ≤ θ ≤ π

4.

Noting that f(π/4) = 1 and

f ′(θ) =π sin θ(2θ cos θ − sin θ)

2θ2> 0 for 0 ≤ θ ≤ π

4,

we find that f(θ) is strictly increasing on [0, π/4] andf(θ) < 1. In the case 0 < ξ < π/2, we have

0 < Θ1(ξ) <π

4

and

0 < Θ2(ξ)

(=π

2sin2 Θ1(ξ)

)< Θ1(ξ) <

π

4.

Recursively, we have n ≥ 1

0 < Θn(ξ)

(=π

2sin2 Θn−1(ξ)

)< Θn−1(ξ) <

π

4.

Let us fix 0 < ξ < π/2. We remark that

0 < Θn < Θn−1 < · · · < Θ1 <π

4, (5)

since f(Θn−1) = Θn/Θn−1 < 1. In particular, there ex-ists a constant a > 0 such that f(Θ1) = Θ2/Θ1 < a < 1.Therefore,

Θn

Θ1= f(Θn−1)f(Θn−2) · · · f(Θ1) < f(Θ1)

n−1 < an−1.

Hence, we get

0 < Θn(ξ) < Θ1an−1 <

π

4an−1 for 0 < ξ <

π

2.

Thus it follows that limn→∞ Θn(ξ) = 0. Consequently,we have

limn→∞

∣∣mHn (ξ)−mSH

∞ (ξ)∣∣ = lim

n→∞(1− cosΘn(ξ)) = 0.

In the case π/2 < ξ < π, noting that

mHn (ξ + π) = sinΘn(ξ) and 0 < −ξ + π <

π

2,

we obtain limn→∞ Θn(−ξ + π) = 0 and also

limn→∞

∣∣mHn (ξ)−mSH

∞ (ξ)∣∣ = lim

n→∞mH

n (−ξ)

= limn→∞

sinΘn(−ξ + π)

= 0.

Since Θn(π/2) = π/4 and (5), we can also have (3).

(QED)

From (3) it follows that mHn (ξ) is the low pass filter

for an MRA (see [10]). This means that ψHn (x) is defined

by its Fourier transform

ψHn (ξ) = ei

ξ2mH

n

(ξ

2+ π

)φHn

(ξ

2

).

From (4) the scaling function φHn and wavelet ψH

n alsoconverge to the Shannon scaling function φSH

∞ andwavelet ψSH

∞ as the order parameter n increases. Moreprecisely, we can prove the following theorem:

Theorem 2 For 2 ≤ q ≤ ∞, we have

limn→∞

∥∥φHn − φSH

∞∥∥Lq = 0

and limn→∞

∥∥ψHn − ψSH

∞∥∥Lq = 0.

Proof It is sufficient to give a proof only for the scalingfunctions. At first, we shall prove for 1 ≤ p <∞

limn→∞

∥∥φHn − φSH

∞∥∥Lp(R)

= 0. (6)

– 34 –


For a fixed ξ ∈ R, there exists J > 0 such that∣∣∣∣∣φHn (ξ)−

J∏j=1

mHn (2−jξ)

∣∣∣∣∣ < ε

3

and ∣∣∣∣∣φSH∞ (ξ)−

J∏j=1

mSH∞ (2−jξ)

∣∣∣∣∣ < ε

3.

While, for a sufficiently large N = N(J) > 0 Proposi-

tion 1 gives that for ξ ∈ R\∪J

j=1 2jπZ+ π/2∣∣∣∣∣

J∏j=1

mHN (2−jξ)−

J∏j=1

mSH∞ (2−jξ)

∣∣∣∣∣ < ε

3.

Thus it follows that for ξ ∈ R\∪J

j=1 2jπZ+ π/2∣∣∣φH

N (ξ)− φSH∞ (ξ)

∣∣∣≤

∣∣∣∣∣φHN −

J∏j=1

mHN (2−jξ)

∣∣∣∣∣+

∣∣∣∣∣J∏

j=1

mHN (2−jξ)−

J∏j=1

mSH∞ (2−jξ)

∣∣∣∣∣+

∣∣∣∣∣J∏

j=1

mSH∞ (2−jξ)− φSH

∞

∣∣∣∣∣< ε.

This implies that for almost all ξ ∈ R

limn→∞

∣∣φHn (ξ)− φSH

∞ (ξ)∣∣ = 0.

From the results of later Theorem 3 we will find thatφHn (x) is smooth and |φH

n (ξ)|p is dominated by someintegrable function for a sufficiently large n = n(p) > 0.Therefore, the dominated convergence theorem proves(6).Let us consider (6) especially for 1 < p ≤ 2. Taking

2 ≤ q <∞ such that 1/p+1/q = 1, by Hausdorff-Younginequality we get

limn→∞

∥∥φHn − φSH

∞∥∥Lq(R)

= 0.

(QED)

In [8] one can see the corresponding results for theBattle-Lemarie scaling function and wavelet.Throughout this paper, we denote the sinc function

by sinc(x) = sinx/x. φSH∞ and ψSH

∞ have polynomialdecays and belong to C∞(Rx), since φ

SH∞ (x) = sinc(πx)

and ψSH∞ (x) = 2sinc(2πx) − sinc(πx). Especially in the

case n = 2, φHn is rewritten as

φH2 (ξ) =

∞∏j=1

cos

[π

2sin2

(ξ

2j+1

)]

=∞∏j=1

sin

[π

2cos2

(ξ

2j+1

)]

=∞∏j=1

cos2(

ξ

2j+1

) ∞∏j=1

π

2sinc

[π

2cos2

(ξ

2j+1

)]

= sinc2ξ

2

∞∏j=1

L

(ξ

2j

),

where L(ξ) = π sinc[π cos2(ξ/2)/2]/2. We remark thatL(ξ+ iη) is an entire function satisfying L(0) = 1. Fromthe continuity, there exists C > 0 such that

|L(ξ + iη)| ≤ 1 + C|ξ + iη|. (7)

Since |L(ξ)| ≤ π/2 for ξ ∈ R, for all ε > 0 there existsδξ > 0 such that for all η with 0 < |η| < δξ, we have

|L(ξ + iη)| ≤(π

2

)1+ε

. (8)

We note that L(ξ) is 2π-periodic and

supξ∈R

δξ = max0≤ξ≤2π

δξ > 0.

So, we can take δ = max0≤ξ≤2π δξ > 0 which is indepen-dent of ξ. For arbitrarily fixed J ≥ 0, taking η such thatmax1≤j≤J |η/2j | < δ, i.e., |η| < 2δ, by (7) and (8), weget for 2J ≤ |ξ + iη| ≤ 2J+1∣∣φH

2 (ξ + iη)∣∣

=

∣∣∣∣sinc(ξ + iη

2

)∣∣∣∣2 J∏j=1

∣∣∣∣L(ξ + iη

2j

)∣∣∣∣×

∞∏j=J+1

∣∣∣∣L(ξ + iη

2j

)∣∣∣∣

≤

∣∣∣∣∣∣∣∣sin

(ξ + iη

2

)ξ + iη

2

∣∣∣∣∣∣∣∣2(

π

2

)J(1+ε) ∞∏j=J+1

(1 + C

∣∣∣∣ξ + iη

2j

∣∣∣∣)

≤ Cη

|ξ + iη|2 + 12J(1+ε) log2

π2

∞∏j=J+1

exp

(C

∣∣∣∣ξ + iη

2j

∣∣∣∣)

≤ Cη

|ξ + iη|2 + 1|ξ + iη|(1+ε) log2

π2 exp

( ∞∑j=0

C

2j

)

≤ Cηe2C(|ξ|+ |η|+ 1

)qε,

where qε = (1 + ε) log2(π/2) − 2. For |ξ + iη| ≤ 1 weeasily see that |φH

2 (ξ + iη)| ≤ C. Thus, it follows thatfor ξ ∈ R and |η| < 2δ∣∣φH

2 (ξ + iη)∣∣ ≤Mη

(|ξ|+ |η|+ 1

)qε. (9)

The exponent qε becomes negative for a sufficiently smallε > 0. Thus, from the Paley-Wiener theorem we con-clude the following two facts (see [11]):

• φH2 has exponential decay, since φH

2 is analytic witha positive radius of convergence.

• φH2 belongs to Cα2(Rx) for some α2 > 0 (the esti-

mate (9) with η = 0).

Furthermore, we can find the decays and regularitiesof φH

n and ψHn (n ≥ 2) as follows:

– 35 –


Table 1. Regularities of φHn and ψHn .

n 2 3 4 5 6

αn 0.386 1.133 2.616 5.580 11.508

Table 2. Time-bandwidth products of the scaling function andwavelet.

n 2 3 4 5 6

∆φHn∆φH

n0.926 0.669 0.772 0.947 1.177

∆ψHn∆ψH

n2.603 2.136 2.500 3.069 5.393

Table 3. Time-bandwidth products of the scaling functions ofBattle-Lemarie, Meyer and Daubechies.

n 1 2 3 4 5

∆φBLn

∆φBLn

∞ 0.686 0.741 0.837 0.928

∆φMn∆φM

n0.810 0.875 0.949 1.012 1.065

∆φDn∆φD

n∞ 1.057 0.828 0.849 0.984

2−2 1−1 3−3

1

0.5

(a) φH2

2−2 1−1 3−3

1

0.5

(b) φH3

2−2 1−1 3−3

1

0.5

(c) φH5

2−2 1−1 3−3

1

0.5

(d) φSH∞

Fig. 2. Graphs of φH2 , φH3 , φH5 and φSH∞ .

Theorem 3 Let n ≥ 2. The scaling function φHn and

wavelet ψHn have exponential decays and belong to Cαn(

Rx) for some αn > 0 increasing in the parameter n.

In fact, we can derive a more refined estimate than(9) in [12] and a better αn > 0 is given as the Table 1.Then, we also give the time-bandwidth product ∆f∆fof the scaling function and wavelet in Table 2, where

∆f :=

∫ ∞

−∞(x− x0)2|f(x)|2dx∫ ∞

−∞|f(x)|2dx

12

with

x0 :=

∫ ∞

−∞x|f(x)|2dx∫ ∞

−∞|f(x)|2dx

.

From the uncertainty principle the lower bound is1/2. It seems that ∆φH

3∆φH

3is nice in comparison with

some other famous scaling functions in Table 3. Inconclusion, we observe that φ(x) defined by φ(ξ) =∏∞

j=1 cosπ sin2[π sin2(ξ/2j+1)/2]/2 is differentiable in

x and also satisfies ∆φ∆φ = 0.669 which is near 1/2.Some generalizations of the above results will appear

in the forthcoming paper [12].

Acknowledgments

The authors would like to thank the referee for valu-able suggestions.

References

[1] I. J. Schoenberg, Cardinal interpolation and spline functions,J. Approx. Theory, 2 (1969), 167–206.

[2] M. Unser, A. Aldroubi and M. Eden, On the asymptotic

convergence of B-spline wavelets to Gabor functions, IEEETrans. Inform. Theory, 38 (1992), 864–872.

[3] G. Battle, A block spin construction of ondelettes. I. Lemariefunctions, Comm. Math. Phys., 110 (1987), 601–615.

[4] I. Daubechies, Ten lectures on wavelets, CBMS-NSF Re-gional Conference Series in Applied Mathematics, 61, SIAM,Philadelphia, PA, 1992.

[5] N. Fukuda and T. Kinoshita, On non-symmetric orthogonalspline wavelets, to appear in South. Asian Bull. Math.

[6] P. G. Lemarie, Ondelettes a localisation exponentielles, J.Math. pures et appl., 67 (1988), 227–236.

[7] J.O.Stromberg, A modified Franklin system and higher-orderspline systems on Rn as unconditional basis for Hardy spaces,in: Proc. Conf. on Harmonic Analysis in honor of Antoni Zyg-mund, pp. 475–493, 1981.

[8] K.H.Oh, K.R.Young and K. J. Seung, On asymptotic behav-ior of Battle-Lemarie scaling functions and wavelets, Appl.Math. Lett., 20 (2007), 376–381.

[9] D. Kateb and P. G. Lemarie-Rieusset, Asymptotic behavior

of the Daubechies filters, Appl. Comput. Harmon. Anal., 2(1995), 398–399.

[10] E. Hernandez and G. Weiss, A first course on wavelets, CRCPress, Boca Raton, FL, 1996.

[11] S. G Krantz and H. R. Parks, A Primer of Real AnalyticFunctions, Birkhauser, Boston-Basel-Berlin, 2nd ed., 2002.

[12] N. Fukuda and T.Kinoshita, On the construction of new fam-

ilies of wavelets, preprint.

– 36 –


Conservative finite difference schemes for the modified

Camassa-Holm equation

Yuto Miyatake1, Takayasu Matsuo1 and Daisuke Furihata2

1 Graduate School of Information Science and Technology, The University of Tokyo, Hongo7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan

2 Cybermedia Center, Osaka University, Machikaneyama 1-32, Toyonaka, Osaka 560-0043,Japan

E-mail yuto miyatake mist.i.u-tokyo.ac.jp

Received November 18, 2010, Accepted April 5, 2011

Abstract

We consider the numerical integration of the modified Camassa-Holm equation, which hasbeen recently proposed by McLachlan and Zhang (2009) as a generalization of the prominentCamassa-Holm equation. We present nonlinear and linear finite difference schemes for themodified equation that preserve two invariants at a same time. We also show some numericalexamples of the presented schemes, where it is found that certain solutions of the mCH canbehave like solitons.

Keywords modified Camassa-Holm equation, conservation law, discrete variational deriva-tive method

Research Activity Group Scientific Computation and Numerical Analysis

1. Introduction

In this paper, we consider the numerical integrationof the “modified Camassa-Holm (mCH) equation” of theform:

mt + umx + 2uxm = 0, m = (1− ∂x2)pu, (1)

where p is a positive integer and the subscript t (or x,respectively) denotes the differentiation with respect totime variable t (or x). This equation was derived byMcLachlan and Zhang [1] as the Euler-Poincare differ-ential equation on the Bott-Virasoro group with respectto the Hp metric.When p = 1, (1) reduces to the well-known Camassa-

Holm (CH) equation:

ut − uxxt = uuxxx + 2uxuxx − 3uux, (2)

which describes shallow water waves. The CH has bi-Hamiltonian structure, is completely-integrable, and hasinfinitely many conservation laws. Furthermore, it has aninteresting feature that it allows strange singular solu-tions called “peakons” (peaked solitons); this is in sharpcontrast to the classical smooth soliton equations suchas the Korteweg-de Vries equation. In order to revealthe rich dynamics of the CH, many numerical studieshave been carried out, including the following geomet-ric integrators: invariant-preserving integrators [2–6] andmultisymplectic integrators [7].In contrast to this, the case p ≥ 2 has not yet been

fully understood, both theoretically and numerically.Here we briefly review some known results on the casep = 2. Global existence of smooth solutions on the unitcircle S and real line R were discussed in [1, 8]. The fol-lowing two invariants have been found associated with

the mCH (with p = 2):

d

dt

∫udx = 0,

d

dt

∫u2 + 2u2x + u2xx

2dx = 0. (3)

It is still an open question whether or not there areother invariants, in particular if the mCH is completely-integrable. The dynamics of the mCH, for example thepossibility of soliton-like solutions, are not yet under-stood, except that in [1] an interesting phenomenoncalled “weak blow-up” was numerically suggested (butnot mathematically confirmed). To the best of the au-thors’ knowledge, so far no study has been carried outthat mainly focuses on the numerical treatment of themCH.Taking these backgrounds into account, the aim of the

present paper is to show the following two points. First,we show that finite difference schemes preserving theinvariants (3) simultaneously can be constructed. Next,using these geometric integrators, we numerically showthat certain solutions can behave like solitons. This pa-per is intended to be a prompt report on these re-sults, and the full detail will be presented in our futurework [9].This paper is organized as follows. In Section 2 the

proposed conservative schemes are presented, and theirproperties are discussed. In Section 3 we show some nu-merical examples on the soliton-like solutions. Conclud-ing remarks are given in Section 4.We use the following notation. Noting the fact that

physically u is the main variable which means the “wavevelocity,” we choose u as our main variable in the nu-merical computation. The discrete version is denoted by

U(n)k ≃ u(k∆x, n∆t), where ∆x = L/N (N is the num-

ber of the spatial grids) and ∆t is the time mesh size.

– 37 –

JSIAM Letters Vol. 3 (2011) pp.37–40 Yuto Miyatake et al.

We use the abbreviation U(n+1/2)k = (U

(n+1)k + U

(n)k )/

2. We also write this as a vector: U (n) = (U(n)0 , U

(n)1 ,

. . . , U(n)N−1)

⊤. In the presentation of the schemes, we also

use the discrete version of m, which is denoted by M(n)k .

Throughout this paper, we limit ourselves to the unitcircle case, i.e., we assume the periodic boundary con-dition, in accordance with numerical simulations. Nat-urally we assume the discrete periodic boundary condi-

tion: U(n)k = U

(n)k mod N (∀k ∈ Z).

We often use the standard central difference operators

δ⟨1⟩k , δ

⟨2⟩k that approximate ∂x, ∂x

2, and the forward andbackward difference operators δ+k , δ

−k .

2. Conservative schemes

In this section, we present the finite difference schemesthat preserve some discrete counterparts of the two in-variants in (3) at a same time. The schemes can be easilyfound by extending the conservative schemes for the CH[5, 6] (see also [2]); below we briefly show the outline.A key observation is that the mCH (1) can be formallywritten in the following Hamiltonian form:

mt = −(m∂x + ∂xm)δH

δm, H =

u2 + 2u2x + u2xx2

, (4)

where δH/δm is the variational derivative of H withrespect to m. Although H is defined with u, it is aneasy exercise to find δH/δm = u withm = (1−∂x2)2u inmind. From this, the conservation of H is obvious, if wenote that the operator (m∂x + ∂xm) is skew-symmetric.

Remark 1 By the “operator” (m∂x + ∂xm), wepromise that it applies to a function f in such a way that(m∂x + ∂xm)f = m∂xf + ∂x(mf), which is a standardconvention in this research area. The same conventionapplies to the discrete versions.

Interestingly enough, the Hamiltonian form is for-mally the same as that of the CH (p = 1); in fact, theCH (2) can be rewritten as

mt = −(m∂x + ∂xm)δH

δm, H =

u2 + u2x2

(5)

with m = (1− ∂x2)u and δH/δm = u. In [5, 6], (a vari-ant of) “the discrete variational derivative method” [10]was applied to the Hamiltonian form (5) to find schemespreserving H. In view of this, one would naturally thinkthat a similar approach can be taken also for (4), whereonly the concrete form of H and the relation between uand m are different; the answer is yes.Due to the restriction of space, we omit the detail of

the derivation, and only show the resulting schemes andrelated discrete invariants.

Scheme 2 (A nonlinear scheme) We define the

initial approximate solution by U(0)k = u(0, k∆x) (k =

0, . . . , N − 1). Then for n = 0, 1, . . . ,

M(n+1)k −M (n)

k

∆t

= −(M

(n+ 12 )

k δ⟨1⟩k + δ

⟨1⟩k M

(n+ 12 )

k

) U (n+1)k + U

(n)k

2

(k = 0, . . . , N − 1), (6)

where M(n)k is associated with U

(n)k via the relation

M(n)k = (1 − δ

⟨2⟩k )2U

(n)k , and M

(n+1/2)k = (M

(n+1)k +

M(n)k )/2.

Obviously (6) corresponds to (4). The scheme, as ex-pressed in (6), formally coincides with the nonlinearscheme for the CH in [5, 6]. However, the relation be-

tween U(n)k andM

(n)k is different (which means the over-

all scheme is different; note that, as mentioned earlier,

the computation is carried out solely in the u (or U(n)k )

space, by eliminating M(n)k ), and this makes the associ-

ated discrete Hamiltonian also different. It can be shownthat the following quantity:

H(n)k =

(U

(n)k

)2+(δ+k U

(n)k

)2+(δ−k U

(n)k

)2+(δ⟨2⟩k U

(n)k

)22

(7)

serves as the discrete Hamiltonian for the scheme.

Theorem 3 (Conservation laws) Under the dis-crete periodic boundary condition, the numerical solutionby Scheme 2 conserves the two invariants:

N−1∑k=0

U(n)k ∆x =

N−1∑k=0

U(0)k ∆x, (n = 1, 2, . . . ),

N−1∑k=0

H(n)k ∆x =

N−1∑k=0

H(0)k ∆x, (n = 1, 2, . . . ).

Proof We only show the outline of the proof. We firstprove the first conservation law. Note that it is sufficientto prove

N−1∑k=0

M(n)k ∆x =

N−1∑k=0

M(0)k ∆x, (8)

since under the discrete periodic boundary condition itobviously holds that

N−1∑k=0

δ⟨2⟩k U

(n)k ∆x =

N−1∑k=0

(δ⟨2⟩k

)2U

(n)k ∆x = 0.

(This can be confirmed by the summation-by-parts for-mulas found in, for example, [10].) Now we prove (8).

1

∆t

N−1∑k=0

(M

(n+1)k −M (n)

k

)∆x

= −N−1∑k=0

(M

(n+ 12 )

k δ⟨1⟩k + δ

⟨1⟩k M

(n+ 12 )

k

)U

(n+ 12 )

k ∆x

= −N−1∑k=0

M

(n+ 12 )

k · δ⟨1⟩k U(n+ 1

2 )

k

+ δ⟨1⟩k

(M

(n+ 12 )

k U(n+ 1

2 )

k

)∆x

= −N−1∑k=0

(1− δ⟨2⟩k

)2U

(n+ 12 )

k · δ⟨1⟩k U(n+ 1

2 )

k ∆x

=N−1∑k=0

δ⟨1⟩k

(1− δ⟨2⟩k

)2U

(n+ 12 )

k · U (n+ 12 )

k ∆x

– 38 –


= 0.

Here we frequently used the discrete periodic bound-ary condition with various summation-by-parts formu-las [10]. The last equality follows from the skew symme-

try of δ⟨1⟩k (1− δ⟨2⟩k )2.

Next we prove the second conservation law.

1

∆t

N−1∑k=0

(H

(n+1)k −H(n)

k

)∆x

=N−1∑k=0

(U

(n+1)k + U

(n)k

2·M

(n+1)k −M (n)

k

∆t

)∆x

=N−1∑k=0

U(n+ 1

2 )

k

−(M

(n+ 12 )

k δ⟨1⟩k

+ δ⟨1⟩k M

(n+ 12 )

k

)U

(n+ 12 )

k

∆x

= 0.

The first equality can be confirmed by (7) andsummation-by-parts formulas (this requires some calcu-lation, but we omit the detail). The third is from the

skew-symmetry of −(M (n+1/2)k δ

⟨1⟩k + δ

⟨1⟩k M

(n+1/2)k ).

(QED)

Since Scheme 2 is nonlinear, it requires expensive non-linear solvers in each time step. As a remedy for this, wecan construct a linear scheme. Again, this bases on thelinearly implicit scheme in [5, 6, 11]. We only show theresults.

Scheme 4 (A linearly implicit scheme) We de-

fine the initial approximate solution by U(0)k = u(0, k∆x)

(k = 0, . . . , N − 1). Then for n = 1, 2, . . .,

M(n+1)k −M (n−1)

k

2∆t= −

(M

(n)k δ

⟨1⟩k + δ

⟨1⟩k M

(n)k

)U

(n)k ,

where M(n)k is associated with U

(n)k via the relation

M(n)k = (1− δ⟨2⟩k )2U

(n)k .

Note that since Scheme 4 is a multistep scheme, we neednot only the initial value U (0) but also the starting valueU (1). If we adopt Scheme 2 for computing U (1), we getthe following conservation laws.

Theorem 5 (Conservation laws) Under the dis-crete periodic boundary condition, the numerical solutionby Scheme 4 conserves the two invariants:

N−1∑k=0

U(n)k ∆x =

N−1∑k=0

U(0)k ∆x, (n = 1, 2, . . . ),

N−1∑k=0

H(n+ 1

2 )

k ∆x =

N−1∑k=0

H( 12 )

k ∆x, (n = 1, 2, . . . ),

where H(n+1/2)k = U (n)

k U(n+1)k +(δ+k U

(n)k )(δ+k U

(n+1)k )+

(δ−k U(n)k )(δ−k U

(n+1)k ) + (δ

⟨2⟩k U

(n)k )(δ

⟨2⟩k U

(n+1)k )/2.

The proof is similar to the nonlinear case (omitted). Notethat, in contrast to the nonlinear scheme case, the dis-crete Hamiltonian is now defined in a multi-step way. Ingeneral, multi-step scheme can be unstable; we observe

0 10 20 30 40 500

1

2

3

4

5

6× 10 −12

time

error

U

Hamiltonian

Fig. 1. Error in the discrete invariants: Scheme 2.

0 10 20 30 40 500

1

2

3

4

5

6

7× 10 −12

time

error

U

Hamiltonian

Fig. 2. Error in the discrete invariants: Scheme 4.

the stability numerically in the next section.

3. Numerical examples with a soliton-

like solution

In this section we show some numerical examples withthe presented schemes, and point out that certain so-lutions of mCH can behave like solitons. All the com-putations were done in the computation environment:CPU Xeon(3.00GHz), 16GB memory, Linux OS. Weused MATLAB (R2007b), where nonlinear equationswere solved by “fsolve” with tolerance TolFun = 10−10

and TolX = 10−10.First we confirm the discrete conservation laws in

the proposed schemes. The parameters were set to t ∈[0, 50], x ∈ [−15, 15],∆x = 0.1,∆t = 0.05, and the ini-tial value was set to u(0, x) = sech2(0.3x). Fig. 1 showsthe errors in the discrete invariants in Scheme 2 (thenonlinear scheme). It confirms that the scheme conserveboth the two discrete invariants within the accuracy ofthe nonlinear solver (recall that the tolerance was set to10−10). Fig. 2 shows the errors in Scheme 4 (the linearscheme), which again well confirms the discrete conser-vation laws.

– 39 –


0

20

40

60

80

100

0

1

t

u

−15−10

−50

510

15

x

Fig. 3. The evolution of the “one soliton” solution.

−20−15

−10−5

05

1015

20

0

20

40

60

80

100

0

1

x

t

u

Fig. 4. The evolution of the “two soliton” solutions.

Next, we seek for soliton-like solutions using the con-servative schemes. In the CH (p = 1), the singular solitonsolutions, the “peakons,” can be obtained by formallysetting m = cδ(x) (the Dirac delta function) where c isa generic constant. In view of the strong similarity be-tween the CH and the mCH, a natural expectation wouldbe that also in the mCH the delta function behaves asa soliton. By formally integrating the delta function, weobtain

u(x, t) =c

4(1 + |x− ct|)e−|x−ct|. (9)

(The argument here is on the whole real line R for thesake of mathematical brevity. In the following numericalexperiments, the solution is truncated so that it fits inthe periodic interval.)We tested this solution using the conservative schemes

with the parameters: t ∈ [0, 100], x ∈ [−15, 15],∆x =0.05,∆t = 0.025, and the initial value: u(0, x) = (1 +|x|)e−|x|. We found that both schemes stably capturedthe same behavior (note that this numerically supportsthe stability of the schemes, in particular, of Scheme 4).Fig. 3 shows the result of Scheme 4. We can see that thesolution actually behaves like a soliton.Next, in order to see if the solutions actually interact

like solitons, we considered the initial value u(0, x) =(1+|x+5|)e−|x+5|+(1/2)(1+|x−5|)e−|x−5|. The param-eters were set as follows: t ∈ [0, 100], x ∈ [−20, 20],∆x =0.1,∆t = 0.1. The result is shown in Fig. 4, which seemsto support our view that the solution behaves like twosolitons.


We presented two finite difference schemes for themCH equation that preserve the two associated invari-ants. We considered a soliton-like solution for the mCH,and confirmed the soliton-like behavior numerically. Asfar as the authors understand, this is a new observation.The discussion on the schemes —the scheme deriva-

tion and the establishment of discrete invariants— canbe easily carried to the general case p ≥ 3. Moreover,it is possible to discuss some theoretical aspects of theschemes, for example, the (unique) existence of the nu-merical solutions. These discussions are left to [9]. Wealso plan to include more numerical results there.Finally, as noted in the introduction, the study of the

mCH has just started, and many open problems stillremain. Does the mCH admit other invariants? Or moreaggressively, is the mCH completely-integrable? As forthe dynamical aspects, although in the present studywe could find a soliton-like solution in analogy with thestandard CH, it is not clear at all whether or not theentire dynamics can be understood in a similar way. Theanswer should be negative, at least partly, since it hasbeen shown in [1] that the blow-up in the sense of “wave-breaking” should not occur in the mCH (p ≥ 2). Thusmuch more effort should be devoted in this topic, andthere we believe that the presented conservative schemesserve as effective numerical tools.

References

[1] R. McLachlan and X. Zhang, Well-posedness of modifiedCamassa-Holm equations, J. Differential Equations, 246

(2009), 3241–3259.[2] D. Cohen and X. Raynaud, Geometric finite difference

schemes for the generalized hyperelastic-rod wave equation,

J. Comput. Appl. Math., 235 (2011), 1925–1940.[3] T. Matsuo, A Hamiltonian-conserving Galerkin scheme for

the Camassa-Holm equation, J. Comput. Appl. Math., 234(2010), 1258–1266.

[4] T.Matsuo and H.Yamaguchi, An energy-conserving Galerkinscheme for a class of nonlinear dispersive equations, J. Com-put. Phys., 228 (2009), 4346–4358.

[5] K. Takeya, Conservative finite difference schemes for the

Camassa-Holm equation (in Japanese), Master’s Thesis, Os-aka Univ., 2007.

[6] K. Takeya and D. Furihata, Conservative finite differenceschemes for the Camassa-Holm equation, in preparation.

[7] D. Cohen, B. Owren and X. Raynaud, Multi-symplectic in-tegration of the Camassa-Holm equation, J. Comput. Phys.,227 (2008), 5492–5512.

[8] P. Zhang, Global existence of solutions to the modifiedCamassa-Holm shallow water equation, Int. J. Nonlinear Sci.,9 (2010), 123–128.

[9] Y. Miyatake, T. Matsuo and D. Furihata, Invariants-

preserving integration of the modified Camassa-Holm equa-tion, submitted.

[10] D. Furihata, Finite difference schemes for ∂u∂t

=(∂∂x

)αδGδu

that inherit energy conservation or dissipation property, J.Comput. Phys., 156 (1999), 181–205.

[11] T.Matsuo and D. Furihata, Dissipative or conservative finite-difference schemes for complex-valued nonlinear partial dif-ferential equations, J. Comput. Phys., 171 (2001), 425–447.

– 40 –


A multi-symplectic integration of the Ostrovsky equation

Yuto Miyatake1, Takaharu Yaguchi2 and Takayasu Matsuo1

1 Department of Mathematical Informatics, Graduate School of Information Science and Tech-nology, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan

2 Department of Computational Science, Graduate School of System Informatics, Kobe Univer-sity, Rokkodai-cho 1-1, Nada-ku, Kobe, 657-8501, Japan

E-mail yuto miyatake mist.i.u-tokyo.ac.jp

Received March 15, 2011, Accepted April 6, 2011

Abstract

We consider structure-preserving integration of the Ostrovsky equation, which for examplemodels gravity waves under the influence of Coriolis force. We find a multi-symplectic formula-tion, and derive a finite difference discretization based on the formulation and by means of thePreissman box scheme. We also present a numerical example, which shows the effectivenessof this scheme.

Keywords Ostrovsky equation, multi-symplecticity, Preissmann box scheme


1. Introduction

In this paper, we consider structure-preserving inte-gration of the Ostrovsky equation [1] under the periodicboundary condition of length L:

ut + αuux − βuxxx = γ∂x−1u, u(t, x) = u(t, x+ L),

where α, β, γ are real parameters and the subscript t (orx, respectively) denotes the differentiation with respectto time variable t (or x). The operator ∂x

−1 is definedby

∂x−1u =

∫ x

0

u(t, s) ds− 1

L

∫ L

0

∫ x

0

u(t, s) dsdx, (1)

for any zero-mean and L-periodic function u [4]. Thisequation is often called the rotation-modified Korteweg-de Vries equation. This equation has many physicalmeanings. For example, it models gravity waves underthe influence of Coriolis force, surface and internal wavesin the ocean, and capillary waves on the surface of a liq-uid.The Ostrovsky equation has three first integrals [2]:∫ L

0

udx = const. = 0, (2)∫ L

0

[α

6u3 +

β

2u2x +

γ

2(∂x

−1u)2]dx = const., (3)

∫ L

0

u2

2dx = const. (4)

The invariant (2), which we call the total mass, is thecondition for the existence of the potential ϕ = ∂x

−1u.The invariants (3) and (4) correspond to the energy andthe L2 norm conservation laws, respectively. From theperspective of structure-preserving integration, Yaguchiet al. have proposed four conservative numerical schemes[3]: a finite difference scheme and a pseudospectralscheme that conserve the energy (3), and the same types

of schemes that conserve the norm (4). For other exist-ing schemes, see also [4–7]. In this paper, we devote oureffort to multi-symplectic integration, which is a branchof structure-preserving integrations. We show that thisequation has a multi-symplectic formulation, and pro-vide a multi-symplectic scheme based on this formula-tion and by applying the Preissman box scheme. Thisformulation is motivated by the multi-symplectic for-mulation of the KdV equation by Ascher-McLachlan [8].Although this multi-symplectic scheme preserves neitherthe energy nor the norm exactly, our numerical resultsshow that the deviations are very small compared to theexisting schemes by Yaguchi et al. [3].This paper is organized as follows. In Section 2 the

schemes by Yaguchi-Matsuo-Sugihara are summarizedfor readers’ convenience. In Section 3 a multi-symplecticformulation and a multi-symplectic scheme based on itare proposed. In Section 4 some numerical results areprovided. Concluding remarks and comments are givenin Section 5.We use the following notation. Numerical solutions are

denoted by Unk ≃ u(n∆t, k∆x) or Φn

k ≃ ϕ(n∆t, k∆x),where ∆x = L/N (N is the number of the spatialnodes) and ∆t is the time mesh size. We use the follow-

ing abbreviations: Un+1/2k = (Un

k + Un+1k )/2, Un

k+1/2 =

(Unk + Un

k+1)/2 and Un+1/2k+1/2 = (Un

k+1/2 + Un+1k+1/2)/2. We

also use the following difference operators: the standardforward, backward and central difference operators δ+x ,

δ−x and δ⟨1⟩x for ∂x, δ

⟨2⟩x = δ+x δ

−x for ∂x

2, δ⟨3⟩x = δ

⟨2⟩x δ

⟨1⟩x

for ∂x3, and the forward difference operator δ+t for ∂t.

2. Approach by Yaguchi-Matsuo

-Sugihara

In this section the previous finite difference approachby Yaguchi-Matsuo-Sugihara is summarized. Their en-ergy conservative scheme is based on the following

– 41 –


Hamiltonian structure:

ut = −∂xδG

δu, G(u) =

α

6u3 +

β

2u2x +

γ

2(∂x

−1u)2, (5)

and their norm conservative scheme is based on the fol-lowing form:

ut +α

3(u∂xu+ ∂xu

2)− β∂x3u = γ∂x−1u. (6)

The symbol δG/δu is the variational derivative of G withrespect to u. They assumed that the initial condition isgiven so as to satisfy

N−1∑k=0

U0k∆x = 0,

which corresponds to (2), and defined the operator δ⟨−1⟩x ,

which is the approximation of ∂−1x , by a summation op-

erator

δ⟨−1⟩x Un

k =∆x

(Un0

2+

k−1∑j=1

Unj +

Unk

2

)

− (∆x)2

L

N−1∑j=0

(Un0

2+

j−1∑l=1

Unl +

Unj

2

).

(7)

This is a natural discretization of (1).Firstly, the energy conservative scheme is summarized.

A discrete version of the energy G = αu3/6 + βu2x/2 +γ(∂x

−1u)2/2, and the “discrete variational derivative”that approximates δG/δu = αu2/2−βuxx−γ∂x−2u aredefined by

Gnk =

α

6(Un

k )3 +

β

4

[(δ+x U

nk )

2 + (δ−x Unk )

2]

+γ

2(δ⟨−1⟩

x Unk )

2,

δG

δ(Un+1,Un)k=α

6

[(Un+1

k )2 + Un+1k Un

k + (Unk )

2]

− βδ⟨2⟩x Un+ 1

2

k − γ(δ⟨−1⟩x )2U

n+ 12

k .

Then the scheme is defined as follows.

Scheme 1 (The Energy Conservative Finite Dif-ference Scheme [3])

Un+1k − Un

k

∆t= −δ⟨1⟩x

δG

δ(Un+1,Un)k.

This scheme corresponds to the Hamiltonian structure(5). Numerical solutions by Scheme 1 conserve both thetotal mass and the energy.Next, the norm conservative finite difference scheme

is summarized. This scheme is defined as follows.

Scheme 2 (The Norm Conservative Finite Differ-ence Scheme [3])

Un+1k − Un

k

∆t+α

3

[U

n+ 12

k δ⟨1⟩x Un+ 1

2

k + δ⟨1⟩x

(U

n+ 12

k

)2]− βδ⟨3⟩x U

n+ 12

k = γδ⟨−1⟩x U

n+ 12

k .

This scheme corresponds to (6). Numerical solutions byScheme 2 conserve both the total mass and the norm.

3. A multi-symplectic integrator

In this section a new multi-symplectic formulation andassociated local conservation laws are shown. A multi-symplectic discretization is also proposed by means ofthe Preissman box scheme.

3.1 Multi-symplectic partial differential equations andmulti-symplectic integrators

We start by briefly reviewing the concept of multi-symplecticity in general context. A partial differentialequation F (u, ut, ux, utx, . . . ) = 0 is said to be multi-symplectic if it can be written as a system of first orderequations:

Mzt +Kzx = ∇zS(z) (8)

with z ∈ Rd a vector of state variables, typically consist-ing of the original variable u as one of its components.Mand K are constant d× d skew-symmetric matrices, andS : Rd → R is a scalar-valued smooth function depend-ing on z. A key observation for the multi-symplectic for-mulation (8) is that this system has a multi-symplecticconservation law

∂tω + ∂xκ = 0, (9)

where ω and κ are differential two forms

ω = dz ∧Mdz, κ = dz ∧Kdz.

Another key property is the following conservationlaws. The system (8) has local energy and norm con-servation laws:

∂tE(z) + ∂xF (z) = 0, ∂tI(z) + ∂xG(z) = 0,

where E(z), F (z), I(z) and G(z) are the density func-tions defined as

E(z) = S(z)− 1

2z⊤x K

⊤z, F (z) =1

2z⊤t K

⊤z,

G(z) = S(z)− 1

2z⊤t M

⊤z, I(z) =1

2z⊤x M

⊤z.

Thus integrating the densities E(z) and I(z) over thespatial domain under the usual assumption on vanishingboundary terms for the functions F (z) and G(z), weobtain the global invariants:

E(z) =∫E(z)dx, I(z) =

∫I(z)dx.

A scheme is called to be multi-symplectic if it satisfiessome discrete version of the multi-symplectic conserva-tion law (9). As multi-symplectic schemes, the Preiss-man box scheme and the Euler box scheme are widelyknown. We adopt the Preissman box scheme in this re-port. The Preissman box scheme, introduced by Preiss-man in 1960 and then most widely used in hydraulics,was proved to be multi-symplectic by Bridges-Reich [9].It is also called the centered box scheme. It leads

Mδ+t Znk+ 1

2+Kδ+x Z

n+ 12

k = ∇zS(Z

n+ 12

k+ 12

). (10)

3.2 A multi-symplectic formulation and an integratorfor the Ostrovsky equation

In this subsection, a multi-symplectic formulationfor the Ostrovsky equation is presented. Setting z =

– 42 –


(ϕ, u, v, w)⊤, we derive a multi-symplectic formulation(8) with two skew-symmetric matrices

M =

0 −1

20 0

1

20 0 0

0 0 0 00 0 0 0

, K =

0 0 0 −10 0 −1 00 1 0 01 0 0 0

,

(11)

and with the scalar function S(z) = uw − αu3/6 +v2/2β−γϕ2/2. ∇zS(z) is given by ∇zS(z) = (−γϕ,w−αu2/2, v/β, u)⊤. This formulation is motivated by themulti-symplectic formulation for the KdV equation [8].Actually when γ = 0, this reduces to the KdV case.From (11), the density functions E and I are explicitly

given by

E(z) = S(z)− 1

2z⊤x K

⊤z

= −α6+β

2u2x −

γ

2ϕ2 + uw

− 1

2(ϕxw + uxv − vxu− wxϕ) ,

I(z) =1

2z⊤x M

⊤z =1

4(ϕxu− uxϕ).

Under the periodic (or vanishing) boundary condition,we obtain the following two global invariants

E(z) = −∫ (

α

6u3 +

β

2u2x +

γ

2ϕ2)dx,

I(z) = 1

2

∫u2dx,

by using the standard integration-by-parts formula.By substituting Zn

k = (Φnk , U

nk , V

nk ,W

nk )

⊤ into (10),we give the following multi-symplectic scheme.

Scheme 3 (A Multi-Symplectic Scheme)

Un+1k+ 1

2

− Unk+ 1

2

∆t+ αU

n+ 12

k+ 12

Un+ 1

2

k+1 − Un+ 1

2

k

∆x

− βU

n+ 12

k+2 − 3Un+ 1

2

k+1 + 3Un+ 1

2

k − Un+ 12

k−1

(∆x)3= γΦ

n+ 12

k+ 12

,

where Un+1/2k+1/2 = (Φ

n+1/2k+1 − Φ

n+1/2k )/∆x.

The detail on how this scheme is in fact “multi-symplectic,” i.e., how it realizes a discrete version of (9)is here omitted due to the restriction of the space (seeour coming complete paper [10] for detail).In Scheme 3, we have to give the initial approximate

solution for the potential ϕ. This can be generated eitherby integrating u(0, x) analytically or by the summing of(U0, . . . , UN−1)

⊤ via (7).

4. Numerical examples

In this section we compare the multi-symplecticscheme with the conservative schemes by Yaguchi et al.numerically. The aim of this section is to confirm theeffectiveness of the multi-symplectic scheme. The pa-rameters were set to α = 1, β = −0.01, γ = −1. The

−4.5

−4

−3.5

−3

−2.5

−2

−1.5

0 1 2 3 4 5

En

erg

y

time

Scheme 1Scheme 2Scheme 3

Fig. 1. Evolution of the energy for each scheme.

1.8

2

2.2

2.4

2.6

2.8

3

3.2

3.4

0 1 2 3 4 5

No

rm

time

Scheme 1Scheme 2Scheme 3

Fig. 2. Evolution of the norm for each scheme.

length of the spatial period was set to L = 2π. Theinitial condition was set to u(0, x) = sin(x), and accord-ingly the potential was set to ϕ(0, x) = − cos(x). In thissetting, Hunter reported that oscillations were observed[4], and Yaguchi et al. confirmed this [3]. We set thetime mesh size to ∆t = 0.1, and used a uniform grid,where ∆x = L/N with N = 101. Computation environ-ment is CPU Xeon (3.00GHz), 16GB memory, Linux OS.We used MATLAB (R2007b), where nonlinear equationswere solved by “fsolve” with tolerance TolFun = 10−16

and TolX = 10−16.Figs. 1 and 2 show the evolutions of the energies and

the norms. Schemes 1 and 2 (the conservative schemes)preserve one invariant, but the deviation for the otherinvariant is large. On the other hand, the deviation of thenumerical solutions by Scheme 3 (the multi-symplecticscheme) is very small.Next, let us evaluate each scheme in view of qualitative

behaviors. The numerical solutions are shown in Figs. 3–5. Fig. 6 shows the numerical solutions by Scheme 1with a sufficiently small mesh size. If we regard Fig. 6 asthe exact solution, Scheme 3 can be said to be the bestof the three schemes because the numerical solution ismuch smoother and closer to the exact solution thanSchemes 1 and 2.

– 43 –


01

23

45

6 0

1

2

3

4

5−3−2−10123

x

t

u

Fig. 3. The numerical solution obtained by Scheme 1 (the normconservative finite difference scheme by Yaguchi et al.) with N =

101 and ∆t = 0.1.

01

23

45

6 0

1

2

3

4

5−3−2−10123

x

t

u


101 and ∆t = 0.1.


We proposed the multi-symplectic scheme for the Os-trovsky equation that preserves the multi-symplecticconservation law, and confirmed that this scheme givesbetter numerical solutions compared with the conserva-tive schemes by Yaguchi et al.Although we have also considered other structure-

preserving schemes including conservative Galerkinschemes, the full detail is left to [10]. More numericalresults will be included there.

References

[1] L. A. Ostrovsky, Nonlinear internal waves in the rotating

ocean, Okeanologia, 18 (1978), 181–191.[2] R. Choudhury, R. I. Ivanov and Y. Liu, Hamiltonian formula-

tion, nonintegrability and local bifurcations for the Ostrovskyequation, Chaos Soliton. Fract., 34 (2007), 544–550.

[3] T. Yaguchi, T.Matsuo and M. Sugihara, Conservative numer-ical schemes for the Ostrovsky equation, J. Comput. Appl.Math., 234 (2010), 1036–1048.

[4] J.K.Hunter, Numerical solutions of some nonlinear dispersive

wave equations, Lectures Appl. Math., 26 (1990), 301–316.[5] G. Y. Chen and J. P. Boyd, Analytical and numerical studies

of weakly nonlocal solitary waves of the rotation-modified

Korteweg-de Vries equation, Physica D, 155 (2001), 201–222.[6] R.Grimshaw, J.M.He and L.A.Ostrovsky, Terminal damping

of a solitary wave due to radiation in rotational systems, Stud.Appl. Math., 101 (1998), 197–210.

[7] Y. Liu, D. Pelinovsky and A. Sakovich, Wave breaking in theOstrovsky-Hunter equation, SIAM J. Math. Anal., 42 (2010),

01

23

45

6 0

1

2

3

4

5−3−2−10123

x

t

u

Fig. 5. The numerical solution obtained by Scheme 3 (the multi-symplectic scheme) with N = 101 and ∆t = 0.1.

01

23

45

6 0

1

2

3

4

5−3−2−10123

x

t

u


301 and ∆t = 0.1.

1967–1985.[8] U. M. Ascher and R. I. McLachlan, Multisymplectic box

schemes and the Korteweg-de Vries equation, Appl. Numer.

Math., 48 (2004), 255–269.[9] T. J. Bridges and S. Reich, Multi-symplectic integrators: nu-

merical schemes for Hamiltonian PDEs that conserve sym-plecticity, Phys. Lett. A, 284 (2001),184–193.

[10] Y. Miyatake, T. Yaguchi and T. Matsuo, Numerical integra-tion of the Ostrovsky equation based on its geometric struc-tures, in preparation.

– 44 –


Solutions of Sakaki-Kakei equations of type 1, 2, 7 and 12

Koichi Kondo1

1 Graduate School of Engineering, Doshisha University, Tatara-Miyakodani 1-3, Kyotanabe,Kyoto 610-0394, Japan

E-mail kokondo mail.doshisha.ac.jp

Received May 6, 2011, Accepted June 16, 2011

Abstract

We consider solutions of Sakaki-Kakei equations of type 1, 2, 7 and 12, which are irreversibletwo dimensional systems. We first obtain their conserved quantities, and reduce them to onedimensional nonautonomous systems. We next show that the equations of type 2 and 7 aretransformed to arithmetic-harmonic mean equation, and obtain their general solutions. Wefinally show that the equations of type 1 and 12 are related to the solvable chaotic systemproposed by Umeno. We also show that their iteration maps have self semiconjugacy, andobtain their particular solutions which are expressed in terms of lemniscate elliptic function.

Keywords Sakaki-Kakei equation, Umeno equation, lemniscate elliptic function

Research Activity Group Applied Integrable Systems

1. Introduction

In [1], Sakaki and Kakei presented twelve types ofirreversible two dimensional dynamical systems. Eachtype of the equations has an invariant, which is ex-pressed in terms of the hypergeometric function. Here,the twelve types of the Sakaki-Kakei equations are de-noted as SK1, SK2, SK3, . . . , and SK12 in order of ap-pearance in the paper. In [2], Kondo showed that theiteration maps of SK3, SK5 and SK6 are semiconjugateto one of arithmetic-harmonic mean equation (AHM)[3], and obtained their general solutions. The aim of thispaper is to obtain solutions of SK1, SK2, SK7 and SK12.

2. Sakaki-Kakei equations

We consider the Sakaki-Kakei equations [1] of type 1,2, 7 and 12, which are given by

SK1: an+1 = an − bn, bn+1 =−4anbnan − bn

, (1)

SK2: an+1 = an + bn, bn+1 =4anbnan + bn

, (2)

SK7:

an+1 =

√an

√an +

√an − bn

2,

bn+1 =√an

√an −

√an − bn

2,

(3)

SK12:

an+1 =

(an + bn)2

an − bn,

bn+1 =16anbn(an − bn)

(an + bn)2

(4)

for n = 0, 1, 2, . . . . Here, an, bn are real variables.In (3), we should take account of the square roots of

SK7. Assume that an > 0, an − bn > 0 for a certain n.It follows from (3) that an+1 > 0 and an+1 − bn+1 =√an√an − bn > 0. By mathematical induction, we thus

obtain the following lemma.

Lemma 1 Suppose that a0 > 0, a0− b0 > 0. Then, thevariables an, bn of SK7 satisfy an > 0, an − bn > 0 forn = 0, 1, 2, . . . , and they are well-defined as real.

3. Conserved quantities

We consider conserved quantity of SK1. We derive

an+1bn+1 = (an − bn)−4anbnan − bn

= −4anbn (5)

from (1). It follows from (5) that anbn = −4an−1bn−1 =· · · = (−4)na0b0. Let c = a0b0. Then, we have anbn =(−4)nc. Thus, c = anbn/(−4)n is a conserved quantityof SK1, since c is a constant determined with a0, b0.Similarly, we can derive conserved quantities of SK2,SK7 and SK12 from (2)–(4) and Lemma 1. We obtainthe following theorem.

Theorem 2 Conserved quantities of SK1, SK2, SK7and SK12 are c = anbn/(−4)n, c = anbn/4

n, c = 4nanbnand c = anbn/16

n for n = 0, 1, 2, . . . , respectively.

The conserved quantities in Theorem 2 are differentfrom the invariants in [1], which are expressed in termsof the hypergeometric function. It is future problem tofind out the relationship between them.By Theorem 2, the variables bn of SK1, SK2, SK7 and

SK12 are expressed as bn = (−4)nc/an, bn = 4nc/an,bn = c/(4nan) and bn = 16nc/an, respectively. Substi-tuting bn in (1)–(4) by them, we obtain one dimensionalequations of an. We thus obtain the following theorem.

Theorem 3 The SK1, SK2, SK7 and SK12 are re-duced to one dimensional nonautonomous equations,

an+1 = an −(−4)ncan

, c = a0b0, (6)

an+1 = an +4nc

an, c = a0b0, (7)

– 45 –

JSIAM Letters Vol. 3 (2011) pp.45–48 Koichi Kondo

an+1 =1

2

(an +

√an2 − c

4n

), c = a0b0, (8)

an+1 =(an

2 + 16nc)2

an(an2 − 16nc), c = a0b0 (9)

for n = 0, 1, 2, . . . , respectively.

4. General solution of AHM

In order to obtain solutions of SK2 and SK7, we con-sider arithmetic-harmonic mean equation [3],

AHM: an+1 =an + bn

2, bn+1 =

2anbnan + bn

(10)

for n = 0, 1, 2, . . . , and its general solution [2].Let c = a0b0. A conserved quantity of AHM is given

by c = anbn for n = 0, 1, 2, . . . . Suppose that c = 0.Substituting bn in (10) by bn = c/an, we obtain onedimensional equation an+1 = Φc(an) for n = 0, 1, 2, . . . .Here, Φc is defined with c by

Φc(x) =1

2

(x+

c

x

). (11)

The general solution of an+1 = Φc(an) is shown in [2].We briefly review it. Let R = R ∪ ∞ and S1 = z ∈C | |z| = 1. The map Φc : R → R is conjugate (cf. [4,pp. 108–109]) to x2 : R→ R if c > 0, or x2 : S1 → S1 ifc < 0. Namely, Φc is expressed as

Φc = ϕ−1c x2 ϕc (12)

with a homeomorphic map ϕc. Here, the map ϕc and itsinverse ϕ−1

c are given with c by

ϕc(x) =x−√c

x+√c, ϕ−1

c (x) =√c1 + x

1− x. (13)

If c < 0, then we treat√c as single-valued function such

that√c = i

√−c. Here, i is the imaginary unit. Employ-

ing (12), we obtain the solution by an = (ϕ−1c x2n

ϕc)(a0) for n = 0, 1, 2, . . . . Hence, the general solutionof an+1 = Φc(an) is obtained by

an =√cλ1

2n + λ22n

λ12n − λ22

n , n = 0, 1, 2, . . . , (14)

where λ1 = a0 +√c, λ2 = a0 −

√c.

5. General solution of SK2

We consider the general solution of SK2. Let an =2nan in (7). Then, (7) yields autonomous equation

an+1 =1

2

(an +

c

an

), n = 0, 1, 2, . . . . (15)

Suppose that c = 0. Eq. (15) is expressed as an+1 =Φc(an), where Φc is defined by (11). Thus, the solution ofSK2 can be derived from (14). Recall that bn = 4nc/an.We obtain the following theorem.

Theorem 4 Let c = a0b0. Suppose that a0b0 = 0 anda0 + b0 = 0. The general solution of SK2 is

an = 2n√cλ1

2n + λ22n

λ12n − λ22

n , bn = 2n√cλ1

2n − λ22n

λ12n + λ2

2n

for n = 0, 1, 2, . . . , where λ1 = a0 +√c, λ2 = a0 −

√c.

6. General solution of SK7

We consider the general solution of SK7. Let an =an/2

n in (8). Then, (8) yields autonomous equation

an+1 = an +√a2n − c, n = 0, 1, 2, . . . . (16)

We here consider the map Φc defined by (11). Since Φc

is two-to-one map, the inverse of Φc has two branches.Suppose that c = 0. Let Uc = x | |x| >

√|c|, U ′

c =

x | |x| >√max(0, c) and Tc = x | |x| <

√|c| for c.

Let us introduce Φc, Φc with c by

Φc(x) = x+ sgn(x)√x2 − c, x ∈ U ′

c, (17)

Φc(x) = x− sgn(x)√x2 − c, x ∈ U ′

c. (18)

Here, sgn(x) denotes sgn(x) = x/|x| for x = 0. Hence,the inverses of the maps Φc : Uc → U ′

c, Φc : Tc → U ′c are

Φc : U′c → Uc, Φc : U

′c → Tc, respectively.

By Lemma 1 and an = an/2n, we have an > 0 and

sgn(an) = 1 for n = 0, 1, 2, . . . . By use of (17), (16) isexpressed as an+1 = Φc(an) for n = 0, 1, 2, . . . .Suppose that c > 0. The inverse of Φc : (

√c,∞) →

(√c,∞) is Φc : (

√c,∞)→ (

√c,∞). It follows from (13)

that ϕc : (√c,∞) → (0, 1) is homeomorphic. The map

x2 : (0, 1)→ (0, 1) is bijection. By (12), we hence obtain

Φc = ϕ−1c x

12 ϕc, (19)

and

Φc : (√c,∞)

ϕc−→ (0, 1)x

12−−→ (0, 1)

ϕ−1c−−→ (√c,∞).

Suppose that c < 0. The inverse of Φc : (√−c,∞) →

(0,∞) is Φc : (0,∞)→ (√−c,∞). Let Sm = eiθ ∈ C | −

mπ/2 < θ < 0 for m = 1, 2. Then, it follows from (13)and√c = i

√−c that ϕc(x) = eiθ, θ = −2 tan−1(

√−c/x)

for x ∈ R, and that ϕ−1c (eiθ) = −

√−c sin θ/(1 − cos θ)

for θ ∈ [0, 2π]. It turns out that ϕc : (0,∞) → S2, ϕc :(√−c,∞)→ S1 are homeomorphic. The map x2 : S1 →

S2 is bijection. By (12), we obtain (19) again, and

Φc : (0,∞)ϕc−→ S2

x12−−→ S1

ϕ−1c−−→ (√−c,∞).

Thus, we obtain the following lemma.

Lemma 5 If c > 0, then Φc : (√c,∞) → (

√c,∞)

is conjugate to x1/2 : (0, 1) → (0, 1). If c < 0, thenΦc : (0,∞)→ (

√−c,∞) is conjugate to x1/2 : S2 → S1.

The map Φc composed with itself is expressed as

Φc Φc = ϕ−1c x

12 ϕc ϕ−1

c x12 ϕc (20)

by (19). If c > 0 or c < 0, then ϕc ϕ−1c : (0, 1)→ (0, 1)

or ϕc ϕ−1c : S1 → S1 is identity map. The composite

(20) is written as

Φc Φc = ϕ−1c x

14 ϕc.

Hence, we obtain a2 = Φc(Φc(a0)) = (ϕ−1c x1/4ϕc)(a0).

Repeatedly, we obtain

an = (ϕ−1c x2

−n

ϕc)(a0), n = 0, 1, 2, . . . . (21)

Recall that an = an/2n, bn = c/(4nan). By (21), we

obtain the following theorem.

Theorem 6 Let c = a0b0. Suppose that a0 > 0, b0 = 0

– 46 –


and a0 − b0 > 0. The general solution of SK7 is

an =

√c

2nλ1

2−n

+ λ22−n

λ12−n − λ22

−n , bn =

√c

2nλ1

2−n

− λ22−n

λ12−n

+ λ22−n

for n = 0, 1, 2, . . . , where λ1 = a0 +√c, λ2 = a0 −

√c.

7. Umeno equation

In order to obtain solutions of SK12 and SK1, we con-sider Umeno equation [5], which is one dimensional solv-able chaotic system. The equation is given by

un+1 =4un(1− un)(1− lun)(1−mun)

1 +Aun2 +Bun3 + Cun4(22)

for n = 0, 1, 2, . . . , where A = −2(l+m+ lm), B = 8lm,C = l2 +m2 − 2lm− 2l2m− 2lm2 + l2m2, and l, m arereal numbers such that −∞ < m ≤ l < 1.In [5], it was proved that the Umeno equation has

a solution in terms of Weierstrass elliptic function ℘(x)with the help of its duplication formula. It is well-knownthat ℘(x) is related to Jacobi elliptic function sn(x; k)(cf. [6, p. 505]). The solution is equally rewritten as

un =k2 sn2(2nσ; k)

l −mdn2(2nσ; k), k =

√l −m1−m

, (23)

where dn(x; k) =√1− k2 sn2(x; k) and σ is a constant

determined by u0.

8. Particular solutions of SK12

We consider solutions of SK12. Let an = 4nan in (9).Then, (9) yields autonomous equation

an+1 =(a2n + c)2

4an(a2n − c), n = 0, 1, 2, . . . . (24)

Suppose that c = 0. Let us introduce Ψc with c by

Ψc(x) =(x2 + c)2

4x(x2 − c). (25)

Eq. (24) is expressed as an+1 = Ψc(an) for n = 0, 1, . . . .Suppose that c > 0. Let an = sgn(a0)

√c/un in (24).

Then, (24) becomes

un+1 =4un(1− un2)

(1 + un2)2, n = 0, 1, 2, . . . . (26)

Eq. (26) is equal to the special case of (22) with l =0, m = −1, so that we can derive a solution of (26)from (23) with l = 0, m = −1. Thus, we obtain un =k2 sn2(2nσ; k)/dn2(2nσ; k) with k = 1/

√2.

We can also directly obtain a solution of (24). We in-troduce lemniscate elliptic function sl(x) (cf. [6, p. 524]),which is defined by inverse of an integral

sl−1(x) =

∫ x

0

dt√1− t4

,

and expressed as

sl(x) =sn(√2x;√2−1)

√2 dn(

√2x;√2−1)

. (27)

By virtue of (27), we can derive a duplication formulaof sl(x) from those of sn(x; k), dn(x; k) (cf. [6, p. 496]).

Hence, we have

sl2(2x) =4 sl2(x)(1− sl4(x))

(1 + sl4(x))2. (28)

Suppose that c > 0. Let s = ±1. Substituting x =s√c/ sl2(x) in (25) and employing (28), we obtain

Ψc

(s√c

sl2(x)

)=

s√c

sl2(2x). (29)

Here, we can see that sl2 : R → [0, 1] by (27). Let σ bea constant which satisfies sl2(σ) =

√c/|a0| for a given

a0 under |a0| ≥√c. Let an = sgn(a0)

√c/ sl2(2nσ) for

n = 0, 1, 2, . . . . Substituting s = sgn(a0) and x = 2nσ in(29), we have Ψc(an) = an+1. Hence, an is a solution of(24). Let us introduce Fc(n, α), µc(α) with c > 0 by

Fc(n, α) =sgn(α)

√c

sl2(2nµc(α)), µc(α) = sl−1 c

14

|α| 12

under |α| ≥√c. Then, it holds that an = Fc(n, a0) and

Ψc(Fc(n, α)) = Fc(n+ 1, α). (30)

We thus obtain the following lemma.

Lemma 7 Suppose that c > 0 and |a0| ≥√c. Then,

the solution of an+1 = Ψc(an) is an = Fc(n, a0).

We here consider relationship between the map Ψc andthe map Φc defined by (11). Computing Φc composedwith Φ−c, we obtain

Ψc = Φc Φ−c (31)

for any c = 0. Replacing c in (31) with −c, we have

Ψ−c = Φ−c Φc. (32)

It follows from (32) that Φc Ψ−c = Φc Φ−c Φc.Employing (31), we have

Φc Ψ−c = Ψc Φc. (33)

The map Φc : R → R is continuous, onto, and at mosttwo-to-one map. The maps Ψc : R → R, Ψ−c : R → Rsatisfy (33). Thus, we obtain the following lemma.

Lemma 8 The map Ψ−c : R→ R is semiconjugate (cf.[4, p. 125]) to the map Ψc : R → R with semiconjugacyΦc for c = 0.

Suppose that c < 0. We consider a solution of (24).Let an be a solution of another equation

an+1 = Ψ−c(an), n = 0, 1, 2, . . . . (34)

for a0 such that |a0| >√−c. Note here that −c > 0.

By Lemma 7, we obtain a solution of (34) by an =F−c(n, a0). Mapping both sides of (34) by Φc, we have

Φc(an+1) = Φc(Ψ−c(an)), n = 0, 1, 2, . . . . (35)

By (33), (35) becomes

Φc(an+1) = Ψc(Φc(an)), n = 0, 1, 2, . . . . (36)

Let an = Φc(an) for n = 0, 1, 2, . . . . Then, (36) is writ-ten as an+1 = Ψc(an) for n = 0, 1, 2, . . . , which issame as (24). Hence, we obtain a solution of (24) byan = Φc(F−c(n, a0)), where a0 should satisfy |a0| >

√−c

and a0 = Φc(a0) for a given a0. Recall that (17).

– 47 –


Since the inverse of Φc : Uc → U ′c is Φc : U ′

c → Uc,we obtain a0 = Φc(a0). Thus, the solution becomesan = Φc(F−c(n, Φc(a0))). The condition |a0| >

√−c is

equivalent to a0 = 0. We obtain the following lemma.

Lemma 9 Suppose that c < 0 and a0 = 0. Then, thesolution of an+1 = Ψc(an) is an = Φc(F−c(n, Φc(a0))).

Recall that an = 4nan, bn = 16nc/an and c = a0b0.Let us introduce f1(x), f2(x) by

f1(x) = sl2(x), f2(x) =2 sl2(x)

1− sl4(x).

By Lemmas 7 and 9, we obtain the following theorem.

Theorem 10 Let c = a0b0 and s = sgn(a0). If a0b0 >0 and |a0| ≥ |b0| > 0, then the solution of SK12 is

an = s4n√c

f1(2nσ), bn = s4n

√c f1(2

nσ)

for n = 0, 1, 2, . . . , where

σ = sl−1

(b0a0

)14

. (37)

If a0b0 < 0, then the solution of SK12 is

an = s4n√−c

f2(2nσ), bn = −s4n

√−c f2(2nσ)

for n = 0, 1, 2, . . . , where

σ = sl−1

√√1− a0

b0−√−a0b0. (38)

Theorem 10 gives us particular solutions of SK12 un-der some conditions. It is future problem to obtain gen-eral solution of SK12.

9. Particular solutions of SK1

We consider solutions of SK1. Let an = 2nan in (6).Then, (6) becomes

an+1 =1

2

[an +

(−1)n+1c

an

], n = 0, 1, 2, . . . . (39)

Eq. (39) is expressed as an+1 = Φcn(an) with cn =(−1)n+1c, where Φc is defined by (11). It holds thata2n+1 = Φ−c(a2n) and a2n+2 = Φc(a2n+1) for n =0, 1, 2, . . . . Hence, we have

a2n+2 = Φc(Φ−c(a2n)), n = 0, 1, 2, . . . . (40)

Let an = a2n for n = 0, 1, 2, . . . . Recall that (31). Then,(40) becomes an+1 = Ψc(an) for n = 0, 1, 2, . . . , whosesolutions can be obtained by Lemmas 7 and 9 undersome conditions.We thus obtain solutions of (39) by a2n = an and

a2n+1 = Φ−c(a2n). Suppose that c > 0 and |a0| ≥√c.

By Lemma 7, we obtain a2n = Fc(n, a0) and a2n+1 =Φ−c(Fc(n, a0)). Suppose that c < 0 and a0 = 0. ByLemma 9, we obtain a2n = Φc(F−c(n, Φc(a0))) anda2n+1 = Φ−c(Φc(F−c(n, Φc(a0)))). Recall that (32). Wehave a2n+1 = Ψ−c(F−c(n, Φc(a0))). Since −c > 0 and|Φc(a0)| ≥

√−c, we can employ (30). We thus obtain

a2n+1 = F−c(n+ 1, Φc(a0)).

Recall that an = 2nan, bn = (−4)nc/an and c = a0b0.We obtain the following theorem.

Theorem 11 Let c = a0b0 and s = sgn(a0). If a0b0 >0 and |a0| ≥ |b0| > 0, then the solution of SK1 is

a2n = s4n√c

f1(2nσ), b2n = s4n

√c f1(2

nσ),

a2n+1 = s22n+1

√c

f2(2nσ), b2n+1 = −s22n+1

√c f2(2

nσ)

for n = 0, 1, 2, . . . , where σ is same as (37). If a0b0 < 0,then the solution of SK1 is

a2n = s4n√−c

f2(2nσ), b2n = −s4n

√−c f2(2nσ),

a2n+1 = s22n+1

√−c

f1(2n+1σ), b2n+1 = s22n+1

√−c f1(2n+1σ)

for n = 0, 1, 2, . . . , where σ is same as (38).

Theorem 11 gives us particular solutions of SK1 undersome conditions. It is future problem to obtain generalsolution of SK1.

10. Conclusion

The aim of this paper is to obtain solutions of SK1,SK2, SK7 and SK12. We first obtained their conservedquantities, and reduced them to one dimensional nonau-tonomous equations. We next showed that SK2 and SK7are transformed to AHM, and obtained their general so-lutions. We finally showed that SK12 and SK1 are re-lated to the solvable chaotic system proposed by Umeno.We also showed that the iteration maps of SK12 and SK1have self semiconjugacy. Under some conditions, we ob-tained particular solutions of SK12 and SK1 which areexpressed in terms of lemniscate elliptic function.

Acknowledgments

The author would like to thank Dr. Umeno for help-ful suggestions, and the reviewers for their careful read-ings and insightful suggestions. This research was par-tially supported by the Ministry of Education, Science,Sports and Culture, Grant-in-Aid for Young Scientists,(B) 21740086.

References

[1] T. Sakaki and S. Kakei, Difference equations with an invari-ant expressed in terms of the hypergeometric function (in

Japanese), Trans. JSIAM, 17 (2007), 455–462.[2] K. Kondo, Solutions of Sakaki-Kakei equations of type 3, 5

and 6, JSIAM Letters, 2 (2010), 73–76.[3] Y. Nakamura, Algorithms associated with arithmetic, geo-

metric and harmonic means and integrable systems, J. Com-put. Appl. Math., 131 (2001), 161–174.

[4] R. L. Devaney, A first course in chaotic dynamical sys-tems: theory and experiment, Addison-Wesley, Reading, Mas-

sachusetts, 1992.[5] K. Umeno, Method of constructing exactly solvable chaos,

Phys. Rev. E, 55 (1997), 5280–5284.[6] E. T. Whittaker and G. N. Watson, A course of modern anal-

ysis 4th ed., Cambridge Univ. Press, Cambridge, 1927.

– 48 –


Analysis of credit event impact

with self-exciting intensity model

Suguru Yamanaka1, Masaaki Sugihara1 and Hidetoshi Nakagawa2

1 Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1Hongo, Bunkyo-ku, Tokyo 113-8656, Japan

2 Graduate School of International Corporate Strategy, Hitotsubashi University, 2-1-2 Hitotsu-bashi, Chiyoda-ku, Tokyo 101-8439, Japan

E-mail yamanaka mtec-institute.co.jp


Abstract

The aim of this article is to examine self-exciting effect and/or mutually exciting effect onrating migrations. First, we examine with self-exciting/mutually exciting intensity modelswhether such effect can be observed for rating migrations in Japanese enterprises. Second, weanalyze which explanatory variable is more significant to the jump of the intensity via modelselection with Akaike information criterion (AIC).

Keywords credit rating, rating migration, self-exciting intensity


1. Introduction

In this article, we use a kind of self-exciting/mutuallyexciting process as credit event intensity model to an-alyze credit event impact against event frequency. Tobe more precise, we demonstrate that our mutually ex-citing intensity model is to some extent consistent withthe sample data, and we also attempt to explain howthe jump size of self-exciting intensity model is relatedto some explanatory variables.The self-exciting type intensity models have been re-

cently used in credit risk modeling (see [1–4]). For ex-ample, in [1], a rating migration intensity model withself-exciting/mutually exciting property is used and it isconcluded that some rating migrations can give an im-pact on the intensities of rating migration of not onlythe same category but other categories.Though [1] focused on self-exciting effect and/or mu-

tually exciting effect among sectors of industry, we ex-amine if there are self-exciting effect and/or mutuallyexciting effect between down-grade and up-grade inten-sity.Our model is advanced in the sense that jump effect is

more flexibly introduced than that of previous works. Inparticular, our model can explicitly relate the jump-sizeto other variables, which is different from the case wherethe jump-size of intensity is assumed to be constant orindependent of other variables as seen in previous works.With this model, we analyze which explanatory variableis more significant to the jump of the intensity via modelselection with Akaike information criterion (AIC).The structure of this article is as follows. Section 2

introduces our self-exciting/mutually exciting intensitymodel. Section 3 presents the sample data of rating mi-gration records of Japanese enterprises. In Section 4, wediscuss the existence of self-exciting effect and/or mutu-

ally exciting effect on rating migrations in Japan. Thenwe consider what is more significant to the jump of theintensity in Section 5. Section 6 gives some concludingremarks.

2. Intensity process

In this section, we present a self-exciting/mutually ex-citing intensity process, which was originally introducedinto credit risk literature by [1, 5]. Consider a filteredcomplete probability space (Ω, F , P, Ft), where P isthe actual probability measure. Here Ft is a right-continuous and complete filtration. Let ℓ ∈ 1, 2 denotethe types of the credit events. Particularly, we set ℓ = 1as down-grade and ℓ = 2 as up-grade. For each ℓ, con-sider marked point processes (T ℓ

n, ζℓn)n∈N. 0 < T ℓ

1 <T ℓ2 < · · · is an increasing sequence of totally inaccessibleFt-stopping times, which represents the event timesof event type ℓ. Random variable ζℓn represents a vec-tor consisted by attributes of event time T ℓ

n. We denotethe counting process of event ℓ by N ℓ

t =∑

n≥1 1T ℓn≤t.

Furthermore, we assume different types of events do notoccur at the same time.Suppose each N ℓ

t has intensity process λℓt. Namely,each λℓt is a Ft-progressively measurable non-negative

process, and the process N ℓt −

∫ t

0λℓsds is an Ft-

local martingale. We specify λℓt with the self-exciting/mutually exciting stochastic process:

dλℓt = κℓ(cℓ − λℓt)dt+ dJℓt , λℓ0 = cℓ,

Jℓt =

∑n≥1

f(ζℓn)1T ℓn≤t +

∑n≥1

g(ζℓ′

n )1T ℓ′n ≤t.

Here, the constants κℓ > 0 and cℓ > 0 are parameters tobe estimated. Function f(·) represents self-exciting jumpsize and g(·) represents mutually exciting jump size. If

– 49 –

JSIAM Letters Vol. 3 (2011) pp.49–52 Suguru Yamanaka et al.

g = 0, the mutually exciting intensity model would becalled self-exciting intensity model.In Section 4, we examine the existence of self-exciting

effect and/or mutually exciting effect, namely, whetherthe functions f and g are identically zero or not. For thepurpose, we consider two types of simple jump models.First type of jump model is as follows:

Model A : Jℓt =

∑n

δℓ11T ℓn≤t +

∑n

δℓ21T ℓ′n ≤t.

Here, the constants δℓ1 and δℓ2 are parameters to be esti-mated. Event ℓ intensity model with jump type of ModelA indicates that event occurrences of type ℓ cause self-exciting jump with size of δℓ1. In addition, event occur-rences of type ℓ′ cause mutually exciting jump with sizeof δℓ2. Model A has simple jump with constant size. Ac-cordingly, Model A is tractable on parameter estimation.However, if either δℓ1 < 0 or δℓ2 < 0, intensity with jumpModel A could be negative. This is contradictory to thefact that intensity is non-negative. From this, we con-sider not only Model A but also another type of jumpmodel as follows:

Model B : Jℓt =

∑n

min(δℓ1λT ℓn−, γ

ℓ)1T ℓn≤t

+∑n

min(δℓ2λT ℓ′n −, γ

ℓ)1T ℓ′n ≤t.

Here, constants δℓ1 > −1, δℓ2 > −1 and γℓ > 0 are pa-rameters to be estimated. The jump size of Model B areproportional to the intensity before the event and hasupper bound γℓ. Conditions on proportional constantsof δℓ1 > −1 and δℓ2 > −1 keep the intensity non-negative.In Section 5, we attempt to explain the relation be-

tween self-exciting impact and some explanatory vari-ables. For the purpose, we employ new jump type, sayaffine function of explanatory variables type, as follows:

Model C : Jℓt =

∑n

(a0 +

∑m

amxm(T ℓn))1T ℓ

n≤t.

Here, a0 is constant terms, xm are explanatory vari-ables and am are coefficients. As we focus on self-exciting effect in Section 5, Model C has only self-exciting jump.

3. Data

The data for analysis is issuer rating migration recordsof Japanese corporations from April 1, 1998 to March 31,2009. The ratings are announced by R&I. The recordof each rating migration consists of the event date, theissuer’s name, the industry it belongs to, the type ofevent, the current ratings and the last ratings. The dataincludes rating for insurance claims paying ability. Ex-cluding rating monitors from the samples, we observe965 down-grades, 481 up-grades during the period. Inthe samples, there are 25 rating categories, AAA, AA+,. . . , C− in order of credit worthiness. Hereafter, we rep-resents rating categories by 1, 2, . . . ,K for simplicity. Ex-cluding no-business days, we transformed calendar timesApril 1, 1998, April 1, 1999, . . . to t = 0, 1, . . . . In ouranalyses, we slide the event times with uniform random

number so as to make every event times different. Influ-ence of this treatment is slight, because the number ofevents in one day is big enough and they are scatteredat random.Figs. 1, 2 and 3 are distributions of “last rating”, “ab-

solute difference of last rating and current rating” and“interval of rating migrations in whole enterprises”. Fig.1 indicates most ratings before migration are between 4and 11. Fig. 2 indicates most of rating migrations arerating changes with one or two ranks. Fig. 3 indicatesthere are some cases where more than one rating mi-grations occur in the same day. Also, most of the eventintervals are narrower than a week (5 working-days).

4. Existence of self-exciting effect and

mutually exciting effect

In this section, we analyze existence of self-excitingeffect and/or mutually exciting effect in down-gradesand up-grades. In other words, we examine whether thedown-grade intensity and up-grade intensity jump or notby occurrences of down-grades and up-grades.To examine the existence of self-exciting effect and/or

mutually exciting effect, we estimate the jump parame-ters of both Models A and B. If the estimated values ofself-exciting jump size are significantly δℓ1 = 0, we con-clude that the self-exciting effect exists. Similarly, if theestimated values of mutually exciting jump size are sig-nificantly δℓ2 = 0, we conclude that the mutually excitingeffect exists. To obtain estimated jump sizes, we exe-cuted maximum-likelihood approach. The log-likelihoodfunction of the intensity of the event ℓ is the following:

N∑n=1

log λℓT ℓn−−∫ H

0

λℓsds.

Here, λℓt− := lims↑t λℓs. For likelihood maximization, we

employed statistical software R, using intrinsic functionoptim. For estimation tractability, we set search range ofγℓ as γ1 = 100, 125, 150, 175, 200 for down-grade inten-sity, and γ2 = 5.0, 7.5, 10.0, 12.5, 15.0 for up-grade inten-sity. If the absolute value of the estimated parameter islarger than twice of standard estimation error (meaningabout 95% significant level), we consider the self-excitingeffect and mutually-effect exists significantly.Table 1 shows the estimation results of Models A and

B. Estimated jump size of down-grade intensity are sig-nificantly δ11 > 0 and δ12 < 0. This indicates that thedown-grade intensity has self-exciting property and mu-tually exciting effect from up-grade. Namely, the pos-sibility of down-grade is raised by down-grades and isdrop down by up-grade. For up-grade intensity, esti-mated jump sizes are significantly δ21 > 0, indicating ex-istence of self-exciting effect. On the other hand, mutu-ally exciting jump sizes of up-grade intensity are δ22 < 0in both Models A and B, however, the jump of ModelA is not significantly δ22 < 0. This indicates that theup-grade intensity has self-exciting effect and slight mu-tually exciting effect. Namely, the possibility of up-gradeis raised by up-grades and would be drop down by down-grade.

– 50 –


2

0

4

24

68

10

12

14

16

18

20

22

6 8 10 12 14 16 18

Rating

Per

centa

ge

(%) down

up

Fig. 1. Distribution of last rating. Sample span is from April 1,1998 to March 31, 2009.

1

0

2

10

20

30

40

50

60

70

80

90

3 4 5 6 7 8

Difference of ratings

Per

cen

tag

e (%

)

downup

Fig. 2. Distribution of the absolute difference of last rating andcurrent rating. Sample span is from April 1, 1998 to March 31,2009.

0

0

2

510

15

20

25

30

35

40

45

50

4 6 8 10 12 14

Event time interval (day)

Per

cen

tag

e (%

)

downup

Fig. 3. Distribution of interval of rating migrations in whole en-terprises. Sample span is from April 1, 1998 to March 31, 2009.

Table 1. Estimation result for Models A and B. Values in paren-theses are standard estimation errors. Estimated values of γℓ

are γ1 = 125 and γ2 = 10.0.

Model κℓ cℓ δℓ1 δℓ2A 234.43 33.19 150.36 −9.20

Down-grade (42.33) (3.09) (22.64) (2.10)ℓ = 1 B 170.66 33.59 2.32 −0.38

(9.89) (2.75) (0.39) (0.09)

A 7.24 5.92 6.37 −0.042Up-grade (2.01) (2.91) (1.72) (0.095)

ℓ = 2 B 11.75 13.14 0.28 −0.021(0.54) (0.81) (0.026) (0.009)

Table 2. Estimation result on self-exciting impact. Values inparentheses are standard estimation errors.

a0 a2 a3 a4Down-grade 193.53 −11.56 34.57 −380.18

(36.96) (3.37) (14.97) (137.64)

Up-grade 9.43 −0.09 1.81 −7.38(5.35) (0.53) (2.48) (12.52)

5. Explanation of self-exciting impact

5.1 Explanation of self-exciting impact

In this section, we introduce some information ob-served at the event-time, and use them as explanatoryvariables of self-exciting impact. We employ Model C inanalyses and consider following four explanatory vari-ables:

• x1(T ℓn) ∈ 1, 2, . . . ,K: Current rating on T ℓ

n,

• x2(T ℓn) ∈ 1, 2, . . . ,K: Last rating before T ℓ

n,

• x3(T ℓn) = x1(T

ℓn)−x2(T ℓ

n): Difference of current andlast ratings,

• x4(T ℓn) = T ℓ

n−T ℓn−1: Time interval from last migra-

tion to current migration.

Excluding x1(Tℓn), we employ x2(T

ℓn), x3(T

ℓn) and x4(T

ℓn)

for explanation variables, because x1(Tℓn) overlaps with

x2(Tℓn) and x3(T

ℓn). Also, we note that x3 is just differ-

ence of current and last ratings, not absolute difference.Table 2 shows the estimation result of Model C. In Ta-

ble 2, for down-grade model, coefficients are estimatedsignificantly, implying that the self-exciting impact be-come larger in one of the following three cases:

• Last rating is high.

• Difference of ratings before-after migration is wide.

• Time interval of rating migration is narrow.

These implications, which are intuitively recognizable,are derived respectively by the significant estimation re-sults of a2 < 0, a3 > 0 and a4 < 0. On the other hand,for up-grade model, estimation error of each coefficient isnot small enough to make some significant explanations.

5.2 Selection of explanatory variables

In this subsection, we examine which explanatory vari-able is more significant to explain self-exciting impact.We compare several combinations of explanatory vari-ables for Model C and select the combination whose AICis smaller. In particular, we consider the model is bet-ter than another when the difference of AIC exceeds one(see [6]). Explanatory variables we consider are current

– 51 –


Table 3. Selection of self-exciting impact explanatory variablesfor down-grade intensity. Order of explanatory variables combi-

nations is associated with AIC.

a0 x1 x2 x3 x4Number of LL AICparameters

1 1 1 5 3865.1 −7720.21 1 1 1 6 3865.9 −7719.81 1 1 5 3862.7 −7715.3

1 1 4 3861.0 −7714.01 1 1 5 3860.6 −7711.31 1 4 3858.7 −7709.51 1 1 5 3858.9 −7707.8

1 1 4 3857.4 −7706.81 1 4 3857.4 −7706.81 3 3855.5 −7704.9

Table 4. Selection of self-exciting impact explanatory variablesfor up-grade intensity. Order of explanatory variables combina-tions is associated with AIC.

a0 x1 x2 x3 x4Number of LL AICparameters

1 3 1503.2 −3000.51 1 4 1503.5 −2999.0

1 1 4 1503.3 −2998.51 1 4 1503.3 −2998.51 1 4 1503.2 −2998.51 1 1 5 1503.5 −2997.0

1 1 1 5 1503.5 −2997.01 1 1 5 1503.3 −2996.61 1 1 5 1503.3 −2996.5

1 1 1 1 6 1503.5 −2995.0

rating x1(Tℓn), last rating x2(T

ℓn), the difference between

the last rating and the current rating x3(Tℓn) and rating

migration time interval x4(Tℓn).

Table 3 shows the selection of explanatory variableson down-grade self-exciting impacts. Table 3 indicatesfollowing observations:

• “Rating migration time interval” seems less impor-tant when “current rating” and “last rating” exist.

• “Last rating” seems more important than “currentrating”.

• “Difference of last rating and current rating” seemsless important than either “current rating” or “lastrating”.

Table 4 shows selection result of up-grade self-excitingimpact explanatory variables. Contrary to the result ofdown-grades, Table 4 indicates that the increase of num-ber of explanatory variables does not increase likelihoodeffectively, and the model with less variables tends to beselected. This means that the explanatory variables arenot so significant to explain our up-grade samples.


We examined self-exciting effect and mutually excit-ing effect on rating migrations. At first, we consideredmutually exciting intensity models of two jump types,and examined the existence of self-exciting effect and/ormutually exciting effect. The estimated jump parametersimply that both self-exciting and mutually exciting effectexist in down-grades. Also, we recognized self-exciting ef-fect and slight mutually exciting effect in up-grades. Sec-ond, we attempted to explain self-exciting impact with

some explanatory variables. We used the self-exciting in-tensity model whose jump type is affine function of someexplanatory variables. Significance of explanatory vari-ables were analyzed via model selection with AIC. Asa result, we obtained some implications on down-gradeswhich are intuitively recognizable.The explanatory variables we used are the informa-

tion about issuer ratings and rating migration intervals.Considering estimation tractability, we did not considerother explanatory variables. However, we are able to con-sider other explanatory variables, such as size of corpo-rate, in the same way. The analyses with additional ex-planatory variables would be future work. Finally, wewould like to mention that the intensity models we con-sidered are naturally applied to credit risk modeling.

Acknowledgments

This work was supported in part by Global COE Pro-gram “The research and training center for new devel-opment in mathematics”, MEXT, Japan.

References

[1] H. Nakagawa, Analysis of records of credit rating transi-tion with mutually exciting rating-change intensity model (inJapanese), Trans. JSIAM, 20 (2010), 183–202.

[2] S. Azizpour, K. Giesecke and B. Kim, Premia for Correlated

Default Risk, J. Econ. Dyn. Control, 35 (2011), 1340–1357.[3] K. Giesecke and B. Kim, Risk analysis of collateralized debt

obligations, Oper. Res., 59 (2011), 32–49.[4] S. Yamanaka, M. Sugihara and H. Nakagawa, Modeling of

Contagious Credit Events and Risk Analysis of Credit Port-folios, Asia-Pacific Financial Markets, in press.

[5] H, Nakagawa, Modeling of contagious downgrades and its ap-

plication to multi-downgrade protection, JSIAM Letters, 2(2010), 65–68.

[6] Y. Sakamoto, M. Ishiguro and G. Kitagawa, Akaike Informa-tion Criterion Statistics, D. Reidel Pub. Co., Dordrecht, 1986.

– 52 –


On the reduction attack against the algebraic surface

public-key cryptosystem(ASC04)

Satoshi Harada1, Yuichi Wada2, Shigenori Uchiyama3 and Hiro-o Tokunaga3

1 NRI SecureTechnologies, Ltd., Tokyo 105-7113, Japan2 Waseda Junior & Senior High School, Tokyo 162-8654, Japan3 Tokyo Metropolitan University, Tokyo 192-0397, Japan

E-mail s2-harada nri.co.jp, uchiyama-shigenori tmu.ac.jp


Abstract

In 2004, Akiyama and Goto proposed an algebraic surface public-key cryptosystem (ASC04)which is based on the hardness of finding sections on fibered algebraic surfaces. In 2007,Uchiyama and Tokunaga gave an efficient attack, which is called the reduction attack, againstASC04 under some condition of a public-key of the scheme. In 2008, Iwami proposed itsimproved attack. In this paper, we point out a flaw in Iwami’s attack and propose a generalizedreduction attack. The attack is based on Iwami’s attack, and the flaw is fixed. We also discussour experiments of the attack.

Keywords multivariate public-key cryptography, algebraic surface, section finding problem,Grobner basis, elimination ideal

Research Activity Group Algorithmic Number Theory and Its Applications

1. Introduction

In 1994, Shor proved that the integer factorizationproblem and the discrete logarithm problem can besolved in probabilistic polynomial time by using quan-tum computers [1]. Thus, once a quantum computer isrealized, public-key cryptosystems based on them wouldnot be secure. For this reason, cryptographic schemeswhich are expected to have resistance against quantumcomputers have been researched actively [2]. Algebraicsurface public-key cryptosystems (ASCs for short) [3,4],proposed by Akiyama and Goto, is one of the candidatesfor such schemes. ASC is based on the hardness of find-ing sections on fibered algebraic surfaces. This problemis called the section finding problem (SFP for short).SFP is the following problems. (Let k := Fp be a finiteprime field of p elements.)Let X(x, y, t) = 0 be an algebraic surface over k, the

problem is to find two polynomials ux(t), uy(t) ∈ k[t]such that X(ux(t), uy(t), t) = 0.Two of the authors, Uchiyama and Tokunaga, pro-

posed an efficient attack, which is called the reductionattack, against the ASC04 (which is the first imple-mentation of ASC proposed in 2004) in 2007 [5]. Theymake use of some fundamental properties of Grobnerbasis. The correctness of the reduction attack can beproven under a certain condition of the leading term ofa public-key X(x, y, t) with respect to a monomial orderin k[x, y, t]. Moreover, Ivanov and Voloch proposed a so-called trace attack in 2008 [6]. Then, Iwami proposed animproved reduction attack [7]. In this paper, we pointout a flaw in Iwami’s scheme, and propose a generalizedreduction attack against the ASC04. The attack is basedon Iwami’s attack, and the flaw is fixed by our proposal.The correctness of our proposed attack is proven without

any conditions. Moreover, we discuss our experiments ofthe proposed attack.

2. ASC04

In this section, we briefly review the ASC04. See [3]for the detail.

2.1 Secret-Key

Two different curves D1 and D2 parameterized with tin A3(k):

D1 : (x, y, t) = (ux(t), uy(t), t),

D2 : (x, y, t) = (vx(t), vy(t), t).

2.2 Public-Key

• Algebraic Surface X:X(x, y, t) :=

∑(i,j)∈ΛX

cij(t)xiyj = 0 (∈ k[x, y, t])

(ΛX := (i, j) ∈ (Z≥0)2 | cij(t) = 0) satisfying

X(ux(t), uy(t), t) = X(vx(t), vy(t), t) = 0.

• l: an integer satisfying the following condition.degtX(x, y, t) < l, and l is the minimum degree ofa monic irreducible polynomial f(t) ∈ k[t] given forencryption.

• d: an integer satisfying the following condition.

d ≥ maxdeg ux(t), deg uy(t), deg vx(t), deg vy(t).

2.3 Encryption

Divide a plaintext m into l blocks as m = m0||m1||· · · ||ml−1 and embed mi (0 ≤ mi < p (i = 0, . . . , l − 1))within coefficients of a plaintext polynomial m(t) ∈ k[t].Choose a monic irreducible polynomial f(t) ∈ k[t] of

degree greater than or equal to l and randomly choose

– 53 –

JSIAM Letters Vol. 3 (2011) pp.53–56 Satoshi Harada et al.

Table 1. Reduction attack.

Input: Public-Key X ∈ k[x, y, t], Ciphertext F ∈ k[x, y, t].Output: Plaintext m corresponding to ciphertext F (x, y, t).

1. Find the remainder R1 ∈ k[x, y, t] by dividing F by X.2. Randomly choose some terms of R1 with cij(t)x

iyj ((i, j) =(0, 0), cij(t) ∈ k), and let its coefficients cij(t) be C(⊂ k[t]).

3. Factorize elements of a set C, and let the set of irreduciblefactors of degree l or more be G(⊂ k[t]).

4. Choose g ∈ G, and find the remainder n ∈ k[t] by dividingR1 by g. If n ∈ k[t], we choose another g ∈ G.

5. Let n(t) = nk−1tk−1 + · · · + n1t + n0 ∈ k[t], and compute

m = n0||n1|| · · · ||nk−1.

two polynomials r(x, y, t), s(x, y, t) ∈ k[x, y, t] with someconditions about its degree. The ciphertext F (x, y, t) ∈k[x, y, t] is defined as follows:

F (x, y, t) := m(t) + f(t)s(x, y, t) +X(x, y, t)r(x, y, t).

2.4 Decryption

Substituting sectionsD1,D2 into F (x, y, t), we obtain:

h1(t) := F (ux(t), uy(t), t)= m(t) + f(t)s(ux(t), uy(t), t),

h2(t) := F (vx(t), vy(t), t) = m(t) + f(t)s(vx(t), vy(t), t).

Factorize h1(t)−h2(t) and choose f(t) as an irreduciblepolynomial with largest degree. Then, m(t) is obtainedby dividing h1(t) by f(t). Finally, we obtain the plaintextm from m(t).

3. Reduction attack

3.1 Reduction attack

In 2007, Uchiyama and Tokunaga proposed an effi-cient attack, which is called the reduction attack, againstthe ASC04 [5]. (See Table 1.) They make use of funda-mental properties of Grobner basis. For the proof of itscorrectness, the following condition is assumed:

Condition 1 For the defining equation of the alge-braic surface X, the leading term of X as LT(X) w.r.t.a monomial order in k[x, y, t] is in the form of cxαyβ

(c ∈ k, (α, β) = (0, 0)).

3.2 Iwami’s reduction attack

In 2008, Iwami generalized the reduction attack [7],and claimed Condition 1 can be dropped.We implemented the attack. However we could not

obtain the valid plaintexts. So there is a flaw in Iwami’sscheme. See [7] for the detail.

Proposition 2 In Iwami’s attack, we have n = 0 inStep 5.

Proof For ∀g ∈ G in Step 4, g(t) ∈ k[t] ⊂ k(t) ⊂k(t)[x, y]. Therefore, g(t) is a unit in k(t)[x, y]. Thus, weobtain as follows:

R1 = (1/g(t))R1g(t).

Since (1/g(t))R1 ∈ k(t)[x, y], we obtain n = r = 0 ∈ kin Step 5. Thus, we cannot obtain the valid plaintext m.

(QED)

Table 2. Generalized reduction attack.


1. Assume the public-key X ∈ k(t)[x, y], and compute Y :=X/LC(X). (Y ∈ k(t)[x, y], LC(X) ∈ k[t])

2. Find the remainder R1 ∈ k(t)[x, y] by dividing F by Y .

3. Randomly choose some terms of R1 with cij(t)xiyj ((i, j)

(0, 0), cij(t) ∈ k), changing its coefficients cij(t) to equiv-alent fractions with a common denominator, and let thenumerators be C(⊂ k[t]).

4. Factorize elements of a set C, and let the set of irreduciblefactors of degree greater than or equal to l be G(⊂ k[t]).

5. Choose g ∈ G, and compute a Grobner basis for an idea⟨g,X⟩ w.r.t. the lex order (x > y > t) in k[x, y, t]. Find the

remainder n(t) ∈ k[t] by dividing F by the basis.6. Let n(t) = nk−1t

k−1 + · · ·+ n1t+ n0 ∈ k[t], and computem = n0||n1|| · · · ||nk−1.

4. Generalized reduction attack

4.1 Generalized reduction attack

In this section, we propose a generalized reduction at-tack (GRA for short). This attack is based on Iwami’sattack, and the flaw is fixed. See Table 2.

4.2 Analysis of the generalized reduction attack

We can prove the correctness of the generalized re-duction attack without using Condition 1 based on thefollowing two theorems.

Theorem 3 In Step 4 of our attack, ∃g ∈ G s.t. g =f(t).

Proof Let I := ⟨Y ⟩ ⊂ k(t)[x, y] be an ideal gener-ated by Y . Then, Y is a Grobner basis. Since I is aprincipal ideal, ∀a ∈ I, a = GY (G ∈ k(t)[x, y]). There-fore, ∃1G1, R1 ∈ k(t)[x, y] s.t. F = G1Y + R1. This R1

is clearly equal to R1 in Step 2. Similarly, ∃1G2, R2 ∈k(t)[x, y] s.t. s(x, y, t) = G2Y +R2. Therefore, the ciphertext F (x, y, t) = m(t)+f(t)s(x, y, t)+X(x, y, t)r(x, y, t)is as follows: (Note that X = LC(X)Y )

F = m(t) + f(t)(G2Y +R2) + LC(X)Y r

= m(t) + f(t)R2 + Y (f(t)G2 + LC(X)r).

Then, each term of m(t) + f(t)R2 can not be dividedby LT(Y ). Therefore, we obtain R1 = m(t) + f(t)R2 bythe uniqueness of R1.Now, we assume R2 = R2(t) ∈ k(t). Then, we evalu-

ate the cipher polynomial F at sections D1 and D2, weobtain:

h1(t) = F (ux(t), uy(t), t) = m(t) + f(t)s(ux(t), uy(t), t),

h2(t) = F (vx(t), vy(t), t) = m(t) + f(t)s(vx(t), vy(t), t).

Since X(ux(t), uy(t), t) = LC(X)Y (ux(t), uy(t), t) = 0and LC(X) = 0, we obtain Y (ux(t), uy(t), t) = 0. There-fore, s(ux(t), uy(t), t) = G2Y (ux(t), uy(t), t) + R2 = R2.Similarly, we have s(vx(t), vy(t), t) = R2. Thus, we ob-tain:

h1(t) = m(t) + f(t)R2 = h2(t).

Therefore, we cannot decrypt because of h1(t) = h2(t),and this is a contradiction.

– 54 –


Thus, ∃xiyjtk ((i, j) = 0, k ≥ 0) in the numerator ofR2 and R1(= m(t)+f(t)R2). Then, we randomly choosesome terms of R1 satisfying Step 3, and change them toequivalent fractions with a common denominator. Letthe numerators be a set C. Since any element of C canbe divided by f(t), we obtain f(t) ∈ G.

(QED)

Note: In what follows, we use f instead of g since wecan obtain f(t) = g (∈ G) by Theorem 3.

Theorem 4 n(t) in Step 5 is the plaintext polynomialm(t).

Proof Let an ideal I be I := ⟨X, f⟩, and let a Grobnerbasis for I be GB(I) := f1, . . . , fs. Moreover, theGrobner basis for I ∩ k[t] is equal to GB(I)∩ k[t] by theelimination ideals. Then, we gather fi ∈ GB(I) ∩ k[t]from GB(I), then change the indices of fi in ascendingorder of degree. We obtain GB(I)∩k[t] = fi1 , . . . , fil.Since we can regard GB(I)∩k[t] as the reduced Grobnerbasis, we have:

GB(I) ∩ k[t] = fi1.

Now, we will prove fi1(t) = f(t). First, we shall provethat fi1(t) is divisible by f(t). ∃a(x, y, t), b(x, y, t) ∈k[x, y, t] s.t. fi1(t) = a(x, y, t)X(x, y, t) + b(x, y, t)f(t)since fi1 ∈ I ⊂ k[x, y, t]. Then, substitute the secret-key(x, y, t) = (ux(t), uy(t), t) into fi1 , and we have: (Notethat X(ux(t), uy(t), t) = 0)

fi1(t) = b(ux(t), uy(t), t)f(t).

We assume b(t) := b(ux(t), uy(t), t), and we obtain:

fi1(t) = b(t)f(t) (b(t) ∈ k[t]).

Secondly, we shall prove that f(t) is divisible by fi1(t).Since f ∈ I ∩k[t], fi1 is a Grobner basis. Then, we have:

f(t) = c(t)fi1(t) (c(t) ∈ k[t]).

Therefore, we have:

f(t) = c(t)fi1(t) = c(t)b(t)f(t).

Since c(t)b(t) = 1 and b, c ∈ k, we obtain GB(I)∩k[t] =f(t).Thus, we obtain GB(I) = f(t), f2, . . . , fs s.t. fi =

xαyβtγ (2 ≤ i ≤ s, (α, β) = 0, γ ≥ 0). Since we computea Grobner basis for an ideal I w.r.t. the lex order (x >y > t) in k[x, y, t], we have:

LT(f) ∈ k[t], LT(fi) ∈ k[t] (2 ≤ i ≤ s).

Then, we shall consider about dividing the cipher textF = m(t) + sf + Xr by GB(I). Any terms of them(t) ∈ k[t] can not be divided by LT(fi) (2 ≤ i ≤ s).Furthermore, any terms of the m(t) ∈ k[t] can notbe divided by LT(f) because of degm(t) = l − 1 anddeg f(t) = l. Since sf +Xr ∈ I, sf +Xr is divisible byGB(I).Thus, by the uniqueness of the reminder of dividing

by Grobner basis, the reminder of dividing the ciphertext F by GB(I) makes m(t).

(QED)

Table 3. Improved generalized reduction attack.


1. Assume the public-key X ∈ k(t)[x, y], and compute Y := X/LC(X). (Y ∈ k(t)[x, y],LC(X) ∈ k[t])

2. Find the remainder R1 ∈ k(t)[x, y] by dividing F by Y .

3. Randomly choose some terms of R1 with cij(t)xiyj ((i, j) =

(0, 0), cij(t) ∈ k), changing its coefficients cij(t) to equivalentfractions with a common denominator, and let the numera-tors be C(⊂ k[t]).

4. Factorize elements of a set C, and let the set of irreduciblefactors of degree greater than or equal to l be G(⊂ k[t]).

5. Choose g ∈ G, and compute a normal form n of F by g,Xw.r.t. the lex order (x > y > t) in k[x, y, t]. If the remainder

n is a univariate polynomial n(t) ∈ k[t], go to Step 7.Otherwise go to Step 6.

6. Choose g ∈ G, and compute a Grobner basis for an ideal ⟨g,X⟩ w.r.t. the lex order (x > y > t) in k[x, y, t]. Find the re-mainder n(t) ∈ k[t] by dividing F by the basis.

7. Let n(t) = nk−1tk−1 + · · · + n1t + n0 ∈ k[t], and compute

m = n0||n1|| · · · ||nk−1.

By Theorems 3 and 4, we can prove that this algo-rithm is effective for ASC04.

5. Efficiency of the generalized reduction

attack

When we implement the GRA, since it takes manytimes to compute a Grobner basis, the GRA is not soefficient in many cases. From a practical point of view,we need to reduce its running time. Here we proposesome improved methods for the GRA by adding somestep just before Step 5 in the Table 3. We call this attackIGRA for short. See Table 3 for the detail.If n(t) ∈ k[t] in Step 5, then the n(t) is a plaintext

polynomial m(t). We have the following theorem.

Theorem 5 If a normal form of F by f,X w.r.t.lex order is a univariate polynomial n(t) ∈ k[t] in Step5 of Table 3, n(t) is the plaintext polynomial m(t) forASC04.

Proof Let an ideal I be I := ⟨X, f⟩, and let a Grobnerbasis for I be GB(I) := f1, . . . , fs. As shown at theproof of Theorem 4,

GB(I) ∩ k[t] = f(t).

Therefore, we can assume f1 = f(t) and LT(fi) =xαiyβitγi (2 ≤ i ≤ s, (αi, βi, γi) ∈ Z3

≥0, (αi, βi) ∈ (0, 0)).By Theorem 4, since we can obtain the valid plaintextpolynomial m(t) by dividing the ciphertext F (x, y, t) byGB(I), we have:

F (x, y, t) = m(t) + f1g1 + f2g2 + · · ·+ fsgs

(LT(fi) |/m(t), gi ∈ k[x, y, t], 1 ≤ ∀i ≤ s).

Moreover, we can obtain a univariate polynomial n(t)by a normal form of F by f,X, and we have:

F (x, y, t) = n(t) + fh1 +Xh2 (h1, h2 ∈ k[x, y, t]).

Therefore, we compute difference of the both members,and we obtain: (Note that f1 = f(t))

n(t)−m(t) = f1(g1 − h1) + f2g2 + · · ·+ frgr −Xh2.

– 55 –


Table 4. IGRA.

p d l avg. [s] memory [MB]

17 20 160 0.152 11.7817 50 400 0.572 15.02

Table 5. p = 17, d = 5, l = 50.

GRA IGRA

time [s] 521.350 0.010

Since f1(= f), f2, . . . , fs, X ∈ I, we obtain n(t)−m(t) ∈I ∩ k[t] (= ⟨f(t)⟩). Moreover, since degm(t) < l ≤deg f(t) and deg n(t) < deg f(t), we obtain f(t) | (n(t)−m(t)). Therefore, we obtain:

n(t)−m(t) = 0⇐⇒ n(t) = m(t).

(QED)

By Theorem 5, we do not need to compute a Grobnerbasis if n(t) ∈ k[t], and we can find a plaintext m effi-ciently.

6. Implementation

In this section, we will show some experimental re-sults about the GRA (Table 2) and the IGRA (Table 3).We used a system of Solaris10 with 2GHz CPU (AMDOpteron246), 4GB memory, and 160GB hard disk. More-over, we used Magma [8](Ver. 2.16-4) as a software forwriting the program.

(a) IGRA We describe the experimental resultsabout the IGRA. For each (p, d, l), we generate 100sets (X, f, s, r,m) randomly. See Table 4 for the re-sults. We could efficiently find the valid plaintext mfor larger size parameters. The above results of theIGRA could compute m(t) ∈ k[t] at Step 5. Thus,we do not need to compute a Grobner basis at Step6. Here we note that, there exist some cases we needStep 6.

(b) GRA v.s. IGRA We compared with the GRAand the IGRA. As stated in the previous section,it takes many times to compute a Grobner basisgenerally in the GRA. Actually, there exist someparameters, which take more than several hours tocompute a Grobner basis in the GRA. Now, we showsome experimental results. See Table 5 for the re-sults. In Table 5, for IGRA, the average runningtime is shown. Here, we generate randomly 100 sets(X, f, s, r,m) for p = 17, d = 5, l = 60. On the otherhand, for GRA, the fastest running time in the ex-periments is only shown since its running time wastoo long, and we had to terminate the program be-fore finished in most cases.

7. Conclusion

We proposed a generalized reduction attack againstASC04, and the flaw in Iwami’s attack was fixed byour proposal. Also we showed some experimental resultsabout our proposed attack. One of our future works isto evaluate the computational complexity of the general-ized reduction attack according to [9] which is an attack

against ASC09, where the ASC09 is an another imple-mentation of the ASC [4,10].

Acknowledgments

The authors would like to thank the reviewers for theirvaluable comments. This work was supported in part byGrant-in-Aid for Scientific Research (C)(20540125).

References

[1] P. W. Shor, Polynomial-time algorithms for prime factoriza-tion and discrete logarithms on a quantum computer, SIAMJ. Comput., 26 (1997), 1484–1509.

[2] T. Okamoto, K. Tanaka and S. Uchiyama, Quantum PublicKey Cryptosystems, in: Proc. of Crypto2000, LNCS 1880, pp.147–165, Springer, 2000.

[3] K. Akiyama and Y. Goto, A Public-Key Cryptosystem using

Algebraic Surfaces, in: Proc. of PQCrypto2006, pp. 119–138,2006.

[4] K. Akiyama, Y. Goto and H. Miyake, An Algebraic Surface

Cryptosystem, in: Proc. of PKC2009, LNCS 5443, pp. 425–442, Springer, 2009.

[5] S. Uchiyama and H. Tokunaga, On the Security of the Al-gebraic Surface Public-Key Cryptosystems (in Japanese), in:

Proc. of SCIS2007, 2C1-2, 2007.[6] P. Ivanov and J.F.Voloch, Breaking the Akiyama-Goto Cryp-

tosystem, in: Proc. of AGCT11, Contemporary Math. 487, pp.113–118, 2009.

[7] M. Iwami, A Reduction Attack on Algebraic Surface Public-Key Cryptosystems, in: Proc. of ASCM2007, LNCS 5081, pp.323–332, Springer, 2008.

[8] Magma, http://magma.maths.usyd.edu.au/magma/.

[9] J-C. Faugere and P-J Spaenlehauer, Algebraic Cryptanalysisof the PKC’2009 Algebraic Surface Cryptosystem, in: Proc.of PKC2010, LNCS 6056, pp. 35–52, Springer, 2010.

[10] K. Akiyama and Y. Goto, An improvement of the algebraic

surface public-key cryptosystem, in: Proc. of SCIS2008, 1F1-2, 2008.

– 56 –


Deterministic volatility models

and dynamics of option returns

Takahiro Yamamoto1 and Koichi Miyazaki1

1 Graduate School of Informatics and Engineering, The University of Electro-Communications,1-5-1, Chohugaoka, Chohu-shi, Tokyo 182-8585, Japan

E-mail y1030103 edu.cc.uec.ac.jp

Received June 28, 2011, Accepted July 12, 2011

Abstract

In this research, we revamp the approach of Buraschi and Jackwerth (2001), especially, inthe derivation of the pricing-kernel and the data-handling technique and then empiricallyanalyze the consistency of the DVMs introduced in Mawaribuchi, Miyazaki and Okamoto(2009) to the dynamics of the cross-sectional option returns. The implication attained fromour quantitative analyses is that even in the trending and volatile market, we could build theequity models that are rational to the dynamics of the cross-sectional option market priceswithin the framework of the complete model without incorporating the additional stochasticvariable such as jump or stochastic volatility.

Keywords pricing kernel, deterministic volatility model, nikkei225 option


1. Introduction

One extension of the famous BS equity model (ge-ometric Brownian motion) is deterministic volatilitymodel (for short, DVM), whose volatility is deterministicfunctional form of equity price (Dupire (1994) [1] amongothers). Mawaribuchi, Miyazaki and Okamoto (2009) [2]calibrates the newly introduced 5-parameter DVM tocross-sectional options market prices on an evaluationdate and reports that the model prices of options de-rived from their 5-parameter DVM are quite close totheir corresponding market prices on the date. The pur-pose of this study is to discuss whether the model pricesof options derived from the DVMs are close to their cor-responding market prices time-series-wise, in short, theDVMs could capture the dynamics of the market pricesof options.The preceding research Buraschi and Jackwerth

(2001) [3] statistically examines the consistency of thepricing-kernel induced from the DVM to the time-seriesof returns of the S&P 500 options (ATM, OTM, ITM) byGMM (Generalized Method of Moments) technique. Werevamp their approach, especially, in the derivation ofthe pricing-kernel and the data handling technique andthen empirically analyze the consistency of the DVMs in-troduced in Mawaribuchi, Miyazaki and Okamoto (2009)[2] to the dynamics of the cross-sectional option returns.The organization of this letter is as follows. In Section

2, we provide the statistical method to evaluate the con-sistency of the DVM pricing-kernel to the dynamics ofoption returns. In Section 3, we report the results of ourempirical analyses on the NIKKEI225 options marketand provide the implications from the results. In Sec-tion 4, summary and the concluding remarks are added.

2. Quantitative methods

2.1 Framework of the quantitative analyses based onthe pricing-kernel

The purpose of this research is to discuss whetherthe DVMs is able to capture the dynamics of the cross-sectional option market prices. To the end, we attemptto examine statistically whether the option returns fromthe DVMs is consistent to the realized returns of the op-tions (ATM, OTM, ITM) time-series-wise. In the deriva-tion of the realized returns from the ITM, the ATM andthe OTM option market prices, we regard these optionsas the individual assets and compute the realized returnsof the assets under the empirical measure. To make allthe analyses proceed under the empirical measure, weintroduce the pricing-kernel induced from the DVM andstatistically examine whether the market prices of theoptions (ATM, OTM, ITM) multiplied by the pricing-kernel are all close to 1 time-series-wise by GMM tech-nique.The pricing-kernel mt.t+∆t (the suffix indicates the

time interval from time t + ∆t to time t) satisfies (1)and the asset price St at time t is able to be evaluatedby taking expectation of the multiplication of the assetprice St+∆t and the pricing-kernelmt.t+∆t at time t+∆tunder the empirical measure Et conditioned on St.

St = Et[mt,t+∆tSt,t+∆t]. (1)

Transforming (1) by St+∆t/St = RSt+∆t, we attain (2)

for the pricing-kernel and the options gross returns.

1 = Et[mt,t+∆tRSt,t+∆t], (2)

where RSt+∆tis the gross return from time t to t+∆t of

the asset S.As the assets to be examined, we adopt four kinds of

– 57 –

JSIAM Letters Vol. 3 (2011) pp.57–60 Takahiro Yamamoto et al.

assets such as the NIKKEI225 index, the ATM option,the OTM option and the ITM option. Denoting the vec-tor consisting of the gross returns of the four assets byRt,t+∆t = [RS

t+∆t, RATMt+∆t , R

OTMt+∆t , R

ITMt+∆t]

′ (for example,RATM

t+∆t indicates the gross return from time t to t+∆t ofthe ATM option), we statistically examine whether allof the components in the expectation of the gross returnvector multiplied by the DVM pricing-kernel are closeto 1 (convergence in (3)) using GMM technique.

ht = 1− Et[mt.t+∆tRt,t+∆t]→ 0. (3)

2.2 Construction of the pricing-kernelAssuming that the equity process follows the DVM in

(4) and the risk-free interest rate r is not equal to 0,we introduce the pricing-kernel mt,t+∆t that is able todiscount both of the bond and the equity returns.

dSt = µStdt+ σ(St, t)StdWt,

St = S0 exp

[(µ− σ(St, t)

2

2

)t+ σ(St, t)Wt

], (4)

whereWt is Winner process and σ(St, t) is volatility. Thepricing-kernel should satisfy (5).[

St

Bt

]= Et

[mt,t+∆t

[St+∆t

Bt+∆t

]]. (5)

Eq. (5) is the extension of the pricing-kernel (derivedassuming that the risk-free interest rate is equal to 0)in the preceding research that could discount only theequity return.The stochastic process ξt in (6) satifies ξ0 = 1, ξT > 0,

ξt = Et[ξt+∆t] and the mt,t+∆t = ξt+∆t/ξt also satisfies(5) and is found to be the pricing-kernel.

ξt = e−rt exp

[− (µ− r)2

2σ(St, t)2t− µ− r

σ(St, t)Wt

]. (6)

Replacing the small time interval with unit time interval1 and taking logarithms of the pricing-kernel, we get (7).

lnmt,t+1 = −r − (µ− r)2

2σ(St, t)2− µ− rσ(St, t)

(Wt+1 −Wt). (7)

Removing the Wiener process in (7) by way of (4), wecould derive (8).

lnmt,t+1 = −r − (µ− r)2

2σ(St, t)2+

µ− rσ(St, t)2

(µ− σ(St, t)

2

2

)− µ− rσ(St, t)

2 ln

(St+1

St

). (8)

Setting the risk-free interest rate to be 0% in (8), thepricing-kernel is reduced to (9).

lnmt,t+1 =µ(µ− σ(St, t)

2)

2σ(St, t)2− µ

σ(St, t)2ln

(St+1

St

). (9)

Due to the market environment such that the Japaneserisk-free interest rate is equal to around 0 in mostof the period, we adopt (9) as the DVM pricing-kernel. Regarding the specific functional form of thevolatility σ(St, t), we examine the 2-parameter DVM,the 3-parameter DVM and the 5-parameter DVM inMawaribuchi, Miyazaki and Okamoto (2009) [2] and listthem in Table 1.

Table 1. Three kinds of the DVMs.

2P-DVM σ(St, t) = aSbt

3P-DVM σ(St, t) = a+ b

[1− tanh

(cSt − S0

S0

)]5P-DVM σ(St, t) = a+ b

[1− tanh

(cSt − S0

S0

)]+ d

[1− sech

(eSt − S0

S0

)]

2.3 Setting, data-handling and statistical method

2.3.1 Setting

In this quantitative analysis, the most important thingis to decide the period of the analysis appropriately.When we test the hypothesis that the pricing-kernelcomposed of the equity return is able to discount theequity option returns properly, we should take the ma-turities of the options into consideration. The underlyingequity does not have its maturity, whereas the equity op-tion has its own maturity and thus we should distinguishone option from the other if the maturities of the optionsare different from each other. Investors price the equityoption incorporating the forecast of the underlying eq-uity dynamics and the risk premium up to the maturityof the option and thus the dynamics of the option re-turn that could be related to that of the underlying eq-uity is only up to the maturity of the option. When theparameters of the equity model are estimated from thedata that does not fall on the period from the issuingdate of the option to its maturity, we could not identifywhether the equity model to derive the pricing-kernel isnot appropriate or the data period is not appropriate inthe rejection of the hypothesis testing. Therefore, eachquantitative analysis should be attempted for each op-tion in the period from the issuing date of the option toits maturity.

2.3.2 Data handling technique

We mention the way to measure the option return.For the options data, we adopt three kinds of the three-month call options such as the ATM (the strike priceis equal to the current equity price), the OTM500 (thestrike price is 500 yen higher than the current equityprice) and the ITM500 (the strike price is 500 yen lowerthan the current equity price). The strike prices of thelisted options are fixed by 500 yen interval and thusabove options actually do not exist except for the casethat the current equity price exactly falls on a multi-ple of 500 yen. Thus, we have to infer the prices of theabove options from the market prices of the listed op-tions. We adopt the approach to interpolate the impliedvolatility (hereafter, we call it IV) using spline-function.We select six kinds of options close to the current equityprice. The three options are put options whose strikeprices are 500 yen, 1000 yen and 1500 yen lower thanthe current equity price and other three options are calloptions whose strike prices are 500 yen, 1000 yen and1500 yen higher than the current equity price. We com-pute the IVs of the six options by inverting the optionsmarket prices by way of BS model and spline-interpolatethe six IVs and pick up the IVs corresponding to the

– 58 –


strike prices of ATM, OTM500, ITM500 options. Then,we compute the prices of ATM, OTM500, ITM500 op-tions by putting the spline-estimated IVs for the threeinto the BS model. Once we attain the daily prices ofATM, OTM500, ITM500 options, it is easy to computethe daily returns of the three options.

2.3.3 Statistical method (Generalized Method of Mo-ments; GMM)

We statistically test the hypothesis that all the fourcomponents in the expectation of the gross return vec-tor Rt,t+∆t = [RS

t+∆t, RATMt+∆t , R

OTMt+∆t , R

ITMt+∆t]

′ multipliedby the pricing-kernel mt,t+∆t are all close to 1 in (3) us-ing GMM technique. As the moment conditions of theGMM, we adopt (10) and (11).

g(θ) =1

N

N∑t=1

ht, (10)

ht =

1−mt,t+∆tSt+∆t

St

1−mt,t+∆tATMt+∆t

ATMt

1−mt,t+∆tOTMt+∆t

OTMt

1−mt,t+∆tITMt+∆t

ITMt

St

(1−mt,t+∆t

St+∆t

St

)St

(1−mt,t+∆t

ATMt+∆t

ATMt

)

. (11)

Using the moment condition, we construct JN (θ) in (12)

and minimize it to estimate the parameter set θ of theDVM.

JN (θ) = g(θ)′WN′g(θ), (12)

where N is the number of the data, WN is the variance-covariance matrix of the moment conditions, θ is the pa-rameter set µ, parameters in volatility function σ ofthe pricing-kernelmt,t+∆t. The maturities of the optionsin our analysis are three months and they have 60 busi-ness dates from the issuing date to the maturity. We usethe daily option returns up to the 5 business date beforethe maturity to stay away from the relatively large noiseincluded in the very short period option prices. Thus, foreach quantitative analysis corresponding to each optioncontract month, the number of the data N is 55. Westatistically test the hypothesis by GMM using the factthat the test statistics JN (θ) with the estimated param-

eter θ follows the Chi-square distribution χ2(n) with ndegrees of freedom (refer to Newey and West (1987) [4]for more detail).

dN = N [JN (θ)] ∼ χ2(n), (13)

H0 : JN (θ) = 0. (14)

The rejection of the hypothesis testing implies that theequity model to derive the pricing-kernel is not consis-tent to the dynamics of the cross-sectional option marketprices.

3. Quantitative analyses

3.1 Data and the equity model

The data in this analyses are daily option prices ofthe remaining maturities from 60 to 5 business days for

20000

16000

12000

8000

4000

0

03/06 04/06 05/06 06/06 07/06 08/06 09/06 10/06

nk225

Fig. 1. Dynamics of the NIKKEI225 index.

the March, June, September and December contracts ofthe NIKKEI225 options (ATM, OTM500 and ITM500)in the period from June 2003 contract to December 2010contract and the daily NIKKEI225 index data corre-sponding to the options data period. We set the risk-freeinterest rate to be equal to 0% due to the fact that themost of the period of the analyses is under the BOJ’szero interest rate policy. We test the four equity modelssuch as the 2-parameter model, the 3-parameter modeland the 5-parameter model of Mawaribuchi, Miyazakiand Okamoto (2009) [2] as listed in Table 1 in additionto the BS model.

3.2 Results and their implications

First of all, from Fig. 1, we review the dynamics of theNIKKEI225 index in the period of the analyses (fromJune 2003 to December 2010). There are two notableperiods. One is the period from the beginning of 2005 tothe mid 2006 when the NIKKEI225 index surges due tothe recovery of the economy (the period is called period(i)) and the other is the period from the end of 2007 tothe beginning of 2009 when the NIIKEI225 index divesdue to the global recession originated from the U.S. sub-prime loan problem (we call the period as period (ii)).Except for the two periods, the NIKKEI225 index movesalmost in a range.The results of testing the hypothesis that the pricing-

kernels induced by the DVMs are rational to the dy-namics of the cross-sectional option market prices areprovided in Table 2. In Table 2, ** and * indicate 0.5%and 1% significance levels, respectively. From Table 2,we see that the BS pricing-kernel is rejected for 19 con-tract months (with 0.5% significance for 10 contractmonths and with 1% significance for 9 contract months)out of total 31 contract months and the 2-parameterDVM pricing-kernel is rejected for 15 contract months(with 0.5% significance for 3 contract months and with1% significance for 12 contract months) out of total 31contract months. The result suggests that the exten-sion of the BS model to the 2-parameter DVM improvesthe rationality of the pricing-kernel to the dynamics ofthe cross-sectional option market prices, however, theeffect is quite limited. On the contrary, regarding the 3-parameter DVM and the 5-parameter DVM, except foronly the December 2008 contract just after the corrup-tion of the Leaman-Brothers security, the rationality ofthe pricing-kernel derived from the two models to thedynamics of the cross-sectional option market prices isnot rejected even with 1% significance level.

– 59 –


Table 2. GMM test-statistics and the results of hypothesis tests.

SQ BS 2P 3P 5P

2003/9 17.340 ∗ 15.338 ∗ 5.897 4.9002003/12 16.762 ∗ 15.590 ∗ 4.424 4.1422004/3 16.859 ∗ 12.336 6.165 5.5302004/6 13.734 12.332 10.630 8.9062004/9 15.108 12.789 6.899 4.575

2004/12 19.711 ∗∗ 14.983 ∗ 9.206 9.2002005/3 19.422 ∗∗ 14.933 ∗ 7.197 6.4882005/6 17.328 ∗∗ 15.332 ∗ 3.349 3.3062005/9 16.152 12.337 5.516 4.998

2005/12 18.973 ∗∗ 16.337 ∗∗ 7.926 6.6212006/3 18.405 ∗∗ 15.007 ∗ 9.836 8.4152006/6 14.961 8.833 5.984 4.9642006/9 13.452 10.453 3.083 2.919

2006/12 18.323 ∗∗ 15.332 ∗ 7.939 7.9202007/3 14.635 10.222 3.661 3.6622007/6 16.967 ∗ 15.040 ∗ 8.949 7.9712007/9 12.231 10.181 7.237 7.231

2007/12 16.948 ∗ 11.472 6.571 4.5752008/3 19.081 ∗∗ 16.916 ∗∗ 9.947 8.9842008/6 15.256 13.103 6.527 4.5262008/9 16.933 ∗ 11.237 9.936 5.365

2008/12 45.787 ∗∗ 35.531 ∗ 25.480 ∗∗ 22.468 ∗∗2009/3 17.575 ∗ 15.402 ∗ 8.502 7.3552009/6 16.819 ∗ 14.919 ∗ 7.931 6.7492009/9 15.001 9.004 4.238 3.481

2009/12 17.035 ∗ 11.022 6.447 5.4482010/3 19.871 ∗∗ 14.978 ∗ 10.274 10.189 ∗2010/6 11.623 8.631 7.465 3.5172010/9 18.557 ∗∗ 14.943 ∗ 9.305 8.822

2010/12 14.862 10.845 2.599 2.2032011/3 17.017 11.434 6.938 5.019

∗∗, ∗ indicate 0.5% and 1% significance levels, respectively.

More closely examining the relation between the dy-namics of the NIIKEI225 index and the result of thehypothesis testing, we find that the pricing-kernels in-duced from the BS model and the 2-parameter DVMare not so rejected in the periods when the NIKEI225index moves mostly in a range. They, however, are quiteoften rejected in the periods when the NIKKEI225 istrending and volatile such as the period (i) and the pe-riod (ii). Contrary to the result, the pricing-kernels in-duced from the 3-parameter DVM and the 5-parameterDVM are seldom rejected even in the periods when theNIKKEI225 index is upward or downward trending. Toinvestigate the background reason of the result, we pro-vide the dynamics of the pricing-kernel induced fromeach equity model for the period of the June 2004 con-tract (the range market) and the period of the June 2005contract (the upward trending market) in Figs. 2 and 3,respectively. Fig. 2 indicates that the dynamics of thepricing-kernel for each model is not so different from eachother in the range market. On the other hands, in the up-ward trending market, Fig. 3 suggests that the dynamicsof the pricing-kernel quite differs model by model. Dueto the strong restriction of the model, the dynamics ofthe pricing-kernels induced from the BS model and the2-parameter DVM are not flexible enough to capture thedynamics of the cross-sectional option market prices.

4. Summary and concluding remarks

We improved the preceding approach, especially, inthe derivation of the pricing-kernel and the data han-dling technique and then empirically examined theconsistency of the DVMs introduced in Mawaribuchi,Miyazaki and Okamoto (2009) [2] to the dynamics ofthe cross-sectional option returns. From the results, wefound that, not to mention the BS model, even the 2-

BSM 2PDVM 3PDVM 5PDVM1.003

1.002

1.001

0.999

0.998

1

0.997

2004/03/22 2004/04/21 2004/05/21

Pri

cing K

ernel

Fig. 2. Dynamics of the pricing-kernel for June 2004 contract.

BSM 2PDVM 3PDVM 5PDVM1.003

1.002

1.001

0.999

0.998

1

0.997

2005/03/18 2005/04/17 2005/05/17

Pri

cin

g K

ern

elFig. 3. Dynamics of the pricing-kernel for June 2005 contract.

parameter DVM, whose volatility could depend on theequity price, is not consistent to the dynamics of thecross-sectional option market prices due to the strongrestriction of the functional form of the volatility. Onthe contrary, regarding the 3-parameter and 5-parameterDVMs that incorporate tanh(x) in the functional formof the volatility, the consistencies of the two modelsto the dynamics of the cross-sectional option marketprices are not rejected in most of the testing periods.The implication attained from our quantitative analy-ses is that even in the trending and volatile market, wecould build the equity models such as the 3-parameterand 5-parameter DVMs in Mawaribuchi, Miyazaki andOkamoto (2009) [2] that are rational to the dynamicsof the cross-sectional option market prices within theframework of the complete model without incorporat-ing the additional stochastic variable such as jump orstochastic volatility.

Acknowledgments

This work was supported by JSPS KAKENHI(22510143). We sincerely thank for the reviewer for con-structive comments.

References

[1] B. Dupire, Pricing with a smile, Risk, 7 (1994), 18–20.[2] J. Mawaribuchi, K. Miyazaki and M. Okamoto, 5-parameter

local volatility models: fitting to option market prices andforecasting ability (in Japanese), IPSJ Trans. Math. Model.Appl., 49 (2009), 58–69.

[3] A. Buraschi and J. Jackwerth, The price of a smile: hedg-

ing and spanning in option markets, Rev. Financ. Stud., 14(2001), 495–527.

[4] W. K. Newey and K. D.West, A simple, positive-semidefinite,heteroskedasticity and autocorrelation consistent covariance

matrix, Econometrica, 55 (1987), 703–708.

– 60 –


Stochastic estimation method of eigenvalue density for

nonlinear eigenvalue problem on the complex plane

Yasuyuki Maeda1, Yasunori Futamura1 and Tetsuya Sakurai1,2

1 Department of Computer Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba 305-8573,Japan

2 CREST, Japan Science and Technology Agency, 4-1-8 Hon-machi, Kawaguchi, Saitama 332-0012, Japan

E-mail maeda mma.cs.tsukuba.ac.jp

Received June 5, 2011, Accepted August 31, 2011

Abstract

The performance of some nonlinear eigenvalue problem solvers can be increased by settingparameters that are based on rough estimates of the desired eigenvalues. In the present paper,we propose a stochastic method for estimating the eigenvalue density for nonlinear eigenvalueproblems of analytic matrix functions. The proposed method uses unbiased estimation of thematrix traces and contour integrations. Its performance is evaluated through the numericalexperiments.

Keywords nonlinear eigenvalue problem, analytic matrix function, trace estimation


1. Introduction

We herein consider a nonlinear eigenvalue problem(NEP) (F (λ)x = 0) of finding eigenpairs (λ,x), wherethe matrix F (λ) is an n × n analytic matrix function.Nonlinear eigenvalue problems appear in a variety ofproblems in science and engineering, such as delay dif-ferential equations [1], quantum dots [2], and acceler-ator design [3]. These problems require specific eigen-pairs. The Sakurai–Sugiura (SS) method [4] is a solverfor NEPs that can find eigenpairs locally. The SS methodrequires parameters such as closed curves on the complexplane, and its performance can be improved by settingthe approximate parameters based on rough estimatesof the desired eigenvalues. If we obtain a rough esti-mate of the eigenvalue density in advance, we can setparameters more efficiently. The method of estimatingthe eigenvalue density using contour integrals of gener-alized eigenvalue problems has been proposed in [5]. Inthe present paper, we extend this method to NEPs.The remainder of the present paper is organized as fol-

lows. In Section 2, we describe a derivation of the num-ber of eigenvalues for NEPs by introducing the Smithform [6] and a contour integral. Then we propose an ap-plication of the unbiased estimation of the matrix tracein order to avoid the matrix inversion. In Section 3, wedescribe an estimation method of the eigenvalue den-sity and show its simple implementation. In Section 4,we investigate the performance of the proposed methodthrough numerical experiments using four matrices. Fi-nally, the conclusions are presented in Section 5.

2. Stochastic method for estimating the

number of eigenvalues for NEPs

In this section, we propose a stochastic method forestimating the number of eigenvalues for NEPs on thecomplex plane. Let F (z) be an analytic matrix functiondefined in a simply connected region in C. The determi-nant of F (z) is not identically zero in the domain Ω. Inother words, F (z) is regular when z ∈ Ω. We introducethe Smith form for analytic matrix functions [6].

Theorem 1 Let F (z) be an n× n regular matrix ana-lytic function. Then, F (z) admits the representation

P (z)F (z)Q(z) = D(z), (1)

where D(z) = diag(d1(z), . . . , dn(z)) is a diagonal ma-trix of analytic functions dj(z) for j = 1, 2, . . . , n, suchthat d1(z) = 0 and dj(z)/dj−1(z) are analytic functionsfor j = 2, 3, . . . , n. In addition, P (z) and Q(z) are n×nregular analytic matrix functions with constant nonzerodeterminants.

The eigenpairs of the NEP are formally derived fromthe Smith form. Let λ1, λ2, . . . , λs be zeros of dn(z) in Ω.Since d1(z) = 0 and dj(z)/dj−1(z) are analytic functionsfor j = 2, 3, . . . , n, dj(z) can be represented in terms ofλi as

dj(z) = hj(z)s∏

i=1

(z − λi)αji , j = 1, 2, . . . , n, (2)

where hj(z) are analytic functions, hj(z) = 0 for z ∈ Ω,the eigenvalues of the NEP are equal to λi, and αji ∈ Z+,and

∑nj=1 αji exhibits the multiplicity of λi. We propose

the following theorem.

Theorem 2 Let F (z) be an n×n regular analytic ma-trix function, and let tr(F (z)) be the matrix trace of

– 61 –

JSIAM Letters Vol. 3 (2011) pp.61–64 Yasuyuki Maeda et al.

F (z). In addition, let m be the number of eigenvalues,counting multiplicity, inside closed curves Γ ∈ Ω on thecomplex plane for the NEP F (λ)x = 0. Then we have

1

2πi

∮Γ

tr

(F (z)−1 dF (z)

dz

)dz = m, (3)

where det(F (z)) = 0.

Proof From Theorem 1, we can derive the followingequation:

tr

(F (z)−1 dF (z)

dz

)= tr

(Q(z)−1 dQ(z)

dz

)+ tr

(D(z)−1 dD(z)

dz

)+ tr

(P (z)−1 dP (z)

dz

). (4)

Since P (z) and Q(z) are regular analytic matrix func-tions with constant nonzero determinants and D(z) =diag(d1(z), . . . , dn(z)), from [7, Section 8.3] we obtain

tr

(Q(z)−1 dQ(z)

dz

)= 0, tr

(P (z)−1 dP (z)

dz

)= 0,

and

tr

(D(z)−1 dD(z)

dz

)

=n∑

j=1

ddj(z)

dz

1

dj(z)

=

s∑i=1

∑nj=1 αji

z − λi+

n∑j=1

dhj(z)

dz

1

hj(z). (5)

From the residue theorem we have

1

2πi

∮Γ

s∑i=1

∑nj=1 αji

z − λi+

n∑j=1

dhj(z)

dz

1

hj(z)

dz

=t∑

i=1

n∑j=1

αji = m, (6)

where t is the number of mutually distinct eigenvaluesinside closed curves Γ.

(QED)

Eq. (3) is approximated by an N -point quadraturerule

m ≈ m =

N−1∑j=0

wjtr(F (zj)−1F ′(zj)), (7)

where

F ′(zj) =dF (z)

dz

∣∣∣∣z=zj

,

zj is a quadrature point and wj is a weight. In the case ofthe trapezoidal rule on a circle with center γ and radiusρ, quadrature points and weights are defined by

wj =ρ

Ne

2πiN (j+ 1

2 ), zj = γ + ρe2πiN (j+ 1

2 ).

Re

Im

Γ1 Γ2 ΓK

. . .ba

ρ1

γ1 γ2 γK

ρ2 ρK

Fig. 1. Closed curves on the complex plane.

1: Input : F (z), N, L,K, a, b

2: Output : m1, m2, . . . , mK

3: Set vl of which the elements take 1 or −1 with equalprobability, l = 1, 2, . . . , L

4: ρ = (b− a)/(2K)

5: for k = 1, 2, . . . ,K do

6: γk = a+ (2k − 1)ρ

7: zjk = γk + ρe(2πi/N)(j+1/2), j = 0, 1, . . . , N − 1

8: Solve F (zjk)xljk = F ′(zjk)vl for x

ljk,

l = 1, 2, . . . , L, j = 0, 1, . . . , N − 1

9: mk = [ρ/(NL)]∑N−1

j=0 e(2πi/N)(j+1/2)∑L

l=1 vTl x

ljk

10: end for

Fig. 2. Algorithm for estimating the eigenvalue density.

In order to avoid the matrix inversion in (7), we es-timate the trace with an unbiased estimation describedin [8, 9], that is

tr(F (zj)−1F ′(zj)) ≈

1

L

L∑l=1

(vTl F (zj)

−1F ′(zj)vl), (8)

where vl are sample vectors, the entries of which take1 or −1 with equal probability, and L is the number ofsample vectors. Using (7) and (8), the number of eigen-values m is estimated as follows:

m ≈ m =1

L

N−1∑j=0

wj

L∑l=1

(vTl F (zj)

−1F ′(zj)vl). (9)

3. Implementation

In this section, we describe an estimation method ofthe eigenvalue density using (9). We set points a and b onthe complex plane, and we divide the interval [a, b] intoK domains. Let Γk (k = 1, 2, . . . ,K) be closed curveswhich enclose each domain. We estimate the number ofeigenvalues in each closed curve by (9). From the resultof the estimation, we have regions where the numberof eigenvalues is large and regions where the number ofeigenvalues is small. A schematic illustration for settingof closed curves on the complex plane is shown in Fig.1. This figure indicates a case that the shapes of Γk arethe circle. The center and the radius of each circle is setto γk = a + (2k − 1)(b − a)/2K and ρk = (b − a)/2K,

– 62 –


Table 1. Matrix properties.

F (λ) size γ ρ

Butterfly A0 + λA1 + λ2A2 + λ3A3 + λ4A4 64 1 + 0.7i 1Quantum dot (QD) A0 + λA1 + λ2A2 + λ3A3 + λ4A4 + λ5A5 2475 1 0.06

Delay-differential equation (DDE) λI −A0 −A1e−τλ 3600 −4.3 + 6.3i 0.2

Accelerator designs (SLAC) A0 − λA1 + i√λ− σ2A2 5384 360000 25000

A0, A1, A2, A3, A4, and A5 of each NEP are different

0 20 40 60 80 1002426.5

2427

2427.5

2428

2428.5

2429

The number of sample vectors

Tra

ce

0

2

4

6

8

10

Sta

nd

ard

de

via

tion

Standard deviation

Trace(F(z)−1F’(z))

Average

Fig. 3. Trace and standard deviation.

respectively, so that each circle encloses an equally di-vided sub-segment of [a, b]. The algorithm shown inFig. 2 estimates the eigenvalue density with γk and ρk(k = 1, 2, . . . ,K).


In this section, we confirm the validity of the pro-posed method by applying the method to a numberof NEPs. The algorithm is implemented in MATLAB7.4. The MATLAB command mldivide is used to solveF (zjk)x

ljk = F ′(zjk)vl for x

ljk numerically, and the ele-

ments of the sample vectors are given by the MATLABfunction rand. We use one random sequence except forExample 1.

4.1 Example 1

In this example, we see the behavior of the average andthe standard deviation of (8) where the number of sam-ple vectors L increases. The test problem is QD in Table1. The average and the standard deviation are evaluatedby using 30 different random sequences. z is set to 0.1.The results of this example are shown in Fig. 3. Thehorizontal axis indicates L. The vertical axes on the leftand right indicate the trace and the standard deviation,respectively. The result indicates that the average getscloser to the exact value of the trace as L increases. Thestandard deviation decreases rapidly until around 30 anddecreases slowly from then on.

4.2 Example 2

In this example, we investigate how the numerical in-tegral in (7) affects the approximation for the numberof eigenvalues m as the number of quadrature points Nincreases. The test problems are given in [1–3,10]. Theirproperties are shown in Table 1. Here, N is set to 4, 6,8, 16, 32 and 64. The shape of Γ is a circle. The resultslisted in Table 2 indicate that the order of magnitude ofm and m agree in all cases.

Table 2. Results for Example 2.

Number of eigenvaluesN Butterfly QD DDE SLAC

4 55.3413 35.5420 22.7747 10.48196 87.7399 33.1033 23.2993 10.34868 69.0999 32.2280 23.5541 10.967916 66.8368 31.1224 23.6857 9.8864

32 74.2438 30.8099 24.9703 9.978264 69.2881 31.0783 23.7490 10.0001

m 70.0000 31.0000 24.0000 10.0000

Table 3. Results for Example 3.

Number of eigenvaluesL Butterfly QD DDE SLAC

10 69.5271 38.8910 36.6366 11.629220 69.5066 35.8612 26.9446 10.183730 68.3734 35.4354 19.0834 10.382040 68.1347 34.9229 17.9046 9.4075

50 67.7309 33.6268 19.1009 8.8863100 67.4809 33.1391 22.0621 9.6417500 69.1751 32.9185 22.7606 10.71241000 69.0185 33.1510 23.6547 10.4950

m 69.0999 32.2280 23.5541 10.9679

4.3 Example 3

In this example, we investigate how the unbiased esti-mation of the matrix trace in (9) affects the estimationfor the number of eigenvalues m as the number of sam-ple vectors L increases. The test problems are the sameas those in Example 2. Here, N is set to 8, and L is setfrom 10 to 1000. The results are shown in Table 3. Thebottom row m of Table 3 is the approximate number ofeigenvalues in the case N = 8 shown in Table 2. Theresults indicate that the order of magnitude of m and magree in all cases. Thus, in the cases of these numericalexperiments, the estimation of the number of eigenvaluesm by (9) can be used for applications that only requireexponent of m. An example of such an application is theparameter setting for the eigensolver which described inthe introduction of the present paper.

4.4 Example 4

In this example, we give a demonstration of the algo-rithm shown in Fig. 2. The test problems are Butterfly,QD and SLAC in Table 1. Here,K = 30,N = 8, and L =30. The interval [a, b] of Butterfly, QD and SLAC is set to[−3.2+0.5i, 3.2+0.5i], [0, 2] and [0.02× 106, 1.02× 106],respectively. The results of this example are shown inFigs. 4, 5 and 6. The horizontal axis indicates the realpart of γk and the vertical axis indicates the numberof eigenvalues in each circle. The results indicate thatthe order of magnitude of the exact number of eigenval-ues and the estimated number of eigenvalues agree in allcases. Through these numerical experiments, it is exper-imentally confirmed that the proposed method lets us

– 63 –


−4 −3 −2 −1 0 1 2 3 40

1

2

3

4

5

6Eigenvalue Estimation

Real part of γk

Th

e n

um

be

r o

f e

ige

nva

lue

s

Exact

Estimation

Fig. 4. Number of eigenvalues (Butterfly).

0 0.5 1 1.5 20

50

100

150

200


Real part of γk

Th

e n

um

be

r o

f e

ige

nva

lue

s

Exact

Estimation

Fig. 5. Number of eigenvalues (QD).

0 2 4 6 8 10

x 105

0

2

4

6

8

10


Real part of γk

Th

e n

um

be

r o

f e

ige

nva

lue

s

Exact

Estimation

Fig. 6. Number of eigenvalues (SLAC).

know whether eigenvalues exist or not in the specifiedregion.

5. Conclusions

In the present paper, we propose a stochastic methodfor estimating eigenvalue density for nonlinear eigen-

value problems of analytic matrix functions. The pro-posed method uses unbiased estimation of the matrixtraces and contour integrations and is considered to bean extension of the parallel stochastic estimation methodfor eigenvalue density proposed in [5]. By the numericalexperiments, we learn that the proposed method can ob-tain the information about whether eigenvalues exist ornot in the specified region. This information can be usedto perform efficient parameter settings for eigensolvers.

Acknowledgments

This research was supported in part by JST, CRESTand Grant-in-Aids for Scientific Research from the Min-istry of Education, Culture, Sports, Science and Tech-nology, Japan, Grant Nos. 21246018 and 23105702.

References

[1] E. Jarlebring, K. Meerbergen and W. Michiels, An Arnoldimethod with structured starting vectors for the delay eigen-

value problem, in: IFAC Workshop on Time Delay Systems,2010.

[2] F.N.Hwang, Z.H.Wei, T.M.Huang and W.Wang, A paralleladditive Schwarz preconditioned Jacobi-Davidson algorithm

for polynomial eigenvalue problems in quantum dot simula-tion, J. Comput. Phys., 229 (2010), 2932–2947.

[3] B. S. Liao, Subspace projection methods for model order re-duction and nonlinear eigenvalue computation, PhD thesis,

Department of Mathematics, UC Davis, 2007.[4] J.Asakura, T. Sakurai, H.Tadano, T. Ikegami and K.Kimura,

A numerical method for nonlinear eigenvalue problems usingcontour integral, JSIAM Letters, 1 (2009), 52–55.

[5] Y. Futamura, H. Tadano and T. Sakurai, Parallel stochasticestimation method for eigenvalue distribution, JSIAM Let-ters, 2 (2010), 127–130.

[6] I. Gohberg and L. Rodman, Analytic matrix functions withprescribed local data, J. d’Analyse Mathematique, 40 (1981)90–128.

[7] J. R. Magnus and H. Neudecker, Matrix Differential Calcu-

lus with Applications in Statistics and Econometrics, Wiley,1999.

[8] Z. Bai, M. Fahey and G. Golub, Some large scale matrix com-putation problems, J.Comput.Appl.Math., 74 (1996), 71–89.

[9] M. F. Hutchinson, A stochastic estimator of the trace of theinfluence matrix for laplacian smoothing splines, Commun.Stat. Simulation Comput., 19 (1990), 433–450.

[10] NLEVP : A Collection of Nonlinear Eigenvalue Problems

http://www.mims.manchester.ac.uk/research/

numerical-analysis/nlevp.html.

– 64 –


Computation of multipole moments from incomplete

boundary data for Magnetoencphalography

inverse problem

Hiroyuki Aoshika1, Takaaki Nara1, Kaoru Amano2,3 and Tsunehiro Takeda3

1 The University of Electro-Communications, 1-5-1, Chofugaoka, Chofu, Tokyo 182-8585, Japan2 Precursory Research for Embryonic Science and Technology, Japan Science and TechnologyAgency, Saitama 332-0012, Japan

3 The University of Tokyo, 5-1-5, Kashiwanoha, Kashiwa, Chiba 227-8561, Japan

E-mail aoshika inv.mce.uec.ac.jp

Received June 6, 2011, Accepted August 8, 2011

Abstract

In this paper, we present a method for reconstructing the dipole sources inside the humanbrain from radial Magnetoencephalography data measured on the part of the boundary whichencloses the source. Combining the proposed method with the direct method provides a goodinitial solution for an optimization-based algorithm. The method is verified with the numericalsimulations, phantom experiments, and a somatosensory evoked field data analysis.

Keywords inverse problem, Magnetoencephalography, multipole moment


1. Introduction

Magnetoencephalography (MEG) is a non-invasivebrain monitoring tool that records the magnetic fieldoutside the head generated by the neural current in thebrain. Here one must solve an inverse problem to recon-struct the current source from the measured magneticfield. The conventional methods for the inverse problemassume that the current source can be represented bya relatively small number of equivalent current dipoles(ECDs). Although the usual algorithm for this sourcemodel is the non-linear least-squares method that min-imizes the squared error of the data and the forwardsolution, it has a problem that an initial parameter es-timate close to the true one is required without whichthe algorithm often converges to a local minimum. Toaddress this issue, several researchers have proposed adirect method [1–4] which reconstructs the source pa-rameters directly and algebraically from the data. Fromthe efficiency of the algorithm, it is expected to be usedfor real-time monitoring of the brain activity. Also froma practical point of view, it can provide a good initialsolution for the iterative algorithm.However, the problem of the direct method is that it

requires the data on the boundary which encloses thesource. The practical MEG system has no sensors infront of the face and beneath the neck. The lack of dataat those parts is the cause of error in computing theweighted integral of the boundary data.The aim of this paper is to develop a method to re-

construct the source parameters from the data on thepart of the boundary. First, using the multipole expan-sion of the radial component of the magnetic field, themultipole moments are estimated from the data only on

the upper hemisphere. We regularize the linear equationproposed by Taulu et al. [5] by using the singular valuedecomposition. Then, by combining this method withour direct method proposed in [4], the source parame-ters are estimated from incomplete boundary measure-ments, which can be used as a good initial solution foran optimization-based algorithm.The rest of this paper is organized as follows. In Sec-

tion 2, our direct method is summarized. The methodusing data only on the upper hemisphere is proposedin Section 3, which is verified by numerical simulations,phantom experiments, and a real data analysis in Sec-tions 4, 5, and 6, respectively.

2. Direct method

Assume that the head can be modeled by the threeconcentric spheres Ω = Ω1 ∪ Ω2 ∪ Ω3 representing thebrain, skull, and scalp, respectively. The sensors mea-suring the radial component of the magnetic field areplaced on the upper hemisphere Γ with radius R cen-tered at the origin. Generally, the solution to the inverseproblem in which the neural current in the brain is re-constructed from the measured MEG data is not unique.To guarantee the uniqueness, we assume that the neu-ral current is expressed by equivalent current dipoles(ECDs) Jp =

∑Kk=1 pkδ(r − rk) where rk ∈ Ω1. The

radial component of the magnetic field at r ∈ Γ is thengiven by the Biot-Savart law

Br(r) =µ0

4π

K∑k=1

(n× pk) · ∇′ 1

|r − r′||r′=rk

,

where µ0 is the permeability assumed to be constantin the whole space and n = r/|r| is the outward unit

– 65 –

JSIAM Letters Vol. 3 (2011) pp.65–68 Hiroyuki Aoshika et al.

normal to Γ. Our inverse problem is to reconstruct thenumber N , positions rk, and the moments pk of theECDs from measurements of Br on Γ.In contrast to the conventional method with the

non-linear least-squares method, we proposed a directmethod which reconstructs the source parameters di-rectly and algebraically from MEG data [4]. The methodis based on the multipole expansion of the radial MEGgiven by

Br = µ0

∞∑l=0

l∑m=−l

l + 1

2l + 1Mlm

Y ∗lm(θ, ϕ)

rl+2, (1)

where Ylm(θ, ϕ) are the normalized spherical harmonicfunctions. It is shown that the multipole moments Mlm

where l = m are expressed in terms of the source pa-rameters. On the other hand, they are expressed by theboundary data. As a result, we have the algebraic equa-tions relating the source parameters to the data:

N∑k=1

qkSmk = αm, (2)

where Sk ≡ xk+ iyk is the kth source position projectedon the xy-plane, qk ≡ rk×pk is the magnetic moment ofthe kth ECD, qk ≡ [qk]x + i[qk]y, where [∗]x representsthe x-component of the vector ∗, and

αm =2m+ 3

(m+ 1)µ0

∫S

Br(x+ iy)m+1dS, (3)

where S is a sphere which is centered at the origin andencloses Ω. It is also shown that (2) is reduced to a gen-eralized eigenvalue problem [6] so that the source param-eters can be reconstructed algebraically from the bound-ary data. Although this method is theoretically simple,a problem is that we need Br on the whole S which en-closes Ω. In the practical situation the sensors cannot beplaced in front of the face and in the middle of the neck.Hence, lack of data on the part of S becomes a factor oferrors in computing αm.

3. Computation of Mmm from data on

the upper hemisphere

Truncate (1) up to order L:

Br ≃L∑

l=0

l∑m=−l

XlmY∗lm(θ, ϕ), (4)

where Xlm = µ0 · [(l + 1)/(2l + 1)] · (Mlm/rl+2). Then

the linear equations relating the multipole moments toradial MEG on Γ are obtained [5]: d = Gx, where d =(Br1, Br2, . . . , BrN )T ∈ RN is the data measured on Γ,

x = (X0,0, X1,−1, X1,0, . . . , XL,L)T ∈ C(L+1)2−1 consists

of the unknown multipole moments, and

G =Y ∗1,−1(θ, ϕ)1 Y ∗

1,0(θ, ϕ)1 · · · Y ∗L,L(θ, ϕ)1

Y ∗1,−1(θ, ϕ)2 Y ∗

1,0(θ, ϕ)2 · · · Y ∗L,L(θ, ϕ)2

......

...

Y ∗1,−1(θ, ϕ)N Y ∗

1,0(θ, ϕ)N · · · Y ∗L,L(θ, ϕ)N

,

where (θ, ϕ)i represents the spherical coordinates of theith sensor. We choose L such that L is maximum un-der the condition that the linear system becomes over-determined, that is, N > (L+ 1)2 − 1.In order to obtain x while suppressing the effect of

noise contained in d, we use the truncated singular valuedecomposition of G denoted by G+

T where T is a trunca-tion order. To determine T , we use the following method:first we fix T and estimate the multipole moments fromthe data on Γ by x = G+

T d. From the components ofl = m in x with the direct method in Section 2, we canidentify the source positions projected on the xy-plane.Using them as the initial solution, the z-coordinatesand the moments of the source are determined by thenon-linear least-squares method. Then, we compute theGoodness of Fit (GoF) defined by

GoF [%] = 100

1−N∑i=1

(Bdata[i]−Bth[i])2

N∑i=1

Bdata[i]2

, (5)

where Bdata[i] and Bth[i] are the data and the forwardsolution at the ith sensor position, respectively. We re-peat these computations by changing T and choose Tsuch that GoF becomes maximum.Practically, the sensors called ‘gradiometers’ are often

used which measure the difference of Br at a point r onΓ and r + bn, where b is called the baseline distance,in order to cancel the noise which originates from thesource far apart. In this case, we only need to changeXlm from Xlm = µ0 · [(l + 1)/(2l + 1)] · (Mlm/r

l+2) toXlm = µ0 · [(l+1)/(2l+1)] · [1/rl+2−1/(r+b)l+2] ·Mlm.Thus even in this case, from x = G+

T d, we can obtainMlm where l = m that are used in the direct method.

4. Numerical simulations

First we verified our method numerically. A singledipole (K = 1) was set at r1 = (40, 40, 40) mm withthe moment p1 = (0, 0, 10) nAm. N = 183 gradiome-ters with b = 50 mm were uniformly distributed on theupper hemisphere with R = 120 mm using the spheri-cal t-design [7]. For this number of the sensors N , thetruncation order was L = 12. Gaussian noise was addedwhose standard deviation was 10% of the root-mean-squares of the theoretical data.Fig. 1 shows the relative localization error (the error

devided by R) and GoF when changing the truncationorder T . We observed that choosing T = 118 gave themaximum GoF (98.1%) and the minimum relative local-ization error (0.18%). Hence from GoF we can determinethe optimal truncation order T . The relative magneticmoment error was 1.8%.Fig. 2 shows the estimated |q2/q1| when assuming that

there were K ′ = 2 dipoles. We observed that |q2| be-came much smaller than |q1| for most T . In fact, whenT = 118 that was the truncation order used for recon-struction, |q2|/|q1| = 0.043, showing that |q2| can beneglected compared to |q1| and hence K = 1.

– 66 –


Truncation order T

0

0.05

0.1

0.2

0.15

25 50 75 100

100

97.5

95

92.5

87.5

85

82.5

90

125 150

Truncation order T

0 25 50 75 100 125 150

T =118

Re

lative

lo

ca

liza

tio

n e

rro

r [%

]G

oF

[%

]

Fig. 1. Relative localization error and GoF with respect to thetruncation order T . When T = 118, the error and GoF becomeminimum and maximum, respectively.

0

0 25 50 75 100 125 150

−0.5

−2.5

−1.5

−1

−2

−3

−3.5

Lo

g [m

ag

ratio

]

Truncation number

Fig. 2. Ratio |q2|/|q1| when assuming K′ = 2.

5. Phantom experiments

Next we examined our method using the phantomhead. We used 168 gradiometers as shown in Fig. 3 wherethe radius R = 129 mm. L was set to be 11. A single cur-rent source (K = 1) was moved on the plane z1 = 16 mmand z1 = 42 mm where

√x21 + y21 = 62.5 mm and

ϕ0 = tan−1 y1/x1 = 45 × i (i = 0, 1, . . . , 7) degrees.The reconstruction results are shown in Figs. 4 and 5.The mean estimation error was 3.5 mm and 2.4 mm, re-spectively. Fig. 6 shows an example of the relationshipbetween the relative localization error and GoF whenz = 42 mm and ϕ = 180. It is observed that the trun-cation order T = 78 that maximizes GoF coincides withthe order that minimizes the localization error. This co-incidence was observed in all the source positions.

6. Real data analysis

We analyzed a somatosensory evoked field (SEF) datawhere a right hand index finger was electrically stimu-lated. 205 gradiometers were used. L was set to be 13.In the time series shown in Fig. 7, we used the peak at101 msec for reconstruction (the peak at 0 msec is an ar-

Fig. 3. Radial Gradiometer 168ch.

y [mm]

x [mm]

80

60

40

20

20 40 60 80

−20

−20−40−60−80

−40

−60

−80true

estimated

Fig. 4. Reconstruction result when z1 =16 mm. Black dots: true

ECD positions, Red dots: estimated ECD positions.

y [mm]

x [mm]

80

60

40

20

20 40 60 80

−20

−20−40−60−80

−40

−60

−80true

estimated

Fig. 5. Reconstruction result when z1 =42 mm.

tifact originating from the electrical stimulation). Fig. 8shows the contour maps of radial MEG.Fig. 9 shows the localization result. The maximum

GoF was 74% when T = 51 with which reconstructionwas conducted. In this case, two dipoles are estimated inthe right and left somatosensory cortices. Figs. 10 and11 shows |q2/q1| and |q3/q2|, respectively, when assumingthat there were K ′ = 3 dipoles. One finds that |q3/q2|often becomes small when changing T while |q2| is com-parable to |q1| for wide range of T . In fact, when T = 51which was used for reconstruction, |q2/q1| = 0.83 and|q3/q2| = 0.0002 from which we can reasonably judgethat K = 2.

7. Conclusion

In this paper, we developed a method for computingthe multipole coefficients of the radial magnetic field cre-ated by the dipole source from radial MEG data on the

– 67 –


80

Truncation order T

40 50 60 70 90 100

Truncation order T

40 50 60

60

70

80

90

100

2

4

6

8

10

70 80 90 100

Go

F [%

]R

ela

tive

lo

ca

liza

tio

n e

rro

r [%

]

Fig. 6. Relative localization error (top) and GoF (bottom) whenϕ1 = 180 and z1 = 42 mm. The truncation order T = 78maximizes GoF and minimized the relative localization error.

200 40 60 80 100 120 140 160 180

500

250

0

−500

−250

Ma

gn

etic flu

x d

en

sity,

fT

0 msec 101 msec

Time, msec

200000000 40 60 80 10000000 120 140 160 18

Fig. 7. Time series data.

Isofield Contour Map

L R

Sink Source

5fT/Step

Fig. 8. Contour map of radial MEG. Red and blue colors showthe outward and inward magnetic field.

Axial View Sagittal View Coronal View

L R P RA L

Fig. 9. Localization result at 101 msec. Two dipoles are esti-mated in the left and right somatosensory cortices.

0 50 100 150Truncation number

−3.5

−2.5

−1.5

0.5

−3

−2

−1

0

Lo

g [m

ag

ratio

]

Fig. 10. |q2/q1| when assuming that there were K′ = 2 dipoles.

For most T , |q2| is comparable to |q1|. When T = 51 which isthe truncation order used in reconstruction, |q2/q1| = 0.83.

0 50 100 150Truncation number

−3.5

−2.5

−1.5

0.5

−3

−2

−1

0

Lo

g [m

ag

ratio

]

Fig. 11. |q3/q2| when assuming that there were K′ = 3 dipoles.For most T , |q3| becomes much smaller than |q2|. When T = 51which is the truncation order used in reconstruction, |q3/q2| =0.0002.

upper hemisphere, that were used in the direct inver-sion method for reconstructing the dipole parameters.The method was verified with the numerical simulations,phantom experiments, and somatosensory evoked field(SEF) data analysis. Although it was suggested that Kcould be estimated from the ratio of the source strengthassuming larger number of dipoles than the true one, therigorous analysis for the threshold is required. General-ization of our method to the case when the data is givennot on the upper hemisphere but on an arbitrary opensurface which does not enclose the source is straightfor-ward; its verification with simulations as well as phan-tom/real data analyses is also required.

References

[1] A. El-Badia and T. Ha-Duong, An inverse source problem inpotential analysis, Inverse Problems, 16 (2000), 651–663.

[2] T. Ohe and K. Ohnaka, A precise estimation method for lo-cations in an inverse logarithmic potential problem for pointmass models, Appl. Math. Modelling, 18 (1994), 446–452.

[3] K. Yamatani, T. Ohe and K. Ohnaka, An identificationmethod of electric current dipoles in spherically symmetricconductor, J. Comp. Appl. Math., 143 (2002), 189–200.

[4] T. Nara, J. Oohama, M. Hashimoto, T. Takeda and S. Ando,

Direct reconstruction algorithm of current dipoles for vectormagnetoencephalography and electroencephalography, Phys.Med. Biol., 52 (2007), 3859–3879.

[5] S. Taulu, M.Kajola and J. Simola, Suppression of interference

and artifacts by the signal space separation method, BrainTopography, 16 (2004), 269–275.

[6] P. Kravanja, T. Sakurai and M. V. Barel, On locating clustersof zeros of analytic functions, BIT, 39 (1999), 646–682.

[7] E. B. Saff and A. B. J. Kuijlaars, Distributing many pointson a sphere, Mathematical Intelligencer, 19 (1997), 5–11.

– 68 –


An alternative implementation of the IDRstab method

saving vector updates

Kensuke Aihara1, Kuniyoshi Abe2 and Emiko Ishiwata3

1 Graduate School of Science, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo162-8601, Japan

2 Faculty of Economics and Information, Gifu Shotoku University, 1-38 Nakauzura, Gifu-shi,Gifu 500-8288, Japan

3 Department of Mathematical Information Science, Tokyo University of Science, 1-3 Kagu-razaka, Shinjuku-ku, Tokyo 162-8601, Japan

E-mail j1411701 ed.tus.ac.jp

Received August 6, 2011, Accepted October 4, 2011

Abstract

The IDRstab method is often more effective than both IDR(s) and BiCGstab(ℓ) for solvinglarge nonsymmetric linear systems. However the computational costs for vector updates areexpensive on the original implementation of IDRstab. In this paper, we propose a variant ofIDRstab to reduce the computational cost; vector updates are saved. Numerical experimentsdemonstrate the efficiency of our variant of IDRstab for sparse linear systems.

Keywords linear systems, Induced Dimension Reduction, IDRstab method, vector update


1. Introduction

The IDR(s) method [1], which is based on the In-duced Dimension Reduction (IDR) principle, has beenproposed by Sonneveld and Gijzen for solving nonsym-metric linear systems Ax = b of order n for x, wherethe right-hand side vector b is an n-vector.It has been shown that IDR(s) corresponds to the Bi-

Conjugate Gradient STABilized (BiCGSTAB) method[2] which uses s-dimensional initial shadow residual [3].As for BiCGSTAB, IDR(s) also contains the residualminimization step using stabilizing polynomials of de-gree one. Therefore, as BiCGSTAB, the residual mini-mization step causes numerical instabilities in the caseof a strongly nonsymmetric matrix. To overcome thisproblem, the IDRstab method has been developed bySleijpen and Gijzen as an alternative IDR(s) with stabi-lizing polynomials of degree ℓ [4]. Note that, IDRstabwith ℓ = 1 is mathematically equivalent to IDR(s),and that with s = 1 is mathematically equivalent tothe BiCGstab(ℓ) method [5]. The related method GBi-CGSTAB(s, L) [6] which incorporates the stabilizingpolynomials of order L to IDR(s) has been proposedby Tanio and Sugihara, but its implementation is quitedifferent from IDRstab.IDRstab with s and ℓ larger than 1 is often more ef-

fective than both IDR(s) and BiCGstab(ℓ). However theoriginal implementation of IDRstab presented in [4] re-quires many vector updates of the form ax + y witha scalar a and the n-vectors x and y (AXPYs). Thecomputation time depends significantly on the compu-tational costs for AXPYs. It is known that a numberof different implementations of IDRstab can be devised.Therefore, in this paper, we propose a variant of IDRstab

which saves the computational costs for AXPYs. Numer-ical experiments demonstrate that our proposed variantof IDRstab is more efficient than the original one forsparse linear systems.

2. The IDRstab method

In this section, we describe the outline of the originalIDRstab method.The jth residual rj of the IDR method based on the

IDR principle such as IDRstab is generated in subspaceGj . Here, the subspaces Gj , j = 0, 1, 2, . . . are related by

G0 ≡ Cn and Gj+1 ≡ (I−ωj+1A)(Gj ∩ R⊥0 ), where R

⊥0 is

the orthogonal complement of the range of a fixed n× smatrix R0, and the ωj ’s are nonzero scalars. In genericcase, the dimension of Gj decreases with increasing j bythe IDR theorem; for the details, we refer to [1, 3, 4].The residual rk ∈ Gk of IDRstab is updated to the

next residual rk+ℓ ∈ Gk+ℓ without explicitly produc-ing the residuals rk+i ∈ Gk+i for i = 1, 2, . . . , ℓ − 1,where the integer k is a multiple of ℓ. The process ofthis update from rk to rk+ℓ is referred to as one cycle ofIDRstab. The cycle has two steps called the IDR stepand the polynomial step.

2.1 The IDR step

Suppose that we have an approximation xk and thecorresponding residual rk ∈ Gk, plus the n× s matricesUk and AUk with columns also in Gk. The IDR step isrepeated ℓ times before the residual minimization step,i.e., the polynomial step. The ℓ repetitions of the IDR

step are performed by using the projections Π(j)i for i =

0, 1, . . . , j, j = 1, 2, . . . , ℓ which are defined by

Π(j)i ≡ I −A

iU(j−1)k σ−1

j R∗0A

j−i, σj ≡ R∗0A

jU(j−1)k .

– 69 –

JSIAM Letters Vol. 3 (2011) pp.69–72 Kensuke Aihara et al.

Note that R∗0Π

(j)j = O and Π

(j)i+1A = AΠ

(j)i for i =

0, 1, . . . , j − 1. Here we use the superscript ‘(j)’ as the

jth repetition of the IDR step. An approximation x(j−1)k

and the corresponding residual r(j−1)k , plus the vectors

Air(j−1)k for i = 1, 2, . . . , j − 1, and the n × s matrices

AiU(j−1)k for i = 0, 1, . . . , j are generated by performing

j−1 (j ≤ ℓ) repetitions, where x(0)k ≡ xk, r

(0)k ≡ rk and

U(0)k ≡ Uk. The jth repetition is performed as follows.

The vectors Air(j)k for i = 0, 1, . . . , j − 1 are obtained

by the projection

Air(j)k ≡ Π

(j)i+1A

ir(j−1)k

= Air(j−1)k −Ai+1U

(j−1)k α(j) (1)

with α(j) ≡ σ−1j (R∗

0Aj−1r

(j−1)k ), and the associated ap-

proximation x(j)k is expressed by

x(j)k = x

(j−1)k + U

(j−1)k α(j). (2)

The matrices AiU(j)k are updated from AiU

(j−1)k for

i = 0, 1, . . . , j such that the columns of AjU(j)k form

a basis of the Krylov subspace Ks(Π(j)j A,Π

(j)j Ajr

(j)k ).

Specifically, the vector Ajr(j)k is obtained by multi-

plying Aj−1r(j)k by A. Then the vectors AiU

(j)k e1 for

i = 0, 1, . . . , j are obtained by the projection

AiU(j)k e1 ≡ Π

(j)i Air

(j)k = Air

(j)k −A

iU(j−1)k β

(j)1 (3)

with β(j)1 ≡ σ−1

j ρ(j)1 and ρ

(j)1 ≡ R∗

0Ajr

(j)k . Similarly, for

some q < s, the vector c(j)q ≡ A(AjU

(j)k eq) is computed

as the qth column of Aj+1U(j)k , after that, the vectors

AiU(j)k eq+1 for i = 0, 1, . . . , j can be computed as

AiU(j)k eq+1 = Π

(j)i Ai+1U

(j)k eq

= Ai+1U(j)k eq −AiU

(j−1)k β

(j)q+1 (4)

with β(j)q+1 ≡ σ

−1j ρ

(j)q+1 and ρ

(j)q+1 ≡ R∗

0c(j)q .

Thus, at the jth repetition, the vectors Air(j)k for i =

0, 1, . . . , j−1 and the columns of AiU(j)k for i = 1, 2, . . . , j

belong to Gk ∩ R⊥0 .

2.2 The polynomial step

After ℓ repetitions of the IDR step, we have an approx-

imation x(ℓ)k and the corresponding residual r

(ℓ)k , plus

the vectors Air(ℓ)k for i = 1, 2, . . . , ℓ, and ℓ + 2 matrices

AiU(ℓ)k for i = 0, 1, . . . , ℓ+ 1. The residual minimization

is performed using a polynomial of degree ℓ. Specifically,

the residual r(ℓ)k and the matrix AU

(ℓ)k are updated by

rk+ℓ = r(ℓ)k − γ1,kAr

(ℓ)k − · · · − γℓ,kA

ℓr(ℓ)k ,

AUk+ℓ = AU(ℓ)k − γ1,kA2U

(ℓ)k − · · · − γℓ,kAℓ+1U

(ℓ)k

with scalars γ1,k, γ2,k, . . . , γℓ,k which are determined byminimizing a norm of the residual rk+ℓ. The approxi-

mation x(ℓ)k and the matrix U

(ℓ)k are also updated to the

next associated approximation xk+ℓ and matrix Uk+ℓ.

U(0)k x

(0)k → x

(1)k U

(1)k e1 U

(1)k e2 . . . U

(1)k es

Π(1)0

↓ AΠ

(1)0

↓ A ↓ A

r(0)k r

(1)k AU

(1)k e1 AU

(1)k e2 . . . AU

(1)k es

Fig. 1. The first repetition.

U(1)k x

(1)k → x

(2)k U

(2)k e1 U

(2)k e2 . . . U

(2)k es

Π(2)0

↓ AΠ

(2)0

↓ A ↓ A

AU(1)k r

(1)k

Π(2)1→ r

(2)k AU

(2)k e1 AU

(2)k e2 . . . AU

(2)k es

↓ AΠ

(2)1

↓ AΠ

(2)1

↓ A ↓ A

Ar(2)k A2U

(2)k e1 A2U

(2)k e2 . . . A2U

(2)k es

↓ A

A2r(2)k

Fig. 2. The second repetition.

3. An alternative implementation of the

IDRstab method

In the IDR step stated in preceding section, the com-putational costs for vector updates (i.e., AXPYs) to

generate the matrices AiU(j)k for i = 0, 1, . . . , j, j =

1, 2, . . . , ℓ are expensive. Note that the vector update

using a projection Π(j)i such as (1) contains s AXPYs.

Therefore, in this section, we give a new formulation ofthe IDR step in which vector updates are saved, andderive an alternative implementation of IDRstab.

3.1 Saving vector updates in the IDR step

We compute the matrix A∗R0 at the beginning of theiteration, and store it. Then, following the idea noted

in [4], we compute the σj as (A∗R0)∗Aj−1U

(j−1)k . The

vectors α(j), ρ(j)1 and ρ

(j)q+1 (q < s) can also be com-

puted as σ−1j ((A∗R0)

∗Aj−2r(j−1)k ), (A∗R0)

∗Aj−1r(j)k

and (A∗R0)∗AjU

(j)k eq, respectively. These forms enable

us to perform the jth repetition of the IDR step without

the matrix AjU(j−1)k .

At the first repetition, i.e., for j = 1, we compute

U(0)k α(1), then obtain the approximation (2) and the

residual r(1)k = r

(0)k − A(U

(0)k α(1)), where multiplying

U(0)k α(1) by A gives A(U

(0)k α(1)). We obtain U

(1)k by the

projection Π(1)0 , and multiplying U

(1)k by A gives AU

(1)k .

From the second repetition, i.e., for j = 2, 3, . . . , ℓ,we perform the updates (1) for i = 0, 1, . . . , j − 2 and

(2), and obtain Aj−1r(j)k by multiplying Aj−2r

(j)k by A.

Then we also perform the updates (3) and (4) for i =0, 1, . . . , j−1. At the end of the ℓth repetition, we obtain

Aℓr(ℓ)k by multiplying Aℓ−1r

(ℓ)k by A.

The scheme of the IDR step stated above is displayedin Figs. 1 and 2, where ℓ = 2. The notations of thescheme follow [4]. The explicit multiplications by A areused to obtain the boxed vectors.In our proposed IDR step, it is not needed to generate

Aj+1U(j)k at the jth repetition. Hence, we can save the

storage of an n× s matrix and ℓ(s2 + s) AXPYs. How-

– 70 –


ever we need an additional matrix-vector multiplication

(MV) to obtain the vector A(U(0)k α(1)) per cycle.

3.2 A variant of IDRstabOur variant of IDRstab algorithm saving vector up-

dates is expressed as follows:

Our proposed variant of IDRstab

1. Select an initial guess x and an (n× s) matrix R0

2. Compute r0 = b−Ax, r = [r0]

% Generate an initial (n× s) matrix U = [U0]

3. For q = 1, . . . , s

4. if q = 1, u0 = r0, else, u0 = Au0

5. µ = (U0(:,1:q−1))∗u0, u0 = u0 −U0(:,1:q−1)µ

6. u0 = u0/∥u0∥2, U0(:,q) = u0

7. End for

8. While ∥r0∥ > tol

9. For j = 1, . . . , ℓ

% The IDR step

10. σ = (A∗R0)∗Uj−1

11. if j = 1, α = σ−1(R∗0r0), else, α = σ−1((A∗R0)

∗rj−2)

12. x = x+U0α

13. if j = 1, r = r−AU0α, else, r = r− [U1; . . . ;Uj−1]α

14. if j > 1, r = [r;Arj−2]

15. For q = 1, . . . , s

16. if q = 1, u = r, else, u = [u1; . . . ;uj ]

17. β = σ−1((A∗R0)∗uj−1), u = u−Uβ, u = [u;Auj−1]

18. µ = (Vj(:,1:q−1))∗uj , u = u−V(:,1:q−1)µ

19. u = u/∥uj∥2, V(:,q) = u

20. End for

21. U = V

22. End for

23. r = [r;Arℓ−1]

% The polynomial step

24. γ = [γ1; . . . ; γℓ] = arg minγ∥r0 − [r1, . . . , rℓ]γ∥225. x = x+ [r0, . . . , rℓ−1]γ, r0 = r0 − [r1, . . . , rℓ]γ

26. U = [U0 −∑ℓj=1 γjUj ]

27. End while

The notations in this algorithm follow the MATLABconventions: for a matrix W = [w1, . . . ,ws], the matrix[w1, . . . ,wq] and the vector wq for q ≤ s are notatedas W(:,1:q) and W(:,q), respectively, and [W0; . . . ;Wj ] ≡[W⊤

0 , . . . ,W⊤j ]⊤. Note that, in this algorithm, the Ui,

Vi, ri and ui for i = 0, 1, . . . , j are related to U, V,r and u by U = [U0; . . . ;Uj ], V = [V0; . . . ;Vj ], r =[r0; . . . ; rj ] and u = [u0; . . . ;uj ], respectively.As in the original implementation in [4], we use the

Arnoldi’s process to obtain the orthonormalized matri-

ces U0 and AjU(j)k at the lines 5-6 and 18-19, respec-

tively. Since AU(ℓ)k is not needed to be updated to AUk+ℓ

in the polynomial step, sℓ AXPYs are saved further. Ta-ble 1 summarizes the computational costs of the originalIDRstab and our variant for MVs and AXPYs per cycle.Here we don’t include the costs for the Arnoldi’s process.From Table 1, the computational costs of our variant percycle are less than that of the original IDRstab whennnz < ℓ(s2 + 2s)n holds, where nnz is the number ofnonzero entries of the coefficient matrix. Thus, we ex-

Table 1. The computational costs of IDRstab and our variant forMVs and AXPYs per cycle.

MVs AXPYs

IDRstab ℓ(s+ 1) 12ℓ(ℓ+ 1)(s+ s2) + 2ℓ+ sℓ+ ℓ(s2 + 2s)

our variant ℓ(s+ 1) + 1 12ℓ(ℓ+ 1)(s+ s2) + 2ℓ+ sℓ

0 50 100 150 200 250−13

−11

−9

−7

−5

−3

−1

1

Number of cycles

Log10 of relative residual 2−norm

original IDRstab

our variant

Fig. 3. Convergence histories of the original IDRstab and ourvariant with (s, ℓ) = (2, 4) for example 1.

0 200 400 600 800−13

−11

−9

−7

−5

−3

−1

1

3

Number of cycles

Log10 of relative residual 2−norm

original IDRstab

our variant

Fig. 4. Convergence histories of the original IDRstab and ourvariant with (s, ℓ) = (2, 4) for example 2.

pect that our variant is more efficient than the originalIDRstab for large sparse linear systems.Note that, the original IDRstab and our variant are

mathematically equivalent, but which may show differ-ent convergence property.


In this section, we present some numerical experi-ments on model problems with nonsymmetric matrices.

4.1 Computational condition

Numerical calculations were carried out in double-precision floating-point arithmetic on a PC (Intel Core i72.67GHz CPU) with Intel C++ 11.1.048 compiler. Theiterations were started with 0. The stopping criterion tolwas set at 10−12∥b∥2. The columns of R0 were given bythe orthonormalization of s real random vectors in theinterval (0, 1). The combinations of the parameters (s, ℓ)were set at (2, 4), (4, 4) and (8, 2).Figs. 3 and 4 display the convergence histories with

(s, ℓ) = (2, 4) for the examples 1 and 2, respectively.The plots show the number of cycles on the horizon-tal axis versus the log10 of the relative residual 2-norm(∥rk∥2/∥b∥2) on the vertical axis, respectively. Tables 2and 3 show the number of cycles and MVs, the computa-tion times and the explicitly computed relative residual2-norms (∥b − Axk∥2/∥b∥2) at termination, which are

– 71 –


Table 2. Number of cycles and MVs, computation times and ex-plicitly computed relative residual norms for example 1.

(s, ℓ) Cycles MVs Time[sec] True res.

IDRstab(2, 4)

225 2703 2.730 4.2E−09

our variant 235 3057 2.398 7.6E−13

IDRstab(4, 4)

116 2325 2.621 8.3E−10our variant 123 2587 2.356 4.8E−11

IDRstab(8, 2)

132 2385 3.442 1.5E−10our variant 132 2516 2.886 2.7E−11

Table 3. Number of cycles and MVs, computation times and ex-plicitly computed relative residual norms for example 2.

(s, ℓ) Cycles MVs Time[sec] True res.

IDRstab(2, 4)

677 8127 38.33 1.6E−06our variant 626 8140 29.58 8.5E−08

IDRstab(4, 4)

226 4525 25.20 3.1E−05

our variant 259 5443 24.42 6.2E−08

IDRstab(8, 2)

206 3717 27.85 1.9E−07our variant 221 4207 24.38 3.2E−09

abbreviated as “Cycles”, “MVs”, “Time[sec]” and “Trueres.”, respectively.

4.2 Example 1

As in [4], we take up a test matrix SHERMAN5from the Matrix-Market collection. The order n andnnz of this matrix are 3312 and 20793, respectively.The percentage of nonzero entries is 0.19. The right-hand side vector is given by substituting a vector x∗ ≡(1, 1, . . . , 1)T into the equation b = Ax∗.From Fig. 3 and Table 2, we can observe the follow-

ing: The number of cycles required for successful con-vergence of our variant are about the same as that ofthe original IDRstab for each of the combinations of sand ℓ. Then, the convergence behavior of our variant isalmost the same as that of the original IDRstab. Thecomputation times for our variant are shorter althoughthe number of MVs increases compared with that of theoriginal IDRstab because the computational costs forAXPYs are sufficiently saved. In particular, the compu-tation time for our variant is about 84% of that for theoriginal IDRstab in the case of (s, ℓ) = (8, 2).Note that the approximate solutions obtained by our

variant are more accurate than those obtained by theoriginal IDRstab for all of the combinations of s and ℓ.

4.3 Example 2

As shown in [7], we take up a system with a sparsenonsymmetric coefficient matrix derived from the finitedifference discretization of the following partial differen-tial equation on the unit square Ω = [0, 1]× [0, 1]:

− uxx − uyy +D[(y − 1

2

)ux +

(x− 1

3

)(x− 2

3

)uy]

− 43π2u = G(x, y), u(x, y)|∂Ω = 1 + xy.

This equation is discretized by using the 5-point centraldifference approximation. The mesh size h is chosen as129−1 in both directions of Ω. Then the order n and nnzof the coefficient matrix are 1282 and 81408, respectively.The percentage of nonzero entries is 0.03. The right-handside vector of the discretized system is given such thatthe exact solution u(x, y) of the above equation is 1+xy.The parameter Dh is set at 2−1.

From Fig. 4 and Table 3, we can observe the follow-ing: The number of cycles required for successful con-vergence of our variant are about the same as that ofthe original IDRstab in the case of (s, ℓ) = (2, 4) and(8, 2). Then, the convergence behavior of our variant isabout the same as that of the original IDRstab. As be-fore, the computation times for our variant are shorteralthough the number of MVs increases compared withthat of the original IDRstab. In particular, the compu-tation time for our variant is about 77% of that for theoriginal IDRstab in the case of (s, ℓ) = (2, 4).In the case of (s, ℓ) = (4, 4), the number of cycles re-

quired for successful convergence of our variant is slightlymore than that of the original IDRstab. Nevertheless,the computation time for our variant is shorter than thatfor the original IDRstab.Note that our variant leads to more accurate approx-

imate solutions as well as the result of example 1.


We proposed a variant of IDRstab saving vector up-dates. A feature of our variant is that the computationalcosts for AXPYs are sufficiently saved instead of an ad-ditional MV per cycle. Numerical experiments show thatthe number of cycles required for successful convergenceof our variant are about the same as that of the originalIDRstab. As a result, the computation time can be re-duced with sparse linear systems. Moreover, we observedthat our variant of IDRstab leads to more accurate ap-proximate solutions than the original one. We will an-alyze on a future work why the approximate solutionsobtained by our variant are more accurate than that ob-tained by the original IDRstab.

Acknowledgments

The authors would like to thank Dr. G. L. G. Sleijpen(Utrecht University) for his helpful advices, and the re-viewer for his or her constructive comments.

References

[1] P. Sonneveld and M. B. van Gijzen, IDR(s): a family of sim-

ple and fast algorithms for solving large nonsymmetric linearsystems, SIAM J. Sci. Comput., 31 (2008), 1035–1062.

[2] H. A. van der Vorst, Bi-CGSTAB: A fast and smoothly con-verging variant of Bi-CG for the solution of nonsymmetric

linear systems, SIAM J. Sci. Stat. Comput., 13 (1992), 631–644.

[3] G. L. G. Sleijpen, P. Sonneveld and M. B. van Gijzen, Bi-

CGSTAB as an induced dimension reduction method, Appl.Numer. Math., 60 (2010), 1100–1114.

[4] G. L. G. Sleijpen and M. B. van Gijzen, ExploitingBiCGstab(ℓ) strategies to induce dimension reduction, SIAM

J. Sci. Comput., 32 (2010), 2687–2709.[5] G. L. G. Sleijpen and D. R. Fokkema, BiCGstab(ℓ) for lin-

ear equations involving unsymmetric matrices with complexspectrum, Elec. Trans. Numer. Anal., 1 (1993), 11–32.

[6] M. Tanio and M. Sugihara, GBi-CGSTAB(s, L): IDR(s)with higher-order stabilization polynomials, J.Comput.Appl.Math., 235 (2010), 765–784.

[7] W. Joubert, Lanczos methods for the solution of nonsymmet-

ric systems of linear equations, SIAM J. Matrix Anal. Appl.,13 (1992), 926–943.

– 72 –


Error analysis of H1 gradient method

for topology optimization problems of continua

Daisuke Murai1 and Hideyuki Azegami1

1 Graduate School of Information Science, Nagoya University, A4-2 (780) Furo-cho, Chikusa-ku,Nagoya 464-8601

E-mail murai az.cs.is.nagoya-u.ac.jp

Received July 6, 2011, Accepted October 19, 2011

Abstract

The present paper describes the result of the error estimation of a numerical solution totopology optimization problems of domains in which boundary value problems are defined. Inthe previous paper, we formulated a problem by using density as a design variable, presenteda regular solution, and called it the H1 gradient method. The main result in this paper is theproof of the first order convergence in the H1 norm of the solution in the H1 gradient methodwith respect to the size of the finite elements if first order elements are used for the designand state variables.

Keywords calculus of variations, boundary value problem, topology optimization, H1 gra-dient method, error analysis

Research Activity Group Mathematical Design

1. Introduction

The problem of finding the optimum layout of holes ina domain in which a boundary value problem is definedis called the topology optimization problem of continua[1]. One method for formulating this topology optimiza-tion problem uses density as a design variable; in thiscase the problem is called the SIMP problem. In theprevious paper [2], we formulated the problem and pre-sented a regular solution by using a gradient method ina function space, and called this method the H1 gradi-ent method. The aim of the present paper is to showthe error estimation of the H1 gradient method usingstandard finite element analyses.

2. SIMP problem

Let D ∈ Rd, d ∈ 2, 3, be a fixed bounded domainwith boundary ∂D, ΓD ⊂ ∂D be a fixed subboundaryof |ΓD| > 0, and ΓN = ∂D \ ΓD. Following [2], let ϕ ∈C∞(R; [0, 1]) be the density given by a sigmoidal func-tion of design variable θ ∈ S = W 1,∞(D;R) | ∥θ∥1,∞≤ M for a constant M > 0. Let u be the solution tothe following problem.

Problem 1 Let f ∈ H1(D;R), p ∈ H3/2(ΓN;R) anduD ∈ H3(D;R) be given functions, and α > 1 be a con-stant. For a given θ ∈ S, find u ∈ H1(D;R) such that

−∇ · (ϕα(θ)∇u) = f in D,

ϕα(θ)∂νu = p on ΓN, u = uD on ΓD.

Here, ∂ν = ν ·∇ where ν is the unit outward normalvector along ∂D. Moreover, we provide cost functions as

J l(θ, u) =

∫D

gl(θ, u) dx+

∫∂D

jl(θ, u) dγ + cl (1)

for l ∈ 0, 1, . . . ,m with constants cl and given func-tions gl and jl. By using J l, we define the SIMP problemas follows [2].

Problem 2 Find θ such that

minθ∈SJ0(θ, u) | J l(θ, u) ≤ 0, l ∈ 1, . . . ,m.

3. θ derivative of J l

The Frechet derivative of J l with respect to θ is ob-tained as

J l′(θ, u, vl)[ρ] =

∫D

(glθ +Gla)ρ dx+

∫∂D

jlθρdγ

= ⟨Gl, ρ⟩ (2)

for all ρ ∈ H1(D;R) [2]. Here, ⟨·, ·⟩ is the dual product,Gl

a = −αϕα−1(θ)ϕθ∇u ·∇vl, and (·)θ denotes ∂(·)/∂θ.The function vl is the solution of the following problem.

Problem 3 For the solution u to Problem 1 at θ ∈ S,find vl ∈ H1(D;R) such that

−∇ · (ϕα(θ)∇vl) = glu(θ, u) in D,

ϕα(θ)∂νvl = jlu(θ, u) on ΓN, vl = 0 on ΓD.

4. Solution to Problem 2

Following [2], we generate θi, i ∈ 1, 2, . . . , n, fromθ0 by the simplified steps as follows.

(i) Set a small constant ε > 0 for step size, and i = 0.

(ii) Compute ui = u by solving Problem 1 with θ = θi.

(iii) Compute vli = vl by solving Problem 3 with θ = θi.

(iv) Compute Gli = Gl by (2) using ui, v

li and θi.

(v) Compute ρlG,i ∈ H1(D;R) by solving∫D

(∇ρlG,i ·∇y + cρlG,iy) dx = −⟨Gli, y⟩ (3)

– 73 –

JSIAM Letters Vol. 3 (2011) pp.73–76 Daisuke Murai et al.

for a constant c > 0 and all y ∈ H1(D;R).(vi) Solve λ = (λli)l in Aλ = −b where A = (ajl)jl,

ajl = ⟨Gji , ρ

lG,i⟩ and b = (Jj + aj0)j . Put λ

0i = 1

and construct

ρi =ρG,i

∥ρG,i∥1,2, ρG,i =

m∑l=0

λliρlG,i. (4)

(vii) Construct θi+1 = θi + ερi and return to (ii) withi = i+ 1.

5. Error analysis

We estimate the error of the numerical solution bythe finite element method with respect to θn obtainedin the solution in Section 4. Let Dh = ∪K be a fi-nite element approximation of D with elements K,h = maxK∈K diam(K). For positive integer k and

even number q ≥ d, we restrict ui, vli and ρlG,i to

W k+1,q(Dh;R), and θi on Dh. We denote θh,i = θi +δθi is the approximation of θi, ui = ui + δui ∈W k+1,q(Dh;R) and vli = vli + δvli ∈ W k+1,q(Dh;R) arethe analytical solutions of Problems 1 and 3 replacingθi by θh,i. Let uh,i = ui + δui, v

lh,i = vli + δvli, G

lh,i =

Gli + δGl

i, ρlGh,i = ρlG,i + δρlG,i, ρGh,i = ρG,i + δρG,i, and

ρh,i = ρi + δρi be the approximate functions of ui, vli,

Gli, ρ

lG,i, ρG,i, and ρi, respectively. ρ

lG,i = ρlG,i + δρlG,i ∈

W k+1,q(Dh;R) represents an analytical solution of (3)replacing Gl

i by Glh,i. Also, let λh = (λlh,i)l be the solu-

tion to Ahλh = −bh with Ah = (ah,jl)jl, ah,jl = ⟨Gjh,i,

ρlGh,i⟩, bh = (Jjh + ah,j0)j , J

jh = Jj(θh,i, uh,i). We use

∥u∥j,q =

( j∑k=0

|u|qk,q

)1/q, |u|j,q =

[ ∫Dh

(∇ju)qdx

]1/qas the W j,q norm ∥ · ∥j,q and seminorm | · |j,q on Dh forj ∈ 0, 1, q ∈ 4, 6, . . . ,∞ with ∇0 = 1. We set thefollowing necessary hypotheses to evaluate the error.

(H1) We take α ≥ 2 in Problems 1, 3 and (2).

(H2) There exist some positive constants C1, C2, C3

independent of h such that

∥ui − uh,i∥j,q ≤ C1hk+1−j |ui|k+1,q, (5)

∥vli − vlh,i∥j,q ≤ C2hk+1−j |vli|k+1,q, (6)

∥ρlG,i − ρlGh,i∥j,q ≤ C3hk+1−j |ρlG|k+1,q. (7)

(H3) For J l(θ, u), we restrict jl(θ, u) to a function of u,i.e. jl(u), jl ∈ C2(W 1,q(D;R);L1(D;R)), and gl ∈C2(Y ;L1(D;R)) for Y = S ×W 1,q(D;R) such thatjlu ∈ C1(W 1,q(D;R);W 1,∞(D;R)), glθ, glu ∈ C1(Y ;L∞(D;R)), jluu ∈ C0(W 1,q(D;R); W 1,∞(D;R)),and glθθ, g

lθu, g

luθ, g

luu ∈ C0(Y ;L∞(D;R)), respec-

tively.

(H4) There exists C4 > 0 such that ∥A−1∥∞ < C4,where ∥ · ∥∞ is the maximum norm on Rm and thecorresponding operator norm for m×m matrices.

Then we have the following main theorem.

Theorem 4 (Error of θn) Assume from (H1) to(H4). Then there exists a constant C > 0 independentof ε and h such that ∥δθn∥1,q ≤ Cεnhk holds for n.

Here εn = T can be considered as the total amount ofvariation of θ. To prove this theorem, we introduce aninduction hypothesis for θh,i:

∥δθi∥1,q ≤ Cεihk (8)

for i ∈ 0, 1, . . . , n− 1 and the lemmas below.

Lemma 5 (Error of ui) Assume (H1), (H2) and (8).Then there exists a constant C ′

1 > 0 independent of εand h such that ∥δui∥1,q ≤ C ′

1(εi+ 1)hk holds.

Proof ui and ui satisfy∫Dh

ϕα(θi)∇δui ·∇vldx

=

∫Dh

(ϕα(θi)− ϕα(θh,i))∇ui ·∇vldx. (9)

By taking ∇vl = (∇δui)q−1 and m = minθi∈Dh

ϕα(θi)in (9), we have

m|δui|q1,q

≤ α∥δθi∥0,∞|ui|1,q|δui|q−11,q

× maxt∈[0,1]

∥ϕα−1(θi + tδθi)ϕθ(θi + tδθi)∥0,∞. (10)

By substituting (8) into (10) and dividing (10) by|δui|q−1

1,q , noticing (H1), we obtain |δui|1,q ≤ C ′1εih

k. Us-ing the Poincare inequality, we get

∥δui∥1,q ≤ C ′1εih

k (11)

by rewriting C ′1 > 0. By substituting (11) and (5) into

∥δui∥1,q ≤ ∥δui∥1,q+∥ui−uh,i∥1,q, the proof is complete.

(QED)

Lemma 6 (Error of vli) Assume from (H1) to (H3)and (8). Then there exists a constant C ′

2 > 0 indepen-dent of ε and h such that

∥∥δvli∥∥1,q ≤ C ′2(εi+1)hk holds.

Proof Noticing (H3), vli and vli satisfy∫

Dh

ϕα(θi)∇δvli ·∇u′dx

=

∫Dh

(ϕα(θi)− ϕα(θh,i))∇vli ·∇u′dx

+

∫Dh

(glu(θh,i, uh,i)− glu(θi, ui))u′dx

+

∫∂Dh

(jlu(uh,i)− jlu(ui))u′dγ. (12)

By taking ∇u′ = (∇δvli)q−1 and using the Poincare in-

equality, we have∫Dh

(glu(θh,i, uh,i)− glu(θi, ui))u′dx

≤ ∥δθi∥0,q|δvli|q−11,q max

t∈[0,1]∥gluθ(θi + tδθi, ui)∥0,∞

+ ∥δui∥0,q|δvli|q−11,q max

t∈[0,1]∥gluu(θh,i, ui + tδui)∥0,∞

(13)

– 74 –


and∫∂Dh

(jlu(uh,i)− jlu(ui))u′dγ

=

∫Dh

∇[(jlu(uh,i)− jlu(ui))u′] dx

≤ |δui|1,q|δvli|q−11,q max

t∈[0,1]|jluu(ui + tδui)|1,∞

+ ∥δui∥0,q|δvli|q−11,q max

t∈[0,1]∥jluu(ui + tδui)∥0,∞.

(14)

By the same argument as in the proof of Lemma 5, sub-stituting (13) and (14) into (12), we have

m∥δvli∥1,q ≤ C ′′1m|δvli|1,q

≤ C ′′1α∥δθi∥0,∞∥vli∥1,q

× maxt∈[0,1]

∥ϕα−1(θi + tδθi)ϕθ(θi + tδθi)∥0,∞

+ C ′′1 ∥δθi∥0,q max

t∈[0,1]∥gluθ(θi + tδθi, ui)∥0,∞

+ C ′′1 ∥δui∥0,q max

t∈[0,1]∥gluu(θh,i, ui + tδui)∥0,∞

+ C ′′1 |δui|1,q max

t∈[0,1]|jluu(ui + tδui)|1,∞

+ C ′′1 ∥δui∥0,q max

t∈[0,1]∥jluu(ui + tδui)∥0,∞ (15)

for some constant C ′′1 > 0. From (H3), substituting (8)

and (11) into (15) and substituting (15) and (6) into∥δvli∥1,q ≤ ∥δvli∥1,q+∥vli−vlh,i∥1,q, the proof is complete.

(QED)

Lemma 7 (Error of Gi) Assume from (H1) to (H3)and (8). Then there exists a constant C ′

3 > 0 indepen-dent of ε and h, such that ∥δGl

i∥0,q ≤ C ′3(εi+1)hk holds.

Proof By (H3), Gli and G

lh,i satisfy

δGli = glθ(θi, ui)− glθ(θh,i, uh,i)

+ αϕα−1(θh,i)ϕθ(θh,i)∇uh,i ·∇vlh,i

− αϕα−1(θi)ϕθ(θi)∇ui ·∇vli. (16)

We estimate the bound on the first and the second termsin the right-hand side of (16) as

∥glθ(θi, ui)− glθ(θh,i, uh,i)∥0,q

≤ ∥δui∥0,∞ maxt∈[0,1]

∥glθu(θi,h, ui + tδui)∥0,∞

+ ∥δθi∥0,∞ maxt∈[0,1]

∥glθθ(θi + tδθi, ui)∥0,∞. (17)

By using triangle inequality, we can estimate the remain-ing terms as

α∥ϕα−1(θh,i)− ϕα−1(θi)∥0,∞∥ϕθ(θi)∇ui ·∇vli∥0,q

+ α∥ϕθ(θh,i)− ϕθ(θi)∥0,∞∥ϕα−1(θh,i)∇ui ·∇vli∥0,q

+ α∥ϕα−1(θh,i)ϕθ(θh,i)∇vli∥0,∞|uh,i − ui|1,q

+ α∥ϕα−1(θh,i)ϕθ(θh,i)∇uh,i∥0,∞|vlh,i − vli|1,q (18)

and

∥ϕα−1(θh,i)− ϕα−1(θi)∥0,∞≤ (α− 1)∥δθi∥0,∞

× maxt∈[0,1]

∥ϕα−2(θi + tδθi)ϕθ(θi + tδθi)∥0,∞,

∥ϕθ(θh,i)− ϕθ(θi)∥0,∞≤ ∥δθi∥0,∞ max

t∈[0,1]∥ϕθθ(θi + tδθi)∥0,∞.

We can obtain the result in this lemma by substituting(17) and (18) into (16), using Lemmas 5, 6, (8), (H1)and (H3).

(QED)

Lemma 8 (Error of ρli) Assume from (H1) to (H3)and (8). Then there exists a constant C ′

4 > 0 indepen-dent of ε and h, such that ∥δρlG,i∥1,q ≤ C ′

4(εi + 1)hk

holds.

Proof ρlG,i and ρlG,i satisfy∫

Dh

(∆δρlG,i − cδρlG,i)y dx = ⟨δGli, y⟩.

Taking y = (q − 1)δρlG,i(∇δρlG,i)q−2 and considering

∇δρlG,i = 0 on ∂Dh, we have

|δρlG,i|q1,q + c(q − 1)

∫Dh

(δρlG,i)2(∇δρlG,i)

q−2 dx

≤ (q − 1)∥δGli∥0,q|δρlG,i|

q−21,q |δρlG,i|0,q. (19)

Now we divide (19) by |δρlG,i|q−21,q . Then, since q (> d)

is even number and the Poincare inequality, we get∥δρlG,i∥1,q ≤ C ′′

4 (q − 1)∥δGli∥0,q for some constant C ′′

4 >0. By substituting (7) into

∥δρlG,i∥1,q ≤ ∥δρlG,i∥1,q + ∥ρlG,i − ρlGh,i∥1,q,

and using Lemma 7, the proof is complete.(QED)

Lemma 9 (Error of λli) Assume from (H1) to (H4)and (8). Then there exists a constant C ′

5 > 0 indepen-dent of ε and h, such that |λli − λlh,i| ≤ C ′

5(εi + 1)hk

holds.

Proof λ and λh satisfy

A(λ− λh) = bh − b− (A−Ah)λh

By (H4) and multiplying by A−1, we get

∥λ− λh∥∞

≤ ∥A−1∥∞(∥b− bh∥∞ + ∥A−Ah∥∞∥λh∥∞)

≤ ∥A−1∥∞(1 +m∥λh∥∞) maxj∈1,...,m,l∈0,...,m

|ajl − ah,jl|

+ ∥A−1∥∞ maxj∈1,...,m

|Jj(θi, ui)− Jj(θh,i, uh,i)|.

Here,

|ajl − ah,jl| ≤ |⟨δGji , ρ

lG,i⟩|+ |⟨G

jh,i, δρ

lG,i⟩|

≤ ∥δGji∥0,2∥ρ

lG,i∥0,2 + ∥G

jh,i∥0,2∥δρ

lG,i∥0,2.

– 75 –


(0,1)

(0,0)(1,1)

f

D

(a) f in Problem 1 (b) ϕ(θ1/20,100)

Fig. 1. Setting for Problem 1 and converged ϕ.

Table 1. Results of− log2 ∥δθn∥1,2 with T = εn = 10 to Problem1.

n h=1/5 h=1/10 h=1/20 h=1/40 h=1/80

50 0.9012 1.8513 2.8614 3.8983 5.0598incr. 0.9501 1.0101 1.0369 1.1615

100 0.9201 1.8655 2.8397 3.8761 5.0407incr. 0.9454 0.9742 1.0364 1.1646200 1.4861 2.4064 3.4106 4.4414 5.6005incr. 0.9203 1.0042 1.0308 1.1591

400 1.0518 2.0617 3.0481 4.0759 5.2331incr. 1.0099 0.9864 1.0278 1.1572800 0.7343 1.6836 2.7203 3.7521 4.9121incr. 0.9493 1.0367 1.0318 1.1600

and

|Jj − Jjh|

≤ ∥δθi∥0,∞ maxt∈[0,1]

∥gjθ(θi + tδθi, ui)∥0,∞

+ ∥δui∥0,∞ maxt∈[0,1]

∥gju(θh,i, ui + tδui)∥0,∞

+ |δui|1,∞ maxt∈[0,1]

|jjuu(ui + tδui)|1,∞

+ ∥δui∥0,∞ maxt∈[0,1]

∥jjuu(ui + tδui)∥0,∞.

From (8), Lemmas 5, 6, 7 and 8, the lemma is proven.

(QED)

Lemma 10 (Error of ρi) Assume from (H1) to (H4)and (8). Then there exists a constant C ′

6 > 0 indepen-dent of ε and h, such that ∥δρi∥1,q ≤ C ′

6(εi+1)hk holds.

Proof By (4), ρi and ρh,i satisfy ∥δρi∥1,q ≤2∥δρG,i∥1,q/∥ρG,i∥1,2 and

∥δρG,i∥1,q

≤ (m+ 1) maxl∈0,...,m

|λlh,i| maxl∈0,...,m

∥δρlG,i∥1,q

+m maxl∈1,...,m

|λli − λlh,i| maxl∈1,...,m

∥ρlG,i∥1,q.

By using Lemmas 8 and 9, the theorem is proven.

(QED)

Proof of Theorem 4 If n = 0, we have Theorem 4by θ0 = θh,0.If n > 0, for i ∈ 0, . . . , n − 1, we have ∥δθi+1∥1,q ≤

ε∥δρi∥1,q + ∥δθi∥1,q with ∥δθ0∥1,q = 0. By applyingLemma 10 and (8) to the previous inequality, we have∥δθi+1∥1,q ≤ maxC ′

6, Cε(i + 1)hk + C ′6ε

2ihk. Since εis a small constant, C = maxC ′

6, C, and n = i+ 1, weobtain Theorem 4. (QED)

¡D D

p

(a) Boundary condition (b) ϕ(θ1/20,800)

Fig. 2. Setting for linear elastic problem and converged ϕ.

Table 2. Results of − log2 ∥δθn∥1,2 with T = εn = 80 to a linear

elastic problem.

n h=1/5 h=1/10 h=1/20 h=1/40 h=1/80

400 −5.7374 −4.9172 −4.0073 −2.9443 −1.6998incr. 0.8202 0.9099 1.0630 1.2445800 −5.7580 −4.9492 −4.0596 −3.0060 −1.7628incr. 0.8088 0.8896 1.0536 1.2432

1600 −5.7598 −4.9506 −4.0618 −3.0086 −1.7656incr. 0.8092 0.8888 1.0532 1.24303200 −5.7607 −4.9514 −4.0629 −3.0099 −1.7669

incr. 0.8093 0.8885 1.0530 1.24306400 −5.7611 −4.9517 −4.0635 −3.0106 −1.7676incr. 0.8094 0.8882 1.0529 1.2430


For Problem 1, we use the setting D = [0, 1]2, ΓD =∂D, f = 2[x21 + x22 − (x1 + x2)], uD = 0, ϕ(θ) = (tanh θ+1)/2, α = 2. The cost functions are assumed asJ0(θ, u) =

∫Dfu dx and J1(θ) =

∫Dϕ(θ) dx− c1, where

c1 is taken as J1(θ0) = 0 for θ0 = 0. We take c = 1 in(3). D is approximated as Dh using triangular element.We take k = 1 in (H2). Fig. 1 shows f and convergedϕ obtained by the present method. Table 1 shows theresults of − log2 ∥δθn∥1,2 with T = nε = 10.Another example is a SIMP problem for linear elastic

continuum. Let D = [0, 3] × [0, 2], p ∈ H3/2(ΓN;R2)be a traction force, uD = 0 ∈ H2(ΓD;R2), and u ∈H1(D;R2) be a displacement as a solution of the linearelastic problem for p. A mean compliance J0(θ,u) =∫ΓN

p ·u dγ and a mass J1(θ) =∫Dϕ(θ) dx− c1 are used

as cost functions. We have G0a = −αϕα−1ϕθσ(u) · ε(u)

for J0 where σ(u) and ε(u) are denoted by the stressand the strain, respectively. The space approximation ofD, c1, α, c and k are the same as above. Fig. 2 shows theproblem setting and the result ϕ obtained by the presentmethod. Table 2 shows the results of− log2 ∥δθn∥1,2 withT = nε = 80.From Tables 1 and 2, we can observe ∥δθn∥1,2 achieves

first order convergence in the H1 norm with respect toh expected by Theorem 4 with k = 1. Also, these tablesshow ∥δθn∥1,2 is independent of T = εn.

Acknowledgments

We want to thank Prof. Norikazu Saito and reviewerfor their valuable comments of the proof. The presentstudy was supported by JSPS KAKENHI (20540113).

References

[1] M.P.Bendsøe and O.Sigmund, Topology optimization : theory,methods and applications, Springer, 2003.

[2] H. Azegami, S. Kaizu and K. Takeuchi, Regular solution to

topology optimization problems of continua, JSIAM Letters,3 (2011), 1–4.

– 76 –


Evolution of bivariate copulas in discrete processes

Yasukazu Yoshizawa1 and Naoyuki Ishimura1

1 Graduate School of Economics, Hitotsubashi University, 2-1 Naka Kunitachi, Tokyo 186-8601,Japan

E-mail ed091006 g.hit-u.ac.jp, ishimura econ.hit-u.ac.jp

Received October 13, 2011, Accepted October 19, 2011

Abstract

A copula function makes a bridge between multivariate joint distributions and univariatemarginal distributions, and provides a flexible way of describing nonlinear dependence amongrandom circumstances. We introduce a new family of bivariate copulas which evolves accordingto the discrete process of heat equation. We prove the convergence of solutions as well as themeasure of dependence. Numerical experiments are also performed, which shows that ourprocedure works substantially well.

Keywords copula, discrete processes, risk management


1. Introduction

There has been much interest in the theory of copulasthese days. A copula technique provides a flexible andconvenient method of describing nonlinear dependenceamong multivariate random events. Copulas make a linkbetween a multivariate joint distribution and univariatemarginal distributions. The technique is employed notonly in statistics but also in many areas of applications,which include financial engineering, risk management,actuarial science, seismology and so on. We refer to [1–10] and the references therein.In the case of bivariate joint distribution, the defini-

tion of copula and the fundamental theorem developedby A. Sklar [11] is expressed as follows.

Definition 1 A function C defined on I2 := [0, 1] ×[0, 1] and valued in I is called a copula if the followingconditions are fulfilled.

(i) For every (u, v) ∈ I2,

C(u, 0) = C(0, v) = 0,

C(u, 1) = u and C(1, v) = v. (1)

(ii) For every (ui, vi) ∈ I2 (i = 1, 2) with u1 ≤ u2 andv1 ≤ v2,

C(u1, v1)− C(u1, v2)− C(u2, v1) + C(u2, v2) ≥ 0.(2)

The requirement (2) is referred to as the 2-increasingcondition. We also note that a copula is continuous byits definition.

Theorem 2 (Sklar’s theorem) Let H be a bivari-ate joint distribution function with marginal distributionfunctions F and G; that is,

limx→∞

H(x, y) = G(y), limy→∞

H(x, y) = F (x).

Then there exists a copula, which is uniquely determined

on RanF × RanG, such that

H(x, y) = C(F (x), G(y)). (3)

Conversely, if C is a copula and F and G are distribu-tion functions, then the function H defined by (3) is abivariate joint distribution function with margins F andG.

In this article, we introduce a new family of bivariatecopulas, which evolves according to discrete process.Although there exist many one-parameter families

of copulas, such as the Clayton family, the Gumbel-Hougaard family and the Frank family, little attentionseems to have been paid to the time-dependent copulasdespite its importance. We just recall one important ex-ception of the concept of dynamic copula due to A. J.Patton [12].On the other hand, we have introduced the time evo-

lution of copulas in [13–15]. To be precise, we considera time parameterized family of copulas C(u, v, t)t≥0,which satisfy the heat equation:

∂C

∂t(u, v, t) =

(∂2

∂u2+

∂2

∂v2

)C(u, v, t). (4)

Here, by the definition of copula, we understand thatC(·, ·, t) fulfills (1), (2); to be precisely, we postulate that

(i) for every (u, v, t) ∈ I2 × (0,∞),

C(u, 0, t) = C(0, v, t) = 0,

C(u, 1, t) = u and C(1, v, t) = v. (5)

(ii) for every (ui, vi, t) ∈ I2 × (0,∞) (i = 1, 2) withu1 ≤ u2 and v1 ≤ v2,

C(u1, v1, t)− C(u1, v2, t)

− C(u2, v1, t) + C(u2, v2, t) ≥ 0. (6)

The stationary solution to (4), which is referred toas the harmonic copula, is uniquely determined to be

– 77 –

JSIAM Letters Vol. 3 (2011) pp.77–80 Yasukazu Yoshizawa et al.

Π(u, v) := uv, in view of the boundary condition (1).We note that the copula Π represents the independentstructure between two respective random variables.Here we discretize (4) in a sense, and define a time-

dependent family of copulas in discrete processes. Wehope that these discretized families are rather ready tobe numerically computed and to be applied in many sit-uations. We exhibit some examples in Section 4.

2. Discrete processes of copulas

The construction of our discretely parametrized fam-ily of copulas proceeds as follows.Let N ≫ 1 and 0 < h≪ 1. We put

∆u = ∆v :=1

N, ∆t := h,

λ :=∆t

(∆u)2=

∆t

(∆v)2= hN2,

and

ui := i∆u =i

Nfor i = 0, 1, . . . , N,

vj := j∆v =j

Nfor j = 0, 1, . . . , N.

Our family of copulas Cn(u, v)n=0,1,2,... is now de-fined as follows: First,

C0(u, v) := C0(u, v),

where C0 denotes given initial copula.At (ui, vj)i,j=0,1,...,N , the value Cn

i,j := Cn(ui, vj) isgoverned by the system of difference equations

Cn+1i,j − Cn

i,j

∆t=Cn

i+1,j − 2Cni,j + Cn

i−1,j

(∆u)2

+Cn

i,j+1 − 2Cni,j + Cn

i,j−1

(∆v)2

for i, j = 1, 2, . . . , N − 1, (7)

together with the boundary conditionsCn

0,j = 0 = Cni,0

Cni,N = ui, Cn

N,j = vjfor i, j = 0, 1, . . . , N. (8)

As to the point (u, v) ∈ I2 other than (ui,vj)i,j=0,1,...,N , the value Cn(u, v) is provided by linearinterpolation. That is, if for instance

ui ≤ u ≤ ui+1, vj ≤ v ≤ vj+1, v − vj ≤ u− ui,

then

Cn(u, v) := Cni,j +

Cni+1,j − Cn

i,j

ui+1 − ui(u− ui)

+Cn

i+1,j+1 − Cni+1,j

vi+1 − vi(v − vj).

Other parts are computed similarly.It is easy to check that a sequence of copulasCn(u, v)n=0,1,2,... defined above verify the boundaryconditions (1) as well as the 2-increasing condition (2)provided λ ≤ 1/4. We also note that in this range of λ,the difference scheme (7) is stable.Next, we define Dn(u, v) := Cn(u, v)− uv; the exten-

sion to all I2 has been made similarly above. It thenfollows that Dn

i,j := Cni,j − uivjn=0,1,2,... satisfies the

system of difference equations (7) with the null bound-ary conditions. Consequently we see that

max(u,v)∈I2

|Dn(u, v)| ≤ Kθn, (9)

for some constants K, θ with 0 < θ < 1, provided λ <1/4. In particular, we have Dn → 0 as n→∞ uniformlyon I2.To summarize, we have established the next theorem.

Theorem 3 For any initial copula C0, there existsa sequence of copulas Cn(u, v)n=0,1,2,..., which sat-isfy the system of difference equations (7) at every(ui, vj)i,j=0,1,...,N . As n→∞, it follows that

Cn(u, v)→ uv uniformly on I2.

3. Measure of dependence

It is an important subject for research to quantita-tively estimate the dependence relation between randomvariables. For this purpose, several measures of associ-ation have been already introduced so far. We recall,as widely known examples, the population version ofKendall’s tau and the Spearman’s rho, which will bedenoted by τ and ρ, respectively.The formulas for τ and ρ in terms of copula function

is well known. There is also the formulas with respect tothe empirical copulas (see Section 5.6 of [6]), which canbe utilized for our discretized family of copulas. For thecompleteness of our exposition, we here reproduce them.

τ =2N

N − 1

N∑i,j=2

(Cni,jC

ni−1,j−1 − Cn

i,j−1Cni−1,j),

ρ =12

N2 − 1

N∑i,j=1

(Cni,j − uivj).

Thanks to these formulas, the convergence as n→∞ isdeduced directly, which is read as follows.

Theorem 4 For any initial copula C0, a sequence ofcopulas Cn(u, v)n=0,1,2,... proved in Theorem 3 fulfills

|τ |, |ρ| → 0 exponentially as n→∞.

Sketch of Proof In view of (9) and the uniform boundmax(u,v)∈I2 |C(u, v)| ≤ 1, we assert that

maxi,j=0,1,...,N

|Cni,j − uivj | → 0

exponentially as n→∞. Taking into account that

uivjui−1vj−1 − uivj−1ui−1vj = 0,

we see immediately the desired result.(QED)


The time evolution of copulas in discrete processeshave strong affinity to their numeric solutions. Thus wecan construct the copulas in accordance with (7) in The-orem 3. For examples, we calculate a Clayton copulaC(u, v) = (u−θ + v−θ− 1)−1/θ (0 < θ <∞) and its time

– 78 –


Time evolution of Densities of time evolution ofClayton copula with θ = 5 Clayton copula with θ = 5

t = 0 (Clayton copula)10

0

0

1

t = 3/50

1.6

0

0

1

t = ∞ (Harmonic copula)

1

0

0

1

Fig. 1. Time evolution copulas.

evolution in the left side of Fig. 1. As the time evolves,the copulas are smoothed and converge to the harmonic(independence) copula Π.We are also able to compute densities of above copulas

by the following formula.

The density =Cn

i,j − Cni,j+1 − Cn

i+1,j + Cni+1,j+1

(∆u)(∆v). (10)

As an example, we calculate densities of Clayton cop-ula and its time evolution, which are depicted in the rightside of Fig. 1. According to the time evolution, the den-sities of copulas are seen to be smoothed and convergeto a flat surface of density one, which is the density ofthe independence copula Π.

5. Discussions

As stated in Section 1, copulas are employed in quanti-tative risk management (QRM) for financial engineering,actuarial science, seismology and so on. There are manyrisk elements and their dependencies are very critical forrisk management, especially measuring aggregated risks.Recently copulas are recognized sophisticated method toexpress quantitatively dependencies between risks. QRMis of use for various purposes. For examples, state depen-dencies may be enough for solvency regulation purpose,but dynamic dependencies must play important roles inQRM for catastrophic events, such as earthquake, ty-phoon and financial crisis. We have constructed the timeevolution of copulas for the purpose of analyzing dy-

namic dependencies, which are concordant with Brown-ian motion. Unfortunately they have smoothing natureand our family of time evolving copulas converges to theharmonic copula, which means the independence rela-tion.Probably in many aspects of applications, the time

evolution toward the intended dependence will be muchrelevant. Thus we propose the backward type of evolu-tion of copulas, which is obtained by transforming theopposite direction of (7):

CT (u, v) := CT (u, v),

where CT denotes given maturity copula and the systemof difference equations is

CT−(n+1)hi,j − CT−nh

i,j

∆t

=CT−nh

i+1,j − 2CT−nhi,j + CT−nh

i−1,j

(∆u)2

+CT−nh

i,j+1 − 2CT−nhi,j + CT−nh

i,j−1

(∆v)2

for i, j = 1, 2, . . . , N − 1, n = 0, 1, . . . , [T/h],(11)

together with the boundary condition (8). We solve (8)and (11) backwardly from the maturity to present.

Acknowledgments

The authors are grateful to the referee for helpful com-ments. The second author (N. I.) is supported in part bythe grant from the Japan Society for the Promotion ofSciences (No.21540117), as well as the research grant(2011) from the Tokio Marine Kagami Memorial Foun-dation.

References

[1] E. W. Frees and E. A. Valdez, Understanding relationships

using copulas, N. Amer. Actuarial J., 2 (1998), 1–25.[2] K.Goda, Statistical modeling of joint probability distribution

using copula: Application to peak and permanent displace-ment seismic demands, Struct. Safety, 32 (2010), 112–123.

[3] K. Goda and G. M. Atkinson, Interperiod dependence ofground-motion prediction equations: A copula perspective,Bull. Seism. Soc. Amer., 99 (2009), 922–927.

[4] R. Lebrun and A. Dutfoy, A generalization of the Nataf trans-

formation to distributions with elliptical copula, Probab.Eng.Mech., 24 (2009), 172–178.

[5] A. J. McNeil, R. Frey and P. Embrechts, Quantitative RiskManagement, Princeton Univ. Press, Princeton, 2005.

[6] R. B. Nelsen, An Introduction to Copulas, 2nd edition,Springer Series in Statistics, Springer, New York, 2006.

[7] H. Tsukahara, Copulas and their applications (in Japanese),

Jpn J. Appl. Statist., 32 (2003), 77–88.[8] Y. Yoshizawa, Modeling for the enterprise risk management

(in Japanese), Sonpo-Soken Report, 90 (2009), 1–49.[9] Y. Yoshizawa, Risk management of extremal events (in

Japanese), Sonpo-Soken Report, 92 (2010), 35–90.[10] R. W. J. van den Goorbergh, C. Genest and B. J. M. Werker,

Bivariate option pricing using dynamic copula models, Insur-ance: Math. Econ., 37 (2005), 101–114.

[11] A. Sklar, Random variables, joint distribution functions, andcopulas, Kybernetika, 9 (1973), 449–460.

[12] A. J. Patton, Modelling asymmetric exchange rate depen-

– 79 –


dence, Int. Econ. Rev., 47 (2006), 527–556.[13] N. Ishimura and Y. Yoshizawa, On time-dependent bivariate

copulas, Theor. Appl. Mech. Jpn, 59 (2011), 303–307.[14] N. Ishimura and Y. Yoshizawa, A note on the time evolution

of bivariate copulas, in: Proc. of FAM2011, Sofia Univ., toappear.

[15] Y. Yoshizawa and N. Ishimura, Time Evolution Copulas andRank Correlation (in Japanese), in: Proc. of JCOSSAR 2011.

– 80 –


On boundedness of the condition number of the coefficient

matrices appearing in Sinc-Nystrom methods

for Fredholm integral equations of the second kind

Tomoaki Okayama1, Takayasu Matsuo2 and Masaaki Sugihara2

1 Graduate School of Economics, Hitotsubashi University, 2-1, Naka, Kunitachi, Tokyo 186-8601, Japan

2 Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1,Hongo, Bunkyo, Tokyo, 113-8656, Japan

E-mail tokayama econ.hit-u.ac.jp

Received September 27, 2011, Accepted November 8, 2011

Abstract

Sinc-Nystrom methods for Fredholm integral equations of the second kind have been indepen-dently proposed by Muhammad et al. and Rashidinia-Zarebnia. They also gave error analyses,but the results did not claim the convergence of their schemes in a precise sense. This is be-cause in their error estimates there remained an unestimated term: the norm of the inverseof the coefficient matrix of the resulting linear system. In this paper, we estimate the termtheoretically to complete the convergence estimate of their methods. Furthermore, we alsoprove the boundedness of the condition number of each coefficient matrix.

Keywords Sinc method, Fredholm integral equation, condition number, Nystrom method


1. Introduction

We are concerned with Fredholm integral equations ofthe second kind of the form

λu(t)−∫ b

a

k(t, s)u(s) ds = g(t), a ≤ t ≤ b, (1)

where λ is a given constant, g(t) and k(t, s) are givencontinuous functions, and u(t) is the solution to be deter-mined. Various numerical methods have been proposedto solve (1), and the convergence rate of most existingmethods has been polynomial with respect to the num-ber of discretization points N [1].One of the exceptions is the Sinc-Nystrom method,

which has firstly been developed by Muhammad etal. [2]. According to their error analysis, the method canconverge exponentially if the coefficient matrix of the re-sulting linear equations, say AN , does not behave badly.To be more precise, the error of the numerical solutionuN (t) has been estimated as

maxt∈[a,b]

|u(t)− uN (t)| ≤ C∥A−1N ∥2 exp

(−cNlogN

), (2)

where C and c are positive constants independent ofN . In their numerical experiments the term ∥A−1

N ∥2 re-mained low for all N , which suggested that the methodcan converge exponentially. Afterwards Rashidinia-Zarebnia [3] have proposed another type of Sinc-Nystrom methods, and claimed that the error can beestimated as

maxt∈[a,b]

|u(t)− uN (t)| ≤ C∥A−1N ∥2 exp(−c

√N), (3)

which also suggested the exponential convergence oftheir method. Strictly speaking, however, the exponen-tial convergence of those two methods still has not beenestablished at this point since the dependence of theterms ∥A−1

N ∥2 and ∥A−1N ∥2 on N has not been clarified.

It seems direct estimates of them are difficult, and thatwas the reason why they have remained open.In this paper, we take a different approach: we give

estimates in ∞-norm as

∥A−1N ∥∞ ≤ K, ∥A−1

N ∥∞ ≤ K,

for some constants K and K. Through ∥X∥2 ≤√n∥X∥∞ for any n × n matrix X, the estimates im-

ply the desired exponential convergence estimates. Thekey here is the analysis of Sinc-collocation methods pre-viously given by the present authors [4].The above approach has another virtue that we can

show a stronger result; we also show

∥AN∥∞ ≤ K ′, ∥AN∥∞ ≤ K ′,

from which the condition numbers of the matrices arebounded (in the sense of ∞-norm). This result guaran-tees not only that the two methods converge exponen-tially, but also that the resulting linear equations do notbecome ill-conditioned as N increases.This paper is organized as follows. In Section 2, we ex-

plain the concrete procedure of the Sinc-Nystrom meth-ods. New theoretical results are described in Section 3with their proofs. In Section 4 a numerical example isshown. Section 5 is devoted to conclusions.

– 81 –

JSIAM Letters Vol. 3 (2011) pp.81–84 Tomoaki Okayama et al.

2. Sinc-Nystrom methods

2.1 Sinc quadrature

In the Sinc-Nystrom methods, the Sinc quadrature:∫ ∞

−∞F (x) dx ≈ h

N∑j=−N

F (jh) (4)

is employed to approximate the integral. Although theinterval of the integral in (1) is finite, we can apply theSinc quadrature by combining it with a variable trans-formation. Rashidinia-Zarebnia [3] utilized the Single-Exponential (SE) transformation defined by

t = ψSE(x) =b− a2

tanh(x2

)+b+ a

2,

which enables us to apply the Sinc quadrature as follows:∫ b

a

f(t) dt =

∫ ∞

−∞f(ψSE(x))ψSE′

(x) dx

≈ hN∑

j=−N

f(ψSE(jh))ψSE′(jh). (5)

Muhammad et al. [2] utilized another one:

t = ψDE(x) =b− a2

tanh(π2sinhx

)+b+ a

2,

which is called the Double-Exponential (DE) transfor-mation. By using the DE transformation we have:∫ b

a

f(t) dt ≈ hN∑

j=−N

f(ψDE(jh))ψDE′(jh). (6)

In order to achieve quick convergence with the Sincquadrature (4), it is necessary that the integrand F isanalytic and bounded in the strip domain: Dd = z ∈C : | Im z| < d for a positive constant d. Accordingly,as for the approximations (5) and (6), it is appropriateto introduce the following function space.

Definition 1 Let D be a bounded and simply-connecteddomain (or Riemann surface). Then we denote byH∞(D) the family of all functions that are analytic andbounded in D .

The domain D should be either ψSE(Dd) or ψDE(Dd),

i.e., we may assume f ∈ H∞(ψSE(Dd)) for the approx-imation (5), and f ∈ H∞(ψDE(Dd)) for the approxima-tion (6).

2.2 SE-Sinc-Nystrom method

Firstly we explain the method derived by Rashidinia-Zarebnia [3]. Assume the following two conditions:

(SE1) u ∈ H∞(ψSE(Dd)),(SE2) k(t, ·) ∈ H∞(ψSE(Dd)) for all t ∈ [a, b].

Then the integral K[u](t) :=∫ b

ak(t, s)u(s) ds in (1) can

be approximated by

KSEN [u](t) := h

N∑j=−N

k(t, ψSE(jh))u(ψSE(jh))ψSE′(jh).

The mesh size h here is chosen as h =√2πd/N . Then,

corresponding to the original equation u = (g + Ku)/λ,

we consider the new equation:

uSEN (t) =

g(t) +KSEN [uSE

N ](t)

λ. (7)

The approximated solution uSEN is obtained by determin-

ing the unknown coefficients in KSEN uSE

N , i.e.,

uSEn = [uSE

N (ψSE(−Nh)), . . . , uSEN (ψSE(Nh))]T,

where n = 2N + 1. To this end, let us discretize (7) att = ψSE(ih) (i = −N, . . . , N), and consider the resultingsystem of linear equations

(λIn −KSEn )uSE

n = gSEn , (8)

where KSEn is an n× n matrix whose (i, j) element is

(KSEn )ij = k(ψSE(ih), ψSE(jh)), i, j = −N, . . . , N,

and gSEn is an n-dimensional vector defined by

gSEn = [g(ψSE(−Nh)), . . . , g(ψSE(Nh))]T.

By solving the system (8), the desired solution uSEN is

obtained. This is called the SE-Sinc-Nystrom method.

2.3 DE-Sinc-Nystrom method

Next we explain the method derived by Muhammadet al. [2]. Assume the following two conditions:

(DE1) u ∈ H∞(ψDE(Dd)),(DE2) k(t, ·) ∈ H∞(ψDE(Dd)) for all t ∈ [a, b].

Then the integral Ku in (1) can be approximated by

KDEN [u](t) := h

N∑j=−N

k(t, ψDE(jh))u(ψDE(jh))ψDE′(jh).

The mesh size h here is chosen as h = log(4dN)/N .Then, instead of the original equation u = (g + Ku)/λ,we consider the new equation:

uDEN (t) =

g(t) +KDEN [uDE

N ](t)

λ. (9)

To obtain the approximated solution uDEN , we have to

determine the unknown coefficients in KDEN uDE

N , i.e.,

uDEn = [uDE

N (ψDE(−Nh)), . . . , uDEN (ψDE(Nh))]T.

By discretizing (9) at t = ψDE(ih) (i = −N, . . . , N), wehave the linear system:

(λIn −KDEn )uDE

n = gDEn , (10)

where KDEn is an n× n matrix whose (i, j) element is

(KDEn )ij = k(ψDE(ih), ψDE(jh)), i, j = −N, . . . , N,

and gDEn is an n-dimensional vector defined by

gDEn = [g(ψDE(−Nh)), . . . , g(ψDE(Nh))]T.

By solving the system (10), the desired solution uDEN is

obtained. This is called the DE-Sinc-Nystrom method.

3. Boundedness of the condition num-

bers

3.1 Main result

The main contribution of this paper is the followingtheorem.

– 82 –


Theorem 2 Let the function k be continuous on [a, b]×[a, b]. Furthermore, suppose that the homogeneous equa-tion (λI − K)f = 0 has only the trivial solution f ≡ 0.Then there exists a positive integer N0 such that for allN ≥ N0 the matrices (λIn − KSE

n ) and (λIn − KDEn )

have bounded inverses. Furthermore, there exist con-stants CSE and CDE independent of N such that for allN ≥ N0

∥(λIn −KSEn )∥∞∥(λIn −KSE

n )−1∥∞ ≤ CSE, (11)

∥(λIn −KDEn )∥∞∥(λIn −KDE

n )−1∥∞ ≤ CDE. (12)

3.2 Sketch of the proof

In what follows we write C = C([a, b]) for short. Thenext result plays an important role to prove Theorem 2.

Lemma 3 (Okayama et al. [4, in the proofs ofTheorems 6.3 and 8.2]) Suppose that the assump-tions in Theorem 2 are fulfilled. Then there exist con-stants C1 and C2 independent of N such that for all N

∥KSEN ∥L(C,C) ≤ C1,

∥KDEN ∥L(C,C) ≤ C2.

Furthermore, there exists a positive integer N0 such thatfor all N ≥ N0 the operators (λI−KSE

N ) and (λI−KDEN )

have bounded inverses, and

∥(λI − KSEN )−1∥L(C,C) ≤ C3,

∥(λI − KDEN )−1∥L(C,C) ≤ C4,

hold, where C3 and C4 are constants independent of N .

In view of this, we see that Theorem 2 is establishedif the following lemma is shown.

Lemma 4 Suppose that the assumptions in Theorem 2are fulfilled. Then we have

∥(λIn −KSEn )∥∞ ≤ ∥(λI − KSE

N )∥L(C,C), (13)

∥(λIn −KDEn )∥∞ ≤ ∥(λI − KDE

N )∥L(C,C). (14)

Furthermore, if the inverse operators (λI −KSEN )−1 and

(λI − KDEN )−1 exist, then the matrices (λIn − KSE

n )−1

and (λIn −KDEn )−1 also exist, and we have

∥(λIn −KSEn )−1∥∞ ≤ ∥(λI − KSE

N )−1∥L(C,C), (15)

∥(λIn −KDEn )−1∥∞ ≤ ∥(λI − KDE

N )−1∥L(C,C). (16)

We prove this lemma below.

3.3 Proofs

The existence of the inverse matrix: (λIn−KSEn )−1 is

shown by the following lemma.

Lemma 5 Suppose that the assumptions in Theorem 2are fulfilled, and let g ∈ C([a, b]). Then the following twostatements are equivalent:

(A) The equation (λI−KSEN )v = g has a unique solution

v ∈ C.

(B) The system of linear equations (λIn−KSEn )cn = gSE

n

has a unique solution cn ∈ Rn.

Proof We show (A) ⇒ (B) first. Using the uniquesolution v ∈ C, define the vector cn ∈ Rn as cn =[v(ψSE(−Nh)), . . . , v(ψSE(Nh))]T. Clearly this cn is a

solution of the linear system in (B), which shows theexistence of a solution. The uniqueness is shown as fol-lows. Suppose that there exists another solution cn =[c−N , . . . , cN ]T. Define a function v ∈ R as

v(t) =1

λ

(g(t) + h

N∑j=−N

k(t, ψSE(jh))cjψSE′

(jh)

). (17)

At the points ti = ψSE(ih) (i = −N, . . . , N), clearly

λv(ti) = g(ti) + h

N∑j=−N

k(ti, tj)cjψSE′

(jh) (18)

holds. On the other hand,

λci = g(ti) + hN∑

j=−N

k(ti, tj)cjψSE′

(jh) (19)

holds since cn is a solution of the linear system. Andsince the right-hand side of (18) is equal to that of (19),we conclude v(ti) = ci. Therefore (18) can be rewrittenas (λI − KSE

N )v = g, which means v is a solution of theequation in (A). From the uniqueness of the equation,v ≡ v holds, which implies cn = cn. This shows thedesired uniqueness.Next we show (B) ⇒ (A). Let cn = [c−N , . . . , cN ]T

be a unique solution in (B), and define a functionv ∈ C by (17). Then by the same argument asabove, we can conclude v is a solution of the equa-tion in (A), which shows the existence. The unique-ness is shown as follows. Suppose that there exists an-other solution v ∈ C. Define the vector cn ∈ Rn ascn = [v(ψSE(−Nh)), . . . , v(ψSE(Nh))]T. Then clearly cnis a solution of the linear system in (B). From the unique-ness of the linear system, we have cn = cn. Thereforev(ψSE(jh)) = cj , and v can be rewritten as

v(t) =1

λ

(g(t) + h

N∑j=−N

k(t, ψSE(jh))cjψSE′

(jh)

). (20)

In view of (17) and (20), we have v ≡ v, which showsthe desired uniqueness.

(QED)

In the same manner we can prove the following lemmafor the DE-Sinc-Nystrom method. The proof is omitted.

Lemma 6 Suppose that the assumptions in Theorem 2are fulfilled, and let g ∈ C([a, b]). Then the following twostatements are equivalent:

(A) The equation (λI−KDEN )v = g has a unique solution

v ∈ C.

(B) The system of linear equations (λIn − KDEn )cn =

gDEn has a unique solution cn ∈ Rn.

Thus the existence of the inverse matrix is guaran-teed in both cases (SE and DE). The remaining task isto show (13)–(16). We show only (13) and (15) since (14)and (16) are shown in the same manner.

Proof of Lemma 4 We show (13) first. Let cn =[c−N , . . . , cN ]T be an arbitrary n-dimensional vector.Pick a function γ ∈ C that satisfies γ(ψSE(ih)) = ci(i = −N, . . . , N) and ∥γ∥C = ∥cn∥∞. Using this func-tion γ, define a function f ∈ C as f = (λI −KSE

N )γ, and

– 83 –


1e−16

1e−14

1e−12

1e−10

1e−08

1e−06

0.0001

0.01

1

0 20 40 60 80 100

max

imum

err

or

N

SE−Sinc−Nyström

DE−Sinc−Nyström

Fig. 1. Error of the Sinc-Nystrom methods for (21).

a vector fn as fn = [f(ψSE(−Nh)), . . . , f(ψSE(Nh))]T.Then we have

∥(λIn −KSEn )cn∥∞ = ∥fn∥∞

≤ ∥f∥C= ∥(λI − KSE

N )γ∥C≤ ∥(λI − KSE

N )∥L(C,C)∥γ∥C= ∥(λI − KSE

N )∥L(C,C)∥cn∥∞,

from which (13) follows.Next we show (15). Notice that the inverse matrix

(λIn−KSEn )−1 exists from Lemma 5. Let cn be an arbi-

trary n-dimensional vector. In the same manner as theabove, pick a function γ ∈ C. Define a function f ∈ Cas f = (λI − KSE

N )−1γ, and a vector fn in the sameway as the above. The difference from the above is in f ;(λI−KSE

N ) is replaced with (λI−KSEN )−1. Then we have

∥(λIn −KSEn )−1cn∥∞ = ∥fn∥∞

≤ ∥f∥C

= ∥(λI − KSEN )−1γ∥C

≤ ∥(λI − KSEN )−1∥L(C,C)∥γ∥C

= ∥(λI − KSEN )−1∥L(C,C)∥cn∥∞,

from which (15) follows. This completes the proof.(QED)

4. Numerical example

In this section we show numerical results for

u(t)−∫ π/2

0

(ts)3/2u(s) ds =√t

(1− π3

24t

), 0 ≤ t ≤ π

2,

(21)

which has also been conducted by Muhammad et al. [2,Example 4.3]. The exact solution is u(t) =

√t. Let

us first check the conditions described in Sections 2.2and 2.3. The conditions (SE1) and (SE2) are satisfiedwith d = π − ϵ, and (DE1) and (DE2) are satisfied withd = (π − ϵ)/2, where ϵ is an arbitrary small positivenumber (we set ϵ = π − 3.14 in our computation).Based on the information, we implemented the SE-

Sinc-Nystrom method and DE-Sinc-Nystrom method inC++ with double-precision floating-point arithmetic.The errors |u(t) − uSE

N (t)| and |u(t) − uDEN (t)| were in-

100 0

20

40

60

80

100

120

140

0 20 40 60 80

condit

ion n

um

ber

N

SE−Sinc−Nyström

DE−Sinc−Nyström

Fig. 2. Condition number of the coefficient matrix appearing inthe Sinc-Nystrom methods for (21).

vestigated on equally-spaced 1000 points on [0, π/2],and the maximum of them is shown in Fig. 1. Wecan observe the rate O(exp(−c1

√N)) in the SE-Sinc-

Nystrom method, and O(exp(−c2N/ logN)) in the DE-Sinc-Nystrom method. These results can be explainedby combining the existing estimates (2) and (3) withthe new result (Theorem 2). Furthermore from Fig. 2,we can also confirm boundedness of the condition num-bers, i.e., the estimates (11) and (12).


The Sinc-Nystrom methods for (1) have been knownas efficient methods in the sense that exponential con-vergence can be attained. However, the convergence hasnot been guaranteed theoretically, since in the exist-ing estimates (2) and (3), there remained unestimatedterms: ∥A−1

N ∥2 and ∥A−1N ∥2 (AN = In − KDE

n and

AN = In −KSEn ). In this paper we showed theoretically

that ∥A−1N ∥∞ and ∥A−1

N ∥∞ are bounded, from whichexponential convergence of the methods is guaranteed.Furthermore we showed that ∥AN∥∞ and ∥AN∥∞ arealso bounded, and consequently the condition numberof them is bounded, as stated in Theorem 2.Muhammad et al. [2] have also developed the Sinc-

Nystrom methods for Volterra integral equations, andthe similar result to this paper can be shown for them.We are now working on this issue, and the result will bereported somewhere else soon.

Acknowledgments

This work was supported by Grants-in-Aid for Scien-tific Research, MEXT, Japan.

References

[1] P. K. Kythe and P. Puri, Computational Methods for Linear

Integral Equations, Birkhauser, Boston, MA, 2002.[2] M. Muhammad, A. Nurmuhammad, M. Mori and M. Sugi-

hara, Numerical solution of integral equations by means ofthe Sinc collocation method based on the double exponential

transformation, J. Comput. Appl. Math., 177 (2005), 269–286.

[3] J. Rashidinia and M. Zarebnia, Convergence of approximatesolution of system of Fredholm integral equations, J. Math.

Anal. Appl., 333 (2007), 1216–1227.[4] T. Okayama, T. Matsuo and M. Sugihara, Improvement of

a Sinc-collocation method for Fredholm integral equations ofthe second kind, BIT Numer. Math., 51 (2011), 339–366.

– 84 –


A modified Calogero-Bogoyavlenskii-Schiff equation

with variable coefficients and its non-isospectral Lax pair

Tadashi Kobayashi1 and Kouichi Toda2,3

1 Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501, Japan

2 Department of Mathematical Physics, Toyama Prefectural University, Kurokawa 5180, Imizu,Toyama 939-0398, Japan

3 Research and Education Center for Natural Sciences, Hiyoshi Campus, Keio University, 4-1-1Hiyoshi, Kouhoku-ku, Yokohama, 223-8521, Japan

E-mail tadashi amp.i.kyoto-u.ac.jp

Received May 9, 2011, Accepted October 11, 2011

Abstract

In this paper, we present a modified version with variable coefficients of the (2 + 1) dimen-sional Korteweg-de Vries, or Calogero-Bogoyavlenskii-Schiff equation, derived by applying thePainleve test. Its Lax pair with a non-isospectral condition in (2+1) dimensions is also given.Moreover a transformation which links the form with variable coefficients to the canonical oneis shown.

Keywords integrable equation with variable coefficients, Painleve property, Lax pair

Research Activity Group Applied Integrable Systems

1. Introduction

Over the last three decades many mathematiciansand physicists study the nonlinear integrable systemsfrom various perspectives. They have remarkable appli-cations to many physical systems such as hydrodynam-ics, nonlinear optics, plasma physics, field theories andso on [1–3]. Generally the notion of the nonlinear inte-grable systems is characterized by several features: soli-tons [4–8], Lax pairs [9–11], Painleve tests [12–18] andso on. The integrable system has “good” nature as pre-viously described. Moreover, solitons are a major attrac-tive issue in mechanical and engineering sciences as wellas mathematical and physical ones. For instance, a realocean is inhomogeneous, and the dynamics of nonlin-ear waves is strongly influenced by refraction, geometricdivergence and so on.The physical phenomena in which many nonlinear in-

tegrable equations with constant coefficients arise tendto be very highly idealized. Therefore, equations withvariable coefficients may provide various models for realphysical phenomena, for example, in the propagation ofsmall-amplitude surface waves, which run on straits orlarge channels of slowly varying depth and width. On onehand, the variable-coefficient generalizations of nonlin-ear integrable equations are a currently exciting subject[19–22] (and also [21, Refs. [24–45]]). Many researchershave mainly investigated (1 + 1) dimensional nonlinearintegrable systems with constant coefficients for discov-ery of new nonlinear integrable systems. On the otherhand, there are few research studies to find nonlinearintegrable systems with variable coefficients, since theyare essentially complicated. Analysis of higher dimen-sional systems is also an active issue in nonlinear inte-grable systems. Since the study of nonlinear integrable

equations in higher dimensions with variable coefficientshas attracted much more attention. So the main aim ofthis paper is to construct a (2 + 1) dimensional inte-grable version of the modified Korteweg-de Vries (KdV)equations with variable coefficients.It is widely known that the Painleve test in the sense

of the Weiss-Tabor-Carnevale (WTC) method [13] is apowerful tool to investigate integrable equations withvariable coefficients. We will discuss the following higherdimensional nonlinear evolution equation with variablecoefficients for q = q(x, z, t):

qt + a(x, z, t)qxxz + b(x, z, t)q2qz + c(x, z, t)qx∂−1x (q2)z

+ d(x, z, t)q + e(x, z, t)qx + f(x, z, t)qz = 0, (1)

where a(x, z, t) = 0, b(x, z, t) = 0, c(x, z, t) = 0, b(x, z, t)+c(x, z, t) = 0 and subscripts with respect to indepen-dent variables denote their partial derivatives and ∂−1

x

is the integral operator, ∂−1x q :=

∫ xq(X)dX. Here (and

hereafter) a(x, z, t), b(x, z, t), . . . , f(x, z, t) are coeffi-cient functions of two spatial variables x, z and one tem-poral one t. We will carry out the WTC method for (1),and present a set of the coefficient functions. Equationof the form (1) includes one of the integrable higher di-mensional modified KdV equations:

qt −1

4q2qz −

1

8qx∂

−1x

(q2)z+

1

4qxxz = 0, (2)

which is called the modified Calogero-Bogoyavlenskii-Schiff (CBS) equation [23]. Eq. (2) can be dimension-ally reduced to the standard modified KdV equation forq = q(x, t):

qt −3

8q2qx +

1

4qxxx = 0, (3)

– 85 –

JSIAM Letters Vol. 3 (2011) pp.85–88 Tadashi Kobayashi et al.

by a dimensional reduction ∂z = ∂x. Here (and here-after) ∂x = ∂/∂x and so on.The plan of this paper is as follows. In Section 2, we

will review the process of the WTC method of (1) inbrief. Next we will construct its Lax pair. In Section 4,we prove that the equation with variable coefficients canbe reduced to the canonical form by a certain transfor-mation. Section 5 will be devoted to conclusions.

2. Painleve test of (1)

Weiss et al. claimed in [13] that a partial differentialequation (PDE) has the Painleve property if the solu-tions of the PDE are single-valued about the movablesingularity manifold. They have proposed a technique,which determines whether a given PDE is integrable ornot. This technique is called the WTC method. Now weshow the WTC method for (1). In order to eliminate theintegral operator, we rewrite (1) in the form of coupledsystems:

qt + a(x, z, t)qxxz + b(x, z, t)q2qz + c(x, z, t)qxr

+ d(x, z, t)q + e(x, z, t)qx + f(x, z, t)qz = 0, (4)

rx = (q2)z. (5)

We are now looking for solutions of (4) and (5) in theLaurent series expansion with ϕ = ϕ(x, z, t):

q =

∞∑j=0

qjϕj−α, r =

∞∑j=0

rjϕj−β , (6)

where qj = qj(x, z, t) and rj = rj(x, z, t) are analyticfunctions in a neighborhood of ϕ = 0. In this case, theleading orders are α = 1 and β = 2. Then

q0 = i

√6a

b+ cϕx, r0 = − 6a

b+ cϕxϕz,

are obtained. Here i2 = −1. To find the resonance wenow substitute the Laurent series expansions (6) into(4) and (5). Rearranging (4) into terms of ϕj−4 and theother higher powers of ϕ, we obtain recurrence relationsfor qj and rj :(

(j − 3)bq20qjϕz + c [ (j − 1)r0qj − q0rj ]

+ (j − 1)(j − 2)(j − 3)aqjϕxϕz )ϕj−4 = δj . (7)

Similarly, rearranging (5) into terms of ϕj−3 and higherpowers of ϕ, we have

(j − 2)(2q0qjϕz − rjϕx)ϕj−3 = σj . (8)

Here δj and σj are given in terms of qℓ and rℓ (0 ≤ ℓ ≤j − 1). Then we get the following resonances:

j = −1, 2, 3, 4. (9)

Let us note here that the resonance j = −1 in (9) cor-responds to the arbitrary singularity manifold ϕ. If therecurrence relations are consistently satisfied at the reso-nances then the differential equations are said to possessthe Painleve property.Subsequent coefficients qj and rj are determined from

(7) and (8). However, from the consistency condition,they must include arbitrary functions at the resonances.

Eqs. (7) and (8) must be satisfied in the respective pow-ers of ϕ. Requiring that every power of ϕ (ϕ−4+k andϕ−3+k with positive integer k in (7) and (8), respec-tively) should vanish, we obtain the consistency condi-tions as follows.

(ϕ−3, ϕ−2) :

q1 =1

2(b+ c)q20ϕz bq20q0,z

+ c(2q20q0,z + r0q0,x − q0r0,x)

+ 2a [ϕx(q0,zϕx+ 2q0,xϕz+ 2q0ϕxz)

+ q0ϕxxϕz ],

r1 =1

(b+ c)q0ϕx cr0q0,x + bq0(r0,x − q0q0,z)

+ 2a [ϕx(q0,zϕx+ 2q0,xϕz+ 2q0ϕxz)

+ q0ϕxxϕz ],

(ϕ−2, ϕ−1) :

− q0ϕt − (fq0 + bq0q21 + bq20q2 + aq0,xx)ϕz

+ (cq2r0 − cq0r2 − eq0 − 2aq0,xz)ϕx − 2aq0,xϕxz

− aq0,zϕxx − aq0ϕxxz + 2bq0q1q0,z + bq20q1,z

+ cr1q0,x + cr0q1,x = 0, (10)

r1,x − 2(q0q1)z = 0. (11)

It follows from (10) and (11) that one of the two variables(q2, r2) must be arbitrary,

4b3axx − b2 [ ax(8bx + 11cx)− 9caxx + 4a(bxx + cxx) ]

− c a(bx − 2cx)(bx + cx)− c2axx+ c [ ax(bx − 2cx) + a(bxx + cxx) ]

− b −a(bx + cx)(8bx + 11cx)− 6c2axx

+ c [ ax(7bx + 13cx) + 5a(bxx + cxx) ] = 0, (12)

2b3axz − b2 [ ax(2bz + 5cz) + az(2bx + 5cx)

+ 2a(bxz + cxz) ]

+ b a [ bz(4bx + 7cx) + cz(7bx + 10cx) ]

− 6c2axz + c [ ax(5bx − cx) + 2a(bxz + cxz) ]

+ c a [−cz(11bx + 8cx)− bz(14bx + 11cx) ]− 4c2axz

+ c [ ax(7bz + 4cz) + az(7bx + 4cx)

+ 4a(bxz + cxz) ] = 0, (13)

2b3azz − b2 [ az(4bz + cz)− 9cazz + 2a(bzz + czz) ]

+ c a(13bz + 10cz)(bz + cz) + 5c2azz

− c [ az(13bz + 10cz) + 5a(bzz + czz) ]

+ b a(bz + cz)(4bz + cz) + 12c2azz

− c [ az(17bz + 11cz) + 7a(bzz + czz) ] = 0, (14)

(b+ c)(2b+ 5c) [ az(b+ c)− a(b+ c)z ] = 0. (15)

We take into account two cases, case (i) c(x, z, t) =

– 86 –


−(2/5)b(x, z, t) and case (ii) a(x, z, t) = a(x, t)(b(x, z, t)+c(x, z, t)) from (15).

case (i): c(x, z, t) = −(2/5)b(x, z, t)We obtain the constraints, a = a(z, t) and b = b(z, t),

from (12)–(14). But a relation, ab2 = 0, is appeared atthe next calculation (ϕ−1, ϕ0). This breaks the initialcondition, a = 0 and b = 0. Namely we have failed thePainleve test in this case.

case (ii): a(x, z, t) = a(x, t)(b(x, z, t) + c(x, z, t))

From (12)–(14), we obtain the following equations:

4b2axx + c2axx + 3axbxc+ 5bcaxx − 3axbcx = 0, (16)

ax(bcz − bzc) = 0. (17)

From (17), we obtain the following two cases: case (ii-1)c(x, z, t) = c(x, t)b(x, z, t) and case (ii-2) a(x, t) = a(t).

case (ii-1): c(x, z, t) = c(x, t)b(x, z, t)

We obtain a relation for c(x, t) from (12)–(14), c(x, t)= 3/(1 − c(t)3ax(x, t)) − 4. But a relation, b = 0, isappeared at the next calculation (ϕ−1, ϕ0). This breaksthe initial condition, b = 0. Namely we have failed thePainleve test in this case.

case (ii-2): a(x, t) = a(t)

The compatibility condition is satisfied in this case.

(ϕ−1, ϕ0) :

cq0 [ r3ϕx − 2(q0q3 + q1q2)ϕz

− (2q0q2 − q21)z ] + F = 0, (18)

r3ϕx − 2(q0q3 + q1q2)ϕz − (2q0q2 − q21)z = 0, (19)

(ϕ0, ϕ1) :

cq0 [ r4ϕx − (2q0q4 + 2q1q3 + q22)ϕz

− (q0q3 + q1q2)z ] +G = 0, (20)

r4ϕx − (2q0q4 + 2q1q3 + q22)ϕz

− (q0q3 + q1q2)z = 0. (21)

It follows from (18)–(21) that one of the two variablesmust be arbitrary in both pairs (q3, r3) and (q4, r4), andsimilarly F in (18) and G in (20) must vanish. Hence thefollowing equations are obtained from F = 0,

b− 2c = 0, bx − cx = 0,

bxc− bcx = 0, cxf − cfx = 0,

cx = 0, ca′ + 2a(cd+ cxe− cex) = 0, (22)

where ′ denotes the ordinary derivative with respect tot. Hence, from (22),

b(x, z, t) = b(z, t), c(x, z, t) =b(z, t)

2,

d(x, z, t) = e0(z, t)−1

2

a′(t)

a(t), f(x, z, t) = f(z, t),

and then, from G = 0,

exx = 0 ⇒ e(x, z, t) = xe0(z, t) + e1(z, t),

are obtained, respectively.Therefore the equation given in the form:

qt +3

2a(t)b(z, t)qxxz + b(z, t)q2qz

+1

2b(z, t)qx∂

−1x (q2)z +

(e0(z, t)−

1

2

a′(t)

a(t)

)q

+ (xe0(z, t) + e1(z, t))qx + f(z, t)qz = 0, (23)

admits the sufficient number of arbitrary functions cor-responding to the resonances and namely passes thePainleve test in the sense of the WTC method. It meansthat we have succeeded in finding of the modified CBSequation with variable coefficients (23). We used theMATHEMATICA [24] to handle calculation for the ex-istence of arbitrary functions at the above resonances.

3. Lax pair of (23)

It is well known that the Lax pair plays a key role inthe theory of integrable systems. Consider two operatorsL and T which are called the non-isospectral Lax pairand given by

Lψ = λψ, Tψ = 0,

and λ being a non-isospectral parameter [23, 25, 26] in-dependent of only x. Then the commutation relation:

[L, T ] ≡ LT − TL = 0, (24)

contains a nonlinear evolution equation for suitably cho-sen operators L and T . Eq. (24) is called the Lax equa-tion.The Lax pair of (23) is as follows,

L = i

√3a(t)

2∂2x + q∂x

+i

4√6a(t)

32

∂−1z

(4a(t)e0(z, t)− a′(t)

b(z, t)

), (25)

T = i

√3a(t)

2∂2x∂z +

1

2(∂−1

x qz)∂2x + q∂x∂z

+i

2√6a(t)b(z, t)

[xe0(z, t) + e1(z, t)

+ b(z, t)∂−1x (qqz)− 2b(z, t)∂−1

x (qx∂−1x qz)

− 3i

√3a(t)

2b(z, t)qz

]∂x + f(z, t)∂z + ∂t

,

(26)

with a constraint condition:

a′(t)

a(t)∂−1x

(4a(t)e0(z, t)− a′(t)

b(z, t)

)− 8a(t)e0(z, t)f(z, t)

3b(z, t)

− 2

3∂−1z

[4e0(z, t)a

′(t)− a′′(t) + 4a(t)e0,t(z, t)

b(z, t)

− bt(z, t)(4a(t)e0(z, t)− a′(t))b(z, t)2

]+

2a′(t)f(z, t)

3b(z, t)= 0. (27)

– 87 –


Notice here that λ = λ(z, t) satisfies a non-isospectralcondition:

λt +

[f(z, t)− b(z, t)

2a(t)∂−1z

(4a(t)e0(z, t)− a′(t)

b(z, t)

)− 2i

√6a(t)b(z, t)λ

]λz = 0. (28)

From (27), we obtain a relation,

e0(z, t) =a′(t)

4a(t). (29)

Using the above, (23) is rewritten as

qt +3

2a(t)b(z, t)qxxz + b(z, t)q2qz +

1

2b(z, t)qx∂

−1x (q2)z

− a′(t)

4a(t)q +

xa′(t)

4a(t)qx + e1(z, t)qx + f(z, t)qz = 0.

(30)

Note that (30) possesses both the Painleve property andthe Lax pair.

4. Reducibility to the canonical form

We show that (30) can be transformed to the standardmodified CBS equation (2) by suitable transformations.As an example, we set the following expressions:

X = xa(t)−14 , Z = ∂−1

z

(1

b(z, t)

), T = ∂−1

t a(t)12 ,

Q(X,Z, T ) = a(t)−14 q(x, z, t), e1(z, t) = 0,

f(z, t) = −b(z, t)∂−1z

(1

b(z, t)

)t

, (31)

for (30). Via a change of the dependent and independentvariables, (30) is transformed to the modified CBS forQ = Q(X,Z, T ):

QT +3

2QXXZ +Q2QZ +

1

2QX∂

−1X (Q2)Z = 0,

of which N soliton solutions were given in [26].


In this paper, we have presented a modified CBS equa-tion with variable coefficients (30), which is integrablein the sense of the Painleve test and the existence ofthe Lax pair. Moreover we can construct its hierarchyby using the Lax-pair Generating Technique [27] for theoperator L (25).Let us note here that taking ∂z = ∂t as another di-

mensional reduction can respectively reduce (2) and (30)to the standard form and its extension with variable-coefficients of the modified version of the Ablowitz-Kaup-Newell-Segur equation in (2 + 1) dimensions.By applying the (weak) Painleve test, we are studying

higher dimensional forms with variable coefficients of thenonlinear Schrodinger, Camassa-Holm and Degasperis-Procesi equations in (2 + 1) dimensions and so on.

Acknowledgments

Many helpful discussions with Dr. T. Tsuchida, Dr. S.Tsujimoto, Professor X. -B. Hu and Professor Y. Naka-

mura are acknowledged. The authors wish to thank theanonymous referee for careful reading of this manuscriptand valuable remarks.

References

[1] G. L. Lamb Jr., Elements of soliton theory, Wiley, New York,

1980.[2] N.N.Akhmediev and A.Ankiewicz, Solitons: nonlinear pulses

and beams, Chapman & Hall, London, 1997.[3] E. Infeld and G. Rowlands, Nonlinear waves, solitons and

chaos, 2nd ed., Cambridge Univ. Press, Cambridge, 2000.[4] M. J. Ablowitz and H. Segur, Solitons and the inverse scat-

tering transform, SIAM, Philadelphia, 1981.[5] A.Jeffrey and T.Kawahara, Asymptotic methods of nonlinear

wave theory, Pitman Advanced Publ., London, 1982.[6] P. G. Drazin and R. S. Johnson, Solitons: an introduction,

Cambridge Univ. Press, Cambridge, 1989.[7] M. J. Ablowitz and P. A. Clarkson, Solitons, nonlinear evolu-

tion equations and inverse scattering, Cambridge Univ. Press,Cambridge, 1991.

[8] R. Hirota, The direct methods in soliton theory, Cambridge

Univ. Press, Cambridge, 2004.[9] P. D. Lax, Integrals of nonlinear equations of evolution and

solitary waves, Comm.Pure Appl.Math., 21 (1968), 467–490.[10] F. Calogero and A. Degasperis, Spectral transform and soli-

tons I, Elsevier Science, Amsterdam, 1982.[11] M. Blaszak, Multi-Hamiltonian theory of dynamical systems,

Springer-Verlag, Berlin, 1998.[12] A. Ramani, B. Dorizzi and B. Grammaticos, Painleve conjec-

ture revisited, Phys. Rev. Lett., 49 (1982), 1539–1541.[13] J. Weiss, M. Tabor and G. Carnevale, The Painleve property

for partial differential equations, J. Math. Phys., 24 (1983),522–526.

[14] J. D. Gibbon, P. Radmore, M. Tabor and D. Wood, Painleveproperty and Hirota’s method, Stud. Appl. Math., 72 (1985),39–63.

[15] W. H. Steeb and N. Euler, Nonlinear evolution equations andPainleve test, World Scientific, Singapore, 1989.

[16] A.Ramani, B.Gramaticos and T.Bountis, The Painleve prop-erty and singularity analysis of integrable and non-integrable

systems, Phys. Rep., 180 (1989), 159–245.[17] A. R. Chowdhury, Painleve analysis and its applications,

Chapman & Hall, New York, 1999.[18] R. Conte (Ed.), The Painleve property one century later,

Springer-Verlag, New York, 1999.[19] T. Brugarino, Painleve analysis and reducibility to the canon-

ical form for the nonlinear generalized Schrodinger equation,Nuovo Cimento, 120 (2005), 423–429.

[20] T. Brugarino and M. Sciacca, Singularity analysis and inte-grability for a HNLS equation governing pulse propagation ina generic fiber optics, Opt. Commun., 262 (2006), 250–256.

[21] T. Kobayashi and K. Toda, The Painleve test and reducibility

to the canonical forms for higher-dimensional soliton equa-tions with variable-coefficients, SIGMA, 2 (2006), 63–72.

[22] T. Brugarino and M. Sciacca, Integrability of an inhomoge-

neous nonlinear Schrodinger equation in Bose-Einstein con-densates and fiber optics, J. Math. Phys., 51 (2010), 093503.

[23] O. I. Bogoyavlenskii, Breaking solitons. III, Math. USSR-Izv.,36 (1991), 129–137.

[24] Wolfram Research, Inc., Mathematica, Version 8.0, http://www.wolfram.com/mathematica/index.en.html.

[25] P. R. Gordoa and A. Pickering, Nonisospectral scatteringproblems: A key to integrable hierarchies, J. Math. Phys.,

40 (1999), 5749–5786.[26] S. Yu, K. Toda, N. Sasa and T. Fukuyama, N soliton solu-

tions to the Bogoyavlenskii-Schiff equation and a quest forthe soliton solution in (3 + 1) dimensions, J. Phys. A: Math.

Gen., 31 (1998), 3337–3347.[27] T. Kobayashi and K. Toda, A generalized KdV-family with

variable coefficients in (2+1) dimensions, IEICE Trans. Fun-damentals, E88-A (2005), 2548–2553.

– 88 –


A parallel algorithm for incremental orthogonalization

based on the compact WY representation

Yusaku Yamamoto1 and Yusuke Hirota1

1 Department of Computational Science, Graduate School of System Informatics, Kobe Univer-sity, 1-1 Rokkodai-cho, Nada-ku, Kobe, 657-8501, Japan

E-mail yamamoto cs.kobe-u.ac.jp

Received October 4, 2011, Accepted November 28, 2011

Abstract

We present a parallel algorithm for incremental orthogonalization, where the vectors to beorthogonalized are given one by one at each step. It is based on the compact WY representationand always produces vectors that are orthogonal to working accuracy. Moreover, it has largegranularity and can be parallelized efficiently. When applied to the GMRES method, thisalgorithm reduces to a known algorithm byWalker. However, our formulation makes it possibleto apply the algorithm to a wider class of incremental orthogonalization problems, as wellas to analyze its accuracy theoretically. Numerical experiments demonstrate accuracy andscalability of the algorithm.

Keywords incremental orthogonalization, Householder transformation, compact WY rep-resentation, Arnoldi process, parallel processing


1. Introduction

Let a1,a2, . . . ,am ∈ Rn (m ≤ n) be a set of n lin-early independent vectors and q1,q2, . . . ,qm be the vec-tors obtained by ortho-normalizing them. We considerthe situation where (i) ai (2 ≤ i ≤ n) is not givenin advance but is computed from q1,q2, . . . ,qi−1, and(ii) qi (1 ≤ i ≤ m) is obtained by orthogonalizing aiagainst q1,q2, . . . ,qi−1 and normalizing the result. Wecall this type of orthogonalization process incrementalorthogonalization. Incremental orthogonalization typi-cally arises in the Arnoldi process for eigenvalue prob-lems [1, 2], linear simultaneous equations (the GMRESmethod) [3] and matrix exponential exp(A)x [4]. It alsoarises in computing eigenvectors of a symmetric tridiag-onal matrix by the inverse iteration [2] or the multiplerelatively robust representations (MR3) [5] algorithmswhen the corresponding eigenvalues are clustered.The most popular algorithm for incremental or-

thogonalization is the modified Gram Schmidt (MGS)method. However, it is inherently sequential because or-thogonalization against qk can be done only after or-thogonalization against qk−1 has been completed. Toparallelize the MGS method, one has to parallelize theinnermost loops of one orthogonalization operation, a′i =ai − (ai · qk)qk. This causes O(m2) interprocessor syn-chronizations and degrades parallel performance. TheMGS method also has the drawback that deviation fromorthogonality of q1,q2, . . . ,qm increases proportionallywith κ(A), the condition number of A ≡ [a1,a2, . . . ,am][6]. Another approach is to repeat the classical GramSchmidt (CGS) method twice to orthogonalize ai againstq1,q2, . . . ,qi−1. With this approach, orthogonalizationagainst q1,q2, . . . ,qi−1 can be done in parallel and the

number of interprocessor synchronizations is reduced toO(m). It is also shown that deviation from orthogonalityof q1,q2, . . . ,qm is O(ϵ), where ϵ is the machine epsilon[7]. However, this approach is applicable only when thecondition O(ϵκ(A)) < 1 is satisfied.In [8], Walker proposes to use Householder transfor-

mations for incremental orthogonalization arising in theGMRES method. With this approach, there is no restric-tion on the condition number κ(A) and high orthogonal-ity of q1,q2, . . . ,qm is always guaranteed. Furthermore,Walker also proposes a blocked variant that aggregatesmultiple Householder transformations and performs thecomputation in the form of matrix-vector products. Thisvariant is intended for parallel processing and requiresonly O(m) interprocessor synchronizations. However, asfor this variant, theoretical analysis on the orthogonal-ity of q1,q2, . . . ,qm has not been given yet. Also, per-formance of this variant on parallel computers has notbeen evaluated.In this paper, we reformulate Walker’s blocked al-

gorithm using the compact WY representation [9] forHouseholder transformations. From our formulation, theorthogonality property of the algorithm follows immedi-ately from that of the compact WY representation [6].It also becomes clear that the algorithm can be appliednot only to the GMRES method but also to incremen-tal orthogonalization problems in general. We evaluatethe accuracy and parallel performance of the algorithmthrough numerical experiments.This paper is organized as follows: in Section 2, we

briefly explain an algorithm for incremental orthogo-nalization using Householder transformations and in-troduce the compact WY representation. By combiningthem, we formulate an algorithm for incremental orthog-

– 89 –

JSIAM Letters Vol. 3 (2011) pp.89–92 Yusaku Yamamoto et al.

onalization based on the compact WY representation.We discuss its numerical and computational properties,as well as its relationship with Walker’s blocked algo-rithm. Experimental results including numerical accu-racy and parallel performance on a distributed memoryparallel computer will be presented in Section 3. Finally,Section 4 will give some concluding remarks.

2. A parallel algorithm for incremental

orthogonalization

2.1 Incremental orthogonalization using Householdertransformations

We begin with an algorithm for incremental orthogo-nalization using Householder transformations [2,8]. Thealgorithm is shown as Algorithm 1. At the ith stepof the algorithm, the vector ai is constructed fromq1,q2, . . . ,qi−1 and is orthogonalized against them.Here, ei denotes the ith column of I, the identity matrixof order n, and Housei(x) is a function that computes aHouseholder transformation Hi = I − tiyiy

Ti that elim-

inates the (i + 1)th through the nth elements of x andleaves the 1st through the (i− 1)th elements intact.

[Algorithm 1: incremental orthogonalization usingHouseholder transformations]do i = 1,mGenerate ai from q1,q2, . . . ,qi−1.a′i = Hi−1 · · ·H2H1aiHi = Housei(a

′i)

qi = H1H2 · · ·Hieiend do

Algorithm 1 is the same as the usual Householder QRdecomposition [2] except that ai and qi is generatedwithin the loop. The fact that qi can be computed asabove is readily confirmed if we note that qi is the ithcolumn of H1H2 · · ·Hm and Hjei = ei for i+1 ≤ j ≤ m.The vectors q1,q2, . . . ,qm computed by Algorithm 1

are orthogonal to working accuracy since they are com-puted as the columns of H1H2 · · ·Hm, which is a prod-uct of Householder transformations (see [6] for numer-ical properties of Householder transformations). How-ever, Algorithm 1 is inherently sequential because multi-ple Householder transformations have to be applied oneby one in the computation of a′i and qi.

2.2 Compact WY representationGiven multiple Householder transformationsHk = I−

tkykyTk (1 ≤ k ≤ i), we can aggregate them using a

technique called the compact WY representation [9]. LetY1 = [y1] and T1 = [t1], and define an n × k matrix Ykand a k × k lower triangular matrix Tk by the followingrecursion formulae:

Yk = [Yk−1 yk], (1)

Tk =

[Tk−1 0

−tkyTk Yk−1Tk−1 tk

]. (2)

Then the product Hi · · ·H2H1 can be represented as fol-lows:

Hi · · ·H2H1 = I − YiTiY Ti . (3)

This is called the compact WY representation of Hi,. . . , H2,H1. Using the compact WY representation, ap-plication of Hi · · ·H2H1 or H1H2 · · ·Hi to a vectorcan be computed as matrix-vector multiplications. Thisgreatly enhances parallelism.It is known that the compact WY representation has

the same level of numerical stability as the usual House-holder transformation [6]. Below, we summarize some ofthe numerical properties of the compact WY represen-tation given in [6, Section 18.4] as two theorems. Notethat although the WY representation treated in [6] is ofa non-compact type, it is stated that the same conclu-sions apply to the compact WY representation as well.

Theorem 1 Let Yi and Ti be the matrices computedby (1) and (2) using finite precision arithmetic and letQ = I − YiTT

i YTi . Then

∥QTi Qi − I∥2 ≤ d1(i, n)ϵ (4)

for some positive constant d1(i, n) that depends only oni and n.

From this theorem, it follows that deviation fromorthogonality of the computed q1,q2, . . . ,qi is alwaysO(ϵ), regardless of the condition number of [a1,a2,. . . ,ai].The next theorem concerns application of the compact

WY representation to a matrix.

Theorem 2 Let B ∈ Rn×l and C be a matrix obtainedby applying I−YiTiY T

i to B using finite precision arith-metic. Then there exist ∆C ∈ Rn×l such that

C = UiB +∆C = Ui(B + UTi ∆C), (5)

∥∆C∥2 ≤ [1 + d1(i, n) + d2(i, n)d3(i, n)(1 + c1(i, n, l)

+ c1(n, i, l))]ϵ∥B∥2 +O(ϵ2), (6)

where Ui is the product of Hi, . . . , H2,H1 computed withexact arithmetic, d1, d2, d3 are positive constants that de-pend only on i and n, and c1 is a positive constant thatdepends only on i, n and l.

Theorem 2 implies that the compact WY representa-tion is backward stable.

2.3 An algorithm for incremental orthogonalizationbased on the compact WY representation

We can rewrite Algorithm 1 using the compact WYrepresentation. The resulting algorithm is shown as Al-gorithm 2.

[Algorithm 2: incremental orthogonalization basedon the compact WY representation]do i = 1,mGenerate ai from q1,q2, . . . ,qi−1.a′i = (I − Yi−1Ti−1Y

Ti−1)ai

(ti,yi) = Housei(a′i)

Yi = [Yi−1 yi]

Ti =

[Ti−1 0

−tiyTi Yi−1Ti−1 ti

]qi = (I − YiTT

i YTi )ei

end do

This is the algorithm we propose for incremental or-thogonalization. In this algorithm, application of the

– 90 –


Householder transformations Hi−1, . . . , H2,H1 to ai isperformed as matrix-vector multiplications a′i = (I −Yi−1Ti−1Y

Ti−1)ai. Since each matrix-vector multiplica-

tion requires only one inter-processor synchronization,the number of synchronizations required to compute a′iis only three, in contrast to O(m) required in Algorithm1. The same is true of the computation of qi. Thus theparallel granularity of Algorithm 2 is O(m) times largerthan that of Algorithm 1.When ai is computed as

a1 = b, (7)

ai = Gqi−1 (i = 1, 2, . . . ) (8)

for some G ∈ Rn×n and b ∈ Rn, Algorithm 2 computesan orthonormal basis of the Krylov subspace Km(G;b).Hence it can be used, for example, in the GMRES algo-rithm for linear equation solution in place of the modifiedGram-Schmidt method. Actually, the combination of theGMRES algorithm with Algorithm 2 leads to Walker’sblocked Householder GMRES algorithm [8]. However,from our formulation, it is evident that Algorithm 2 canbe applied not only to the GMRES algorithm but alsoto incremental orthogonalization problems in general.From Algorithm 2, it is clear that qi (1 ≤ i ≤ m)

is the ith column of the compact WY representationI − YmTT

mYTm . Thus we can conclude from Theorem 1

that the vectors q1,q2, . . . ,qm are always orthogonal toworking accuracy. On the other hand, Walker states thathe has no proof of numerical superiority of his blockedmethod to another parallelizable method, namely, theclassical Gram Schmidt [8].We can also discuss backward stability of Algorithm

2 based on Theorem 2. Let R be an n × m upper tri-angular matrix whose (i, j)th element is the ith elementof (I − tjyjy

Tj )a

′j . Then R is the upper triangular fac-

tor of the QR decomposition of A = [a1,a2, . . . ,am].Using Theorem 2, it is easy to see that there exists∆A ∈ Rn×m such that

A+∆A = UmR, ∥∆A∥2 ≤ d4(m,n)ϵ∥A∥, (9)

where Um is defined in Theorem 2 and d4(m,n) is a pos-itive constant that depends only on m and n. The proofis almost the same as the proof of backward stabilityof the Householder QR decomposition. See [6, Lemma18.3] for the latter proof. Eq. (9) shows that Algorithm2 is backward stable, as is the non-blocked algorithm(Algorithm 1).Next, we count the number of operations required to

perform Algorithm 2. The number of operations to com-pute a′i, Ti and qi is 4in, 2in and 2in, respectively, ifwe assume m ≪ n and retain only the highest orderterms. Note that Y T

i ei in the expression of qi requiresno computation; we only need to extract the ith rowof Y T

i . By summing up these numbers over i, we knowthat the operation count of Algorithm 2 is about 4m2n.This is the same as the operation count of the originalHouseholder-based method, Algorithm 1.

2.4 Comparison with other methodsIn Table 1, we show a comparison of the incremental

orthogonalization algorithm based on the compact WY

Table 1. Comparison of algorithms for incremental orthogonal-ization.

MGS CGS2 House cWY

Work 2m2n 4m2n 4m2n 4m2n

Synchronizations O(m2) O(m) O(m2) O(m)

Granularity O(n/P ) O(mn/P ) O(n/P ) O(mn/P )

Orthogonality O(ϵκ(A)) O(ϵ) O(ϵ) O(ϵ)

Condition − O(ϵκ(A)) − −< 1

representation with other algorithms. Here, CGS2 is amethod to repeat the classical Gram Schmidt orthogo-nalization twice to increase the orthogonality [7]. Houseand cWY stand for the Householder-based method andthe proposed method, respectively. The rows namedSynchronizations and Granularity show, respectively,the number of inter-processor synchronizations and par-allel granularity, that is, the number of arithmetic opera-tions that can be performed by a processor between twosynchronization points. P denotes the number of pro-cessors. The rows named Orthogonality and Conditionshow, respectively, theoretical bounds on ∥QTQ − I∥,where Q = [q1, . . . ,qm], and the condition (if any) thatmust be satisfied for the method to be applicable. Thematrix A is defined as A = [a1,a2, . . . ,am]. The resultsfor MGS, CGS2 and House are taken from [10].From the table, we can conclude that the method

based on the compact WY representation is superior toCGS2 in terms of applicability and to the Householder-based method in terms of parallel granularity.

3. Experimental results

We evaluated the performance and accuracy of theincremental orthogonalization algorithm based on thecompact WY representation. The computational envi-ronment is a PC cluster with Intel Xeon processors andwe used up to 16 nodes. The program was written inFORTRAN and MPI and compiled with the PGI FOR-TRAN compiler. All the calculations were done withdouble precision floating-point arithmetic.

3.1 Numerical accuracy

To evaluate the accuracy of Algorithm 2, we gen-erated the vectors a1,a2, . . . ,am using correlated ran-dom numbers so that the condition number κ(A) ofA = [a1,a2, . . . ,am] takes a specified value. We set n =20, 000, m = 50 and varied κ(A) from 1 to 1016. Thematrix A was scaled so that its Frobenius norm is 1.Fig. 1 shows the accuracy as a function of κ(A). Here,

the orthogonality is measured by maxi,j |(QTQ − I)ij |,where Q = [q1, . . . ,qm]. Also, we plot the residualmaxi,j |(A−QR)ij |, where R is the upper triangular ma-trix defined in Section 2.3.It is clear from the graph that both orthogonality and

residual are independent of κ(A) and are of the orderof machine epsilon. This is in consistent with the theo-retical prediction made in Section 2.3. The behavior oforthogonality and residual is almost the same for othervalues of n and m.

– 91 –


Condition number of A

1 104

10−17

10−15

10−13

10−11

10−9

10−7

10−5

10−3

10−1

108 1012 1016

orthogonality

residual

Ort

ho

go

nal

ity

/ R

esid

ual

Fig. 1. Orthogonality and residual versus κ(A).

3.2 Parallel performance

In parallelizing Algorithm 2, we used block distribu-tion to distribute each of the vectors ai and qi (1 ≤i ≤ m) among the processors. Hence, if the number ofprocessors is P , each processor is allocated sub-vectorsof length n/P . To compute a matrix-vector productlike Y T

i−1ai, the processors first calculate partial matrix-vector products using the data they own, and then sumup the partial results using MPI AllReduce to get thefull result. Calculations that involve onlyO(m) orO(m2)operations, such as the product of Ti−1 and Y T

i−1ai, aredone redundantly on all the processors. This makes sub-sequent calculations easier.Figs. 2 and 3 show the parallel performance of our

program. In Fig. 2, m = 10 and n is varied from 20, 000to 80, 000, while in Fig. 3, m = 50 and n is varied from5, 000 to 20, 000. The horizontal axis is the number ofprocessors and the vertical axis is the parallel perfor-mance measured in GFLOPS, where we assumed thenumber of operations to be 4m2n (see Table 1). It canbe seen that our program achieves reasonable speedup,especially when n is large.

4. Conclusion

In this paper, we presented an algorithm for incremen-tal orthogonalization based on the compact WY repre-sentation. It requires the same amount of computationalwork as the classical Gram-Schmidt with reorthogonal-ization or the Householder-based algorithm, but is su-perior to the former in terms of applicability and to thelatter in terms of parallel granularity. Numerical exper-iments on a PC cluster demonstrate accuracy and scal-ability of the algorithm.

Acknowledgments

We are grateful to the anonymous referee and the edi-tor, whose comments helped us to improve the quality ofthis paper. We also would like to thank the participantsof the annual meeting of the Japan Society of Indus-trial and Applied Mathematics for valuable comments.This work is partially supported by the Ministry of Ed-ucation, Science, Sports and Culture, Grant-in-aid forScientific Research.

Number of processors

1 2 4 8 16

n=20000

n=40000

n=80000

0

2

4

6

8

10

12

14

16

18

Per

form

ance

(G

FL

OP

S)

Fig. 2. Parallel performance (m = 10).

Number of processors

1 2 4 8 16

n=5000

n=10000

n=20000

0

2

4

6

8

10

12

14

16

18

Per

form

ance

(G

FL

OP

S)

Fig. 3. Parallel performance (m = 50).

References

[1] W. Arnoldi, The principle of minimized iterations in the so-lution of the matrix eigenvalue problem, Quart. Appl. Math.,9 (1951), 17–29.

[2] G. Golub and C. van Loan, Matrix Computations, Johns Hop-

kins Univ. Press, Baltimore, 1996.[3] Y. Saad and H. Schultz, GMRES: a generalized minimal

residual algorithm for solving nonsymmetric linear systems,SIAM J. Sci. Stat. Comput., 7 (1986), 856–869.

[4] E. Gallopoulos and Y. Saad, Efficient solution of parabolicequations by Krylov approximation methods, SIAM J. Sci.Stat. Comput., 13 (1992), 1236–1264.

[5] I. Dhillon and B. Parlett, Multiple representations to com-pute orthogonal eigenvectors of symmetric tridiagonal matri-ces, Linear Algebra Appl., 387 (2004), 1–28.

[6] N. Higham, Accuracy and Stability of Numerical Algorithms,

SIAM, Philadelphia, 2002.[7] J. Daniel, W. Gragg, L. Kaufman and G. Stewart, Reorthog-

onalization and stable algorithms for updating the Gram-Schmidt QR factorization, Math. Comp., 30 (1976), 772–795.

[8] H. Walker, Implementation of the GMRES method usingHouseholder transformations, SIAM J. Sci. Stat. Comput.,9 (1988), 152–163.

[9] R. Schreiber and C. van Loan, A storage-efficient WY repre-

sentation for products of Householder transformations, SIAMJ. Sci. Stat. Comput., 10 (1989), 53–57.

[10] J Demmel, L. Grigori, M. Hoemmen and J. Langou,Communication-optimal parallel and sequential QR and LU

factorizations, LAPACK Working Notes, No. 204, 2008.

– 92 –


Analysis of downgrade risk in credit portfolios

with self-exciting intensity model

Suguru Yamanaka1, Masaaki Sugihara2 and Hidetoshi Nakagawa3

1 Mitsubishi UFJ Trust Investment Technology Institute Co., Ltd, 4-2-6 Akasaka, Minato-ku,Tokyo 107-0052, Japan

2 Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1Hongo, Bunkyo-ku, Tokyo 113-8656, Japan

3 Graduate School of International Corporate Strategy, Hitotsubashi University, 2-1-2 Hitotsu-bashi, Chiyoda-ku, Tokyo 101-8439, Japan

E-mail yamanaka mtec-institute.co.jp

Received August 19, 2011, Accepted November 25, 2011

Abstract

We present an intensity based credit rating migration model and execute empirical analyses onforecasting the number of downgrades in some credit portfolios. The framework of the modelis based on so-called top-down approach. We firstly model economy-wide rating migrationintensity with a self-exciting stochastic process. Next, we characterize the downgrade intensityfor the underlying sub-portfolio with some thinning model specified by the distribution ofcredit ratings in the sub-portfolio. The results of empirical analyses indicate that the modelis to some extent consistent with downgrade data of Japanese firms in a sample period.

Keywords credit risk, rating migration, self-exciting intensity


1. Introduction

In credit portfolio risk management, we quantifycredit risks with some model of credit event occurrencessuch as credit rating migrations and defaults. In thispaper, we introduce an intensity based credit rating mi-gration model for risk analyses of credit portfolios andperform statistical test for model validation with creditmigration samples of Japanese firms.Our modeling framework is based on the top-down ap-

proach studied in [1,2]. Namely, our model is constitutedby two parts, top-part and down-part. In the top-part,we model rating migration in whole economy with eventintensities. In this paper, we use a self-exciting processfor the intensity model, where the term “self-exciting”means that the intensity increases when a event occurs.Several self-exciting type intensity models have been re-cently used in credit risk modeling to capture creditevent clusters (see [2–5]). Credit event cluster is well-known feature of credit events. For example, the histor-ical data of the monthly downgrade number in Fig. 1show that there are downgrade clusters, from 1998 to2000, from 2001 to 2003 and from 2008 to 2010. Specif-ically, we use the self-exciting model proposed in [5].In the down-part, we obtain intensity models of sub-

portfolios with a thinning model. Our thinning model isspecified by some factors which represent characteristicsof sub-portfolios. We adopt rating distribution for oneof the factors to consider the size of credit portfolios, incommon with the thinning model of [5].To check the adequacy of the model, we perform em-

pirical analyses on downgrade forecast. First, we specifyour model by maximum likelihood approach and per-

1998

010

20

30

40

50

2000 2002 2004 2006 2008 2010

Event

num

ber

Fig. 1. The monthly number of downgrades in Japan announcedby R&I.

form statistical test for in-sample fit. Second, we per-form statistical test for out-of-sample forecast with thefitted model. Specifically, with the estimated intensitymodel and thinning model, we derive the distributionof the downgrades number in a reference bond portfo-lio underlying a collateralized bond obligation, and testthe validity of the distribution with realized downgradenumber.The organization of this paper is as follows. Section

2 provides a rating migration model for credit portfo-

– 93 –


lios. Section 3 shows empirical analyses on downgrades.Section 4 gives some concluding remarks.

2. Model

In this section, we introduce an intensity model ofeconomy-wide rating migrations. In addition, we spec-ify intensities of rating migration in sub-portfolios by athinning model.

2.1 Intensity model for economy-wide events

We model the uncertainty in the economy by a fil-tered complete probability space (Ω,F ,P, Ft), whereP is the actual probability measure and Ft is a right-continuous and complete filtration. For each type of thecredit event, consider an increasing sequence of totallyinaccessible Ft-stopping times 0 < T1 < T2 < · · · ,which represents the ordered event times in the wholeeconomy. We denote the counting process of the eventby Nt =

∑n≥1 1Tn≤t.

Suppose Nt has intensity process λt. Namely, λt isa Ft-progressively measurable non-negative process,

and the process Nt−∫ t

0λsds is an Ft-local martingale.

Let λt be the self-exciting stochastic process:

dλt = κt(ct − λt)dt+ dJt,

Jt =∑n≥1

(min(δλTn−, γ)1Tn≤t),

κt = κλTNt, ct = cλTNt

,

where λt− := lims↑t λs and the constants κ > 0, c ∈(0, 1), δ > 0, γ ≥ 0, λ0 > 0 are parameters.

2.2 Thinning model

In down-part, we decompose the economy-wide eventintensity into sub-portfolio event intensity with a thin-ning model based on rating distributions.Suppose each firm in the economy is associated with a

credit rating. There are K credit ratings and we denotecredit ratings by 1, 2, . . . ,K, in order of credit quality.Let Sk

i denote the set of k rated firms in portfolio Si

(i = 1, 2, . . . , I, k = 1, 2, . . . ,K). At each time, eachrated firm belongs to one of sub-portfolios Sk

i . Let Nit (k)

be the counting process of credit events in sub-portfolioSki . The counting process is given by

N it (k) =

∑n≥1

1Tn≤t∩Tn∈τ(Ski ),

where τ(S) denotes the set of event times in portfolio S.To obtain intensity of the counting process N i

t (k), weintroduce Ft-adapted process Zi

t(k). Zit(k) repre-

sents the conditional probability that the credit event isthe event in the portfolio Sk

i , given that an event oc-curs in the economy. Zi

t(k) satisfy the following prop-erties: (a) Zi

t(k) takes values in the unit interval [0, 1],(b)

∑i,k Z

it(k) = 1. From [4, Proposition 2.1], we obtain

the intensity associated with counting process N it (k) as

follows:

λit(k) = Zit(k)λt.

To analyze credit risk in sub-portfolios, we introducea thinning model characterized by the distribution of

credit ratings in the sub-portfolios. In particular, wespecify thinning model for downgrade intensity as fol-lows:

Zit(k) = ζiZi

t(k) (1)

where

Zit(k) =

Xit(k)

K−1∑k=1

X∗t (k)

1∑K−1

k=1X∗

t (k)>0, (2)

and the quantity Zit(k) denotes the rating distribution of

portfolio. Xit(k) denotes the number of firms in the port-

folio Ski at time t. X∗

t (k) denotes the number of k-ratedfirms in the whole economy at time t . The denomina-tor in the thinning model (2) represents the number offirms with downgrade possibility. The quotients in (2) istaken to be 0 when the denominator vanishes. The quan-tity ζit(k) represents the portfolio characteristic that therating distribution of portfolio can not capture. Whilewe use thinning models (1) with two factors, we canconsider some additional factors in thinning models toobtain more specific sub-portfolio intensities.

3. Empirical analyses

In this section, we estimate the intensity model withthe downgrades samples of Japanese firms. Then, weestimate the thinning model for a reference portfoliounderlying the collateralized bond obligation called J-Bond Limited. In addition, we perform validation teston in-sample fitness and out-sample downgrade forecast.Specifically, we firstly divide downgrade samples intofirst half period and second half period. Next, we es-timate the model with the first half period and performstatistical tests on fitness. Then, we derive the down-grade distribution in second period with the model, andcompare it with the realized downgrades in the secondperiod. As our validation test is statistical one, the re-sults of our test indicate whether the model is rejectedor not. In other words, the testing methods in this paperdo not necessary give active support on model validity.

3.1 Data

The data for parameter estimation are the samplerecords on rating changes of Japanese firms from April1, 1999 to March 31, 2004. The ratings are announcedby Rating and Investment Information, Inc. (R&I). Dur-ing the sample period of 1243 working days, there are509 downgrades and 55 up-grades. We focus on down-grades, because the number of up-grades are too smallto estimate up-grade intensity and to discuss the modeladequacy. Excluding no-working days, we transformedcalendar times April 1, 1999, April 1, 2000, . . . tot = 0, 1, . . . . There are a lot of events in the same day, sowe slide the event times with uniform random numberso as to make every event times different. We employ thereference credit portfolio underlying J-Bond Limited asa target sub-portfolio (corresponding index number isi = 1). J-Bond Limited is a collateralized bond obliga-tion with the reference portfolio consisted by 67 corpo-rate bonds. J-Bond Limited was issued in 1999 and re-

– 94 –


demption date of tranches were from 2002 to 2003. Thedetails of J-Bond Limited are described in [6].For testing in-sample fit and out-sample forecast, we

divide the samples into first half period, from April 1,1999 to September 27, 2001 ([0, 2.5)) and the second halfperiod, from September 28 to March 31, 2004 ([2.5, 5.0]).The downgrade samples in the first half period are usedfor estimating models and testing in-sample fit. Thedowngrade samples in the second half period are usedfor testing out-sample forecast.

3.2 Estimation procedure

For estimating event intensity models, we apply themaximum likelihood method performed in [3]. Supposethat we have event time samples 0 < T1 < T2 < · · · <TN (≤ H). Then the log-likelihood function of the inten-sity is following:

N∑n=1

log λTn− −∫ H

0

λsds. (3)

We specify the parameters that maximize (3).To test the validity of the estimated intensity model to

the data, we apply the Kolmogorov-Smirnov test, that[3] performed as follows. First, we transform the eventtimes TnNn=1 into An by

An :=

∫ Tn

0

λsds.

We perform the Kolmogorov-Smirnov test using the factthat AnNn=1 will be the jump times of the standardPoisson process in the case of TnNn=1 are generated byλt. Thus, the null hypothesis is that An+1 − AnN−1

n=1

are independent and exponentially-distributed (param-eter 1).We performed the maximum likelihood estimation of

the parameters with the free statistical software packageR. Specifically, we used the intrinsic function “optim”to maximize the objective function. We performed themaximization for 30 sets of initial values, and finallychosen the estimates that maximize the objective func-tion among the initial value sets. In addition, we per-formed Kolmogorov-Smirnov test with R, using the in-trinsic function “ks.test”.The log-likelihood function of thinning models is as

follows:

log(L(ζi | Hit))

=∑

1i(Tn)=1

log(ZiTn

(k)) +∑

1i(Tn)=0

log(1− ZiTn

(k)),

where Hit = (Tn,1i(Tn))n≤Nt and

1i(Tn) =

1 (Tn ∈ τ(Si)),

0 (Tn /∈ τ(Si)).

We also performed the maximum likelihood estimationof the parameters with R.

3.3 Testing in-sample fit

Table 1 shows estimation result for intensity modelobtained from downgrade samples in the first period.For estimation tractability, we restricted the value of

Table 1. Maximum likelihood estimates of the downgrade inten-sity model (data: April 1, 1999–September 27, 2001). Values inparentheses are standard estimation errors.

κ c δ γ λ00.937 0.200 2.706 150 63.885(0.115) (0.015) (0.101) - (60.368)

Table 2. Average number and maximum of downgrades obtainedby the model and realized downgrade number, in the first span.“Percentiles” are the percentile of realized downgrade numberin the model distribution. The “complement” means the com-

plement portfolio of J-Bond Limited.

Economy J-Bond Limited Complement

ModelAverage 270.066 31.848 238.218Max 456 68 399

Realized number 267 31 236(Percentile) (46.03%) (46.34%) (56.68%)

P-value 0.865 1.000 0.919

γ within γ = 100, 125, 150, 175, 200. With Kolmogorov-Smirnov test for in-samples, we obtained P-value of0.694, indicating the intensity model is not rejected instandard significant level. Also, we obtained parametervalue of thinning model for J-Bond Limited as ζ1 = 1.27.As the value of parameter ζ1 exceeds 1, the downgradefrequency of J-Bond Limited reference portfolio is higherthan that of the rating distribution indicates.Table 2 shows the result of in-sample fitness on

downgrade number, namely, comparison of the distri-bution of downgrades obtained by the model and real-ized downgrade number in the first period. Especially,we focus on downgrades in whole Japanese bond is-suer portfolio with credit ratings (Economy), down-grades in J-Bond Limited reference portfolio (J-BondLimited) and downgrades in complement portfolio of J-Bond Limited (Complement). To derive the distributionsof downgrades, we performed Monte-carlo simulationwith 100,000 scenarios. With the downgrades distribu-tion, we performed two-tailed test of realized downgradenumber and obtained P-values in Table 2. Specifically,P-values in Table 2 are the sum of probabilities of thedowngrade number whose probability is less than that ofrealized downgrade number. Comparison of the averageand realized downgrade number, the percentile of real-ized downgrade number in the model distribution andP-values indicate that the model is consistent with therealized number of downgrades. Namely, the estimationfor whole model (both top-part and down-part) workedwell with the first period samples.

3.4 Testing out-of-sample forecast

The result of out-of-sample fit test, namely compari-son of the model obtained by the first data with seconddata, is following. First, with Kolmogorov-Smirnov testfor out-of-sample fitness, we obtained P-values of 0.416,indicating the intensiy model is not rejected at stan-dard significant level. Table 3 shows comparison of themodel distribution of downgrade number and realizeddowngrade number. P-values in Table 3 indicate thatthe whole model is not rejected at standard significantlevel. Namely, the model is consistent with the out-of-

– 95 –


Table 3. Average number and maximum of downgrades obtainedby the model and realized downgrade number, in the secondspan. “Percentiles” are the percentile of realized downgradenumber in the model distribution. The “complement” means

the complement portfolio of J-Bond Limited.

Economy J-Bond Limited Complement

ModelAverage 274.344 32.350 241.994

Max 442 68 392

Realized number 242 24 218(Percentile) (19.33%) (11.23%) (23.60%)

P-value 0.409 0.248 0.456

Table 4. Average number and maximum of downgrades in J-Bond Limited obtained by the model when the downgrade num-ber in the complement portfolio is under 95% percentile (298downgrades) and over 95% percentile.

Down-grades numberin the complement portfolioUnder 298 Over 298

Average 31.978 39.290Max 65 68

sample.Now, we show one of features of our model, namely,

the model can capture the risk contagion among severalportfolios. As we considered economy wide self-excitingintensity for top-part model, the occurrence of an eventin a sub-portfolio increase the possibility of event occur-rence in the whole economy. That means the model cap-tures event risk contagions among portfolios. In the fol-lowing example, we see the model captures the risk con-tagion from the complement portfolio to J-Bond Limitedin the second period. Fig. 2 shows the conditional distri-bution on downgrades number in J-Bond Limited, con-ditioned on that the downgrade number in the comple-ment is under 95-percentile (298 downgrades) and over95-percentile. Table 4 shows the averages and the maxi-mums of both distribution in Fig. 2. Table 4 and Fig. 2indicate that as the downgrade risk in the complementportfolio increases, the downgrade risk of J-Bond Lim-ited increases.


We introduced the intensity based rating migrationmodel and performed goodness of fit test. Our modelis consisted by two parts, self-exciting intensity modelfor economy wide rating migrations and thinning modelbased on rating distributions. For testing model ade-quacy, we used downgrade samples of Japanese firmsfrom 1999 to 2004. We divided the sample period intofirst and second periods, then we estimated the modelswith first period and performed in-sample fitness test.The result of fitness test indicates the model estimationworked well. Also, the result of out-of-sample downgradeforecast indicates that the model prediction is consistentwith the downgrade out-of-samples.The opinions expressed here are those of the authors

and do not necessarily reflect the views or policies oftheir employers.

0

0.0

00

.02

0.0

40

.06

0.0

80

.10

10 20 30 40 50 60 70

less than 298 eventsmore than 298 events

Pro

bab

ilit

y

Event number

Fig. 2. Conditional distribution of downgrades number in J-Bond Limited over 2.5 years. The solid line indicates the condi-

tional distribution of downgrades number in J-Bond Limited, onthe condition that the number of down grades in the complementportfolio is under 298. The dash line indicates the conditionaldistribution of downgrades number in J-Bond Limited, on the

condition that the number of down grades in the complementportfolio is over 298.

Acknowledgments

This work was supported in part by Grant-in-Aid forScientific Research (A) No. 21243019 from Japan Soci-ety for the Promotion of Science (JSPS) and Global COEProgram “The research and training center for new de-velopment in mathematics”, MEXT, Japan.

References

[1] K. Giesecke, L. R. Goldberg and X. Ding, A top-down ap-proach to multi-name credit, Oper. Res., 59 (2011), 283–300.

[2] H, Nakagawa, Modeling of contagious downgrades and its ap-plication to multi-downgrade protection, JSIAM Letters, 2

(2010), 65–68.[3] H. Nakagawa, Analysis of records of credit rating transi-

tion with mutually exciting rating-change intensity model (inJapanese), Trans. JSIAM, 20 (2010), 183–202.

[4] K. Giesecke and B. Kim, Risk analysis of collateralized debtobligations, Oper. Res., 59 (2011), 32–49.

[5] S. Yamanaka, M. Sugihara and H. Nakagawa, Modeling ofcontagious credit events and risk analysis of credit portfolios,

Asia-Pacific Financial Markets, in press.[6] Rating and Investment Information, Inc., News release,

No.99-C-410, 1999.

– 96 –


Automatic verification of anonymity of protocols

Hideki Sakurada1

1 NTT Communication Science Laboratories, NTT Corporation, 3-1 Morinosato Wakamiya,Atsugi-shi, Kanagawa, 243-0198 Japan

E-mail sakurada.hideki lab.ntt.co.jp

Received October 3, 2011, Accepted December 6, 2011

Abstract

Anonymity is an important security requirement for protocols such as voting schemes. It isoften guaranteed by using anonymous channels such as mixnets. In this paper, we present atechnique for automatically verifying the anonymity of protocols that use anonymous channelsby using Proverif, a tool for the automatic verification of security protocols. We use thistechnique to verify the voting scheme developed by Fujioka, Okamoto, and Ohta.

Keywords anonymity, protocol, verification, security, Proverif

Research Activity Group Formal Approach to Information Security

1. Introduction

Designing a security protocol is an error-prone task.There are a large body of work and many tools forfinding errors and verifying the security of protocols.Proverif [1, 2] is one such tool that can automaticallyverify various security properties including anonymity.Anonymity is an important security property for pro-

tocols such as voting schemes. It is often guaranteed byusing anonymous channels such as mixnets [3]. The vot-ing scheme developed by Fujioka, Okamoto, and Ohta(FOO) is one of such protocols [4].Kremer and Ryan [5] have used Proverif to verify var-

ious security properties of FOO, except for anonymity.Delaune, Ryan, and Smyth [6] have developed a tech-nique for the automatic verification of anonymity andapplied it to protocols including FOO.In this paper, we develop a technique for the au-

tomatic verification of anonymity by using Proverif.Our technique is similar to that of Delaune, Ryan, andSmyth. While their technique enables us only to modelanonymous channels for publishing data to the envi-ronment, our technique enables us to model those forsending data to another participants. We also verify theanonymity of FOO by employing our technique.

2. Preliminaries

To describe protocols and their executions, we intro-duce the language used in Proverif and its semantics (see[1] for details). In this language, protocols are modeled asprocesses and messages exchanged between the partici-pants are modeled as terms. Table 1 summarizes the syn-tax for the terms and processes. Terms are subject to anequational theory Σ. Users of Proverif may extend it, forexample, with the equation dec(enc(M,k), k) = M formodeling symmetric-key encryption. We write Σ ⊢M =N if M and N are equal in Σ; otherwise Σ ⊢M = N .Intuitively, an execution of a process P is a sequence

of either of the following rewriting steps:

• If P has both an output subprocess N⟨M⟩.Q and

M,N ::= termsx, y, z variablesa, b, c, k, s namesf(M1, . . . ,Mn) constructor application

D ::= term evaluationsM termeval h(D1, . . . , Dn) function evaluation

P,Q,R ::= processesM⟨N⟩.P outputM(x).P input0 inactive processP | Q parallel composition!P replication(νc)P restrictionlet x = D in P else Q term evaluationif M = N then P else Q conditional

Table 1. Syntax for terms and processes.

an input subprocess N ′(x).R such that Σ ⊢ N = N ′

holds, then the message M is transmitted over thechannel N from the output subprocess to the inputsubprocess. Then these subprocesses are replacedwith Q and RM/x respectively where M/x isa substitution that replaces x with M .

• If P has a conditional subprocess if M =N then Q else R, this process is rewritten into Q ifΣ ⊢M = N holds and otherwise rewritten into R.

If more than one rewriting steps are possible, one of themis non-deterministically chosen. For example, a processP = (νc)(c⟨M⟩.0 | c(x).R1 | c(x).R2) has two possibleexecutions:

P → (νc)(0 | R1M/x | c(x).R2),

P → (νc)(0 | c(x).R1 | R2M/x).

Here 0 is the inactive process, which is often omitted.

– 97 –

JSIAM Letters Vol. 3 (2011) pp.97–100 Hideki Sakurada

M ⇓Meval h(D1, . . . , Dn) ⇓ σNif h(N1, . . . , Nn)→ N ∈ defΣ(h) and σ issuch that for all i, Di ⇓Mi and Σ ⊢Mi = σNi

P | 0 ≡ P P ≡ PP | Q ≡ Q | P Q ≡ P ⇒ P ≡ Q(P |Q)|R ≡ P |(Q|R) P ≡ Q,Q ≡ R⇒ P ≡ Q(νa)(νb)P ≡ (νb)(νa)P P ≡ Q⇒ P | R ≡ Q | R(νa)(P |Q) ≡ P |(νa)Q P ≡ Q⇒ (νa)P ≡ (νa)Qif a /∈ fn(P )

if M = N then P else Q≡ let x = eq(M,N) in P else Q

N⟨M⟩.Q | N ′(x).P → Q | PM/xif Σ ⊢ N = N ′ (Red I/O)

let x = D in P else Q→ PM/xif D ⇓M (Red Fun 1)

let x = D in P else Q→ Qif there is no M such that D ⇓M (Red Fun 2)

!P → P |!P (Red Repl)P → Q⇒ P | R→ Q | R (Red Par)P → Q⇒ (νa)P → (νa)Q (Red Res)P ′ ≡ P, P → Q,Q ≡ Q′ ⇒ P ′ → Q′ (Red ≡)

Table 2. Semantics for terms and processes.

We write this process P as R1 + R2 if the variable xoccurs neither in R1 nor in R2.The formal semantics for terms and processes is shown

in Table 2. We refer readers to [1] for details.

Example 1 We specify a simple voting scheme V asa process as follows:

V = (νc1)(νc2)(c1⟨x1⟩ | c2⟨x2⟩ | MIX(c1, c2, cp)),

MIX = MIX0 +MIX1,

MIX0 = c1(y1).c2(y2).cp⟨(y1, y2)⟩,

MIX1 = c1(y1).c2(y2).cp⟨(y2, y1)⟩.

This scheme consists of two voter subprocesses c1⟨x1⟩and c2⟨x2⟩ and a process MIX that models an anony-mous channel. These processes communicate over pri-vate channels c1 and c2, and a public channel cp . Intu-itively, communications over the public channel cp arevisible from an attacker while those over the privatechannels c1 and c2 are not. One of the possible execu-tions of V is shown below:

V → (νc1)(νc2)(c1⟨x1⟩ | c2⟨x2⟩ | MIX0)

→ (νc2)(c2⟨x2⟩ | c2(y2).cp⟨(x1, y2)⟩)

→ cp⟨(x1, x2)⟩).

Here the subprocessMIX of V first reduces toMIX0, andthen the first and the second voters send their respectivevotes x1 and x2 to MIX0 in order. If there is anotherprocess that runs in parallel with V , it may receive thepair (x1, x2) over the public channel cp . Other executionsare also possible: MIX may reduce to MIX1, and thesecond voter may send a vote before the first voter.

N⟨M⟩.Q | N ′(x).P → Q | PM/x (Red I/O)if Σ ⊢ fst(N) = fst(N ′) and Σ ⊢ snd(N) = snd(N ′)

let x = D in P else Q→ Pdiff[M1,M2]/xif fst(D) ⇓M1 and snd(D) ⇓M2 (Red Fun 1)

let x = D in P else Q→ Q (Red Fun 2)if there is no M1 such that fst(D) ⇓M1 andthere is no M2 such that fst(D) ⇓M2.

Table 3. Semantics for biprocesses.

3. Specifying anonymity

The anonymity of a protocol is specified in terms ofthe observational equivalence between two instances ofthe protocol. Intuitively, two processes are observation-ally equivalent if and only if no attacker can distinguishbetween these processes by interacting with either ofthem. For example, the anonymity of the scheme V inExample 1 is specified by the observational equivalence

V v1/x1v2/x2 ∼ V v2/x1v1/x2

between V v1/x1v2/x2 and V v2/x1v1/x2. InV v1/x1v2/x2, the first and second voters vote forcandidates v1 and v2 respectively. In V v2/x1v1/x2,the voters vote for v2 and v1 respectively. Thisanonymity holds because, intuitively, even if an attackerobserves the pair (v1, v2) of votes on the public channelcp , he does not know which process MIX0 or MIX1 hassent it and which voter has sent v1. Formally, observa-tional equivalence is defined as follows [1]:

Definition 2 An evaluation context C is a process thatis built from a hole [], parallel compositions C | P andP | C with a process, and a restriction (νa)C.A process P emits on M (P ↓M ) if and only if P ≡

C[M ′⟨N⟩.R] for some evaluation context C that does notbind fn(M) and Σ ⊢M =M ′.Observational equivalence ∼ is the largest symmetric

relation R on closed processes such that P R Q implies

• if P ↓M then Q ↓M ;

• if P → P ′ then Q→ Q′ and P ′ R Q′ for some Q′;

• C[P ] R C[Q] for all evaluation contexts C.

4. Automatic verification in Proverif

Proverif verifies a sufficient condition of the observa-tional equivalence between processes fst(P ) and snd(P )for a given biprocess P . A biprocess is a process in whichterms of the form diff[M1,M2] may occur. Processesfst(P ) and snd(P ) are obtained from a biprocess P by re-placing each term of the form diff[M1,M2] with M1 andM2 respectively. The semantics for biprocesses is definedby replacing rules (Red I/O), (Red Fun 1), and (RedFun 2) with those in Table 3. For example, a biprocessif M = N then P else Q reduces to P if both equalitiesΣ ⊢ fst(M) = fst(N) and Σ ⊢ snd(M) = snd(N) hold. Itreduces to Q if neither of the equalities holds. It reducesto no biprocess if exactly one of these equations holds.The sufficient condition is shown in the following lemma:

Lemma 3 ( [1, Theorem 1] ) Let P0 be a closedbiprocess. Then fst(P0) ∼ snd(P0) if for any evaluation

– 98 –


context C that has no occurrence of diff and any reduc-tion sequence C[P0] →∗ P , fst(P ) → Q1 implies thatP → Q for some biprocess Q with fst(Q) ≡ Q1, andsymmetrically for snd(P )→ Q2.

Although the sufficient condition is useful in verify-ing many security properties, it does not work for theanonymity of many protocols. For example, to verify theanonymity of V in Example 1, we consider the biprocessP0 = V M1/x1M2/x2 where M1 = diff[v1, v2] andM2 = diff[v2, v1]. Consider the following process

A = c(x).if x = (v1, v2) then c⟨1⟩ else 0

and an evaluation context C[] = A | []. Similarly to theexecution shown after Example 1, we have an execution

C[P0]→→→ A | cp⟨(M1,M2)⟩

→ if (M1,M2) = (v1, v2) then cp⟨1⟩ else 0.

Let P be the last process above. Then, since we havefst(M1) = v1 and fst(M1) = v2, we have Σ ⊢ fst((M1,M2)) = fst((v1, v2)), hence fst(P ) → cp⟨1⟩. However,since we have snd(M1) = v2 and snd(M2) = v1, we donot have Σ ⊢ snd((M1,M2)) = snd((v1, v2)). Hence Preduces to no biprocess. Thus P0 does not satisfy thesufficient condition in Lemma 3. In fact, Proverif fails toverify the observational equivalence.

5. Our technique and its soundness

In this section, we introduce a technique to overcomethe problem described in the previous section. Then weshow the soundness of the technique.To overcome the problem, we replace the anonymous

channel MIX with an alternative representation MIX′ ofthe anonymous channel defined as follows:

MIX′ = MIX′0 +MIX′

1,

MIX′0 = c1(y1).c2(y2).cp⟨(diff[y1, y2], diff[y2, y1])⟩,

MIX′1 = c1(y1).c2(y2).cp⟨(diff[y2, y1], diff[y1, y2])⟩.

The problem is fixed with this MIX′. For example,replace MIX with MIX′ in P0 in the previous section.The reduction sequence in the example becomes

C[P0]→∗ A | cpub⟨(M ′1,M

′2)⟩

→ if (M ′1,M

′2) = (v1, v2) then cp⟨1⟩ else 0

where M ′1 = diff[diff[v1, v2], diff[v2, v1]] and M ′

2 = diff[diff[v2, v1], diff[v1, v2]]. Let P

′ be the last process above.Since we have fst(M ′

1) = fst(diff[v1, v2]) = v1 and simi-larly fst(M ′

2) = v2, we have Σ ⊢ fst((M ′1,M

′2)) = (v1,

v2), hence fst(P ′) → cp⟨1⟩. Similarly, since we havesnd(M ′

1) = v1 and snd(M ′2) = v2, we also have Σ ⊢

snd((M ′1,M

′2)) = (v1, v2). Thus we have P ′ → cp⟨1⟩ as

a biprocess. Thus this reduction sequence satisfies thesufficient condition in Lemma 3. The other reduction se-quences are similarly checked by using Proverif, and wesucceed in verifying fst(P ′

0) ∼ snd(P ′0).

Now we show the soundness of the replacement byproving the equivalence between MIX and MIX′, whichmeans that the replacement does not change the obser-vational equivalence.

Theorem 4 We have observational equivalences fst(MIX′) ∼ MIX and snd(MIX′) ∼ MIX.

Proof The first equivalence trivially follows from thedefinition of fst. From the definition of ≡, we have

(νc)(c(x).MIX0 | c(x).MIX1 | c⟨M⟩)

≡ (νc)(c(x).MIX1 | c(x).MIX0 | c⟨M⟩).

Thus we have MIX ≡ snd(MIX′) from the definitions of‘+’, snd, and MIX′. Thus the claim follows from ≡⊆∼,which is shown as follows: For any processes P and Qsuch that P ≡ Q,

• If P ↓M , then we have P ≡ C[M ′⟨N⟩.R] and Σ ⊢M =M ′ for some process M ′⟨N⟩.R and evaluationcontext C that does not bind fn(M ′). Then Q ≡P ≡ C[M ′⟨N⟩.R], hence Q ↓M .

• If P → P ′, we have Q → P ′ from the definition of→ and have P ′ ≡ P ′ from the definition of ≡.• For any evaluation context C, we have C[P ] ≡C[Q]. This is shown by induction on the construc-tion of C and using the definition of ≡.

(QED)

6. Application to anonymity of FOO

We employ our technique to verify the anonymity ofthe FOO protocol on Proverif. The entire script for theverification is shown in Table 4. In the implementationof Proverif, the function symbol diff is renamed choice.There are two voters in this script. The first one votesfor candidates cand1 and cand2 in the first and secondexecutions respectively. The second one votes for can-didates cand2 and cand1 in the first and second execu-tions respectively. These voters first obtain signatures ontheir votes from the administrator admin through pub-lic channels. They then publish the commitments on thevotes and the keys to open the commitments throughanonymous channels. Anonymous channels are modeledby the process mix, which is same as MIX′, except thatthe non-deterministic choice ‘+’ is expanded accordingto the definition. When this script is input, Proverif out-puts ‘Observational equivalence is true.’ It means thatthese two executions are not distinguished by the at-tacker and that the anonymity of the voters. We stressthat Proverif will fail to verify the anonymity if our tech-nique is not employed.

7. Related work

As we have mentioned, our technique is similar to thatdeveloped by Delaune, Ryan, and Smyth. They used thefollowing technique to verify anonymity of protocols in-cluding FOO. They consider protocols of the form

W = let x = x1 in P | let x = x2 in P

and anonymityWv1/x1v2/x2 ∼Wv2/x1v1/x2.Here x, x1, and x2 are sequences of variables, and v1 andv2 are sequences of names. The process P is a processextended with some annotations and containing neitherparallel composition (‘|’) nor conditional (let). They also

– 99 –


rewrite protocols to overcome the problem described inSection 4. For example, consider the following protocol:

V ′ = let x = x1 in P | let x = x2 in P,

P = (**swap*)c⟨x⟩

where (**swap*) is an annotation that assists rewrit-ing. As with the protocol V , Proverif fails to verify theanonymity of this protocol. They therefore rewrite thisprotocol into the equivalent one:

V ′′ = let x = diff[x2, x1] in P | let x = diff[x1, x2] in P.

Proverif can verify the anonymity of this protocol, giventhe biprocess V ′′diff[v1, v2]/x1diff[v2, v1]/x2.Both their technique and ours enable automatic ver-

ification by rewriting protocols into equivalent ones. Inaddition to the above rewriting, they also introduceanother annotation and rewriting rule for protocols inwhich agents are synchronized. However, they consideronly protocols of the above form, in which P does notcontain parallel compositions. For this reason, to verifya protocol consisting of some processes communicatingwith each other such as the voting scheme V , we mustwrite a process P that simulates these processes. For ex-ample in their verification of the FOO voting protocol,they combined a voter, a portion of the administrator,and an anonymous channel into a single process of theabove form. On the other hand, such a transformationis not necessary with our technique.

8. Conclusion

In this paper, we described a technique we have de-veloped for the automatic verification of anonymity ofprotocols that use anonymous channels and proved itssoundness. We also used the technique to verify theanonymity of the FOO protocol.

References

[1] B. Blanchet, M. Abadi and C. Fournet, Automated verificationof selected equivalences for security protocols, J. Logic Algebr.

Progr., 75 (2008), 3–51.[2] B. Blanchet, Automatic verification of correspondences for se-

curity protocols, J. Comput. Secur., 17 (2009), 363–434.[3] D. L. Chaum, Untraceable electronic mail, return addresses,

and digital pseudonyms, Commun. ACM, 24 (1981), 84–90.[4] A. Fujioka, T. Okamoto and K. Ohta, A Practical Secret Vot-

ing Scheme for Large Scale Elections, in: Advances in Cryp-tology - AUSCRYPT’92, J. Seberry and Y. Zheng eds., Lect.

Notes Comput. Sci., Vol. 718, pp. 244–251, Springer-Verlag,Berlin, 1993.

[5] S. Kremer and M. D. Ryan, Analysis of an Electronic Voting

Protocol in the Applied Pi Calculus, in: Programming Lan-guages and Systems - 14th European Symposium on Program-ming, ESOP 2005, M. Sagiv ed., Lect. Notes Comput. Sci.,Vol. 3444, pp. 186–200, Springer-Verlag, Berlin, 2005.

[6] S. Delaune, M. Ryan and B. Smyth, Automatic Verificationof Privacy Properties in the Applied pi Calculus, in: TrustManagement II, Y.Karabulut, J. C.Mitchell, P. Herrmann andC. Damsgaard Jensen eds., IFIP Advances in Information and

Communication Technology, Vol. 263, pp. 263–278, Springer-Verlag, Berlin, 2008.

(* Defs. of commitment and signature schemes *)

fun ok/0. fun commit/2.

fun vk/1. fun blind/3. fun bsign/2. fun sign/2.

reduc open(commit(m, k), m, k) = ok.

reduc extract_open(commit(m, k), k) = m.

reduc unblind(bsign(blind(m, r, vk(sk)), sk),

m, r, vk(sk))

= sign(m, sk).

reduc verify(sign(m, sk), m, vk(sk)) = ok.

reduc extract_blind(blind(m, r, k)) = k.

reduc extract_bsign(bsign(m, sk)) = (m, vk(sk)).

reduc extract_sign(sign(m, sk)) = (m, vk(sk)).

(* Definition of processes in FOO *)

free ca, cco, ccv, cand1, cand2.

let voter =

new rc; new rb; new rb0;

let com = commit(v, rc) in

let b = blind(com, rb, vkA) in

out(ca, (sign(b, sk), b));

in(ca, bs);

let sig = unblind(bs, com, rb, vkA) in

if verify(sig, com, vkA) = ok then

out(cmv, (sig, com));

out(cmo, (v, rc)).

let mix =

new ch_choice;

(out(ch_choice, ()) |

(in(ch_choice, y);

in(cin1, m0); in(cin2, m1);

out(cout, (choice[m0, m1], choice[m1, m0]))) |

(in(ch_choice, y);

in(cin1, m0); in(cin2, m1);

out(cout, (choice[m1, m0], choice[m0, m1])))).

let admin =

in(ca, (s, b));

if verify(s, b, vk(sk1)) = ok then

out(ca, bsign(b, skA));

in(ca, (s’, b’));

(if verify(s’, b’, vk(sk2)) = ok then

out(ca, bsign(b’, skA)))

else if verify(s, b, vk(sk2)) = ok then

out(ca, bsign(b, skA));

in(ca, (s’, b’));

(if verify(s’, b’, vk(sk1)) = ok then

out(ca, bsign(b’, skA))).

process

new sk1; new sk2; new skA;

new cmv1; new cmo1; new cmv2; new cmo2;

((let v = choice[cand1, cand2] in

let sk = sk1 in let vkA = vk(skA) in

let cmv = cmv1 in let cmo = cmo1 in voter) |

(let v = choice[cand2, cand1] in

let sk = sk2 in let vkA = vk(skA) in

let cmv = cmv2 in let cmo = cmo2 in voter) |

(let cin1 = cmv1 in let cin2 = cmv2 in

let cout = ccv in mix) |

(let cin1 = cmo1 in let cin2 = cmo2 in

let cout = cco in mix) |

admin)

Table 4. Verification of FOO in Proverif.

– 100 –

JSIAM Letters Vol.3 (2011) ISBN : 978-4-9905076-2-6

ISSN : 1883-0609

©2011 The Japan Society for Industrial and Applied Mathematics Publisher :


4F, Nihon Gakkai Center Building

2-4-16, Yayoi, Bunkyo-ku, Tokyo, 113-0032 Japan

tel. +81-3-5684-8649 / fax. +81-3-5684-8663

j s i a mjsiaml.jsiam.org/ebooks/jsiamletters_vol3-2011.pdf · keywords calculus of variations,...

Documents