an effective setting of hierarchical cell structure for the fast multipole boundary element method

24
Journal of Computational Acoustics, Vol. 13, No. 1 (2005) 47–70 c IMACS AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD Y. YASUDA Institute of Industrial Science, The University of Tokyo 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan [email protected] T. SAKUMA Institute of Environmental Studies, The University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan [email protected] Received 16 November 2003 Revised 30 June 2004 The fast multipole boundary element method (FMBEM) is an advanced BEM that leads to drastic reduction of processing time and memory requirements in a large-scale steady-state sound field analysis. In the FMBEM, hierarchical cell structure is employed to apply multipole expansion in multiple levels, and the setting of the hierarchical cell structure considerably affects the computa- tional efficiency of the FMBEM. This paper deals with effective settings of hierarchical cell structure for taking full advantage of the FMBEM. A numerical study with objects of different shapes with the same DOF shows that both the computational complexity and the memory requirements with the FMBEM were greater for 1D-shaped objects than for 2D- or 3D-shaped ones, without a special setting of hierarchical cell structure for each problem. An effective setting for 1D-shaped objects is derived through theoretical and numerical studies, where special considerations are given to the arrangement of the cell structure and the treatment of translation coefficients between cells. This setting allows for efficient calculations not dependent on the shape of an analyzed object. A simple method to arrange hierarchical cell structure is proposed, which realizes the derived setting for arbitrarily-shaped problems. Keywords : Fast multipole algorithm; hierarchical cell structure; boundary element method; Helmholtz equation. 1. Introduction The fast multipole algorithm (FMA), which was originally proposed by Rokhlin 1 and devel- oped by Greengard 2 for N-body problems, has recently gained popularity as a fast algorithm of large-scale problems in various fields. 3,4 As an application of the algorithm, many studies have been conducted on the use of the FMA with the boundary element method (BEM) to reduce its large computational time and memory requirements in large-scale problems. 58 This advanced BEM, known as the fast multipole BEM (FMBEM), has been also studied in the field of acoustics, not only for developing the theory, 913 but also for applications 47 J. Comp. Acous. 2005.13:47-70. Downloaded from www.worldscientific.com by DUKE UNIVERSITY on 10/05/13. For personal use only.

Upload: t

Post on 19-Dec-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Journal of Computational Acoustics, Vol. 13, No. 1 (2005) 47–70c© IMACS

AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTUREFOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

Y. YASUDA

Institute of Industrial Science, The University of Tokyo4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan

[email protected]

T. SAKUMA

Institute of Environmental Studies, The University of Tokyo7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan

[email protected]

Received 16 November 2003Revised 30 June 2004

The fast multipole boundary element method (FMBEM) is an advanced BEM that leads to drasticreduction of processing time and memory requirements in a large-scale steady-state sound fieldanalysis. In the FMBEM, hierarchical cell structure is employed to apply multipole expansion inmultiple levels, and the setting of the hierarchical cell structure considerably affects the computa-tional efficiency of the FMBEM. This paper deals with effective settings of hierarchical cell structurefor taking full advantage of the FMBEM. A numerical study with objects of different shapes withthe same DOF shows that both the computational complexity and the memory requirements withthe FMBEM were greater for 1D-shaped objects than for 2D- or 3D-shaped ones, without a specialsetting of hierarchical cell structure for each problem. An effective setting for 1D-shaped objectsis derived through theoretical and numerical studies, where special considerations are given to thearrangement of the cell structure and the treatment of translation coefficients between cells. Thissetting allows for efficient calculations not dependent on the shape of an analyzed object. A simplemethod to arrange hierarchical cell structure is proposed, which realizes the derived setting forarbitrarily-shaped problems.

Keywords: Fast multipole algorithm; hierarchical cell structure; boundary element method;Helmholtz equation.

1. Introduction

The fast multipole algorithm (FMA), which was originally proposed by Rokhlin1 and devel-oped by Greengard2 for N-body problems, has recently gained popularity as a fast algorithmof large-scale problems in various fields.3,4 As an application of the algorithm, many studieshave been conducted on the use of the FMA with the boundary element method (BEM) toreduce its large computational time and memory requirements in large-scale problems.5–8

This advanced BEM, known as the fast multipole BEM (FMBEM), has been also studiedin the field of acoustics, not only for developing the theory,9–13 but also for applications

47

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 2: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

48 Y. Yasuda & T. Sakuma

or practical use.13–18 We have also developed a general efficient scheme of the FMBEMfor three-dimensional steady-state sound fields,14 and have proposed practical criteria fornumerical items in this method.15

In the FMBEM, interactions between groups of boundary elements are evaluated usingmultipole expansion, instead of direct interactions between elements. To group elementssystematically and to apply multipole expansion in multiple levels, hierarchical cell structureis often used, where cells work as grouping units. It is generally known that both theoperation count and the memory requirements of the FMBEM are O(Na logb N), where N

is degree of freedom (DOF), a ≥ 1 and b ≥ 0. The values a and b depend on details of thealgorithm. These characteristics are large advantage for the FMBEM over the conventionalBEM, especially for large DOF problems, since the operation count of the conventionalBEM is O(N3) with direct solvers, or O(N2) with appropriate iterative solvers, and thememory requirements are O(N2).

In the conventional BEM, the computational complexity and memory requirementsdepend only on DOF, whereas in the FMBEM, they also depend on other factors, suchas numerical items needed to approximate the multipole expansion, the number of hier-archical levels used in hierarchical cell structure, and geometrical arrangement of the cellstructure for analyzed objects. It is known that the numerical items for approximation ofthe multipole expansion (i.e. the number of terms for truncation of infinite summation ofan expanded series and the number of quadrature points for numerical integration over theunit sphere) depend on the size of cells.11,13,15 As for the number of hierarchical levels, itseffect on the efficiency of the FMBEM has already been studied.15 However, the effect ofthe geometrical arrangement, which directly determines the size and the number of cellsand affects the computational efficiency of the FMBEM, has not been clarified in detail.

Appropriate arrangement of cell structure for an analyzed object is contingent on theshape of the object, in other words, distribution of nodes of boundary elements. RegardingN -body problems, it has been recognized that the complexity of the FMA depends ondistribution of particles,19,20 and it has pointed out that the complexity of the algorithm byGreengard2 is not O(N) as claimed, where N is the number of particles.20,21 In our previousstudy on the acoustical FMBEM, we have compared the computational efficiency between2D- and 3D-shaped node distributions through theoretical estimation. We have shown thatboth the computational complexity and memory requirements are larger for 2D-shapeddistribution than for 3D-shaped one in the study.15 This indicates the necessity of detailedinvestigation on the effect of the shapes of objects on the computational efficiency. It isof importance to clarify the factors causing the inefficiency, and to derive an appropriatearrangement of hierarchical cell structure, for efficient use of the FMBEM.

When discussing the effect of the shapes of objects and that of geometrical arrange-ment of hierarchical cell structure, it is important to direct careful attention to operationsfor computing translation coefficients between cells. These operations in the FMBEM cor-respond to those for translation of coefficients for multipole expansion to those for localexpansion in the FMA. Generally, these operations are expensive and often need the largestcomputational load.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 3: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 49

In this paper, we study on effective settings of hierarchical cell structure to take fulladvantage of the FMBEM (“a setting of hierarchical cell structure” is not a mere geometricalarrangement of the cell structure; it implies the arrangement, which is closely connectedwith the computation of translation coefficients between cells). Section 2 briefly discusses thecalculation process and the computational efficiency of the FMBEM, and explain operationsfor translation coefficients between cells, which need large computational load in the processof the FMBEM. In Sec. 3, the effect of the shapes of objects on the efficiency of the FMBEMis examined in detail through numerical study, where no special settings of hierarchical cellstructure are set for each problems. The effect of settings of hierarchical cell structure on theefficiency of problems with 1D-shaped objects, which spoil the efficiency of the FMBEM, isinvestigated by theoretical estimation in Sec. 4, and by numerical study in Sec. 5. Here, aneffective setting for 1D-shaped objects is derived. In Sec. 6, we propose a simple method todetermine an arrangement of hierarchical cell structure, which realizes the derived settingand ensures efficient calculation probably-independent of shapes of objects.

2. Calculation of FMBEM

Here we briefly describe the outline of the FMBEM and its computational efficiency. For fur-ther details of the FMBEM, see Ref. 14. Throughout this section, time convention exp(−jωt)is used.

2.1. Conventional BEM

In the field satisfying the three-dimensional Helmholtz equation, the sound pressure at apoint p on the smooth boundary Γ is described using the Kirchhoff–Helmholtz integralequation as

12p(rp) =

∫Γ

(p(rq)

∂G(rp, rq)∂nq

− ∂p(rq)∂nq

G(rp, rq))

dS, (2.1)

where ∂/∂nq denotes the normal derivative, and G is the Green’s function given by

G(rp, rq) =exp(jkrpq)

4πrpq, (2.2)

where k is the wave number, and rpq = |rp − rq| is the distance between points p and q.Three kinds of boundary conditions are assumed as follows:

∂p(rq)∂nq

=

0 q ∈ Γ0 (rigid),

jωρv(rq) q ∈ Γ1 (vibration),

−jkp(rq)/z(rq) q ∈ Γ2 (absorption),

(2.3)

where v is the normal component of the surface velocity, z is the acoustic impedance ratio,and ρ is the air density.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 4: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

50 Y. Yasuda & T. Sakuma

By discretizing Eq. (2.1) with the boundary conditions Eq. (2.3), the following systemof equations is obtained:

(E + B + C) · p = jωρA · v, (2.4)

where p is the sound pressure vector (unknown), v is the velocity vector (given), and theentries of the influence coefficient matrices are represented by

Eij = −12δij , (2.5)

Aij = aj(ri) =∫

Γ1

Nj(rq)G(ri, rq) dSq, (2.6)

Bij = bj(ri) =∫

ΓNj(rq)

∂G(ri, rq)∂nq

dSq, (2.7)

Cij = cj(ri) = jk

∫Γ2

Nj(rq)G(ri, rq)

z(rq)dSq, (2.8)

where δ is Kronecker’s delta, ri is the position vector of the ith node, and Nj is the inter-polation function of the jth node.

Generally, B is a dense matrix. Thus, the operation count for solving Eq. (2.4) is O(N3)with direct solvers, where N is DOF (the number of nodes). Even if an efficient iterativesolver is used, the operation count of O(N2) is needed due to matrix-vector multiplicationsB · p (and due to A · v and C · p, if A and C are near dense). The memory requirementsfor keeping these matrices are O(N2).

2.2. Computational process of FMBEM

When solving Eq. (2.4) with an iterative solver, the FMBEM efficiently achieves matrix-vector multiplications (B + C) · p and A · v by applying multipole expansion in multiplelevels using hierarchical cell structure. Since it is not necessary to keep matrices themselves,the memory requirements also drastically decrease. The following briefly shows the outlineand the computational process of the FMBEM.

Figure 1 shows an example of boundary and hierarchical cell structure in two dimensions.A cube (a square in two dimensions) circumscribing the whole boundary is determined as aroot cell, which is divided into eight child cubes (level 1). Each divided cube is also dividedin turn (level 2, 3, . . . , L). Only the cubes including nodes are called cells.

The main process of the FMBEM consists of the setup process and the iterative process.The former is the process for calculation of coefficients that are not necessary to iterate. Thelatter is the process for iterative calculation of matrix-vector products that are necessary foriterative solvers, and the operations for each calculation of matrix-vector product consistof six steps. In the following, we explain the concept of the operation at each step in theiterative process.

Step 1. Translate potential of element j in cell m′L at the lowest level L into contribution

from the center λm′L

of the cell m′L, and accumulate it to the center.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 5: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 51

p

p

p

q

q

q

root

level 1

level 2

boundary

level 3 level 4

Step 3Step 2q

λm'4

λm'3

λm'2

Step 1

λm'2

λm2

λm2

p

λm3

λm4

Step 5

Step 4

λm'3

λm'4

λm4

λm3

Fig. 1. Two-dimensional diagram of hierarchical cell structure (the lowest level number L = 4) and boundary,with illustration of three paths for evaluation of influence from point q to p. Diagram of Steps 1 to 5 in theFMBEM is also illustrated.

Step 2. Translate contribution from cell m′l+1 into contribution from the center λm′

lof its

parent cell m′l, and accumulate it to the center. (l = L − 1, L − 2, . . . , 2.)

Step 3. Translate contributions from interaction cell set Tmlof the cell ml into contribution

to the center λmlof cell ml, and accumulate it to the center. (l = 2, 3, . . . , L.) The interaction

cell set Tmlconsists of the cells which are not neighbors of ml but whose parents are

neighbors of parent cell of ml. Figure 2(a) shows an example of an interaction cell set intwo-dimensions.

72 - 32 = 40 (in 2-D)73 - 33 = 316 (in 3-D)

ml

(a) interaction cell set Tml(b) common interaction cell set Tl'

62 - 32 = 27 (in 2-D)63 - 33 = 189 (in 3-D)Iml { Il' {

Dl

< <

Fig. 2. (a) Tml : interaction cell set of cell ml at level l, and (b) T′l: common interaction cell set for pre-calculation of translation coefficients TLM at level l (in 2D). Iml and I ′l are the numbers of cells of Tml andT′l, respectively.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 6: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

52 Y. Yasuda & T. Sakuma

Step 4. Translate contribution to its parent cell ml into contribution to the centerλml+1

of cell ml+1, and add it to the contribution to the cell ml+1 obtained in Step 3.(l = 2, 3, . . . , L − 1.)Step 5. Translate contribution to the cell mL at the lowest level L into contribution tonode i in cell mL.Step 6. For node i in cell mL, directly compute contribution from all other elements withinthe cell mL and its neighbor cells.

Far influence is evaluated in Steps 1 to 5 using multipole expansion, illustrated in Fig. 1,and near influence is evaluated in Step 6 in the same way as the conventional BEM. Theabove structure for evaluation using cells enables us to improve the computational efficiencydrastically.

Next, corresponding to Steps 1 to 5 in the above, we give short introduction of the way toevaluate matrix-vector products, based on the multipole expansion of the Green’s function.According to the multipole translation theory with plane wave expansion,10–12 the Green’sfunction Eq. (2.2) can be transformed into the following expression, which corresponds tothe procedures of Step2s 1 to 5:

G(rp, rq) =jk

16π2

∮EpλmL

(k)L−1∏l=I

Eλml+1λml

(k)

·TλmIλm′

I

(k)L−1∏l=I

Eλm′lλm′

l+1

(k)Eλm′L

q(k)dk̂ (2.9)

where

TLM(k) =Nc∑l=0

jl(2l + 1)h(1)l (krLM)Pl(k̂ · r̂LM), (2.10)

EMq(k) = exp(jk · rMq), (2.11)

k is the wave number vector, k = |k|, k̂ = k/k, h(1)l are the spherical Hankel functions of

the first kind, Pl are the Legendre polynomials, Nc is the number of terms for truncation ofinfinite summation, and

∮dk̂ represents the integral over the unit sphere. m′

I ∈ TmI, and I

is the level number to execute Step 3, which is determined by the positions of points p andq as illustrated in Fig. 1. Matrix-vector products (B + C) ·p and A · v are calculated usingthe following equations, based on Eqs. (2.6)–(2.9):[(Bij + Cij)pj

Aijvj

]=

jk

16π2

∮EiλmL

(k)L−1∏l=I

Eλml+1λml

(k)

·TλmIλm′

I

(k)L−1∏l=I

Eλm′lλm′

l+1

(k)

[(βλm′

Lj(k) + γλm′

Lj(k))pj

αλm′L

j(k)vj

]dk̂, (2.12)

where

αλm′L

j(k) =∫

Γ1

Nj(rq)Eλm′L

q(k) dSq, (2.13)

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 7: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 53

βλm′L

j(k) = jk

∫Γ

Nj(rq)Eλm′L

q(k)(nq · k̂) dSq, (2.14)

γλm′L

j(k) = jk

∫Γ2

Nj(rq)Eλm′

Lq(k)

z(rq)dSq. (2.15)

In the procedures for computation, the integral∮

dk̂ is calculated numerically, and contri-butions from/to cells are evaluated using coefficients at the quadrature points of the integralin Steps 1 to 5. These coefficients are called outgoing, interaction, and incoming coefficientsξ, τ , and ζ.

2.3. Computational efficiency of FMBEM

The operation counts for the setup process and the iterative process, and the memoryrequirements are estimated according to Ref. 14. In the following, N is DOF, M is theaverage number of nodes in a cell at the lowest level L of hierarchical cell structure, Ml isthe number of cells at level l, N l

c ∼ kDl is the number of terms for truncation of infinitesummation at level l, Dl is the diameter of the sphere circumscribing a cell at level l,Kl ∼ 2(N l

c)2 is the number of quadrature points for spherical integration, R ≤ 33 = 27 isthe average number of neighboring cells, P is the number of quadrature points for boundaryintegral, and Q and J are constants. Il ≤ 189 is the average number of cells of interactioncell set Tml

, I ′l is the number of cells of common interaction cell set T′l, which is cell set forpre-calculation of TLM at level l. We will amplify on TLM, T′l, and I ′l in the next subsection.

The operation counts for the setup process are as follows:

• for coefficients α, β, and γ,

C̄1 ∼ NKLP ∼ N(kDL)2P ,

• for coefficients TLM,

C̄3 =∑L

l=2 C̄ l3 ∼ ∑L

l=2 KlNlcI

′l ∼

∑Ll=2(kDl)3I ′l ,

where C̄ l3 is the operation count for TLM at level l,

• for coefficients in near fields, where contributions are directly evaluated between elements,C̄6 ∼ NMPR.

The operation counts for the iterative process are as follows:

• for Step 1, C1 ∼ NKL ∼ N(kDL)2,

• for Step 2, C2 =∑L−1

l=2 C l2 ∼ ∑L−1

l=2 MlKlJQ ∼ ∑L−1l=2 Ml(kDl)2JQ,

• for Step 3, C3 =∑L

l=2 C l3 ∼ ∑L

l=2 MlKlIl ∼∑L

l=2 Ml(kDl)2Il,• for Step 4, C4 =

∑L−1l=2 C l

4 ∼ ∑L−1l=2 Ml+1Kl+1Q ∼ ∑L−1

l=2 Ml+1(kDl+1)2Q,• for Step 5, C5 ∼ NKL ∼ N(kDL)2,• for Step 6, C6 ∼ NMR,

where C ln is the operation count at level l in step n.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 8: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

54 Y. Yasuda & T. Sakuma

The memory requirements for the operations of the FMBEM are as follows:

• for pressure and velocity of nodes, E1 = 2N ,• for α, β, γ, E2 = 2NKL ∼ 2N(kDL)2,• for outgoing, interaction, and incoming coefficients ξ, τ , and ζ,

E3 =∑L

l=2 El3 =

∑Ll=2 6MlKl ∼ 6

∑Ll=2 Ml(kDl)2,

• for TLM, E4 =∑L

l=2 El4 =

∑Ll=2 KlI

′l ∼

∑Ll=2(kDl)2I ′l ,

• for coefficients in near fields, where contributions are directly evaluated between elements,E5 = 2NMR,

where El3 and El

4 is the memory requirements at level l for ξ, τ and ζ, and for TLM,respectively.

Here we regard M and DL as constants, because M minimizing the computational com-plexity or memory requirements hardly depends on the shapes of objects,15 and becauseM ∝ (DL)2 is satisfied at lower levels of the hierarchical cell structure, independent ofthe shapes of objects. This narrows the computational complexity and memory require-ments affected by the shapes of objects and by geometrical arrangement of hierarchical cellstructure to the following:

• Computational complexity for the setup process.

C̄3 =L∑

l=2

C̄ l3 ∼

L∑l=2

(kDl)3I ′l for TLM. (2.16)

• Computational complexity for the iterative process.

C2 =L−1∑l=2

C l2 ∼

L−1∑l=2

Ml(kDl)2JQ for Step 2, (2.17)

C3 =L∑

l=2

C l3 ∼

L∑l=2

Ml(kDl)2Il for Step 3, (2.18)

C4 =L−1∑l=2

C l4 ∼

L−1∑l=2

Ml+1(kDl+1)2Q for Step 4. (2.19)

• Memory requirements.

E3 =L∑

l=2

El3 ∼

L∑l=2

6Ml(kDl)2 for ξ, τ, and ζ, (2.20)

E4 =L∑

l=2

El4 ∼

L∑l=2

(kDl)2I ′l for TLM. (2.21)

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 9: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 55

Equations (2.16), (2.21) depend on the size of cells Dl and the number I ′l of cells of thecommon interaction cell set T′l at level l, and Eqs. (2.17)–(2.20) depend on Dl and thenumber of cells Ml at level l. In order to investigate the effect of the shapes of objectsand of geometrical arrangement of the cell structure, it is important to consider Ml, I ′l ,and Dl.

2.4. Calculation of translation coefficients

Translation of coefficients for multipole expansion to those for local expansion often requiresthe largest computation in the FMA. In the FMBEM algorithm, the corresponding oper-ations are those for translation coefficients TLM (Eq. (2.10)), therefore, it is important todirect careful attention to the computational efficiency for TLM (Eqs. (2.16), (2.21)).

In the FMA or FMBEM procedures, an interaction cell set Tmlfor a cell ml is defined,

and the translation of coefficients to cell ml are executed only from cells of Tml. It is

not necessary to iteratively calculate coefficients TLM. One can calculate TLM in the setupprocess in advance and to keep them. The values of TLM are determined by relative positionsbetween a cell ml and its interaction cell set Tml

. Taking relativity of positions between cellsinto account, one can prepare a cell set for pre-calculation of TLM at each level; it is adequateto calculate TLM only for this cell set, instead of calculation of TLM for each cell ml. Wecall this cell set for pre-calculation, a common interaction cell set T′l. For arbitrary shapesof objects, it is sufficient to set the number I ′l of cells of T′l at 316 not depending on its levell, as shown in Fig. 2(b). This is an all-around setting and easy to implement, however, canbe inefficient for some objects, since no special considerations are given for the shape of theobject. On the other hand, there is another way to adaptively set T′l to decrease I ′l at eachlevel, as shown in Figs. 8(b) and 9(b). This is a special setting for an arbitrarily-shapedobject, whereas the effect of this setting on the whole computational efficiency has not beenclear.

3. Numerical Study on the Effect of Shapes of Objects

Here a numerical study shows in detail the effect of the shapes of objects on the com-putational complexity and the memory requirements using the FMBEM, when no specialsettings of hierarchical cell structure are set for each problem. In the below of the paper,‘higher level’ indicates the level nearer to the root cell level, and ‘lower level’ means thelevel nearer to the level L.

3.1. Numerical setup

If the number of cells Ml at level l (≤ld) satisfies the next equation,

Ml ∝ (2a)l, (3.22)

the shape of an analyzed object is defined as being a-dimensional up to ld. Three cases areprepared for study as problems with typical shapes of objects shown in Fig. 3: cases 1, 2,

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 10: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

56 Y. Yasuda & T. Sakuma

3.05

2.25

Case 1 (tube) Case 2 (cube) Case 3 (6 8 6 cubes)

25

1

12.25

0.150.25

51.125

point source

0.250.15 0.25

0.15

unit: [m]point source

point source

12.5

4

4

4

Fig. 3. Geometry of three cases of problems. All cases have uniform rigid surfaces. A point source is locatedat the center in cases 1 and 2.

1

10

level l

Case 1

Num

ber

of c

ells

at l

evel

l M

l

Case 2

Case 3

102

103

105

106

104

O(22l)

O(23l)

O(2l)

2 93 54 6 7 8

the number of all cubes

Fig. 4. Relationship between level l and the number of cells at level l, Ml.

and 3 have 1D-, 2D-, and 3D-shaped objects, which are one-, two- and three-dimensional upto a certain ld, respectively. Figure 4 shows the relationship between l and Ml for the threecases. Generally, all problems analyzed with the BEM have two-dimensional boundaries insufficiently small parts. This is similar in these cases, i.e., cases 1 and 3 have two-dimensionalboundaries at level l ≥ 6. Cases 1 and 2 are internal problems and case 3 is an externalone, and all cases have rigid surfaces and a point source. Boundaries are discretized usingquadrature constant elements with width of less than 1/8 of the wavelength, and degrees offreedom of the three cases at the same analysis frequency are almost similar to one another,for comparison. The numerical items for calculation with the FMBEM are identical withthose in Ref. 15. The computation is executed with the supercomputer HITACHI SR8000.As a simple setting, hierarchical cell structure is arranged so that its sides are parallel tothe sides of rectangular objects and the center of the root cell is the same as that of objects.The size of the root cell is the minimum required to circumscribe the whole boundary in

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 11: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 57

each case. For translation coefficients TLM calculated in the setup process, the number I ′l ofcommon interaction cells used here is 316 as shown in Fig. 2(b) to easily deal with objectsof arbitrary shapes.

3.2. Results and discussion

3.2.1. Computational complexity

Figure 5 shows details of computational time for the setup process and that per matrix-vector product of the iterative process of the FMBEM. Several numbers of the lowest level L

for the hierarchical cell structure are used for analysis, including level LTpopt that minimizes

0.05

s1 s2 s3

s4 s6

s1 s2 s3

Case 1N: 104,448

Case 2N: 98,304

LT

iopt

= 7

(M

= 4

0)

LT

popt

= 6

(M

= 4

)

L

=

5 (

M =

17)

LT

iopt

= 4

(M

= 7

2)

LT

popt

= 9

(M

= 2

)

L

=

8 (

M =

10)

LT

popt

= 6

(M

= 2

)

L

=

5 (

M =

9)

LT

iopt

= 4

(M

= 4

8)

Case 3N: 110,592

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

TLM

rest

α, β, γ

coefficients for near field

0

5

10

Tim

e fo

r se

tup

proc

ess

[min

]

0

50

100

150

Tim

e pe

r m

atri

x-ve

ctor

m

ultip

licat

ion

[min

]

(a)

(b)

f = 1,000Hz

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

TLM

rest

α, β, γ

coefficients for near field

0

5

10

15

0

0.1

0.2

0.3

0.4

Tim

e fo

r se

tup

proc

ess

[min

]T

ime

per

mat

rix-

vect

or

mul

tiplic

atio

n [m

in]

(a)

(b)

Case 1N: 6,528

Case 2N: 6,144

LT

iopt

= 5

(M

= 5

1)

LT

popt

= 4

(M

= 4

)

L

=

3 (

M =

20)

LT

iopt

= 2

(M

= 1

09)

LT

popt

= 7

(M

= 2

)

L

=

6 (

M =

8)

LT

popt

= 4

(M

= 3

)

LT

iopt

= 3

(M

= 2

4)

L

=

2 (

M =

103

)

Case 3N: 6,912

f = 250Hz

Fig. 5. Details of computational time with the FMBEM for analyzing three cases of problems at 250 Hz and1000 Hz: (a) the time for the setup process and (b) the time per matrix-vector multiplication of the iterativeprocess. N is degree of freedom (DOF). M is the average number of nodes in a cell at the lowest cell levelL. LTpopt and LTiopt are the optimum numbers of the lowest cell levels for setup process time and iterativeprocess time, respectively.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 12: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

58 Y. Yasuda & T. Sakuma

the time for the setup process and level LTiopt that minimizes the time for the iterativeprocess. When the number of iteration is sufficiently small, LTpopt is the optimum levelminimizing the total time, and when the number of iteration is sufficiently large, LTiopt isthe optimum one.

In the setup process, it is clearly seen that the calculation time for TLM is especiallylong in case 1 not depending on the lowest level number L and the analysis frequency.Because of this time for TLM, the total time for the setup process of case 1 is aboutten times as long as other cases at LTpopt with both analysis frequency, and at 1000 Hz(DOF is about 100 000), the total setup process time for case 1 is longer than the othertwo cases, not depending on L. These results show that the total complexity for 1D-shaped objects is greater than that of 2D- or 3D-shaped ones with about the same DOF,when the complexity of the setup process is relatively great owing to rapid convergenceof iterative solutions. The great complexity of TLM in case 1 results from its large rootcell producing large size of cells at higher levels. One can see the effect of cells’ largesize in Eq. (2.16), where complexity C̄ l

3 is proportional to (kDl)3. It is also seen thatthe time for TLM hardly change with L in all cases. This result shows that the propor-tion of complexity C̄ l

3 at higher levels to total complexity for TLM is quite great, andthus, in order to reduce the total complexity for TLM, it is necessary to reduce C̄ l

3 athigher levels. This can be achieved by decreasing Dl and I ′l at higher levels as shown inEq. (2.16).

In the iterative process, the time in case 1 is the longest both at LTpopt and at LTiopt notdepending on DOF, due to the time for Steps 2 and 4. This is also caused by the large sizeof the root cell in case 1. To improve the efficiency of the iterative process for 1D-shapedobjects, reduction for these steps is required, which is achieved by decreasing Dl and Ml athigher levels as shown in Eqs. (2.17), (2.19).

3.2.2. Memory requirements

Figure 6 shows details of memory requirements for the FMBEM. Several numbers of thelowest level L for the hierarchical cell structure are used for analysis, including level LMopt

that minimizes the total memory requirements. It is seen that the memory requirements forξ, τ , ζ and for TLM are greater in case 1 than in other cases, not depending on the lowestlevel number L and DOF. Because of these requirements, especially those for TLM, totalmemory requirements for case 1 is about five to eight times as large as other cases at LMopt.The large memory for these coefficients in case 1 results from its large root cell; memoryrequirements El

3 and El4 is proportional to (kDl)2 as shown in Eqs. (2.20), (2.21). It is also

seen that the memory requirements for TLM and for ξ, τ , and ζ hardly change with L. Thisresult shows that the proportion of the requirements El

3 and El4 at higher levels to the total

requirements for these coefficients are quite large, and thus, to reduce total requirementsfor these coefficients, reduction of El

3 and El4 at higher levels is required, which is achieved

by decreasing Dl and Ml for El3 (in Eq. (2.20)), and Dl and I ′l for El

4 (in Eq. (2.21)) athigher levels.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 13: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 59

0

Case 1N: 104,448

Case 2N: 98,304

L

= 7

(M

= 4

0)

LM

opt

= 6

(M

= 4

)

L

= 5

(M

= 1

7)

L

= 4

(M

= 7

2)

LM

opt

= 9

(M

= 2

)

L

= 8

(M

= 1

0)

Tota

l mem

ory

[MB

]

104

LM

opt

= 6

(M

= 2

)

L

= 5

(M

= 9

)

L

= 4

(M

= 4

8)

Case 3N: 110,592

TLM

rest

α, β, γ

coefficients for near field

ξ, τ, ζ

0

Tota

l mem

ory

[MB

] 400

500

300

200

100

Case 1N: 6,528

Case 2N: 6,144

L

= 5

(M

= 5

1)

LM

opt

= 4

(M

= 4

)

L

= 3

(M

= 2

0)

L

= 2

(M

= 1

09)

LM

opt

= 7

(M

= 2

)

L

= 6

(M

= 8

)

LM

opt

= 4

(M

= 3

)

L

= 3

(M

= 2

4)

L

= 2

(M

= 1

03)

Case 3N: 6,912

TLM

restα, β, γ

coefficients for near field

ξ, τ, ζ

f = 250Hz f = 1,000Hz

5 103

Fig. 6. Details of memory requirements with the FMBEM for three cases of problems at 250 Hz and 1000 Hz.N is degree of freedom (DOF). M is the average number of nodes in a cell at the lowest cell level L. LMopt

is the optimum number of the lowest cell levels for memory requirements.

The conclusions of this study are summarized below.

(i) Both the computational complexity and the memory requirements with the FMBEMare greater for 1D-shaped objects than for 2D- or 3D-shaped ones.

(ii) Regarding analyses for 1D-shaped objects, large computational costs are required inthe following parts:

(a) Calculation of TLM in the setup process, and Steps 2 and 4 in the iterative process(for computational complexity),

(b) ξ, τ , ζ and TLM (for memory requirements).

Computational cost for TLM is the largest both for the complexity and for the memoryrequirements.

(iii) Reduction of Ml, I ′l , and Dl, especially at higher levels, is important for efficient cal-culation of 1D-shaped objects. It is possible to reduce them by an effective setting ofhierarchical cell structure.

4. Theoretical Estimation for 1D-shaped Objects

We investigate the effect of settings of hierarchical cell structure on the computational effi-ciency for 1D-shaped objects through theoretical estimation. Estimation is executed forthree cases with different arrangement of hierarchical cell structure and different way of

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 14: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

60 Y. Yasuda & T. Sakuma

calculation of translation coefficients TLM. Equations (2.16)–(2.21) are estimated, and deno-tations of estimation are the same as shown in Sec. 2.

4.1. Setting of hierarchical cell structure

There are two kinds of arrangements of hierarchical cell structure to execute efficient calcu-lation: arrangement reducing the size of cells Dl, and that reducing the number of cells Ml.Three settings with different arrangements of hierarchical cell structure, including the abovetwo, are considered for the estimation as shown in Fig. 7. The setting “Conv” adoptsone of the simple arrangements without special consideration for the shapes of objects.Hierarchical cell structure is placed so that its edges are parallel to the direction wherean object has the longest width, and the center of the root cell is the same as that of theobject. “M-size” minimizes the size of cells Dl. An object is placed on a diagonal line ofthe root cell to minimize the size of the root cell. “M-num” minimizes the number of cellsMl. An object is placed along one of the edges of the root cell to minimize the number ofcells at each level. The number I ′l of common interaction cell set T′l is fixed as I ′l = 316at each level for “Conv”, as shown in Fig. 2(b), to easily deal with objects of arbitraryshapes. The smallest I ′l is assumed at each level by consideration of the shape of an objectfor “M-size” and “M-num”. Figures 8 and 9 show examples of hierarchical cell structurefor a 1D-shaped object and its common interaction cell sets T′l at levels for the settings of“M-size” and “M-num”, respectively. I ′l = 28 (I ′l = 12 in two-dimensions) for “M-size”, andI ′l = 4 (I ′l = 4 in two-dimensions) for “M-num” at the levels where the object is regarded

root (25 m)

object(1 1 25 m)

(a) Conv

(c) M-num

(b) M-size

point source

root (25 m)

root (16 m)

Fig. 7. Three arrangements of hierarchical cell structure used in three setting, Conv, M-size, and M-num:(a) conventional arrangement (for Conv), (b) arrangement minimizing the size of the root cell (for M-size),and (c) arrangement minimizing the number of cells Ml (for M-num). Description of point sources and thesize of root cells and objects are for numerical study in Sec. 5.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 15: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 61

L = 2: I2' = 12

L = 3: I3' = 12

L = 4: I4' = 20

L = 5: I5' = 28

(a) cell structure

(b) common interaction cell set Tl'

boundary

Fig. 8. (a) An example of a 1D-shaped object and hierarchical cell structure which minimizes the size of cellsDl, and (b) common interaction cell sets T′l for pre-calculation of TLM at levels (in 2D).

L = 2: I2' = 4

L = 3: I3' = 4 L = 4: I4' = 12 L = 5: I5' = 26

(a) cell structure

(b) common interaction cell set Tl'

boundary

Fig. 9. (a) An example of a 1D-shaped object and hierarchical cell structure which minimizes the numberof cells Ml, and (b) common interaction cell sets T′l for pre-calculation of TLM at levels (in 2D).

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 16: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

62 Y. Yasuda & T. Sakuma

as being one-dimensional. The shape of the object took into consideration here is assumedideally one-dimensional, having sufficiently small thickness even at the lowest level. In thefollowing, D(setting name)l denotes the size of a cell (the diameter of the sphere circumscribinga cell) at level l for the setting, and D(Conv)l = D(M-num)l = Dl.

4.2. Evaluation of complexity

4.2.1. Setup process

We estimate the complexity C̄ l3 in Eq. (2.16) for calculation of translation coefficients TLM

at each level l.

Conv. The following equation is obtained by I ′l = 316 as

C̄ l3 ∼ (kD(Conv)l)

3 · 316 = 316 · (kDl)3. (4.23)

M-size. The following equation is obtained by I ′l = 28 and D(M-size)l = 1√3Dl as

C̄ l3 ∼ (

kD(M-size)l)3 · 28 =

(k · 1√

3Dl

)3

· 28 ≈ 5.39 · (kDl)3. (4.24)

If I ′l = 316 is used, the effect of reduction of C̄ l3 is only due to small size of D(M-size)l, and

the following equation is obtained as

C̄ l3 ∼ (

kD(M-size)l)3 · 316 =

(k · 1√

3Dl

)3

· 316 ≈ 60.8 · (kDl)3. (4.25)

M-num. The following equation is obtained by I ′l = 4 as

C̄ l3 ∼ (kD(M-num)l)

3 · 4 = 4 · (kDl)3. (4.26)

These results show that the setting of “M-num” makes the calculation of C̄ l3 the most

efficient. Since the complexity for TLM at higher levels is quite great compared to lowerlevels, it is considered that the total complexity for TLM summed up in all levels hardlychange by the lowest level L, and that the total complexity is the smallest with “M-num”,independent of the lowest level. When the object for analysis has some thickness, Eqs. (4.24),(4.26) are not satisfied at lower levels, but satisfied at higher levels. Thus, the setting of“M-num” makes calculation for 1D-shaped objects the most efficient even if the objectshave some thickness.

4.2.2. Iterative process

Here we estimate Ml(kDl)2 because it is common part in Steps 2, 3, and 4 and determinesthe complexity of these steps, as shown in Eqs. (2.17), (2.18), (2.19).

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 17: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 63

Conv. The following equation is obtained by Ml = 4 · 2l as

Ml(kD(Conv)l)2 ∼ 4 · 2l(kDl)2. (4.27)

M-size. As shown in Fig. 8(a), when the object is placed not including a diagonal line ofthe root cell inside and along the diagonal line to minimize Ml at each level, Ml = 3 · 2l − 2is satisfied in three-dimensions and leads to

Ml(kD(M-size)l)2 ∼ (3 · 2l − 2) ·

(k · 1√

3Dl

)2

=(

2l − 23

)· (kDl)2. (4.28)

M-num. The following equation is obtained by Ml = 2l as

Ml(kD(M-num)l)2 ∼ 2l · (kDl)2. (4.29)

These results indicate that the setting of “M-size” makes the calculation of Ml(kDl)2 themost efficient. When the object for analysis has some thickness, however, either Ml orD(M-size)l increases in “M-size” at all levels, whereas in “M-num”, the complexity doesnot change with thickness at higher levels. Therefore, there is a possibility that “M-num”practically makes calculation for 1D-shaped objects more efficient than “M-size”.

4.3. Evaluation of memory requirements

Here we do not deal with the memory requirements El3 for outgoing, interaction and incom-

ing coefficients ξ, τ , and ζ because those are determined by Ml(kDl)2, similar to the com-plexity for Steps 2 to 4 of the iterative process. Only the memory requirements El

4 fortranslation coefficients TLM are estimated.

Conv. The following equation is obtained by I ′l = 316 as

El4 ∼ (kD(Conv)l)

2 · 316 = 316 · (kDl)2. (4.30)

M-size. The following equation is obtained by I ′l = 28 as

El4 ∼ (kD(M-size)l)

2 · 28 =(

k · 1√3Dl

)2

· 28 ≈ 9.33 · (kDl)2. (4.31)

If I ′l = 316 is used, the effect of reduction of El4 is only due to small size of D(M-size)l, and

the following equation is obtained as

El4 ∼ (kD(M-size)l)

2 · 316 =(

k · 1√3Dl

)2

· 316 ≈ 105.3 · (kDl)2. (4.32)

M-num. The following equation is obtained by I ′l = 4 as

El4 ∼ (kD(M-num)l)

2 · 4 = 4 · (kDl)2. (4.33)

These results show that the setting of “M-num” requires the smallest memory for El4

among the three settings. Since the memory requirements for TLM at higher levels are large

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 18: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

64 Y. Yasuda & T. Sakuma

compared to lower levels, it is considered that the total memory requirements summed up inall levels are also smallest with “M-num”, independent of the lowest level. When the objectfor analysis has some thickness, Eqs. (4.31), (4.33) are not satisfied at all levels, however,these equations are satisfied at higher levels. Therefore, the setting of “M-num” requiresthe smallest memory for 1D-shaped objects even if the objects have some thickness.

From these estimations, we can state that “M-num” is the best setting for 1D-shapedobjects for the computational complexity of the setup process and for the memory require-ments, whereas for the complexity of the iterative process, “M-size” may be the most efficientfor ideally one-dimensional problems. For practical objects having some thickness, however,it is possible that “M-num” is the most efficient also for the iterative process.

5. Numerical Study for 1D-shaped Objects

We validate the above estimated results through a numerical study, and investigate thepractical effect given by the thickness of 1D-shaped objects on the computational efficiency.

5.1. Numerical setup

The sound field of case 1 in Fig. 3 is analyzed as a 1D-shaped problem using the FMBEM.We use the three arrangements corresponding to those in the above theoretical estimations.In the setting of “M-size”, the object is placed including a diagonal line of the root cellinside to minimize the size of the root cell for the object. This arrangement is differentfrom that in the above estimation because of the thickness of the object. The number I ′l ofcommon interaction cell set T′l is fixed as I ′l = 316 at each level for “Conv” (correspondingto Eqs. (4.23), (4.30)) and “M-size” (corresponding to Eqs. (4.25), (4.32)) as shown inFig. 2(b). The smallest I ′l is used at each level by consideration of the shape of the objectfor “M-num” (corresponding to Eqs. (4.26), (4.33)) as shown in Fig. 9. The computationalconditions such as boundary conditions, mesh generation, the type of a computer used, andnumerical items for calculation with the FMBEM are the same as in Sec. 3. Numericalresults below are shown with those for cases 2 and 3 in Fig. 3 with the setting of “Conv”to compare with 2D- or 3D-shaped problems.

5.2. Results and discussion

5.2.1. Computational complexity

Figure 10 shows details of computational time for the setup process and that per matrix-vector product of the iterative process of the FMBEM. The denotations such as LTpopt andLTiopt are the same as in Sec. 3.

In the setup process, one can see a decrease in the computational time for TLM due tosmall Dl with “M-size”, and a much larger decrease due to small I ′l with “M-num”. As aresult, the total time for the setup process with “M-num” is almost the same as those forcases 2 and 3 at LTpopt, and even smaller at LTiopt. The results show that the setting of“M-num” decreases large amount of the total time not depending on L, when the complexity

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 19: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 65

0

5

10

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

TLM

rest

α, β, γ

coefficients for near field

Tim

e fo

r se

tup

proc

ess

[min

]

0

50

100

150

Tim

e pe

r m

atri

x-ve

ctor

m

ulti

plic

atio

n [m

in]

conv m-numm-size

Case 1N: 104,448

Case 2N: 98,304

LT

iopt

= 7

(M

= 4

0)

LT

popt

=

6 (

M =

4)

L

= 5

(M

= 1

7)

LT

iopt

= 4

(M

= 7

2)

LT

iopt

= 7

(M

= 4

0)

LT

popt

=

9 (

M =

2)

L

= 7

(M

= 1

1)

LT

iopt

= 6

(M

= 4

8)

LT

popt

=

9 (

M =

2)

L

= 8

(M

= 1

0)

LT

popt

=

8 (

M =

2)

L

= 8

(M

= 1

0)conv

LT

popt

=

6 (

M =

2)

L

= 5

(M

= 9

)

LT

iopt

= 4

(M

= 4

8)

Case 3N: 110,592

conv

0

5

10

15

0

0.1

0.2

0.3

0.4

TLM

rest

α, β, γ

coefficients for near field

Tim

e fo

r se

tup

proc

ess

[min

]T

ime

per

mat

rix-

vect

or

mul

tipl

icat

ion

[min

] Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

conv m-numm-size

Case 1N: 6,528

Case 2N: 6,144

LT

iopt

= 5

(M

= 5

1)

LT

popt

= 4

(M

= 4

)L

=

3 (

M =

20)

LT

iopt

= 2

(M

= 1

09)

LT

iopt

= 5

(M

= 5

1)

LT

popt

= 7

(M

= 2

)

L

=

5 (

M =

12)

LT

iopt

= 4

(M

= 6

1)

LT

popt

= 7

(M

= 2

)L

= 6

(M

= 8

)

LT

popt

= 6

(M

= 3

)

L

= 6

(M

= 1

2)

conv

LT

popt

= 4

(M

= 3

)

LT

iopt

= 3

(M

= 2

4)

L

= 2

(M

= 1

03)

Case 3N: 6,912

conv

f = 250Hz f = 1,000Hz

Fig. 10. Details of computational time with the FMBEM using three settings of hierarchical cell structureat 250 Hz and 1000 Hz: (a) the time for the setup process and (b) the time per matrix-vector multiplicationof the iterative process. N is degree of freedom (DOF). M is the average number of nodes in a cell at thelowest cell level L. LTpopt and LTiopt are the optimum numbers of the lowest cell levels for time of setupprocess, and of iterative process, respectively.

of the setup process is relatively great owing to rapid convergence of iterative solutions.Figure 11(a) shows relation between DOF and the time for the setup process, where valuesonly at LTpopt are shown. One can see that large amount of time is needed for case 1 with“Conv” and with “M-size” compared to cases 2 and 3, while the time with “M-num” iscomparable to cases 2 and 3. The slopes of lines are about the same, and do not dependon the shapes of objects or on the settings of hierarchical cell structure. This is becausecases 1 and 3 do not satisfy the definition Eq. (3.22) of being one- or three-dimensionalin lower levels as shown in Fig. 4. The line of “M-num” is slightly gentler than those of“Conv” and “M-size” because of small amount of computational complexity for TLM. Theline of “M-num”, however, is a little steeper than those for other cases. This is the effect

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 20: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

66 Y. Yasuda & T. Sakuma

1

10

103 104 105 2 105

103

104

102

Tim

e fo

r se

tup

proc

ess

[sec

]

O(N2)

O(N)

Case 1 (conv)

Case 2 (conv)

Case 1 (m-size)Case 1 (m-num)

Case 3 (conv)

Degree of freedom N

1

10

Degree of freedom N

103 104 105 2 105

103

10-1

102

Tim

e pe

r it

erat

ion

[sec

]

O(N2)

O(N)

Case 1 (conv)

Case 2 (conv)

Case 1 (m-size)Case 1 (m-num)

Case 3 (conv)

Fig. 11. Computational time of the FMBEM with the optimum cell level number: (a) the time for the setupprocess and (b) the time per iteration of the iterative process.

of the complexity of TLM that cannot be reduced even with “M-num” as shown in Fig. 10(f = 1000 Hz), and thus there is some possibility that the complexity for 1D-shaped objectsis greater than that for 2D- or 3D-shaped ones with much larger DOF, even when using“M-num”. However, the number of iterations often increases with DOF, resulting in longtime for the iterative process, and thus it is considered that the steepness of the line with“M-num” mentioned above has limited influence on the total time.

In the iterative process, the time with “M-size” is the longest in all settings for case 1both at LTpopt and at LTiopt. This is caused by large Ml due to the thickness of the object,and by the large number of interaction cells of each cell in Step 3 due to the diagonalposition of the object in the root cell. Therefore, the setting of “M-size” is inappropriatefor real objects with some thickness. On the other hand, the time for Steps 2, 3, and 4decreases with “M-num”, resulting in decrease of the total complexity comparable to thatfor cases 2 and 3. Figure 11(b) shows relation between DOF and the time per iterationin the iterative process, where values only at LTiopt are shown. The time for case 1 with“M-num” is almost the same as that for case 2 independent of DOF, as well as case 3 withlarge DOF. With DOF larger than values analyzed here, it is considered that there willbe less difference among cases, because of deeper hierarchy of cell structure and greaterproportion of the complexity at lower levels, where the shapes of objects do not affect thecomplexity. From these results, we can conclude that the setting of “M-num” makes thecomputational complexity for 1D-shaped objects the smallest even if the objects have somethickness, which results in almost the same complexity as that for 2D- or 3D-shaped ones.

5.2.2. Memory requirements

Figure 12 shows details of memory requirements for the FMBEM. The denotations suchas LMopt are the same as in Sec. 3. For the memory requirements for TLM, one can see adecrease due to small Dl with “M-size”, and a much larger decrease due to small I ′l with

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 21: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 67

0

TLM

rest

α, β, γ

coefficients for near field

conv m-numm-sizeCase 1

N: 104,448Case 2

N: 98,304

L

= 7

(M

= 4

0)

LM

opt

= 6

(M

= 4

)

L

= 5

(M

= 1

7)

L

= 4

(M

= 7

2)

L

= 7

(M

= 4

0)

LM

opt

= 9

(M

= 2

)

L

= 7

(M

= 1

1)

L

= 6

(M

= 4

8)

LM

opt

= 9

(M

= 2

)

L

= 8

(M

= 1

0)

LM

opt

= 8

(M

= 2

)

L

= 8

(M

= 1

0)

convTo

tal m

emor

y [M

B] ξ, τ, ζ

5 103

104

LM

opt

= 6

(M

= 2

)

L

= 5

(M

= 9

)

L

= 4

(M

= 4

8)

Case 3N: 110,592

conv

0

TLM

restα, β, γ

coefficients for near field

conv m-numm-sizeCase 1N: 6,528

Case 2N: 6,144

L

= 5

(M

= 5

1)

LM

opt

= 4

(M

= 4

)

L

= 3

(M

= 2

0)

L

= 2

(M

= 1

09)

L

= 5

(M

= 5

1)

LM

opt

= 7

(M

= 2

)

L

= 5

(M

= 1

2)

L

= 4

(M

= 6

1)

LM

opt

= 7

(M

= 2

)

L

= 6

(M

= 8

)

LM

opt

= 6

(M

= 3

)

L

= 6

(M

= 1

2)

conv

Tota

l mem

ory

[MB

] ξ, τ, ζ400

500

LM

opt

= 4

(M

= 3

)

L

= 3

(M

= 2

4)

L

= 2

(M

= 1

03)

Case 3N: 6,912

300

200

100

conv

f = 250Hz f = 1,000Hz

Fig. 12. Details of memory requirements with the FMBEM using three settings of hierarchical cell structureat 250 Hz and 1000 Hz. N is degree of freedom (DOF). M is the average number of nodes in a cell at thelowest cell level L. LMopt is the optimum numbers of the lowest cell levels for memory requirements.

1

10

103 104 105

103

104

102

Tot

al m

emor

y [M

B]

O(N)

Case 1 (conv)

Case 2 (conv)

Case 1 (m-size)Case 1 (m-num)

Case 3 (conv)

Degree of freedom N

2 105

Fig. 13. Memory requirements of the FMBEM with the optimum cell level number.

“M-num”. This tendency is similar to the complexity for the setup process. As a result,the total memory requirements for case 1 with “M-num” are almost the same as those forcases 2 and 3 at LMopt and even smaller at other L. Figure 13 shows relation between DOFand the total memory requirements, where values only at LMopt are shown. The slopesof lines are the same, and do not depend on the shapes of the objects or the settings of

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 22: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

68 Y. Yasuda & T. Sakuma

hierarchical cell structure, since cases 1 and 3 do not satisfy the definition Eq. (3.22) of beingone- or three-dimensional in lower levels. It is seen that the total memory requirements with“M-num” are not different from those for cases 2 and 3. From these results, we can concludethat the setting of “M-num” makes the memory requirements for 1D-shaped objects thesmallest even if the objects have some thickness, which results in almost the same memoryrequirements as those for 2D- or 3D-shaped objects.

6. A Method to Arrange Hierarchical Cell Structure

Here we propose a simple method to arrange hierarchical cell structure, that realizes thesetting of “M-num”, which was proved to be the most efficient setting in the above studies.The method proposed here is not completely optimum in the efficiency for all problems,however, it is sufficiently effective for 1D-shaped objects, the efficiency for which is greatlyinfluenced by the setting of hierarchical cell structure, whereas the setting of the structuregives only small effect on 2D- or 3D-shaped objects. For these reasons, this method real-izes probably-optimum settings for many problems. Generally, optional costs or complexalgorithms are often required to realize the completely optimum settings for each problem.This method is based on a simple concept and thus is easy to implement. Figure 14 showsa diagram of the method. The procedures are given below.

(i) Find direction d1 in which the width of objects is largest in all directions, and computethe width w1 in d1.

w1 w2

d1

d2

d3

w3

w3

w1

w2

(i, ii, iii)

(iv)

object

hierarchical cell structurerectangular parallelepiped

Fig. 14. Diagram of procedures (i) to (iv) to determine an appropriate position of hierarchical cell structure.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 23: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

Fast Multipole Boundary Element Method 69

(ii) Find direction d2 in which the width of objects is largest in all directions perpendicularto d1, and compute the width w2 in d2.

(iii) Find direction d3 perpendicular to both d1 and d2, and compute the width w3 of objectsin d3.

(iv) Assume a w1 × w2 × w3 rectangular parallelepiped circumscribing the whole objects,and place hierarchical cell structure with width of w1 as sides of cells are parallel tothose of the parallelepiped and both the structure and the parallelepiped have the sameedges and corners.

7. Concluding Remarks

Effective settings of hierarchical cell structure, which considerably affect the efficiency ofthe FMBEM, have been investigated through theoretical and numerical studies. Numericalstudy with objects of different shapes with the same DOF showed that both the com-putational complexity and the memory requirements with the FMBEM were larger for1D-shaped objects than for 2D- or 3D-shaped ones, without any special settings of hier-archical cell structure. Through theoretical and numerical studies, an effective setting for1D-shaped objects has derived, where the arrangement of the cell structure and treatmentof translation coefficients between cells were specially considered. With this setting, boththe computational complexity and the memory requirements hardly depend on the shapesof objects. This setting helps numerical analyses for large-scale problems with 1D-shapedobjects, such as noise barriers and long ducts. To permit the above setting for objects ofarbitrary shapes, a simple method to arrange hierarchical cell structure has been proposed,which realizes probably-optimum arrangements for arbitrarily-shaped problems.

References

1. V. Rokhlin, J. Comput. Phy. 60 (1983) 187.2. L. Greengard, The Rapid Evaluation of Potential Fields in Particle Systems (The MIT Press,

1987).3. J. K. Salmon, M. S. Warren and G. S. Winckelmans, Int. J. Supercomputer Appl. 8 (1994) 124.4. H. Schwichtenberg, G. Winter and H. Wallmeier, Parallel Computing 25 (1999) 535.5. K. Hayami and S. A. Sauter, JASCOME 13 (1996) 125.6. Y. Fu et al., Int. J. Num. Meth. Eng. 42 (1998) 1215.7. N. Nishimura, K. Yoshida and S. Kobayashi, Eng. Anal. Bound. Elem. 23 (1999) 97.8. A. Buchau, W. Rieger and W. M. Rucker, IEEE Trans. Mag. 37 (2001) 3181.9. V. Rokhlin, J. Comput. Phy. 86 (1990) 414.

10. V. Rokhlin, Applied and Comput. Harm. Anal. 1 (1993) 82.11. R. Coifman, V. Rokhlin and S. Wandzura, IEEE Antennas Propag. Magaz. 35(3) (1993) 7.12. M. A. Epton and B. Dembart, SIAM J. Sci. Comput. 16(4) (1995) 865.13. S. Koc and W. C. Chew, J. Acoust. Soc. Am. 103(2) (1997) 721.14. T. Sakuma and Y. Yasuda, Acustica-acta Acustica 88 (2002) 513.15. Y. Yasuda and T. Sakuma, Acustica-acta Acustica 89 (2003) 28.16. S. Schneider, J. Comput. Acoust. 11(3) (2003) 387.17. S. Amini and A. T. J. Profit, Eng. Anal. Bound. Elem. 27 (2003) 547.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.

Page 24: AN EFFECTIVE SETTING OF HIERARCHICAL CELL STRUCTURE FOR THE FAST MULTIPOLE BOUNDARY ELEMENT METHOD

May 5, 2005 8:50 WSPC/130-JCA 00252

70 Y. Yasuda & T. Sakuma

18. S. Marburg and S. Schneider, Eng. Anal. Bound. Elem. 27 (2003) 727.19. P. B. Callahan and S. R. Kosaraju, J. ACM 42(1) (1995) 67.20. S. Aluru, J. Gustafson, G. M. Prabhu and F. E. Sevilben, J. Supercomputing 12 (1998) 303.21. S. Aluru, SIAM J. Sci. Comput. 17(3) (1996) 773.

J. C

omp.

Aco

us. 2

005.

13:4

7-70

. Dow

nloa

ded

from

ww

w.w

orld

scie

ntif

ic.c

omby

DU

KE

UN

IVE

RSI

TY

on

10/0

5/13

. For

per

sona

l use

onl

y.