parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · scheduling...

50
Parallel multi-frontal solver for isogeometric finite element methods on GPU Maciej Paszyński Department of Computer Science, AGH University of Science and Technology, Kraków, Poland email: [email protected] Collaborators: Maciej Woźniak Krzysztof Kuźnik AGH University of Science and Technology, Kraków, Poland Victor Calo King Abdullah University of Science and Technology, Thuwal, Saudi Arabia David Pardo The University of the Basque Country, Basque Center for Applied Mathematics and IKERBASQUE (Basque Foundation of Science), Bilbao, Spain

Upload: others

Post on 07-Apr-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

Parallel multi-frontal solver

for isogeometric

finite element methods on GPU

Maciej Paszyński

Department of Computer Science,

AGH University of Science and Technology, Kraków, Poland

email: [email protected]

Collaborators:

Maciej Woźniak

Krzysztof Kuźnik

AGH University of Science and Technology, Kraków, Poland

Victor Calo

King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

David Pardo

The University of the Basque Country,

Basque Center for Applied Mathematics

and IKERBASQUE (Basque Foundation of Science), Bilbao, Spain

Page 2: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

AGH UNIVERSITY OF SCIENCE AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE, KRAKOW, POLAND

Page 3: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

OUTLINE

1. Introduction

2. Graph grammar model for 1D

• Grammar productions expressing the generation of element elimination tree

• Grammar productions expressing the solver algorithm

3. Numerical results for 1D

• Execution time for p=1,2,3,4,5

• Comparision with MUMPS for p=1,2,3,4,5

4. Generalization to 2D

5. Cp-1 vs C0

6. Numerical results for 2D

• Execution time for p=1,2,3

• Comparision with MUMPS for p=1,2,3

7. Conclusions

Page 4: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

INTRODUCTION

B-SPLINES BASED FINITE ELEMENT METHOD

Strong formulation

d

dxA x

d u

d x

B x

d u

d xC x u f x

u 0 0

A 1 d u 1

d x u 1

Weak formulation

Find u V u H 1 0,1 :u 0 0 s.t.

b v,u l v , v V v H 1 0,1 :v 0 0

b v,u A x d v

d x

d u

d xB x v x

d u

d xC x v x u x

d x

0

1

v 1 u 1

l v v 1

Page 5: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

Find u V u H 1 0,1 :u 0 0 s.t.

b v,u l v , v V v H 1 0,1 :v 0 0

INTRODUCTION

B-SPLINES BASED FINITE ELEMENT METHOD

Using B-splines as basis functions

u x N

i,px

i

di

v x Nj ,p

x

b N

j ,px ,Ni,p

x ai l N

j ,px

i

, j

contribution of

b(N2,1;N3,1) Linear B-splines

Page 6: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

Find u V u H 1 0,1 :u 0 0 s.t.

b v,u l v , v V v H 1 0,1 :v 0 0

INTRODUCTION

B-SPLINES BASED FINITE ELEMENT METHOD

Using B-splines as basis functions

u x N

i,px

i

di

v x Nj ,p

x

b N

j ,px ,Ni,p

x ai l N

j ,px

i

, j

contribution of

b(N2,2;N3,2) Quadratic B-splines

Page 7: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GRAPH GRAMMAR

GENERATION OF 1D ELIMINATION TREE

1D elimination tree obtained by executing productions (P1)-(P2)2-(P2)2-(P3)6

Page 8: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GRAPH GRAMMAR PRODUCTIONS AS ATOMIC TASKS

We assign indices to grammar productions in order to localize

the places where the graph grammar productions were fired

(P1)-(P2)1-(P2)2-(P2)3-(P2)4-(P3)1-(P3)2-(P3)3-(P3)4-(P3)5-(P3)6

Page 9: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

TRACE THEORY BASED SCHEDULER

Dependency relation for construction of the elimination tree

(P1)D{(P2)1,(P2)2}

(P2)1D{(P2)3,(P2)4}

(P2)3D{(P3)1,(P3)2}

(P2)4D{(P3)3,(P3)4}

(P2)2D{(P3)5,(P3)6}

Alphabet:

A = {(P1) , (P2)1 , (P2)2 , (P2)3 , (P2)4 , (P3)1 , (P3)2 , (P3)3 , (P3)4 , (P3)5 , (P3)6 }

Page 10: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

TRACE THEORY BASED SCHEDULER

Dependency graph

Page 11: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

TRACE THEORY BASED SCHEDULER

Dependency graph

Page 12: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

TRACE THEORY BASED SCHEDULER

(P1)-(P2)1-(P2)2-(P2)3-(P2)4- (P3)1-(P3)2-(P3)3-(P3)4-(P3)5-(P3)6

[(P1)][(P2)1(P2)2][(P2)3(P2)4(P3)5(P3)6][(P3)1(P3)2(P3)3(P3)4]

Scheduling according to Foata Normal Form:

Thus, the execution of the solver consists of several steps, where independent

tasks are executed in concurrent, interchanged with the synchronization barriers.

kj

kikk

kj

kik

ki

nl

nnll

Daaljlik

Iaaljik

Aa

aaaaaaaaan

11

2122

221

112

11

,...,1,...,1

,...,1,

............11

i<>j where I=AxA\D

Foata Normal Form

(alphabet)

Page 13: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GRAMMAR BASED NUMERICAL INTEGRATION

using Gaussian quadrature the integration over the domain can be substituted

by a weighted summation over Gauss points

b Nj ,p

x ,Ni ,px

A x d N

j ,px

d x

d Ni ,p

x d x

B x N j ,px

d Ni ,p

x d x

C x N j ,px Ni ,p

x

d x

0

1

Nj ,p

1 Ni ,p1

l N

i,px N

i,p1

A x d N

i ,px

d x

d Nj ,p

x d x

B x d N

i,px

d xN

j ,px C x Ni ,p

x N j ,px

d x

0

1

wl

A xl

d Ni ,p

xl

d x

d Nj ,p

xl

d x B x

l d N

i,px

l d x

Nj ,p

xl C x

l Ni,px

l N j ,px

l

l

Page 14: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GRAMMAR BASED NUMERICAL INTEGRATION

Page 15: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GRAMMAR BASED NUMERICAL INTEGRATION

Page 16: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Generation of frontal matrices at leaves of the eliminaton tree expressed as

the execution of graph grammar productions (A1)-(A)4-(AN)

Page 17: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar productions generating local frontal matrices for left boundary,

interior and right boundary nodes for linear B-splines

Page 18: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar productions merging element frontal matrices at parent level

Page 19: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar production eliminating fully assembled row at parent level

Page 20: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar production for solution at root level

Graph grammar production for merging element frontal matrices at root level

Page 21: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Graph grammar production for recursive backward substitution

Page 22: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

Expression of the solver execution by graph grammar productions

(A1)-(A)4-(AN) (generation of frontal matrices at leaves of the elimination trees)

(A2)3 (merging contributions at father nodes)

(E2)3 (elimination of fully assembled nodes)

(A2) – (E2) (merging at parent node followed by elimination)

(Aroot) – (Eroot) (merging at root node followed by full forward elimination)

(BS)4 (backward substitutions)

PROCESS OF THE ELIMINATION

EXPRESSED BY GRAPH GRAMMAR PRODUCTIONS

Page 23: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

TRACE THEORY BASED SCHEDULER

Dependency relation for the solver algorithm

{(A1),(A)1}D(A2)1

{(A)2,(A)3}D(A2)2

{(A)4,(AN)}D(A2)3

(A2)1D(E2)1

(A2)2D(E2)2

(A2)3D(E2)3

{(E2)1,(E2)2}D(A2)4

(A2)4D(E2)4

{(E2)3(E2)4}D(Aroot)

(Aroot)D(Eroot)

(Eroot)D{(BS)1,(BS)2

(BS)1D{(BS)3,(BS)4}

Alphabet:

A={(A1), (A)1 , (A)2 , (A)3 , (A)4 , (AN), (A2)1 , (A2)2 , (A2)3 , (E2)1 , (E2)2 , (E2)3 , (A2)4 ,

(E2)4 , (Aroot) , (Eroot) , (BS)1 , (BS)2 , (BS)3 , (BS)4 }

Page 24: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

TRACE THEORY BASED SCHEDULER

Dependency graph

Page 25: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

TRACE THEORY BASED SCHEDULER

Dependency graph

Page 26: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

TRACE THEORY BASED SCHEDULER

Scheduling according to Foata Normal Form:

(A1)-(A)1-(A)2-(A)3-(A)4- (AN)-(A2)1-(A2)2- (A2)3-(E2)1-(E2)2-(E2)3- (A2)4- (E2)4-

(Aroot)-(Eroot)-(BS)1-(BS)2-(BS)3-(BS)4

[(A1)(A)1(A)2(A)3(A)4(AN)][(A2)1(A2)2(A2)3][(E2)1(E2)2(E2)3] [(A2)4][(E2)4]

[(Eroot)][(Aroot)][(Eroot)][(BS)1(BS)2][(BS)3(BS)4]

Thus, the execution of the solver consists of several steps, where independent

tasks are executed in concurrent, interchanged with the synchronization barriers.

kj

kikk

kj

kik

ki

nl

nnll

Daaljlik

Iaaljik

Aa

aaaaaaaaan

11

2122

221

112

11

,...,1,...,1

,...,1,

............11

Foata Normal Form

(alphabet)

Page 27: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GRAPH GRAMMAR PRODUCTIONS EXPRESSING

THE SOLVER ALGORITHM

Linear B-splines

Page 28: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GRAPH GRAMMAR PRODUCTIONS EXPRESSING

THE SOLVER ALGORITHM

Quadratic B-splines

Page 29: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

1D NUMERICAL RESULTS

LINEAR B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

Page 30: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

1D NUMERICAL RESULTS

QUADRATIC B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

Page 31: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

1D NUMERICAL RESULTS

CUBIC B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

Page 32: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

1D NUMERICAL RESULTS

QINTIC B-SPLINES

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

Page 33: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

COMPARISON WITH CPU MUMPS SOLVER

NVidia Tesla c2070, 6GB memory, 448 CUDA cores, each one with 1.15GHz clock

Intel(R) Core(TM)2 Quad CPU Q9400 with 2.66GHz clock, 8GB of memory

Page 34: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GENERATION OF 2D ELIMINATION TREE

Page 35: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GENERATION OF 2D ELIMINATION TREE

Page 36: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GENERATION OF 2D ELIMINATION TREE

Page 37: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GENERATION OF 2D ELIMINATION TREE

Page 38: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GENERATION OF 2D ELIMINATION TREE

Page 39: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GENERATION OF 2D ELIMINATION TREE

Page 40: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

GENERATION OF 2D ELIMINATION TREE

Page 41: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

2D NUMERICAL INTEGRATION

Page 42: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

2D ELIMINATION

Page 43: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

2D ELIMINATION

Page 44: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

2D ELIMINATION

Page 45: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

C0 COST

• Size of the top dense problem

O(N0.5)

• Cost of the sequential dense solver

O(M3)

• Computational cost of sequential

C0 solver

O((N0.5)3)=O(N3/2)

• Cost of the parallel dense solver

O(M2)

• Computational cost of sequential

C0 solver

O((N0.5)2)=O(N)

Page 46: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

Cp-1 COST

• Size of the top dense problem

O(pN0.5)

• Cost of the sequential dense solver

O(M3)

• Computational cost of sequential

C0 solver

O((pN0.5)3)=O(p3N3/2)

• Cost of the parallel dense solver

O(N2)

• Computational cost of sequential

C0 solver

O((pN0.5)2)=O(p2N)

Page 47: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

2D NUMERICAL RESULTS

LINEAR B-SPLINES

GeForce GTX 780 with 2304 cores, 3GB of memory

Page 48: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

2D NUMERICAL RESULTS

QUADRATIC B-SPLINES

GeForce GTX 780 with 2304 cores, 3GB of memory

Page 49: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

2D NUMERICAL RESULTS

CUBIC B-SPLINES

GeForce GTX 780 with 2304 cores, 3GB of memory

Page 50: Parallel multi-frontal solver for isogeometric finite element … · 2014. 5. 26. · Scheduling according to Foata Normal Form: Thus, the execution of the solver consists of several

CONCLUSIONS

We have developed the formal graph grammar model for B-splines based

finite element method computations in one and two dimensions.

The multi-frontal direct solver algorithm has been expressed by basic

undividable tasks, and the partial order of execution has been expressed

by a dependency graph

The tasks can be scheduled in a sequence of sets with independent tasks

based on the coloring of the dependency graph

The model allows for an efficient implementation of the generation

and solution of the problem in shared memory architecture

The isogeometric shared memory Cp-1 continuity parallel direct solver

scales like O(p2log(N/p)) for one dimensional problems, and like

O(Np2) for two dimensional problems