recent advances on the solution phase of direct solvers with...

46
Recent advances on the solution phase of direct solvers with multiple sparse right-hand sides P. Amestoy 1 , J.-Y. L’Excellent 2 , G. Moreau 2 1. Universit´ e de Toulouse INPT and IRIT , 2. Universit´ e de Lyon, Inria and LIP-ENS Lyon , [email protected] Mumps User’s Days, Montbonnot, June 1-2 2017

Upload: others

Post on 05-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Recent advances on the solution phaseof direct solvers with multiple sparseright-hand sides

P. Amestoy1, J.-Y. L’Excellent2, G. Moreau2

1. Universite de Toulouse INPT and IRIT , 2. Universite de Lyon, Inria and LIP-ENS Lyon ,[email protected]

Mumps User’s Days, Montbonnot, June 1-2 2017

SIAM CSE 17, Altanta, Georgia

Page 2: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Introduction

ோߝ��������������������������௫ܧ ൌ 10

ൌ, െ

|, |

�௫ܧ����������

����������������������������������������������������������������������������

��

Linear systems of equations :Ax = b, A is sparseSolve phase (Ly = b,Ux = y) maybe critical.

01234D

epth

(km

)

0Dip (km)

5

10

15

20

Cross (

km)

5 10 15 20

3000 4000 5000 6000m/s

Application coming from Helmholtz or Maxwell equations:

name n (million) nrhs nnz/nrhs Tfacto Tsolve

sei70m 2.9 2302 587 1258 1267

sei50m 7.1 2302 486 6289 2985

E1 0.33 8000 9.8 55.2 291

E3 2.8 8000 7.5 1951 5610

Table: Characteristics of matrices and right-hand-sides.

2 / 19

Page 3: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Introduction

Objectives:

• focus on the forward solution phase Ly = b;

• exploit sparsity of right-hand-sides;

• limit the number of operations (∆);

3 / 19

Page 4: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Overview

Exploitation of sparse right-hand-sidesContext of studyTree pruningExploitation of subintervals of columns at each node

Minimizing the number of operationsPermutation of columnsAdapted blocking technique

Conclusion

4 / 19

Page 5: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Context: analysis phase

Ordering: reorder variables of the matrix A to reduce fill-in and buildelimination tree:

• Nested Dissection ⇒ build tree of separators.

1

4

2

5

10

11

13

143

6

12

15

7

9

8

16

18

17

19

20

21

2223

24

25

26

27

u15

u14

u13u10

u7

u6u3

u7

•••••••••••••••••••••••••••

• •

• •

• •

•••

••

•••• • • •

◦ ◦

◦ ◦

◦◦

◦ ◦

◦ ◦◦

◦ ◦◦

◦ ◦

◦ ◦◦

◦◦

◦ ◦

◦ ◦◦ ◦◦ ◦◦ ◦ ◦◦ ◦◦ ◦◦

◦ ◦ ◦

◦◦

◦◦◦

◦◦

◦◦

◦◦

◦◦

◦◦◦◦◦ ◦

3D physical domain (cube)

separator tree L factor

5 / 19

Page 6: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Context: analysis phase

Ordering: reorder variables of the matrix A to reduce fill-in and buildelimination tree:

• Nested Dissection ⇒ build tree of separators.

1

4

2

5

10

11

13

143

6

12

15

7

9

8

16

18

17

19

20

21

2223

24

25

26

27

u15

u14

u13u10

u7

u6u3

u7

•••••••••••••••••••••••••••

• •

• •

• •

•••

••

•••• • • •

◦ ◦

◦ ◦

◦◦

◦ ◦

◦ ◦◦

◦ ◦◦

◦ ◦

◦ ◦◦

◦◦

◦ ◦

◦ ◦◦ ◦◦ ◦◦ ◦ ◦◦ ◦◦ ◦◦

◦ ◦ ◦

◦◦

◦◦◦

◦◦

◦◦

◦◦

◦◦

◦◦◦◦◦ ◦

3D physical domain (cube) separator tree

L factor

5 / 19

Page 7: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Context: analysis phase

Ordering: reorder variables of the matrix A to reduce fill-in and buildelimination tree:

• Nested Dissection ⇒ build tree of separators.

1

4

2

5

10

11

13

143

6

12

15

7

9

8

16

18

17

19

20

21

2223

24

25

26

27

u15

u14

u13u10

u7

u6u3

u7

•••••••••••••••••••••••••••

• •

• •

• •

•••

••

•••• • • •

◦ ◦

◦ ◦

◦◦

◦ ◦

◦ ◦◦

◦ ◦◦

◦ ◦

◦ ◦◦

◦◦

◦ ◦

◦ ◦◦ ◦◦ ◦◦ ◦ ◦◦ ◦◦ ◦◦

◦ ◦ ◦

◦◦

◦◦◦

◦◦

◦◦

◦◦

◦◦

◦◦◦◦◦ ◦

3D physical domain (cube) separator tree

L factor

5 / 19

Page 8: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Context: analysis phase

Ordering: reorder variables of the matrix A to reduce fill-in and buildelimination tree:

• Nested Dissection ⇒ build tree of separators.

1

4

2

5

10

11

13

143

6

12

15

7

9

8

16

18

17

19

20

21

2223

24

25

26

27

u15

u14

u13u10

u7

u6u3

u7

•••••••••••••••••••••••••••

• •

• •

• •

•••

••

•••• • • •

◦ ◦

◦ ◦

◦◦

◦ ◦

◦ ◦◦

◦ ◦◦

◦ ◦

◦ ◦◦

◦◦

◦ ◦

◦ ◦◦ ◦◦ ◦◦ ◦ ◦◦ ◦◦ ◦◦

◦ ◦ ◦

◦◦

◦◦◦

◦◦

◦◦

◦◦

◦◦

◦◦◦◦◦ ◦

3D physical domain (cube) separator tree

L factor

5 / 19

Page 9: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Context: analysis phase

Ordering: reorder variables of the matrix A to reduce fill-in and buildelimination tree:

• Nested Dissection ⇒ build tree of separators.

1

4

2

5

10

11

13

143

6

12

15

7

9

8

16

18

17

19

20

21

2223

24

25

26

27

u15

u14

u13u10

u7

u6u3

u7

•••••••••••••••••••••••••••

• •

• •

• •

•••

••

•••• • • •

◦ ◦

◦ ◦

◦◦

◦ ◦

◦ ◦◦

◦ ◦◦

◦ ◦

◦ ◦◦

◦◦

◦ ◦

◦ ◦◦ ◦◦ ◦◦ ◦ ◦◦ ◦◦ ◦◦

◦ ◦ ◦

◦◦

◦◦◦

◦◦

◦◦

◦◦

◦◦

◦◦◦◦◦ ◦

3D physical domain (cube) separator tree L factor

5 / 19

Page 10: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Context: analysis phase

Ordering: reorder variables of the matrix A to reduce fill-in and buildelimination tree:

• Nested Dissection ⇒ build tree of separators.

1

4

2

5

10

11

13

143

6

12

15

7

9

8

16

18

17

19

20

21

2223

24

25

26

27

u15

u14

u13u10

u7

u6u3

u7

•••••••••••••••••••••••••••

• •

• •

• •

•••

••

•••• • • •

◦ ◦

◦ ◦

◦◦

◦ ◦

◦ ◦◦

◦ ◦◦

◦ ◦

◦ ◦◦

◦◦

◦ ◦

◦ ◦◦ ◦◦ ◦◦ ◦ ◦◦ ◦◦ ◦◦

◦ ◦ ◦

◦◦

◦◦◦

◦◦

◦◦

◦◦

◦◦

◦◦◦◦◦ ◦

3D physical domain (cube) separator tree L factor

5 / 19

Page 11: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

u15

u14u7

0

u15

u7

u14

Block operations:

• y1 ← L−111 b1

• b2 ← b2 − L21y1

#flops for node u is given by:

Fu = 2 ∗ (#entries in L11 + L21)

Total #flops: ∆ =∑u∈TFu

6 / 19

Page 12: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

u15

u14u7

0

u15

u7

u14

Block operations:

• y1 ← L−111 b1

• b2 ← b2 − L21y1

#flops for node u is given by:

Fu = 2 ∗ (#entries in L11 + L21)

Total #flops: ∆ =∑u∈TFu

6 / 19

Page 13: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

u15

u14u7

0

u15

u7

u14

Block operations:

• y1 ← L−111 b1

• b2 ← b2 − L21y1

#flops for node u is given by:

Fu = 2 ∗ (#entries in L11 + L21)

Total #flops: ∆ =∑u∈TFu

6 / 19

Page 14: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

u15

u14u7

0

u15

u7

u14

Block operations:

• y1 ← L−111 b1

• b2 ← b2 − L21y1

#flops for node u is given by:

Fu = 2 ∗ (#entries in L11 + L21)

Total #flops: ∆ =∑u∈TFu

6 / 19

Page 15: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

u15

u14u7

0

u15

u7

u14

Block operations:

• y1 ← L−111 b1

• b2 ← b2 − L21y1

#flops for node u is given by:

Fu = 2 ∗ (#entries in L11 + L21)

Total #flops: ∆ =∑u∈TFu

6 / 19

Page 16: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

u15

u14u7

0

u15

u7

u14

Block operations:

• y1 ← L−111 b1

• b2 ← b2 − L21y1

#flops for node u is given by:

Fu = 2 ∗ (#entries in L11 + L21)

Total #flops: ∆ =∑u∈TFu

6 / 19

Page 17: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

Forward solve phase processes the tree from bottom to top:

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

×

×

×ff

ff × : initial non-zeros

f : filled non-zeros

u15

u14

u13

u12u11

u10

u9u8

u7

u6

u5u4

u3

u2u1

Computation follows paths in the tree T [Gilbert, 1994].

↪→ Tree pruning (T → Tp(b)) to reduce computation:

∆ =∑

u∈Tp(b)

Fu

7 / 19

Page 18: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

Forward solve phase processes the tree from bottom to top:

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

×

×

×ff

ff × : initial non-zeros

f : filled non-zeros

u15

u14

u13

u12u11

u10

u9u8

u7

u6

u5u4

u3

u2u1

Computation follows paths in the tree T [Gilbert, 1994].

↪→ Tree pruning (T → Tp(b)) to reduce computation:

∆ =∑

u∈Tp(b)

Fu

7 / 19

Page 19: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

Forward solve phase processes the tree from bottom to top:

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

×

×

×ff

ff × : initial non-zeros

f : filled non-zeros

u15

u14

u13

u12u11

u10

u9u8

u7

u6

u5u4

u3

u2u1

Computation follows paths in the tree T [Gilbert, 1994].

↪→ Tree pruning (T → Tp(b)) to reduce computation:

∆ =∑

u∈Tp(b)

Fu

7 / 19

Page 20: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

Forward solve phase processes the tree from bottom to top:

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

×

×

×ff

ff × : initial non-zeros

f : filled non-zeros

u15

u14

u13

u12u11

u10

u9u8

u7

u6

u5u4

u3

u2u1

Computation follows paths in the tree T [Gilbert, 1994].

↪→ Tree pruning (T → Tp(b)) to reduce computation:

∆ =∑

u∈Tp(b)

Fu

7 / 19

Page 21: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

Forward solve phase processes the tree from bottom to top:

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

×

×

×ff

ff × : initial non-zeros

f : filled non-zeros

u15

u14

u13

u12u11

u10

u9u8

u7

u6

u5u4

u3

u2u1

Computation follows paths in the tree T [Gilbert, 1994].

↪→ Tree pruning (T → Tp(b)) to reduce computation:

∆ =∑

u∈Tp(b)

Fu

7 / 19

Page 22: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Forward solution process

Forward solve phase processes the tree from bottom to top:

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

×

×

×ff

ff × : initial non-zeros

f : filled non-zeros

u15

u14

u13

u12u11

u10

u9u8

u7

u6

u5u4

u3

u2u1

Computation follows paths in the tree T [Gilbert, 1994].

↪→ Tree pruning (T → Tp(b)) to reduce computation:

∆ =∑

u∈Tp(b)

Fu

7 / 19

Page 23: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Exposition of padded zeros

When B is a matrix with multiple columns:

• use of BLAS 3 operations for efficiency;

• Tp(B) =⋃Tp(Bi ), where Bi is column i of B;

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

nrhs

1 2 3 4 5 6

u15

u14

u13

u12u11

u10

u9u8

u7

u6

u5u4

u3

u2u1

But still, extra computations are done ...

∆ = nrhs ×∑

u∈Tp(B)Fu

8 / 19

Page 24: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Exposition of padded zeros

When B is a matrix with multiple columns:

• use of BLAS 3 operations for efficiency;

• Tp(B) =⋃Tp(Bi ), where Bi is column i of B;

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

nrhs

1 2 3 4 5 6

u15

u14

u13

u12u11

u10

u9u8

u7

u6

u5u4

u3

u2u1

But still, extra computations are done ...

∆ = nrhs ×∑

u∈Tp(B)Fu

8 / 19

Page 25: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Solutions

What are the possible alternatives ?

• Indirections: rebuilding data structures;

• Sequential: solution phase on each column⇒ optimal (∆ = ∆min) but not efficient;

• Regular blocking: how to build blocks ?◦ minimal access to factors (out of core) [Amestoy et al.,SISC,2012];◦ minimal number of operations (in core) [Yamazaki et al.,2013];

• Exploitation of subintervals of columns at each node [Amestoy et

al.,SISC,2015].

9 / 19

Page 26: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Solutions

What are the possible alternatives ?

• Indirections: rebuilding data structures;

• Sequential: solution phase on each column⇒ optimal (∆ = ∆min) but not efficient;

• Regular blocking: how to build blocks ?◦ minimal access to factors (out of core) [Amestoy et al.,SISC,2012];◦ minimal number of operations (in core) [Yamazaki et al.,2013];

• Exploitation of subintervals of columns at each node [Amestoy et

al.,SISC,2015].

9 / 19

Page 27: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Work on node subintervals

Let u ∈ T :

Active columns at node u

Zu = {i ∈ {1, . . . ,m} | u ∈ Tp(Bi )}

Subinterval is given by:

θu = max(Zu)−min(Zu) + 1

Example: θu1 = 1, θu10 = 6

∆ =∑

u∈Tp(B)

Fu × θu

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

1 2 3 4 5 6θ1 = 1θ2 = 1θ3 = 4θ4 = 1θ5 = 1θ6 = 4θ7 = 5θ8 = 6θ9 = 0θ10 = 6θ11 = 1θ12 = 1θ13 = 3θ14 = 6θ15 = 6

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

1 4 2 5 6 3θ1 = 1θ2 = 1θ3 = 2θ4 = 1θ5 = 1θ6 = 2θ7 = 4θ8 = 5θ9 = 0θ10 = 5θ11 = 1θ12 = 1θ13 = 3θ14 = 6θ15 = 6

↪→ ∆ is extremely dependant on column permutation.

10 / 19

Page 28: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Work on node subintervals

Let u ∈ T :

Active columns at node u

Zu = {i ∈ {1, . . . ,m} | u ∈ Tp(Bi )}

Subinterval is given by:

θu = max(Zu)−min(Zu) + 1

Example: θu1 = 1, θu10 = 6

∆ =∑

u∈Tp(B)

Fu × θu

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

1 2 3 4 5 6θ1 = 1θ2 = 1θ3 = 4θ4 = 1θ5 = 1θ6 = 4θ7 = 5θ8 = 6θ9 = 0θ10 = 6θ11 = 1θ12 = 1θ13 = 3θ14 = 6θ15 = 6

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

1 4 2 5 6 3θ1 = 1θ2 = 1θ3 = 2θ4 = 1θ5 = 1θ6 = 2θ7 = 4θ8 = 5θ9 = 0θ10 = 5θ11 = 1θ12 = 1θ13 = 3θ14 = 6θ15 = 6

↪→ ∆ is extremely dependant on column permutation.

10 / 19

Page 29: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Work on node subintervals

Let u ∈ T :

Active columns at node u

Zu = {i ∈ {1, . . . ,m} | u ∈ Tp(Bi )}

Subinterval is given by:

θu = max(Zu)−min(Zu) + 1

Example: θu1 = 1, θu10 = 6

∆ =∑

u∈Tp(B)

Fu × θu

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

1 2 3 4 5 6θ1 = 1θ2 = 1θ3 = 4θ4 = 1θ5 = 1θ6 = 4θ7 = 5θ8 = 6θ9 = 0θ10 = 6θ11 = 1θ12 = 1θ13 = 3θ14 = 6θ15 = 6

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

1 4 2 5 6 3θ1 = 1θ2 = 1θ3 = 2θ4 = 1θ5 = 1θ6 = 2θ7 = 4θ8 = 5θ9 = 0θ10 = 5θ11 = 1θ12 = 1θ13 = 3θ14 = 6θ15 = 6

↪→ ∆ is extremely dependant on column permutation.

10 / 19

Page 30: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Work on node subintervals

Let u ∈ T :

Active columns at node u

Zu = {i ∈ {1, . . . ,m} | u ∈ Tp(Bi )}

Subinterval is given by:

θu = max(Zu)−min(Zu) + 1

Example: θu1 = 1, θu10 = 6

∆ =∑

u∈Tp(B)

Fu × θu

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

1 2 3 4 5 6θ1 = 1θ2 = 1θ3 = 4θ4 = 1θ5 = 1θ6 = 4θ7 = 5θ8 = 6θ9 = 0θ10 = 6θ11 = 1θ12 = 1θ13 = 3θ14 = 6θ15 = 6

×

f

f

×

ff

f

×

×ff

×f

f

f

×ff

×ff×

×

f

ff

1 4 2 5 6 3θ1 = 1θ2 = 1θ3 = 2θ4 = 1θ5 = 1θ6 = 2θ7 = 4θ8 = 5θ9 = 0θ10 = 5θ11 = 1θ12 = 1θ13 = 3θ14 = 6θ15 = 6

↪→ ∆ is extremely dependant on column permutation.

10 / 19

Page 31: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Problem statement & algorithms

Goal is to minimize or decrease ∆ =∑

u∈Tp(B)Fu × θu:

• find permutation σ of columns to decrease θu,∀u ∈ Tp(B);

• in case of blocking, minimize the number of blocks.

Proposed heuristics:

• based on geometrical properties (Nested Dissection);

• generalization possible thanks to pruned tree Tp(B).

11 / 19

Page 32: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Problem statement & algorithms

Goal is to minimize or decrease ∆ =∑

u∈Tp(B)Fu × θu:

• find permutation σ of columns to decrease θu,∀u ∈ Tp(B);

• in case of blocking, minimize the number of blocks.

Proposed heuristics:

• based on geometrical properties (Nested Dissection);

• generalization possible thanks to pruned tree Tp(B).

11 / 19

Page 33: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Flat Tree Algorithm

Intuition based on a simple 2D example:

2D physical domain

u0

u1 u2a b c

d

e

f

g

h

i

j

k

l

structure of B

a bc cbd e f g h i j k l

T [u1]

T [u2]

u0

T [u11]

T [u12]

u1

T [u21]

T [u12]

u2u0

×

f

×

×

×

×

×

×

×f

×

f

f

×

××

f

×f

f

×

f

×

×

×××

×××

×f

×f×

×

ff

×

××f

×ff

u0

u2

u22u21

u1

u12u11

elimination tree

• Nested Dissection ⇒ partition right-hand-sides into 3 sets (a, b, c);

• θu1 = a + c + b

⇒ θu1 = a + b;

↪→ Top-down approach + local optimisation for the nodes at thecurrent layer in the tree.

12 / 19

Page 34: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Flat Tree Algorithm

Intuition based on a simple 2D example:

2D physical domain

u0

u1 u2a b c

d

e

f

g

h

i

j

k

l

structure of B

a bc cbd e f g h i j k l

T [u1]

T [u2]

u0

T [u11]

T [u12]

u1

T [u21]

T [u12]

u2u0

×

f

×

×

×

×

×

×

×f

×

f

f

×

××

f

×f

f

×

f

×

×

×××

×××

×f

×f×

×

ff

×

××f

×ff

u0

u2

u22u21

u1

u12u11

elimination tree

• Nested Dissection ⇒ partition right-hand-sides into 3 sets (a, b, c);

• θu1 = a + c + b ⇒ θu1 = a + b;

↪→ Top-down approach + local optimisation for the nodes at thecurrent layer in the tree.

12 / 19

Page 35: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Flat Tree Algorithm

Intuition based on a simple 2D example:

2D physical domain

u0

u1 u2a b c

d

e

f

g

h

i

j

k

l

structure of B

a bc cbd e f g h i j k l

T [u1]

T [u2]

u0

T [u11]

T [u12]

u1

T [u21]

T [u12]

u2u0

×

f

×

×

×

×

×

×

×f

×

f

f

×

××

f

×f

f

×

f

×

×

×××

×××

×f

×f×

×

ff

×

××f

×ff

u0

u2

u22u21

u1

u12u11

elimination tree

• Nested Dissection ⇒ partition right-hand-sides into 3 sets (a, b, c);

• θu1 = a + c + b ⇒ θu1 = a + b;

↪→ Top-down approach + local optimisation for the nodes at thecurrent layer in the tree.

12 / 19

Page 36: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Results on the Flat Tree

flops: normalized with the dense case; Ordering: Nested Dissection;

E1 E3 sei70m sei50mMatrices

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Norm

aliz

ed fl

ops

TP INT PO FT LB

Strategies:

• TP = tree pruning only;

• INT = tree pruning + nodeinterval+natural order;

• PO = tree pruning+nodeinterval+Postorder;

• FT = tree pruning+node interval+FlatTree;

• LB = Lower Bound (∆min).

↪→ Still 28% above the lower bound on one case.

13 / 19

Page 37: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Adapted blocking technique

Objective: decrease ∆ with the creation of a minimum number ofgroups.

×f

f

f

×

ff

f

×

f

f

×ff

×ff×

×

f

ff

×

×ff

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

4 2 1 5 6 34

×f

f

f

2

×

ff

f

f

f

5

×ff

×ff×

6

×

f

ff

3

×

×ff

4 2 6 3

×f

f

f

×

ff

f

×

f

ff

×

×ff

1 5×

f

f

×ff

×ff×

Computations on explicit zeros still exist.

• however, not performant (loss of BLAS 3 operations);

• need to find some property to group right hand sides togetherwithout introducing extra operations.

14 / 19

Page 38: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Adapted blocking technique

Objective: decrease ∆ with the creation of a minimum number ofgroups.

×f

f

f

×

ff

f

×

f

f

×ff

×ff×

×

f

ff

×

×ff

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

4 2 1 5 6 34

×f

f

f

2

×

ff

f

f

f

5

×ff

×ff×

6

×

f

ff

3

×

×ff

4 2 6 3

×f

f

f

×

ff

f

×

f

ff

×

×ff

1 5×

f

f

×ff

×ff×

∆min may be obtain by creating nrhs groups:

• however, not performant (loss of BLAS 3 operations);

• need to find some property to group right hand sides togetherwithout introducing extra operations.

14 / 19

Page 39: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Adapted blocking technique

Objective: decrease ∆ with the creation of a minimum number ofgroups.

×f

f

f

×

ff

f

×

f

f

×ff

×ff×

×

f

ff

×

×ff

u1u2u3u4u5u6u7u8u9u10u11u12u13u14u15

4 2 1 5 6 34

×f

f

f

2

×

ff

f

f

f

5

×ff

×ff×

6

×

f

ff

3

×

×ff

4 2 6 3

×f

f

f

×

ff

f

×

f

ff

×

×ff

1 5×

f

f

×ff

×ff×

∆min may be obtain by creating nrhs groups:

• however, not performant (loss of BLAS 3 operations);

• need to find some property to group right hand sides togetherwithout introducing extra operations.

14 / 19

Page 40: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Adapted blocking technique

Principle (1): group sets of right hand sides that belong to differentsubdomains (starting with root separator).

a b c

a bc cc b

2D Domain structure of B

• non-zero structure of a and c are disjoint;

• whereas b may have the non-zero structure of both a and c ;

• thus, we extract them.

15 / 19

Page 41: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Adapted blocking technique

Principle (2): extract set of right hand sides that belong to bothsubdomains (starting with root separator).

a b c

a bc cc b

2D Domain structure of B

• non-zero structure of a and c are disjoint;

• whereas b may have the non-zero structure of both a and c ;

• thus, we extract them.

15 / 19

Page 42: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Adapted blocking technique

Principle (2): extract set of right hand sides that belong to bothsubdomains (starting with root separator).

a b c

a bc cc b

2D Domain structure of B

• non-zero structure of a and c are disjoint;

• whereas b may have the non-zero structure of both a and c ;

• thus, we extract them.

15 / 19

Page 43: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Comparison with a regular blocking strategy

Our Blocking algorithm (BLK):

• greedy algorithm to choose nextgroup;

• stop condition: ∆ < ∆tol ,where ∆tol = 1.01∆min.

Regular blocking algorithm(REG):

• split in chunk of regular size;

• stop condition: ∆ < ∆tol ,where ∆tol = 1.01∆min.

nb groups 5Hz 7Hz E1 E3

REG 328 255 363 258

BLK 3 3 4 4

Table: Number of groups created for each strategy with a tolerance such that∆ < 1.01×∆tol .

16 / 19

Page 44: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Conclusion

Achievements:

• implementation of two heuristics (permutation, blocking);

• 90% decrease in flops by exploiting sparsity;

• Up to 40% decrease in time for forward solve w.r.t. INT strategyand Nested Dissection ordering (sequential).

Perspectives:

• adapt the Flat Tree algorithm to unbalanced trees;

• parallelism and sparsity aspects of Flat Tree permutation;

• extend to more general test cases.

17 / 19

Page 45: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

Acknowledgements

• LIP laboratory for access to the machines;

• EMGS et SEISCOPE for providing test cases;

• This work was performed within the frameworks of both theMUMPS consortium and the LABEX MILYON(ANR-10-LABX-0070) of Universite de Lyon, within the program”Investissements d’Avenir” (ANR-11-IDEX-0007) operated by theFrench National Research Agency (ANR).

18 / 19

Page 46: Recent advances on the solution phase of direct solvers with …mumps.enseeiht.fr/doc/ud_2017/Moreau_Talk.pdf · 2017. 9. 30. · Recent advances on the solution phase of direct solvers

? Thanks!Questions?