c o m p u t a t i o n a l r e s e a r c h d i v i s i o n solving large-scale eigenvalue problems in...

24
C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Solving Large-scale Eigenvalue Problems in SciDAC Applications Chao Yang Lawrence Berkeley National Laboratory June 27, 2005

Upload: vivian-gammons

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Solving Large-scale Eigenvalue Problems in SciDAC Applications

Chao Yang

Lawrence Berkeley National Laboratory

June 27, 2005

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

People Involved

LBNL: W. Gao, P. Husbands, X. S. Li, E. Ng, C. Yang

(TOPS) J. Meza, L. W. Wang, C. Yang (Nano-science)

SLAC: L. Lee, K. Ko

Stanford: G. Golub

UC-Davis Z. Bai

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

SciDAC Applications

Accelerator Modeling

Nano-science

MxKx 0)

1(

0

0)1

(2

2

E

E

EE

n

n

c

H i E ii XXXH )(

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Algorithms

Krylov Subspace MethodAlternatives

Optimization based approach non-linear solver based approach

Multi-level Sub-structuringNon-linear Eigenvalue Problems

Structure preserving methods Optimization based method

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Krylov Subspace Method

xAx 0

1000 ,...,,span);,( vAAvvkvA k Κ

kT

kkkkT

kTkkkkk AVVHIVVefHVAV , ,

• Widely used, relatively well understood (Polynomial approximation theory):

• Convergence of KSM: Well separated, large eigenvalues converge rapidly the starting vector

nnn xpxpxpvApz 2221110)(

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Acceleration Techniques

Implicit Restart

Spectral transformation

MxMxMK

xxIA

1

1

)(

)(

);,( 0 kvAΚ

);,( 0 kvA ΚQRIH

QefQHQQVQAV

k

Tkkk

Tkk

,)(

1

filter out unwanted spectral components from v0

ARPACK

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Using KSM in accelerator modeling

the spectrum of the problem

Example: H60VG3 structure, linear element, N=30M, nnz=484M 1024 CPUs, 738GB Ordering time: 4143s Numerical Factorization: 133s Total: 5068s for 12 eigenvalues

Software: PARPACK (implicit restart) + SuperLU, WSMP (spectral transformation)

1

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Limitations of the KSM

High degree polynomial needed for computing small clustered eigenvalues many matrix vector

multiplications Spectral transformation

can be expensive memory limitation scalability

Not easy to introduce a preconditioner eigenvectors of P-1A are

different from eigenvectors of A

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Alternative algorithms

Optimization based approach Minimizing Rayleigh Quotient

Minimizing Residual (Wood & Zunger 85, Jia 97)

Nonlinear equation solver based approach (Jacobi-Davidson) Newton correction Preconditioner stopping criteria for the inner iteration (Notay

2002, Stathopoulos 2005)

0 ),)(()( zuzuzuA T

AxxT

xxT 1min

xAxxxVx T

1,

min

)()( TT uuIPuuI

Allows us to solve problems with more than 90M DOF

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Multi-level Sub-structuring (for computing many eigenpairs)

Domain Decomposition concept Multi-level extension of the Component Mode

Synthesis (CMS) method (Bennighof 92) Decomposition can be done algebraically (Lehoucq &

Bennighof 2002) Success story in structure engineering.... Error analysis Extend to accelerator modeling

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Single-level Sub-structuring

Matrix Partition

Block elimination

Sub-structure calculation (mode selection)

Subspace assembling

11K

22K

11M

22M

11K

22K

11M

22M

),( MK

TKLLK 1ˆ TMLLM 1ˆ

)3(33

)3()3(33

)2(22

)2()2(22

)1(11

)1()1(11

ˆˆ vMvK

vMvK

vMvK

)3()3(

2)3(

13

)2()2(2

)2(12

)1()1(2

)1(11

3

2

1

k

k

k

vvvS

vvvS

vvvS

1S

2S

S

3S

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Mode Selection

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Implementation & cost

Cost: Flops: more than a single sparse Cholesky

factorization Storage: Block Cholesky factor + Projected matrix +

some other stuff NO triangular solves (involving the original K and M),

NO orthogonalization

attractive when:1) the problem is large enough2) a large number of eigenvalues are needed

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

AMLS vs. Shift-invert Lanczos (SIL)

DOF=65K, 3 levels of partition

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Cavity with External Coupling

Vector wave equation with waveguide boundary conditions can be modeled by a non-linear eigenvalue problem

OpenCavity

n

E i k 2 kc1

2 n n

E 0

n

E i k 2 kc2

2 n

n

E 0

n

E i k 2 kc3

2 n

n

E 0 Waveguide BC

Waveguide BC

Waveguide BC

With

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Quadratic Eigenvalue Problem

Consider only one mode propagating in the waveguides

Algorithms Linearize then solve by KSM (does not preserve

the structure of the problem) Second Order Arnoldi Iteration (Bai & Su 2005)

project the QEP into 2nd order Krylov Subspace

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Second-Order Krylov Space (Bai)

IkKMBWiMA c211 ,

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

SOAR is faster and more accurate (than linearization)

Accelerating cavity model for international linear collider (ILC)

9-cell superconducting cavity coupled to one input coupler and two Higher-Order-Mode couplers.

NDOFs=3.2million, NCPUs=768, Memory=300GB

18 eigenpairs in 2634 seconds (linearization took more than 1 hour)

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Electronic Structure Calculation

wave function

n – real space grid size, e.g. 323~32000

k – number of occupied states, 1~10% of n

Charge density

• Ekinetic =

• Eionic =

• EHartree=

• Exc =

)(trace2

1LXX T

i

Ti

Tion wxXXD

2trace

)(2

1XSX T

Xfe xcT

nik xxxxX R ),,...,,( 21

TXXX diag)(

Etotal(X) = Ekinetic + Eionic + EHartree + Exc

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Non-linear Eigenvalue Problem

Total energy minimization

KKT condition

IXX

XE

T

totalX

s.t.

)(min

IXX

XX

XH

XgXSwwDL

T

xcT

ion

)(

))((DiagDiag

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

The Self Consistent Field Iteration

Input: initial guess and Output:

Major steps

o For i=1,2,…,until converged

1) Form

2) Compute k smallest eigpairs of

)( )()( ii XHH

0X

k

T

total

IXX

XEX

s.t.

)(minargwSDL ion ,,,

k

T

iTi

IXX

XHXX

s.t.

traceminarg )()1(

)(iH

)()()()( iiii XXXH

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Direct Constrained Minimization (DCM)

GYE itotal

IYGYG kTT

)(min

):1,2:1()()( kkkGYP ii

)1()()()( ,, iiii PRXY

)( )()( ii XHH For i=1,2,… until convergence1. Form 2. Compute

3. If (i>1) then• set

4. else• set

5. Solve

6. If (i>1) then• set

7. else• set

onerpreconditi a is ,Diag where

,)()()()(

)()()()(1)(

KXHX

XXHKRiiii

iiiii

T

)()()( , iii RXY

):1,3:1()()( kkkGYP ii

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

DCM vs. SCF

Atomic system: SiH4 Discretization: spectral

method with plane wave basis: n=323 in real space, N=2103 (# of basis functions) in frequency space

Number of occupied states: k = 4

PETOT version of SCF uses 10 PCG steps (inner iterations) per outer iteration

DCM: 3 inner iterations

min)()( )()( EXEXE i

totali

C O M P U T A T I O N A L R E S E A R C H D I V I S I O N

Concluding Remarks

Krylov Subspace Method (with appropriate acceleration strategies) continues to play an important role in solving SciDAC eigenvalue problems

Steady progress has been made in alternative approaches that can make better use of preconditioners

Multi-level sub-structuring is promising for computing many eigenpairs

Significant progress made in solving QEP Non-linear eigenvalue problems remain challenging