cee 618 scientiﬁc parallel computing (lecture 3) · 2013. 1. 25. · cee 618 scientiﬁc parallel...

CEE 618 Scientific Parallel Computing (Lecture 3)Linear Algebra Basics using LAPACK

Albert S. Kim

Department of Civil and Environmental EngineeringUniversity of Hawai‘i at Manoa

2540 Dole Street, Holmes 383, Honolulu, Hawaii 96822

1 / 36

Table of Contents

1 Partial Differential Equation: Althernative Method

2 Linear AlgebraLU decompositionNumerical Recipes in FORTRANLinea Algeb PACAKage

3 Eigen Value & Eigen Vector

4 PBS(Portable Batch System

2 / 36

Partial Differential Equation: Althernative Method

Outline





3 / 36


Convection-Diffusion-Reaction Equation

General form∂C

∂t= ∇ · (D∇C)−∇ · (vC)− kC (1)

In a steady state without convection and reaction

0 = ∇ · (D∇C) (2)

In 2D with a constant diffusion coefficient

0 =∂2C

∂x2+∂2C

∂y2(3)

Mathematically identical to heat diffusion (C → T )

0 =∂2T

∂x2+∂2T

∂y2(4)

Examples? Let’s watach some videos inhttp://albertsk.org/videos/physical/.

4 / 36

http://albertsk.org/videos/physical/


Example problem

Solve the following equation using the method of separation ofvariables:

∂2C

∂x2+∂2C

∂y2= 0 (5)

Boundary conditions(0 < x, y < L)

1 C (x = 0, y) = 02 C (x, y = 0) = 03 C (x, y = L) = 04 C (x = L, y) = 10 sin

(πyL

)Figure: How does your solution looklike?

5 / 36


Solution

1. By the method of the separation of variables

C (x, y) = X (x)Y (y) (6)

C =10

sinhπsinh

πx

Lsin

πy

L(7)

Prove.

6 / 36


7 / 36


Solution

2. By MS Excel: I am nothing but an average of my neighbors.

Cij =Ci+1,j + Ci−1,j + Ci,j+1 + Ci,j−1

4(8)

Excel setup1 Open MS Excel2 Go to File3 Click Options4 Go to Formulas5 Click “Enable iterative calculation”

8 / 36

Linear Algebra

Outline





9 / 36

Linear Algebra

Example

Tony is two years odlder than Sam and the sum of their current ages istwenty. How old are Tony and Sam? Use a two by two matrix to solvethis problem.

T − S = 2 (9)T + S = 20 (10)

10 / 36

Linear Algebra LU decomposition

A Linear System

A · x = b (11)

A =

a11 a12 · · · a1na21 a22 · · · a2n...

.... . .

...an1 an2 · · · ann

n×n

,x =

x1x2...xn

n×1

and b =

b1b2...bn

n×1

where A is a n× n square matrix, and b is a n× 1 column vector, ofwhich all elements are known.

Then, how can we calculate x?

11 / 36


LU decomposition

The square matrix A can be decomposed into

A = L ·U (12)

where L and U are lower and upper triangular matrixes, respectively,and calculated as

L =

α11 0 · · · 0α21 α22 · · · 0

......

. . ....

αn1 αn2 · · · αnn

, U =

β11 β12 · · · β1n0 β22 · · · β2n...

.... . .

...0 0 · · · βnn

Then,

A · x = (L ·U) · x = L · (U · x) = b (13)

Let’s set U · x = y, thenL · y = b (14)

12 / 36


Forward substitution with known L and b to solve for y

L · y = bα11 0 · · · 0α21 α22 · · · 0

......

. . ....

αn1 αn2 · · · αnn

·

y1y2...yn

=

b1b2...bn

Then,

y1 =b1α11

, y2 =b2 − α11y1

α22, . . . (15)

Using back substitution,

yi =1

αii

bi − i−1∑j=1

αijyj

(16)

where i = 2, 3, ..., n.13 / 36


Backward substitution with U and y to solve for x

U · x = y (17)β11 β12 · · · β1n

0. . . · · ·

......

... βn−1,n−1 βn−1,n0 · · · 0 βnn

·

x1...

xn−1xn

=

y1...

yn−1yn

Then,

xn =ynβnn

, xn−1 =yn−1 − βn−1,nxn

βn−1,n−1, . . . (18)

Using back substitution,

xi =1

βii

yi − n∑j=i+1

βijxj

(19)

where i = n− 1, n− 2, ..., 1.14 / 36


Combined matrix of α’s and β’s with less memory

Using αii = 1 where i = 1, 2, ..., n

L⊕U→ C =

β11 β12 β13 · · · β1,n−1 β1nα21 β22 β23 · · · β2,n−1 β2nα31 α32 β33 · · · β3,n−1 β3n

......

.... . .

......

αn−1,1 αn−1,2 αn−1,3 · · · βn−1,n−1 βn−1,nαn1 αn2 αn3 · · · αn,n−1 βnn

15 / 36


Example:

A · x = b (20)

A =

1 3 11 1 22 3 4

, b =

100

, x =

x1x2x3

=? (21)

C =

2.000000 3.000000 4.0000000.500000 1.500000 −1.0000000.500000 −0.333333 −0.333333

, x =

20−1

(22)

However, C does not directrly represent L and U of matrix A becausepivoting exchanges row index during the LU decomposition.

16 / 36


Makefile

1 Use files in “/opt/cee618s13/class03/” to solve this problem usingludcmp and lubksb subroutines from NRF771

2 Use LAPACK routines2 of DGETRF and DGETRS.3 Check how to link LAPACK in Makefile.

1Section 2.3 of “Numerical Recipes in FORTRAN 77”, available athttp://www.nrbook.com/a/bookfpdf.php

2LAPACK user’s guide athttp://www.netlib.org/lapack/lug/index.html

17 / 36

http://www.nrbook.com/a/bookfpdf.php

http://www.netlib.org/lapack/lug/index.html

Linear Algebra Numerical Recipes in FORTRAN

Using subroutines in NRF: ludcmp & lubksb

1 program LUi m p l i c i t none

3 i n t e g e r : : i , j , indx ( 3 )r e a l : : a (3 ,3 ) = ( / 1 . , 1 . , 2 . , 3 . , 1 . , 3 . , 1 . , 2 . , 4 . / )

5 r e a l : : d , b ( 3 ) = ( / 1 . , 0 . , 0 . / )

7 open (11 , f i l e = ’ l u . dat ’ )! D isp lay the given matr ix , A and b

9 w r i t e ( 1 1 , * )do i =1 ,3

11 w r i t e (11 , " (4 (2 x , F12 . 6 ) ) " ) ( a ( i , j ) , j =1 ,3) , b ( i )enddo

13 ! Decomposit ion o f the given mat r i x Ac a l l ludcmp ( a ,3 ,3 , indx , d )

15 ! Disp lay the decomposed matr ix , A and bw r i t e ( 1 1 , * )

17 do i =1 ,3w r i t e (11 , " (4 (2 x , F12 . 6 ) ) " ) ( a ( i , j ) , j =1 ,3)

19 enddo! Solv ing f o r x w i th the decomposed mat r i x using backsubs i tu t i on

21 c a l l lubksb ( a ,3 ,3 , indx , b )! D isp lay the decomposed matr ix , A and the s o l u t i o n x

23 w r i t e ( 1 1 , * )do i =1 ,3

25 w r i t e (11 , " (4 (2 x , F12 . 6 ) ) " ) ( a ( i , j ) , j =1 ,3) , b ( i )enddo

27 stopend

./codes/LU/LU3.f9018 / 36


Results using ludcmp & lubksb

1 3 11 1 22 3 4

20-1

=

100

(23)

2 1.000000 3.000000 1.000000 1.0000001.000000 1.000000 2.000000 0.000000

4 2.000000 3.000000 4.000000 0.000000

6 2.000000 3.000000 4.0000000.500000 1.500000 −1.000000

8 0.500000 −0.333333 −0.333333

10 2.000000 3.000000 4.000000 2.0000000.500000 1.500000 −1.000000 0.000000

12 0.500000 −0.333333 −0.333333 −1.000000

./codes/LU/lu.dat19 / 36


SUBROUTINE ludcmp ( a , n , np , indx , d )2 INTEGER n , np , indx ( n ) ,NMAX

REAL d , a ( np , np ) ,TINY4 PARAMETER (NMAX=500 ,TINY=1.0e−20)

INTEGER i , imax , j , k6 REAL aamax ,dum, sum, vv (NMAX)

d=1.8 do 12 i =1 ,n

aamax=0.10 do 11 j =1 ,n

i f ( abs ( a ( i , j ) ) . g t . aamax) aamax=abs ( a ( i , j ) )12 11 cont inue

i f ( aamax . eq . 0 . ) pause ’ s i n g u l a r mat r i x i n ludcmp ’14 vv ( i ) = 1 . / aamax

12 cont inue16 do 19 j =1 ,n

do 14 i =1 , j −118 sum=a ( i , j )

do 13 k=1 , i −120 sum=sum−a ( i , k ) *a ( k , j )

13 cont inue22 a ( i , j ) =sum

14 cont inue24 aamax=0.

do 16 i = j , n26 sum=a ( i , j )

do 15 k=1 , j −128 sum=sum−a ( i , k ) *a ( k , j )

15 cont inue30 a ( i , j ) =sum

dum=vv ( i ) * abs (sum)32 i f (dum. ge . aamax) then

imax= i34 aamax=dum

end i f36 16 cont inue

i f ( j . ne . imax ) then38 do 17 k=1 ,n

dum=a ( imax , k )40 a ( imax , k ) =a ( j , k )

a ( j , k ) =dum42 17 cont inue

d=−d44 vv ( imax ) =vv ( j )

end i f46 indx ( j ) =imax

i f ( a ( j , j ) . eq . 0 . ) a ( j , j ) =TINY48 i f ( j . ne . n ) then

dum= 1 . / a ( j , j )50

do 18 i = j +1 ,n52 a ( i , j ) =a ( i , j ) *dum

18 cont inue54 end i f

19 cont inue56 r e t u r n

END

./codes/LU/ludcmp.f

20 / 36


1 SUBROUTINE lubksb ( a , n , np , indx , b )INTEGER n , np , indx ( n )

3 REAL a ( np , np ) ,b ( n )INTEGER i , i i , j , l l

5 REAL sumi i =0

7 do 12 i =1 ,nl l = indx ( i )

9 sum=b ( l l )b ( l l ) =b ( i )

11 i f ( i i . ne . 0 ) thendo 11 j = i i , i −1

13 sum=sum−a ( i , j ) *b ( j )11 cont inue

15 else i f (sum. ne . 0 . ) theni i = i

17 end i fb ( i ) =sum

19 12 cont inuedo 14 i =n,1 ,−1

21 sum=b ( i )do 13 j = i +1 ,n

23 sum=sum−a ( i , j ) *b ( j )13 cont inue

25 b ( i ) =sum/ a ( i , i )14 cont inue

27 r e t u r nEND

./codes/LU/lubksb.f

21 / 36

Linear Algebra Linea Algeb PACAKage

Using subroutines in LAPACK: dgetrf & dgetrs

program LUlapack2 i m p l i c i t none

i n t e g e r : : i , j , i p i v ( 3 ) , i n f o4 double p r e c i s i on : : a (3 ,3 ) = ( / 1 . , 1 . , 2 . , 3 . , 1 . , 3 . , 1 . , 2 . , 4 . / )

double p r e c i s i o n : : b ( 3 ) = ( / 1 . , 0 . , 0 . / )6

open (11 , f i l e = ’ lu lapack . dat ’ )8 ! D isp lay the given matr ix , A and b

w r i t e ( 1 1 , * )10 do i =1 ,3

w r i t e (11 , " (4 (2 x , F12 . 6 ) ) " ) ( a ( i , j ) , j =1 ,3) , b ( i )12 end do

! Decomposit ion o f the given mat r i x A14 c a l l d g e t r f (3 ,3 , a ,3 , i p i v , i n f o )

! D isp lay the decomposed matr ix , A and b16 w r i t e ( 1 1 , * )

do i =1 ,318 w r i t e (11 , " (4 (2 x , F12 . 6 ) ) " ) ( a ( i , j ) , j =1 ,3) , b ( i )

end do20 ! Solv ing f o r x w i th the decomposed mat r i x using backsubs i tu t i on

c a l l dget rs ( ’N ’ ,3 ,1 ,a ,3 , i p i v , b ,3 , i n f o )22 ! Disp lay the decomposed matr ix , A and the s o l u t i o n x

w r i t e ( 1 1 , * )24 do i =1 ,3

w r i t e (11 , " (4 (2 x , F12 . 6 ) ) " ) ( a ( i , j ) , j =1 ,3) , b ( i )26 end do

stop28 end

./codes/LUlapack/LU3dlapack.f90

22 / 36


Archives

Linear Equations athttp://www.netlib.org/lapack/lug/node38.html

Individual athttp://www.netlib.org/lapack/individualroutines.html

Single, REAL at http://www.netlib.org/lapack/single/

Double, REAL at http://www.netlib.org/lapack/double/

dgetrf at http://www.netlib.org/lapack/double/dgetrf.f

dgetrs at http://www.netlib.org/lapack/double/dgetrs.f

dgetri at http://www.netlib.org/lapack/double/dgetri.f

23 / 36

http://www.netlib.org/lapack/lug/node38.html

http://www.netlib.org/lapack/individualroutines.html

http://www.netlib.org/lapack/single/

http://www.netlib.org/lapack/double/

http://www.netlib.org/lapack/double/dgetrf.f

http://www.netlib.org/lapack/double/dgetrs.f

http://www.netlib.org/lapack/double/dgetri.f


Specifically

call dgetrf ( 3 , 3 , a , 3 , ipiv , info )

call dgetrs( ’N’ , 3 , 1 , a , 3 , ipiv , b , 3 , info )

24 / 36

Eigen Value & Eigen Vector

Outline





25 / 36



Example: Rotate to principal axes the quadratic surface

x2 + 6xy − 2y2 − 2yz + z2 = 24 (24)

In matrix form this equation is

(x y z

) 1 3 03 −2 −10 −1 1

xyz

= 24 (25)

orXTMX = 24 (26)

The characteristic equation of this matrix is∣∣∣∣∣∣1− µ 3 0

3 −2− µ −10 −1 1− µ

∣∣∣∣∣∣ = −µ3+13µ−12 = − (µ− 1) (µ+ 4) (µ− 3) = 0

(27)The characteristic values are µ = 1, −4, 3.

26 / 36


From (x y z

) 1 3 03 −2 −10 −1 1

xyz

= 24 (28)

relative to the principal axes (x′, y′, z′), the quadratic sufrace equationbecomes (

x′ y′ z′) 1 0 0

0 −4 00 0 3

x′

y′

z′

= 24 (29)

or1 · x′2 + (−4) · y′2 + 3 · z′2 = 24 (30)

orX ′

TM ′X ′ = 24 (31)

where

M ′ =

1 0 00 −4 00 0 3

(32)

27 / 36


Eigen vectors are(1√10,

0√10,

3√10

)for µ = 1 (33)(

−3√35,

5√35,

1√35

)for µ = −4 (34)(

−3√14,−2√14,

1√14

)for µ = 3 (35)

1√10

−3√35

−3√14

0 5√35

−2√14

3√10

1√35

1√14

x

yz

=

x′

y′

z′

(36)

orC ·X = X ′

XT · CT = X ′T

28 / 36


In other words,1√10

0 3√10

−3√35

5√35

1√35

−3√14

−2√14

1√14

1 3 0

3 −2 −10 −1 1

1√10

−3√35

−3√14

0 5√35

−2√14

3√10

1√35

1√14

=

1 0 00 −4 00 0 3

(37)

In the eigen vector matrix, the columns can be exchanged andsigns can be reverted. It is a matter of using right-handed orleft-haded coordinates.Using transformed coordinates makes the problem mathematicallyso convenient.In quantum mechanics, eigen values are energy and eigenvectors are quantum states.

29 / 36


PROGRAM EIGENVV2 IMPLICIT NONE

INTEGER : : I , INFO , J , N, LWORK4 DOUBLE PRECISION : : DUMMY(1 ,1 )

DOUBLE PRECISION , ALLOCATABLE, DIMENSION ( : , : ) : : A , B , VR6 DOUBLE PRECISION , ALLOCATABLE, DIMENSION ( : ) : : ALPHAR, ALPHAI , BETA, WORK

open (11 , f i l e = ’ mat . i n ’ , s ta tus = ’ o ld ’ )8 read ( 1 1 , * ) N

LWORK = 8*N10 a l l o c a t e (A(N,N) ,B(N,N) ,ALPHAR(N) ,ALPHAI (N) ,BETA(N) ,VR(N,N) ,WORK(LWORK) )

B = 0 . 0 ; do i = 1 , N; B( i , i ) = 1 . 0 ; end do12 READ ( 1 1 , * ) ( ( A( I , J ) , J=1 ,N) , I =1 ,N)

CALL DGGEV( ’N ’ , ’V ’ ,N,A,N,B,N, ALPHAR, ALPHAI ,BETA,DUMMY,1 ,VR,N,WORK,LWORK, INFO)14 w r i t e ( * , * ) ’ Eigen values are ( d iagonal ) : ’

w r i t e ( * , " (3 (2X, F12 . 8 ) ) " ) ( ( A( i , j ) , J=1 ,N) , I =1 ,N)16 w r i t e ( * , * )

c a l l eigvec_norm (N,VR)18 w r i t e ( * , * ) ’ Eigen vec to rs are : ’

w r i t e ( * , " (3 (2X, F12 . 8 ) ) " ) ( (VR( i , j ) , J=1 ,N) , I =1 ,N)20 w r i t e ( * , * )

dea l l oca te (A,B,ALPHAR, ALPHAI ,BETA,VR,WORK)22

conta ins24

subrou t ine eigvec_norm (N,VR)26 DOUBLE PRECISION : : VR(N,N)

i n t e g e r : : N, i , j28 double p r e c i s i o n : : norm

do i = 1 , 330 norm = DOT_PRODUCT (VR( : , i ) , VR( : , i ) )

VR( : , i ) = VR( : , i ) / s q r t ( norm )32 enddo

end subrou t ine eigvec_norm34 end program

./codes/eigen/eigvv.f90 30 / 36


Makefile

s r c r o o t =eigvv2 s r c f i l e =$ ( s r c r o o t ) . f90

e x e f i l e =$ ( s r c r o o t ) . x4

a l l :6 i f o r t $ ( s r c f i l e ) −o $ ( e x e f i l e ) − l l apack

8 run :. / $ ( e x e f i l e )

10

12 e d i t :vim $ ( s r c f i l e )

14

clean :16 rm − f * . x * . o

./codes/eigen/Makefile

31 / 36


Output

. / e igvv . x2 Eigen values are ( d iagonal ) :

−4.00000000 −0.00000000 0.000000004 0.00000000 3.00000000 0.00000000

0.00000000 0.00000000 1.000000006

Eigen vec to rs are :8 −0.50709255 −0.80178373 0.31622777

0.84515425 −0.53452248 −0.0000000010 0.16903085 0.26726124 0.94868330

./codes/eigen/output.dat

32 / 36

PBS(Portable Batch System

Outline





33 / 36


sample0.pbs & sample1.pbs

1 #PBS −S / b in / bash#PBS −V

3 uname −necho $PBS_O_JOBID

./codes/PBS/sample0.pbs

# ! / b in / bash2 #PBS − l wa l l t ime =12:00:00

#PBS −N MyJob4 #PBS −V

uname −n6 echo $PBS_O_JOBID

cd $PBS_O_WORKDIR8 pwd


34 / 36


sample2.pbs

1 # ! / b in / bash#PBS − l host= f r a c t a l

3 #PBS − l wa l l t ime =12:00:00#PBS − l s e l e c t =1: mpiprocs =4: ncpus=4

5 #PBS −N Sample#PBS −V

7 #PBS − j oecd $PBS_O_WORKDIR

9 ### put your s p e c i f i c job here a f t e r ’ t ime ’ command ###t ime l s −laF

11 #######################################################qs ta t − f $PBS_JOBID


35 / 36


Commands

1 $tqsubt<tsample0.pbs2 $tqstat

The first comman is to submit a job described in "sample0.pbs" toa queueing system, i.e. "torque".The second comman is to monitor a status of the job, of which jobnumber was assigned automatically by the first command.Observe the directory since each command of "qsub" willgenerate two files with the job number.Look at contents of newly generated files.

36 / 36

cee 618 scientiﬁc parallel computing (lecture 3) · 2013. 1. 25. · cee 618 scientiﬁc parallel...

Documents