chapter 7 eigenvalues and eigenvectors

Chapter 7 Eigenvalues and Eigenvectors

7.1 Eigenvalues and Eigenvectors

7.2 Diagonalization

7.3 Symmetric Matrices and Orthogonal Diagonalization

7.4 Application of Eigenvalues and Eigenvectors

7.5 Principal Component Analysis

7.1

7.2

7.1 Eigenvalues and Eigenvectors Eigenvalue problem (特徵值問題 ) (one of the most important

problems in the linear algebra):

If A is an nn matrix, do there exist nonzero vectors x in Rn such that Ax is a scalar multiple of x?

Eigenvalue (特徵值 ) and Eigenvector (特徵向量 ):A: an nn matrix: a scalar (could be zero)x: a nonzero vector in Rn

A x x

Eigenvalue

Eigenvector

※ Geometric Interpretation

(The term eigenvalue is from the German word Eigenwert, meaning “proper value”)

x

A x = x

x

y

7.3

Ex 1: Verifying eigenvalues and eigenvectors

10

02A 1

10

x

1 1

2 0 1 2 12 2

0 1 0 0 0A

x x

Eigenvalue

2 2

2 0 0 0 01 ( 1)

0 1 1 1 1A

x x

Eigenvalue

Eigenvector

Eigenvector

2

01

x

※ In fact, for each eigenvalue, it has infinitely many eigenvectors. For = 2, [3 0]T or [5 0]T are both corresponding eigenvectors. Moreover, ([3 0] + [5 0])T is still an eigenvector. The proof is in Thm. 7.1.

7.4

Thm. 7.1: The eigenspace corresponding to of matrix A

If A is an nn matrix with an eigenvalue , then the set of all eigenvectors of together with the zero vector is a subspace

of Rn. This subspace is called the eigenspace (特徵空間 ) of Pf:

x1 and x2 are eigenvectors corresponding to

1 1 2 2(i.e., , )A A x x x x

1 2 1 2 1 2 1 2

1 2

(1) ( ) ( ) (i.e., is also an eigenvector corresponding to )

A A Aλ

x x x x x x x xx x

1 1 1 1

1

(2) ( ) ( ) ( ) ( ) (i.e., is also an eigenvector corresponding to )

A c c A c cc

x x x xx

Since this set is closed under vector addition and scalar multiplication, this set is a subspace of Rn according to Theorem 4.5

7.5

Ex 3: Examples of eigenspaces on the xy-planeFor the matrix A as follows, the corresponding eigenvalues are 1 = –1 and 2 = 1:

1001

A

Sol:

0 1 0 0 0 0 1

0 1A

y y y y

For the eigenvalue 1 = –1, corresponding vectors are any vectors on the x-axis

1 0 1

0 0 1 0 0 0x x x x

A

For the eigenvalue 2 = 1, corresponding vectors are any vectors on the y-axis

※ Thus, the eigenspace corresponding to = –1 is the x-axis, which is a subspace of R2

※ Thus, the eigenspace corresponding to = 1 is the y-axis, which is a subspace of R2

7.6

※ Geometrically speaking, multiplying a vector (x, y) in R2 by the matrix A corresponds to a reflection to the y-axis

0 00 0

01 1

0

x x xA A A A A

y y y

x xy y

v

7.7

(1) An eigenvalue of A is a scalar such that

Thm. 7.2: Finding eigenvalues and eigenvectors of a matrix AMnn

det( ) 0I A (2) The eigenvectors of A corresponding to are the nonzero solutions of

Characteristic polynomial (特徵多項式 ) of AMnn:1

1 1 0det( ) ( ) n nnI A I A c c c

Characteristic equation (特徵方程式 ) of A:det( ) 0I A

( )I A x 0

Let A be an nn matrix.

has nonzero solutions for x iff

( )I A x 0 det( ) 0I A

Note: follwing the definition of the eigenvalue problem(homogeneous system) ( )A A I I A x x x x x 0

(The above iff results comes from the equivalent conditions on Slide 4.101)

7.8

Ex 4: Finding eigenvalues and eigenvectors

51

122A

Sol: Characteristic equation:

2

2 12det( )

1 5

3 2 ( 1)( 2) 0

I A

Eigenvalue: 2 ,1 21

2 ,1

7.9

2(2) 2 12

2

G.-J. E.

1

2

4 12 0( )

1 3 0

4 12 1 31 3 0 0

3 3, 0

1

xI A

x

x ss s

x s

x

1(1) 1 11

2

G.-J. E.

1

2

3 12 0( )

1 4 0

3 12 1 41 4 0 0

4 4, 0

1

xI A

x

x tt t

x t

x

7.10

200020012

A


3

2 1 00 2 0 ( 2) 00 0 2

I A

Eigenvalue: 2

Ex 5: Finding eigenvalues and eigenvectors

Find the eigenvalues and corresponding eigenvectors for the matrix A. What is the dimension of the eigenspace of each eigenvalue?

7.11

The eigenspace of λ = 2:

1

2

3

0 1 0 0( ) 0 0 0 0

0 0 0 0

xI A x

x

x

0, ,100

001

03

2

1

tsts

t

s

xxx

1 00 0 , : the eigenspace of corresponding to 20 1

s t s t R A

Thus, the dimension of its eigenspace is 2

7.12

Notes:

(1) If an eigenvalue 1 occurs as a multiple root (k times) for

the characteristic polynominal, then 1 has multiplicity

(重數 ) k

(2) The multiplicity of an eigenvalue is greater than or equal to the dimension of its eigenspace. (In Ex. 5, k is 3 and the dimension of its eigenspace is 2)

7.13

Ex 6： Find the eigenvalues of the matrix A and find a basis for each of the corresponding eigenspaces

30010201105100001

A


2

1 0 0 00 1 5 101 0 2 01 0 0 3

( 1) ( 2)( 3) 0

I A

Eigenvalues:

3 ,2 ,1 321

※ According to the note on the previous slide, the dimension of the eigenspace of λ1 = 1 is at most to be 2 ※ For λ2 = 2 and λ3 = 3, the demensions of their eigenspaces are at most to be 1

7.14

1(1) 1

1

21

3

4

0 0 0 0 00 0 5 10 0

( )1 0 1 0 01 0 0 2 0

xx

I Axx

x

1

G.-J.E.2

3

4

2 0 21 0

, , 02 0 2

0 1

x tx s

s t s tx tx t

1202

,

0010

is a basis for the eigenspace corresponding to 1 1

※The dimension of the eigenspace of λ1 = 1 is 2

7.15

2(2) 2

1

22

3

4

1 0 0 0 00 1 5 10 0

( )1 0 0 0 01 0 0 1 0

xx

I Axx

x

1

G.-J.E.2

3

4

0 05 5

, 01

0 0

xx t

t tx tx

0150



7.16

3(3) 3

1

23

3

4

2 0 0 0 00 2 5 10 0

( )1 0 1 0 01 0 0 0 0

xx

I Axx

x

1

G.-J.E.2

3

4

0 05 5

, 00 0

1

xx t

t txx t

105

0



7.17

Thm. 7.3: Eigenvalues for triangular matricesIf A is an nn triangular matrix, then its eigenvalues are the entries on its main diagonal

Ex 7: Finding eigenvalues for triangular and diagonal matrices

2 0 0(a) 1 1 0

5 3 3A

1 0 0 0 00 2 0 0 0

(b) 0 0 0 0 00 0 0 4 00 0 0 0 3

A

Sol:2 0 0

(a) 1 1 0 ( 2)( 1)( 3) 05 3 3

I A

1 2 32, 1, 3

1 2 3 4 5(b) 1, 2, 0, 4, 3

※According to Thm. 3.2, the determinant of a triangular matrix is the product of the entries on the main diagonal

7.18

Eigenvalues and eigenvectors of linear transformations:

A number is called an eigenvalue of a linear transformation : if there is a nonzero vector such that ( ) .

The vector is called an eigenvector of corresponding to , and the set of all

T V V TT

x x xx

eigenvectors of (together with the zero vector) is called the eigenspace of

※ The definition of linear transformation functions should be introduced in Ch 6 ※ Here I briefly introduce the linear transformation and its some basic properties ※ The typical example of a linear transformation function is that each component of the resulting vector is the linear combination of the components in the input vector x

An example for a linear transformation T: R3→R3

1 2 3 1 2 1 2 3( , , ) ( 3 ,3 , 2 )T x x x x x x x x

7.19

Theorem: Standard matrix for a linear transformationLet : be a linear trtansformation such thatn nT R R

11 12 1

21 22 21 2

1 2

( ) , ( ) , , ( ) ,

n

nn

n n nn

a a aa a a

T T T

a a a

e e e

1 2 nwhere { , , , } is a standard basis for . Then an matrix , whose -th column correspond to ( ),

n

i

R n nA i T

e e ee

satisfies that ( ) for every in . is called the standard matrix for ( )

nT A R AT Tx x x

的標準矩陣

11 12 1

21 22 21 2

1 2

( ) ( ) ( ) ,

n

nn

n n nn

a a aa a a

A T T T

a a a

e e e

7.20

Consider the same linear transformation T(x1, x2, x3) = (x1 + 3x2, 3x1 + x2, –2x3)

1 2 3

1 1 0 3 0 0( ) ( 0 ) 3 , ( ) ( 1 ) 1 , ( ) ( 0 ) 0

0 0 0 0 1 2T T T T T T

e e e

1 1 2

2 1 2

3 3

1 3 0 1 3 0 33 1 0 3 1 0 30 0 2 0 0 2 2

x x xA A x x x

x x

x

Thus, the above linear transformation T is with the following corresponding standard matrix A such that T(x) = Ax

※ The statement on Slide 7.18 is valid because for any linear transformation T: V →V, there is a corresponding square matrix such that T(x) = Ax. Consequently, the eignvalues and eigenvectors of a linear transformation T are in essence the eigenvalues and eigenvectors of the corresponding square matrix A

7.21

Ex 8: Finding eigenvalues and eigenvectors for standard matricesFind the eigenvalues and corresponding eigenvectors for

1 3 0 3 1 0

0 0 2A

Sol:

200

013031

AI 2( 2) ( 4) 0

1 2eigenvalues 4, 2

1

2

For 4, the corresponding eigenvector is (1, 1, 0).For 2, the corresponding eigenvectors are (1, 1, 0) and (0, 0, 1).

※ A is the standard matrix for T(x1, x2, x3) = (x1 + 3x2, 3x1 + x2, –2x3) (see Slides 7.19 and 7.20)

7.22

Transformation matrix for nonstandard bases

Suppose is the standard basis of . Since the coordinate matrix of a vector relative to the standard basis consists of the components of that vector, i.e.,

for any in , = [ ] , the theorem on

n

nB

B R

Rx x x Slide 7.19 can be restated as follows.

1 2( ) ( ) , where ( ) ( ) ( )

is the standard matrix for or the matrix of relative to the standard basis

nB B B B BT A T A A T T T

T TB

x x x x e e e

1 2

The above theorem can be extended to consider a nonstandard basis ', whichconsists of { , , , }n

Bv v v

1 2' ' ' ' '( ) ' , where ' ( ) ( ) ( )

is the transformation matrix for relative to the basis 'nB B B B B

T A A T T T

T B

x x v v v

※ On the next two slides, an example is provided to verify numerically that this extension is valid

'A

7.23

EX. Consider an arbitrary nonstandard basis to be {v1, v2, v3}= {(1, 1, 0), (1, –1, 0), (0, 0, 1)}, and find the transformation matrix such that corresponding to the same linear transformation T(x1, x2, x3) = (x1 + 3x2, 3x1 + x2, –2x3)

1 2' '

' '' '

3 '

''

1 4 4 1 2 0( ) ( 1 ) 4 0 , ( ) ( 1 ) 2 2 ,

0 0 0 0 0 0

0 0 0( ) ( 0 ) 0 0

1 2 2

B B

B BB B

B

BB

T T T T

T T

v v

v

4 0 0' 0 2 0

0 0 2A

'B

'A ' '( ) '

B BT Ax x

7.24

Consider x = (5, –1, 4), and check that corresponding to the linear transformation T(x1, x2, x3) = (x1 + 3x2, 3x1 + x2, –2x3)

' '

' ''

5 2 8 5 2( ) ( 1 ) 14 6 , 1 3 ,

4 8 8 4 4B B

B BB

T T

x x

' '

4 0 0 2 8' 0 2 0 3 6 ( )

0 0 2 4 8B B

A T

x x

' '( ) '

B BT Ax x

7.25

For a special basis , where ’s are eigenvectors of the standard matrix , is obtained immediately to be diagonal due to

and

3

1 2 3

' be a basis of made up of three linearly independent eigenvectet , e.g., ' { , , } {(1, 1, 0), (1, 1, 0), (0

ors , 0, 1)} in Ex 8o .f

LB

B RA v v v

Eigenvectors of AEigenvalues of A

200020004

'A

1 ' 2 ' 3 '

Then ', the transformation matrix for relative to the basis ', defined as [[ ( )] [ ( )] [ ( )] ] (see Slide 7.22), is , andB B B

A T BT T Tv v v diagonal the main

diagonal entries are corresponding eigenvalues (see Slides 7.23)

' {(1, 1, 0), (1, 1, 0), (0, 0, 1)}B 1for 4

2for 2

7.26

Keywords in Section 7.1:

eigenvalue problem: 特徵值問題 eigenvalue: 特徵值 eigenvector: 特徵向量 characteristic equation: 特徵方程式 characteristic polynomial: 特徵多項式 eigenspace: 特徵空間 multiplicity: 重數 linear transformation: 線性轉換 diagonalization: 對角化

7.27

7.2 Diagonalization Diagonalization problem (對角化問題 ):

For a square matrix A, does there exist an invertible matrix P such that P–1AP is diagonal?

Diagonalizable matrix (可對角化矩陣 ):

Definition 1: A square matrix A is called diagonalizable if there exists an invertible matrix P such that P–1AP is a diagonal matrix (i.e., P diagonalizes A)Definition 2: A square matrix A is called diagonalizable if A is similar to a diagonal matrix

Notes:In this section, I will show that the eigenvalue and eigenvector problem is closely related to the diagonalization problem

※ In Sec. 6.4, two square matrices A and B are similar if there exists an invertible matrix P such that B = P–1AP.

7.28

Thm. 7.4: Similar matrices have the same eigenvalues

If A and B are similar nn matrices, then they have the same eigenvalues

Pf:APPBBA 1similar are and

1 1 1 1

1 1 1

( )I B I P AP P IP P AP P I A P

P I A P P P I A P P I A

I A

Since A and B have the same characteristic equation, they are with the same eigenvalues

For any diagonal matrix in the form of D = λI, P–1DP = D

Consider the characteristic equation of B:

※ Note that the eigenvectors of A and B are not identical

7.29

Ex 1: Eigenvalue problems and diagonalization programs

200013031

A


2

1 3 03 1 0 ( 4)( 2) 0

0 0 2I A

1 2 3The eigenvalues : 4, 2, 2

(1) 4 the eigenvector 1

110

p

7.30

(2) 2 the eigenvector 2 3

1 01 , 0

0 1

p p

11 2 3

1 1 0 4 0 0[ ] 1 1 0 , and 0 2 0

0 0 1 0 0 2P P AP

p p p

2 1 3

1

[ ]

1 1 0 2 0 01 1 0 0 4 0

0 0 1 0 0 2

P

P AP

p p p Note: If

※ The above example can verify Thm. 7.4 since the eigenvalues for both A and P–1AP are the same to be 4, –2, and –2 ※ The reason why the matrix P is constructed with the eigenvectors of A is demonstrated in Thm. 7.5 on the next slide

7.31

Thm. 7.5: Condition for diagonalizationAn nn matrix A is diagonalizable if and only if it has n linearly independent eigenvectors

Pf: ( )1

1 2 1 2

Since is diagonalizable, there exists an invertible s.t. is diagonal. Let [ ] and ( , , , ), thenn n

A P D P APP D diag

p p p

1

21 2

1 1 2 2

0 00 0

[ ]

0 0

[ ]

n

n

n n

PD

p p p

p p p

※ If there are n linearly independent eigenvectors, it does not imply that there are n distinct eigenvalues. In an extreme case, it is possible to have only one eigenvalue with the multiplicity n, and there are n linearly independent eigenvectors for this eigenvalue

※ On the other hand, if there are n distinct eigenvalues, then there are n linearly independent eivenvectors (see Thm. 7.6), and thus A must be diagonalizable

7.32

1

1 2 1 1 2 2

(since )[ ] [ ]n n n

AP PD D P APA A A

p p p p p p

, 1, 2, , (The above equations imply the column vectors of are eigenvectors of , and the diagonal entries in are eigenvalues of )

i i i

i

i

A i nP

A D A

p pp

1 2

Because is diagonalizable is invertibleColumns in , i.e., , , , , are linearly independent

(see Slide 4.101 in the lecture note)n

A PP

p p p

Thus, has linearly independent eigenvectorsA n

1 2

1 2

Since has linearly independent eigenvectors , , with corresponding eigenvalues , , (could be the same), then

n

n

A n

p p p

, 1, 2, , i i iA i n p p

1 2Let [ ]nP p p p

( )

7.33

1 2 1 2

1 1 2 2

1

21 2

[ ] [ ][ ]

0 00 0

[ ]

0 0

n n

n n

n

n

AP A A A A

PD

p p p p p pp p p

p p p

1 2

1

Since , , , are linearly independent is invertible (see Slide 4.101 in the lecture note)

is diagonalizable

(according to the definition of the diagonalizable matrix on Slide 7.

n

P

AP PD P AP DA

p p p

27)

Note that 's are linearly independent eigenvectors and the diagonal entries in the resulting diagonalized are eigenvalues of

i

i D Ap※

7.34

Ex 4: A matrix that is not diagonalizableShow that the following matrix is not diagonalizable

1 2

0 1A

Sol: Characteristic equation:21 2

( 1) 00 1

I A

1 1The eigenvalue 1, and then solve ( ) for eigenvectorsI A x 0

1 1

0 2 1eigenvector

0 0 0I A I A

p

Since A does not have two linearly independent eigenvectors, A is not diagonalizable

7.35

Steps for diagonalizing an nn square matrix:

Step 2: Let 1 2[ ]nP p p p

Step 1: Find n linearly independent eigenvectors for A with corresponding eigenvalues

1 2, , np p p

Step 3:

n

DAPP

00

0000

2

1

1

where , 1, 2, , i i iA i n p p

1 2, , , n

7.36

Ex 5: Diagonalizing a matrix

diagonal. is such that matrix a Find

113131111

1APPP

A

Sol: Characteristic equation:1 1 1

1 3 1 ( 2)( 2)( 3) 03 1 1

I A

1 2 3The eigenvalues : 2, 2, 3

7.37

21 1

G.-J. E.1 2

3

1 1 1 1 0 1 01 1 1 0 1 0 0

3 1 3 0 0 0 0

xI A x

x

1

2 1

3

10 eigenvector 0

1

x txx t

p

22

114

G.-J. E. 12 24

3

3 1 1 1 0 01 5 1 0 1 0

3 1 1 0 0 0 0

xI A x

x

11 4

12 24

3

1 eigenvector 1

4

x tx tx t

p

7.38

33 1

G.-J. E.3 2

3

2 1 1 1 0 1 01 0 1 0 1 1 0

3 1 4 0 0 0 0

xI A x

x

1

2 3

3

1 eigenvector 1

1

x tx tx t

p

1 2 3

1

1 1 1[ ] 0 1 1 and it follows that

1 4 1

2 0 00 2 00 0 3

P

P AP

p p p

7.39

Note: a quick way to calculate Ak based on the diagonalization technique

1 1

2 2

0 0 0 00 0 0 0

(1)

0 0 0 0

k

kk

kn n

D D

1 1 1 1 1

repeat times

1

1 2

(2)

0 00 0

, where

0 0

k k

k

k

kk k k

kn

D P AP D P AP P AP P AP P A P

A PD P D

7.40

Thm. 7.6: Sufficient conditions for diagonalizationIf an nn matrix A has n distinct eigenvalues, then the corresponding eigenvectors are linearly independent and thus A is diagonalizable according to Thm. 7.5.

Pf:Let λ1, λ2, …, λn be distinct eigenvalues and corresponding eigenvectors be x1, x2, …, xn. In addition, consider that the first m eigenvectors are linearly independent, but the first m+1 eigenvectors are linearly dependent, i.e.,

1 1 1 2 2 , (1)m m mc c c x x x x

where ci’s are not all zero. Multiplying both sides of Eq. (1) by A yields

1 1 1 2 2

1 1 1 1 1 2 2 2 (2)m m m

m m m m m

A Ac Ac Acc c c

x x x xx x x x

7.41

On the other hand, multiplying both sides of Eq. (1) by λm+1 yields

1 1 1 1 1 2 1 2 1 (3)m m m m m m mc c c x x x x

Now, subtracting Eq. (2) from Eq. (3) produces

1 1 1 1 2 1 2 2 1( ( (m m m m m mc c c )x )x )x 0

Since the first m eigenvectors are linearly independent, we can infer that all coefficients of this equation should be zero, i.e.,

1 1 1 2 1 2 1( ) ( ) ( ) 0m m m m mc c c

Because all the eigenvalues are distinct, it follows all ci’s equal to 0, which contradicts our assumption that xm+1 can be expressed as a linear combination of the first m eigenvectors. So, the set of n eigenvectors is linearly independent given n distinct eigenvalues, and according to Thm. 7.5, we can conclude that A is diagonalizable

7.42

Ex 7: Determining whether a matrix is diagonalizable

300100121

A

Sol: Because A is a triangular matrix, its eigenvalues are

1 2 31, 0, 3

According to Thm. 7.6, because these three values are distinct, A is diagonalizable

7.43

Ex 8: Finding a diagonalized matrix for a linear transformation3 3

1 2 3 1 2 3 1 2 3 1 2 3

3

Let : be the linear transformation given by ( ) ( 3 3 )

Find a basis ' for such that the matrix for relative to ' is diagonal

T R RT x ,x ,x x x x , x x x , x x x

B R TB

Sol: The standard matrix for T is given by

1 1 11 3 13 1 1

A

From Ex. 5 you know that λ1 = 2, λ2 = –2, λ3 = 3 and thus A is diagonalizable. So, according to the result on Slide 7.25, these three linearly independent eigenvectors found in Ex. 5 can be used to form the basis . That is'B

7.44

1 2 3' { , , } {( 1, 0, 1), (1, 1, 4), ( 1, 1, 1)}B v v v

1 ' 2 ' 3 '' [ ( )] [ ( )] [ ( )]

2 0 00 2 00 0 3

B B BA T T T

v v v

The matrix for T relative to this basis is

※ Note that it is not necessary to calculate through the above equation. According to the result on Slide 7.25, we already know that is a diagonal matrix and its main diagonal entries are corresponding eigenvalues of A

'A'A

7.45


diagonalization problem: 對角化問題 diagonalization: 對角化 diagonalizable matrix: 可對角化矩陣

7.46

7.3 Symmetric Matrices and Orthogonal Diagonalization

Symmetric matrix (對稱矩陣 ):A square matrix A is symmetric if it is equal to its transpose:

TAA

Ex 1: Symmetric matrices and nonsymetric matrices

502031210

A

13

34B

501041123

C

(symmetric)

(symmetric)

(nonsymmetric)

7.47

Thm 7.7: Eigenvalues of symmetric matrices

If A is an nn symmetric matrix, then the following properties are true

(1) A is diagonalizable (symmetric matrices (except the matrices in the form of D = aI) are guaranteed to have n linearly independent eigenvectors and thus be diagonalizable)

(2) All eigenvalues of A are real numbers(3) If is an eigenvalue of A with the multiplicity to be k, then

has k linearly independent eigenvectors. That is, the eigenspace of has dimension k

※ The above theorem is called the Real Spectral Theorem (實數頻譜理論 ), and the set of eigenvalues of A is called the spectrum (頻譜 ) of A

7.48

Ex 2: Prove that a 2 × 2 symmetric matrix is diagonalizable

bccaA

Pf: Characteristic equation:

0)( 22 cabbabc

caAI

2 2 2 2 2

2 2 2

2 2

( ) 4(1)( ) 2 4 4

2 4

( ) 4

a b ab c a ab b ab c

a ab b c

a b c

0 real-number solutions

As a function in , this quadratic polynomial function has a nonnegative discriminant (判別式 ) as follows

7.49

04)( (1) 22 cba

0 , cba

0 itself is a diagonal matrix

0a c a

Ac b a

04)( )2( 22 cba

The characteristic polynomial of A has two distinct real roots, which implies that A has two distinct real eigenvalues. According to Thm. 7.6, A is diagonalizable

7.50

A square matrix P is called orthogonal if it is invertible and Orthogonal matrix (正交矩陣 ):

1 (or )T T TP P PP P P I Thm. 7.8: Properties of orthogonal matrices

An nn matrix P is orthogonal if and only if its column vectors form an orthonormal set

1 1 1 2 11 1 1 2 1

2 1 2 2 2 12 1 2 2 2 1

1 21 2

T T Tnn

T T TT

n

T T Tn n n nn n n n

P P I

p p p p p pp p p p p pp p p p p pp p p p p p

p p p p p pp p p p p p

Pf: Suppose the column vectors of P form an orthonormal set, i.e.,

1 2 , where 0 for and 1n i j i iP i j p p p p p p p

It implies that P–1 = PT and thus P is orthogonal

7.51

Ex 5: Show that P is an orthogonal matrix.

535

534

532

51

52

32

32

31

0P

Sol: If P is a orthogonal matrix, then 1 T TP P PP I 1 2 21 2 23 5 3 53 3 3

2 1 2 1 435 5 5 3 5

5 52 4 233 5 3 5 3 5 3 5

1 0 00 0 1 0

0 0 10

TPP I

7.52

1 2 1 3 2 3 1 1

2 2 3 3

we can produce 0 and 1

p p p p p p p p

p p p p

1 2 3So, { , , } is an orthonormal set (These results areconsistent with Thm. 7.8)

p p p

1 2 23 3 32 1

1 2 35 552 4

3 53 5 3 5

Moreover, let , , and 0 ,

p p p

7.53

Thm. 7.9: Properties of symmetric matrices Let A be an nn symmetric matrix. If 1 and 2 are distinct eigenvalues of A, then their corresponding eigenvectors x1 and x2

are orthogonal. (Thm. 7.6 only states that eigenvectors corresponding to distinct eigenvalues are linearly independent)

1 1 2 1 1 2 1 2 1 2 1 2( ) ( ) ( ) ( ) ( )T T TA A A x x x x x x x x x xbecause is symmetric

1 2 1 2 1 2 2 2 2 2 2( ) ( ) ( ) ( ) )A

T T TA A 1 1x x x x x x x x (x x

1 2 1 2

1 2 1 2 1 2

The above equation implies ( )( ) 0, and because , it follows that 0. So, and are orthogonal

x xx x x x

Pf:

※For distinct eigenvalues of a symmetric matrix, their corresponding eigenvectors are orthogonal and thus linearly independent to each other

※Note that there may be multiple x1 and x2 corresponding to 1 and 2

7.54

Thm. 7.10: Fundamental theorem of symmetric matricesLet A be an nn matrix. Then A is orthogonally diagonalizable and has real eigenvalues if and only if A is symmetric

A matrix A is orthogonally diagonalizable if there exists an orthogonal matrix P such that P–1AP = D is diagonal

Orthogonal diagonalization (正交對角化 ):

( )

1 1

1

is orthogonally diagonalizable

is diagonal, and is an orthogonal matrix s.t.

( ) ( )

T

T T T T T T T T T

A

D P AP P P P

A PDP PDP A PDP P D P PDP A

Pf:

( )See the next two slides

7.55

Orthogonal diagonalization of a symmetric matrix:Let A be an nn symmetric matrix.(1) Find all eigenvalues of A and determine the multiplicity of each

(2) For each eigenvalue of multiplicity 1, choose the unit eigenvector (3) For each eigenvalue of the multiplicity to be k 2, find a set of k

linearly independent eigenvectors. If this set {v1, v2, …, vk} is not orthonormal, apply the Gram-Schmidt orthonormalization process

1 1 2 2

1 1 2 2

1 1 2 2

It is known that G.-S. process is a kind of linear transformation, i.e., the produced vectors can be expressed as (see Slide 5.55), i. Since , , , ,

(

k k

k k

c c cA A A

A c c c

v v vv v v v v v

v v

1 1 2 2

1 2

) ( )The produced vectors through the G.-S. process are still eigenvectors for

ii. Since , , , are orthogonal to eigenvectors corresponding to otherdifferent eigenvalues (acc

k k k k

k

c c c

v v v v

v v v

1 1 2 2ording to Thm. 7.9), is alsoorthogonal to eigenvectors corresponding to other different eigenvalues.

k kc c c v v v

※ According to Thm. 7.9, eigenvectors corresponding to distinct eigenvalues are orthognoal

7.56

(4) The composite of steps (2) and (3) produces an orthonormal set of n eigenvectors. Use these orthonormal and thus linearly independent eigenvectors as column vectors to form the matrix P.i. According to Thm. 7.8, the matrix P is orthogonalii. Following the diagonalization process on Slide 7.35, D = P–1AP is diagonal Therefore, the matrix A is orthogonally diagonalizable

7.57

Ex 7: Determining whether a matrix is orthogonally diagonalizable

111101111

1A

081812125

2A

102

0233A

2000

4A

Orthogonally diagonalizable

Symmetricmatrix

7.58

Ex 9: Orthogonal diagonalizationFind an orthogonal matrix that diagonalizes .

2 2 2 2 1 4

2 4 1

P A

A

Sol:0)6()3( )1( 2 AI

1 26, 3 (has a multiplicity of 2) 1 1 2 2

1 1 1 3 3 31

(2) 6, (1, 2, 2) ( , , ) vv uv

2 2 3(3) 3, (2, 1, 0), ( 2, 4, 5) v v

orthogonal

※ Verify Thm. 7.9 that v1·v2 = v1·v3 = 0

7.59

※If v2 and v3 are not orthogonal, the Gram-Schmidt Process should be performed. Here we simply normalize v2 and v3 to find the corresponding unit vectors

32 52 1 2 42 35 5 3 5 3 5 3 5

2 3

( , , 0), ( , , ) vvu u

v v

535

32

534

51

32

532

52

31

0P

300030006

1APP

1 2 3u u u

※Note that there are some calculation error in the solution of Ex.9 in the text book

7.60


symmetric matrix: 對稱矩陣 orthogonal matrix: 正交矩陣 orthonormal set: 單範正交集 orthogonal diagonalization: 正交對角化

7.61

7.4 Applications of Eigenvalues and Eigenvectors

The rotation for quadratic equation: ax2+bxy+cy2+dx+ey+f = 0 Ex 5: Identify the graphs of the following quadratic equations

2 2(a) 4 9 36 0x y

Sol:

2 2

2 2(a) In standard form, we can obtain 1.3 2x y

2 2(b) 13 10 13 72 0x xy y

※ Since there is no xy-term, it is easy to derive the standard form and it is apparent that this equation represents an ellipse.

7.62

2 2(b) 13 10 13 72 0x xy y ※ Since there is a xy-term, it is difficult to identify the graph of this equation. In fact, it is also an ellipse, which is oblique on the xy-plane.

※ There is a easy way to identify the graph of quadratic equation. The basic idea is to rotate the x- and y-axes to x’- and y’-axes such that there is no more x’y’-term in the new quadratic equation.

※ In the above example, if we rotate the x- and y-axes by 45 degree counterclockwise, the new quadratic equation can be derived, which represents an ellipse apparently.

2 2

2 2

( ') ( ') 13 2x y

※ In Section 4.8, the rotation of conics is achieved by changing basis, but here the diagonalization technique based on eigenvalues and eignvectors is applied to solving the rotation problem

7.63

Matrix of the quadratic form:

If we define X = , then XTAX= ax2 + bxy + cy2 . In fact, the quadratic equation can be expressed in terms of X as follows

Quadratic form (二次形式 ): ax2 + bxy + cy2

is the quadratic form associated with the quadratic equation ax2 + bxy + cy2 + dx + ey + f = 0

/ 2/ 2a b

Ab c

xy

TX AX d e X f

※ Note that A is a symmetric matrix

7.64

Principal Axes Theorem (主軸定理 )For a conic whose equation is ax2 + bxy + cy2 + dx + ey + f = 0, the rotation to eliminate the xy-term is achieved by X = PX’, where P is an orthogonal matrix that diagonalizes A. That is,

where λ1 and λ2 are eigenvalues of A. The equation for the rotated conic is given by

11

2

0,

0TP AP P AP D

2 21 2( ') ( ') 0x y d e PX f

7.65

Pf:According to Thm. 7.10, since A is symmetric, we can conclude that there exists an orthogonal matrix P such that P–1AP = PTAP = D is diagonalReplacing X with PX’, the quadratic form becomes

2 21 2

( ) ( ) ( )

( ) ( ) ( )

T T T T

T

X AX PX A PX X P APX

X DX x y

※ It is obvious that the new quadratic form in terms of X’ has no x’y’-term, and the coefficients for (x’)2 and (y’)2 are the two eigenvalues of the matrix A

1 2 1 2

1 2

Since and are

the orignal and new coodinates, the roles of and (the eigenvectorsof ) are like the basis vectors (or the axis vector

x x x xX PX x y

y y y y

A

v v v v

v vs) in the new coordinate

system

※

7.66

Sol:

Ex 6: Rotation of a conicPerform a rotation of axes to eliminate the xy-term in the following quadratic equation

2 213 10 13 72 0x xy y

13 55 13

A

The matrix of the quadratic form associated with this equation is

The eigenvalues are λ1 = 8 and λ2 = 18, and the corresponding eigenvectors are

1 2

1 1 and

1 1

x x

7.67

After normalizing each eigenvector, we can obtain the orthogonal matrix P as follows.

2 28( ) 18( ) 72 0,x y

1 1cos 45 sin 452 2

1 1 sin 45 cos 452 2

P

Then by replacing X with PX’, the equation of the rotated conic is

which can be written in the standard form2 2

2 2

( ) ( ) 13 2x y

※ The above equation represents an ellipse on the x’y’-plane

※ According to the results on p. 268 in Ch4, X=PX’ is equivalent to rotate the xy-coordinates by 45 degree to form the new x’y’-coordinates, which is also illustrated in the figure on Slide 7.62

7.68

Matrix of the quadratic form:

If we define X = [x y z]T, then XTAX= ax2 + by2 + cz2 + dxy + exz + fyz, and the quadratic surface equation can be expressed as

In three-dimensional version: ax2 + by2 + cz2 + dxy + exz + fyz

is the quadratic form associated with the equation of quadric surface: ax2 + by2 + cz2 + dxy + exz + fyz + gx + hy + iz + j = 0

/ 2 / 2/ 2 / 2/ 2 / 2

a d eA d b f

e f c

TX AX g h i X j

※ Note that A is a symmetric matrix

7.69


quadratic form (二次形式 ) principal axes theorem (主軸定理 )

7.70

Principal component analysis It is a way of identifying the underlying patterns in data It can extract information in a large data set with many

variables and approximate this data set with fewer factors In other words, it can reduce the number of variables to a

more manageable set Steps of the principal component analysis

Step 1: Get some data Step 2: Subtract the mean Step 3: Calculate the covariance matrix Step 4: Calculate the eigenvectors and eigenvalues of the covariance matrix Step 5: Deriving the transformed data set Step 6: Getting the original data back

7.5 Principal Component Analysis (主成分分析 )

7.71

x y2.5 2.40.5 0.72.2 2.91.9 2.23.1 3.02.3 2.72.0 1.61.0 1.11.5 1.61.1 0.9

1.81 1.91

demeaned

x y0.69 0.49

-1.31 -1.210.39 0.990.09 0.291.29 1.090.49 0.790.19 -0.31

-0.81 -0.81-0.31 -0.31-0.71 -1.01

0 0

TX x y

var( )

var( ) cov( , ) 0.616556 0.615444

cov( , ) var( ) 0.615444 0.716556

T T TT T

T T T

x x x x yX E XX E x y E

y y x y y

x x yA

x y y

Step 1: Step 2:

Step 3:

7.72

Step 4: Calculate the eigenvectors and eigenvalues of the covariance matrix A

1 1

0.677871.284028, =

0.73518

v 2 2

0.735180.049083, =

0.67787

v

1v

2v

※ First, the two eigenvectors are perpendicular (orthogonal) to each other according to Thm. 7.9 (In fact, they are orthonormal here)

※ Second, v1 eigenvector is just like a best-fit regression line

※ Third, v2 seems less important to explain the data since the projection of each node on the v2 axis is very close to zero

※ Fourth, the interpretation of v1 is the new axis which retains as much as possible the interpoint distance information (or the variance information) that was contained in the original two dimensions

7.73

※ The intuition behind the principal component analysis:(1) Total variance of series of and = variance of + variance of = sum of the main diagonal entries in the covariance matrix = 0.616556+0.716556 = 1.33311 (The series , which are t

x y x yA

x he coordinate values on the -axis, explains 0.616556/1.33311 = 46.25% of total variance)x

1 2

1 2

(2) Consider = [ ] and = . (According to the Principal Axes Theorem on Slide 7.65, it is equivalent to transform - and -axes to be - and -axes, i.e., the data is the same but

P X PXx y

v vv v

1 2

1

1

2

with different coordinate values on the -plane.)

( ) , where ( ) [ ] and [ ]0

var(( ) ) = var( ) var( )0

T T T T T

T T T T T

X

X P X P X X X P X x y X x y

X X P P X P P AP D

v v

(3) Total variance of transformed series of and (called ) in = variance of + variance of

= sum of the main diagonal entries i

principal componen

n the covariance matrix var((

tsx y Xx y

1 2) ) = TX

1 1 1 2(5) The new series , which are the coordinate values on the -axis, explains /( ) 1.284028 /(1.284028 0.049083) 96.32% of total variance

x

v1 2

A property for eigenvalues: Trace( ) = , which means that after the transformation,

the total variance remains the same. In this case, 1.284028 0.049083 1.33311.

(4) iA

(It also implies that the new series of x’ and y’ are independent)

7.74

Step 5: Deriving the transformed data set:( )T TX X P

x’ y’-0.82797 -0.17512 1.77758 0.14286

-0.99220 0.38437 -0.27421 0.13042 -1.67580 -0.20950 -0.91295 0.17528 0.09911 -0.34982 1.14457 0.04642 0.43805 0.01776 1.22382 -0.16268

0 0

x’ y’-0.82797 0 1.77758 0

-0.99220 0 -0.27421 0 -1.67580 0 -0.91295 0 0.09911 0 1.14457 0 0.43805 0 1.22382 0

0 0

( )TX ( )TX

1.284028 0 var(( ) ) =

0 0.049083TX

1.284028 0 var(( ) ) =

0 0TX

11 211 2

12 22

0.67787 0.73518Case 1:

0.73518 0.67787v v

Pv v

v v Case 2: Set ' 0 on purposey

11 21

12 22

11 12 21 22

0.67787 0.73518( )

0.73518 0.67787

0.67787 0.73518 0.73518 0.67787

T v vX x y x y

v v

v x v y v x v y

x y x y

11 12

( )

0

0.67787 0.73518 0

TX x y

v x v y

x y

7.75

Step 6: Getting the original data back:

x y

2.5 2.40.5 0.72.2 2.91.9 2.23.1 32.3 2.7

2 1.61 1.1

1.5 1.61.1 0.9

1.81 1.91

x y2.37 2.52 0.61 0.60 2.48 2.64 2.00 2.11 2.95 3.14 2.43 2.58 1.74 1.84 1.03 1.07 1.51 1.59 0.98 1.01

1.81 1.91

11 2( ) ( ( ) ) original mean, where = [ ]T T T TX X P X P P v v

※ We can derive the original data set if we take both v1 and v2 and thus x’ and y’ into account when deriving the transformed data

※ Although when we derive the transformed data, only v1 and thus only x’ are considered, the data gotten back is still similar to the original data. That means, x’ can be a common factor almost able to explain both series x and y

case 1 case 2

11 1211 21 12 22

21 22

0.67787 0.73518

0.73518 0.67787

0.67787 0.73518 0.73518 0.67787

v vx y x y v x v y v x v y

v v

x y x y

7.76

1v

2v

※ If only the principal component x’ is considered in the Principal Component Analysis (PCA), it is equivalent to project all points onto the v1 vector

※ It can be observed in the above figure that the projection onto v1 vector can retains as much as possible the interpoint distance information (variance) that was contained in the original series of (x, y)

7.77

Factor loadings: the correlations between the principal components (F1 = x’ and F2 = y’) and the original variables (x1 = x and x2 = y)

Factor loadings are used to identify and interpret the unobservable principal components

※ The factor loadings of x’ on x and y are close to 1 in absolute levels, which implies the principal component x’ can explain x and y quite well and thus x’ can viewed as an index to represent the combined information of x and y

. .i j

ij iij F x

j

vl

s d

110.67787 1.284028 0.97824

0.785211x xl

120.73518 1.284028 0.98414

0.846496x yl

210.73518 0.049083 0.20744

0.785211y xl

220.67787 0.049083 0.177409

0.846496y yl

7.78

※ Questionnaire (問卷 ) analysis: salary vs. personal and learned-courses information

i. If there are five variables (the results of five problems), x1 to x5, x1 to x3 are about personal information of the respondent (作答者 ), and x4 and x5 are about the course information which this respondent had learned

ii. Suppose there are two principal components x1’ and x2’ (x3’ , x4’ , x5’ and are minor components), and the principal component x1’ is highly correlated with x1 to x3 and the principal component x2’ is highly correlated with x4 and x5

iii. x1’ can be interpreted as a general index to represent the personal characteristic, and x2’ can be interpreted as a general index associated with the course profile that the respondent had learned

iv. Study the relationship between the salary level and the indexes x1’ and x2’

chapter 7 eigenvalues and eigenvectors

Documents