chapter 8 eigenvectors and eigenvaluescis515/cis515-11-sl8.pdfchapter 8 eigenvectors and eigenvalues...

Chapter 8

Eigenvectors and Eigenvalues

8.1 Eigenvectors and Eigenvalues of a Linear Map

Given a finite-dimensional vector space E, let f : E → Ebe any linear map. If, by luck, there is a basis (e1, . . . , en)of E with respect to which f is represented by a diagonalmatrix

D =

λ1 0 . . . 00 λ2

. . . ...... . . . . . . 00 . . . 0 λn

,

then the action of f on E is very simple; in every “direc-tion” ei, we have

f (ei) = λiei.

339

340 CHAPTER 8. EIGENVECTORS AND EIGENVALUES

We can think of f as a transformation that stretches orshrinks space along the direction e1, . . . , en (at least if Eis a real vector space).

In terms of matrices, the above property translates intothe fact that there is an invertible matrix P and a di-agonal matrix D such that a matrix A can be factoredas

A = PDP−1.

When this happens, we say that f (or A) is diagonaliz-able , the λis are called the eigenvalues of f , and the eisare eigenvectors of f .

For example, we will see that every symmetric matrixcan be diagonalized .

8.1. EIGENVECTORS AND EIGENVALUES OF A LINEAR MAP 341

Unfortunately, not every matrix can be diagonalized .

For example, the matrix

A1 =

�1 10 1

�

can’t be diagonalized.

Sometimes, a matrix fails to be diagonalizable because itseigenvalues do not belong to the field of coefficients, suchas

A2 =

�0 −11 0

�,

whose eigenvalues are ±i.

This is not a serious problem because A2 can be diago-nalized over the complex numbers.

However, A1 is a “fatal” case! Indeed, its eigenvalues areboth 1 and the problem is that A1 does not have enougheigenvectors to span E.


The next best thing is that there is a basis with respectto which f is represented by an upper triangular matrix.

In this case we say that f can be tridiagonalized .

As we will see in Section 8.2, if all the eigenvalues off belong to the field of coefficients K, then f can betridiagonalized. In particular, this is the case if K = C.

Now, an alternative to tridiagonalization is to consider therepresentation of f with respect to two bases (e1, . . . , en)and (f1, . . . , fn), rather than a single basis.

In this case, ifK = R orK = C, it turns out that we caneven pick these bases to be orthonormal , and we get adiagonal matrix Σ with nonnegative entries , such that

f (ei) = σifi, 1 ≤ i ≤ n.

The nonzero σis are the singular values of f , and thecorresponding representation is the singular value de-composition , or SVD .


Definition 8.1. Given any vector space E and any lin-ear map f : E → E, a scalar λ ∈ K is called an eigen-value, or proper value, or characteristic value of f ifthere is some nonzero vector u ∈ E such that

f (u) = λu.

Equivalently, λ is an eigenvalue of f if Ker (λI − f ) isnontrivial (i.e., Ker (λI − f ) �= {0}).

A vector u ∈ E is called an eigenvector, or proper vec-tor, or characteristic vector of f if u �= 0 and if thereis some λ ∈ K such that

f (u) = λu;

the scalar λ is then an eigenvalue, and we say that u isan eigenvector associated with λ.

Given any eigenvalue λ ∈ K, the nontrivial subspaceKer (λI − f ) consists of all the eigenvectors associatedwith λ together with the zero vector; this subspace isdenoted byEλ(f ), orE(λ, f ), or even byEλ, and is calledthe eigenspace associated with λ, or proper subspaceassociated with λ.


Remark: As we emphasized in the remark followingDefinition 4.4, we require an eigenvector to be nonzero.

This requirement seems to have more benefits than incon-venients, even though it may considered somewhat inel-egant because the set of all eigenvectors associated withan eigenvalue is not a subspace since the zero vector isexcluded.

Note that distinct eigenvectors may correspond to thesame eigenvalue, but distinct eigenvalues correspond todisjoint sets of eigenvectors.

Let us now assume that E is of finite dimension n.

Proposition 8.1. Let E be any vector space of finitedimension n and let f be any linear mapf : E → E. The eigenvalues of f are the roots (in K)of the polynomial

det(λI − f ).


Definition 8.2. Given any vector space E of dimen-sion n, for any linear map f : E → E, the polynomialPf(X) = det(XI − f ) is called the characteristic poly-nomial of f . For any square matrix A, the polynomialPA(X) = det(XI −A) is called the characteristic poly-nomial of A.

Note that we already encountered the characteristic poly-nomial in Section 2.7; see Definition 2.9.

Given any basis (e1, . . . , en), if A = M(f ) is the matrixof f w.r.t. (e1, . . . , en), we can compute the characteristicpolynomial det(XI − f ) of f by expanding the followingdeterminant:

det(XI − A) =

��

X − a1 1 −a1 2 . . . −a1n−a2 1 X − a2 2 . . . −a2n... ... . . . ...

−an 1 −an 2 . . . X − ann

��.

If we expand this determinant, we find that

det(XI − A) = Xn − (a1 1 + · · · + ann)Xn−1 + · · ·

+ (−1)n det(A).


The sum tr(A) = a1 1+ · · ·+ann of the diagonal elementsof A is called the trace of A.

Since the characteristic polynomial depends only on f ,tr(A) has the same value for all matrices A representingf . We let tr(f ) = tr(A) be the trace of f .

Remark: The characteristic polynomial of a linear map

is sometimes defined as det(f −XI). Since

det(f −XI) = (−1)n det(XI − f ),

this makes essentially no difference but the versiondet(XI − f ) has the small advantage that the coefficientof Xn is +1.

If we write

det(XI−A) = Xn−τ1(A)Xn−1+· · ·+(−1)kτk(A)X

n−k

+ · · · + (−1)nτn(A),

then we just proved that

τ1(A) = tr(A) and τn(A) = det(A).


If all the roots, λ1, . . . ,λn, of the polynomial det(XI−A)belong to the field K, then we can write

det(XI − A) = (X − λ1) · · · (X − λn),

where some of the λis may appear more than once.Consequently,

det(XI−A) = Xn−σ1(λ)Xn−1+· · ·+(−1)kσk(λ)X

n−k

+ · · · + (−1)nσn(λ),

where

σk(λ) =�

I⊆{1,...,n}|I|=k

�

i∈Iλi,

the kth symmetric function of the λi’s.

From this, it clear that

σk(λ) = τk(A)

and, in particular, the product of the eigenvalues of f isequal to det(A) = det(f ) and the sum of the eigenvaluesof f is equal to the trace, tr(A) = tr(f ), of f .


For the record,

tr(f ) = λ1 + · · · + λn

det(f ) = λ1 · · ·λn,

where λ1, . . . ,λn are the eigenvalues of f (and A), wheresome of the λis may appear more than once.

In particular, f is not invertible iff it admits 0 has aneigenvalue.

Remark: Depending on the field K, the characteristicpolynomial det(XI − A) may or may not have roots inK.

This motivates considering algebraically closed fields .For example, over K = R, not every polynomial has realroots. For example, for the matrix

A =

�cos θ − sin θsin θ cos θ

�,

the characteristic polynomial det(XI − A) has no realroots unless θ = kπ.


However, over the fieldC of complex numbers, every poly-nomial has roots. For example, the matrix above has theroots cos θ ± i sin θ = e±iθ.

Definition 8.3. Let A be an n× n matrix over a field,K. Assume that all the roots of the characteristic poly-nomial, det(XI − A), of A belong to K and write

det(XI − A) = (X − λ1)k1 · · · (X − λm)

km,

where λ1, . . . ,λm are the distinct roots of det(XI − A).The integer, ki, is called the algebraic multiplicity ofthe eigenvalue λi and the dimension of the eigenspace,Eλi = Ker(λiI−A), is called the geometric multiplicityof λi.

By definition, the sum of the algebraic multiplicities isequal to n but the sum of the geometric multiplicitiescan be strictly smaller.


In fact, the geometric multiplicity of each eigenvalue isalways less than or equal to its algebraic multiplicity .

Proposition 8.2. Let E be any vector space of fi-nite dimension n and let f be any linear map. Ifu1, . . . , um are eigenvectors associated with pairwisedistinct eigenvalues λ1, . . . ,λm, then the family(u1, . . . , um) is linearly independent.

Thus, from Proposition 8.2, if λ1, . . . ,λm are all the pair-wise distinct eigenvalues of f (where m ≤ n), we have adirect sum

Eλ1 ⊕ · · ·⊕ Eλm

of the eigenspaces Eλi.

Unfortunately, it is not always the case that

E = Eλ1 ⊕ · · ·⊕ Eλm.


WhenE = Eλ1 ⊕ · · ·⊕ Eλm,

we say that f is diagonalizable (and similarly for anymatrix associated with f ).

Indeed, picking a basis in each Eλi, we obtain a matrixwhich is a diagonal matrix consisting of the eigenvalues,each λi occurring a number of times equal to the dimen-sion of Eλi.

This happens if the algebraic multiplicity and the geo-metric multiplicity of every eigenvalue are equal.

In particular, when the characteristic polynomial has ndistinct roots, then f is diagonalizable .

It can also be shown that symmetric matrices have realeigenvalues and can be diagonalized.


For a negative example, we leave as exercise to show thatthe matrix

M =

�1 10 1

�

cannot be diagonalized, even though 1 is an eigenvalue.

The problem is that the eigenspace of 1 only has dimen-sion 1.

The matrix

A =

�cos θ − sin θsin θ cos θ

�

cannot be diagonalized either, because it has no real eigen-values, unless θ = kπ.

However, over the field of complex numbers, it can bediagonalized.

8.2. REDUCTION TO UPPER TRIANGULAR FORM 353

8.2 Reduction to Upper Triangular Form

Unfortunately, not every linear map on a complex vectorspace can be diagonalized.

The next best thing is to “tridiagonalize,” which means tofind a basis over which the matrix has zero entries belowthe main diagonal.

Fortunately, such a basis always exist.

We say that a square matrix A is an upper triangularmatrix if it has the following shape,

a1 1 a1 2 a1 3 . . . a1n−1 a1n0 a2 2 a2 3 . . . a2n−1 a2n0 0 a3 3 . . . a3n−1 a3n... ... ... . . . ... ...0 0 0 . . . an−1n−1 an−1n

0 0 0 . . . 0 ann

,

i.e., ai j = 0 whenever j < i, 1 ≤ i, j ≤ n.


Theorem 8.3. Given any finite dimensional vectorspace over a field K, for any linear map f : E → E,there is a basis (u1, . . . , un) with respect to which f isrepresented by an upper triangular matrix (in Mn(K))iff all the eigenvalues of f belong to K. Equivalently,for every n×n matrix A ∈ Mn(K), there is an invert-ible matrix P and an upper triangular matrix T (bothin Mn(K)) such that

A = PTP−1

iff all the eigenvalues of A belong to K.

If A = PTP−1 where T is upper triangular, note thatthe diagonal entries of T are the eigenvalues λ1, . . . ,λn

of A.

Also, if A is a real matrix whose eigenvalues are all real,then P can be chosen to real, and if A is a rational matrixwhose eigenvalues are all rational, then P can be chosenrational.

8.2. REDUCTION TO UPPER TRIANGULAR FORM 355

Since any polynomial over C has all its roots in C, The-orem 8.3 implies that every complex n × n matrix canbe tridiagonalized .

If E is a Hermitian space, the proof of Theorem 8.3 canbe easily adapted to prove that there is an orthonormalbasis (u1, . . . , un) with respect to which the matrix of fis upper triangular. This is usually known as Schur’slemma .

Theorem 8.4. (Schur decomposition) Given any lin-ear map f : E → E over a complex Hermitian spaceE, there is an orthonormal basis (u1, . . . , un) with re-spect to which f is represented by an upper trian-gular matrix. Equivalently, for every n × n matrixA ∈ Mn(C), there is a unitary matrix U and an uppertriangular matrix T such that

A = UTU ∗.

If A is real and if all its eigenvalues are real, thenthere is an orthogonal matrix Q and a real upper tri-angular matrix T such that

A = QTQ�.


Using the above result, we can derive the fact that if Ais a Hermitian matrix, then there is a unitary matrix Uand a real diagonal matrix D such that A = UDU∗.

In fact, applying this result to a (real) symmetric matrixA, we obtain the fact that all the eigenvalues of a symmet-ric matrix are real, and by applying Theorem 8.4 again,we conclude that A = QDQ�, where Q is orthogonaland D is a real diagonal matrix.

We will also prove this in Chapter 9.

When A has complex eigenvalues, there is a version ofTheorem 8.4 involving only real matrices provided thatwe allow T to be block upper-triangular (the diagonalentries may be 2× 2 matrices or real entries).

Theorem 8.4 is not a very practical result but it is a usefultheoretical result to cope with matrices that cannot bediagonalized.

For example, it can be used to prove that every complexmatrix is the limit of a sequence of diagonalizable ma-trices that have distinct eigenvalues !

8.3. LOCATION OF EIGENVALUES 357

8.3 Location of Eigenvalues

If A is an n× n complex (or real) matrix A, it would beuseful to know, even roughly, where the eigenvalues of Aare located in the complex plane C.

The Gershgorin discs provide some precise informationabout this.

Definition 8.4. For any complex n × n matrix A, fori = 1, . . . , n, let

R�i(A) =

n�

j=1j �=i

|ai j|

and let

G(A) =n�

i=1

{z ∈ C | |z − ai i| ≤ R�i(A)}.

Each disc {z ∈ C | |z − ai i| ≤ R�i(A)} is called a

Gershgorin disc and their union G(A) is called theGershgorin domain .


Theorem 8.5. (Gershgorin’s disc theorem) For anycomplex n× n matrix A, all the eigenvalues of A be-long to the Gershgorin domain G(A). Furthermorethe following properties hold:

(1) If A is strictly row diagonally dominant, that is

|ai i| >n�

j=1, j �=i

|ai j|, for i = 1, . . . , n,

then A is invertible.

(2) If A is strictly row diagonally dominant, and ifai i > 0 for i = 1, . . . , n, then every eigenvalue ofA has a strictly positive real part.

In particular, Theorem 8.5 implies that if a symmetricmatrix is strictly row diagonally dominant and has strictlypositive diagonal entries, then it is positive definite.

Theorem 8.5 is sometimes called theGershgorin–Hadamardtheorem .

8.3. LOCATION OF EIGENVALUES 359

Since A and A� have the same eigenvalues (even for com-plex matrices) we also have a version of Theorem 8.5 forthe discs of radius

C �j(A) =

n�

i=1i �=j

|ai j|,

whose domain is denoted by G(A�).

Theorem 8.6. For any complex n × n matrix A, allthe eigenvalues of A belong to the intersection of theGershgorin discs, G(A) ∩ G(A�). Furthermore thefollowing properties hold:

(1) If A is strictly column diagonally dominant, thatis

|ai i| >n�

i=1, i�=j

|ai j|, for j = 1, . . . , n,

then A is invertible.

(2) If A is strictly column diagonally dominant, andif ai i > 0 for i = 1, . . . , n, then every eigenvalueof A has a strictly positive real part.


There are refinements of Gershgorin’s theorem and eigen-value location results involving other domains besidesdiscs; for more on this subject, see Horn and Johnson[17], Sections 6.1 and 6.2.

Remark: Neither strict row diagonal dominance nor strictcolumn diagonal dominance are necessary for invertibility.Also, if we relax all strict inequalities to inequalities, thenrow diagonal dominance (or column diagonal dominance)is not a sufficient condition for invertibility.

chapter 8 eigenvectors and eigenvaluescis515/cis515-11-sl8.pdfchapter 8 eigenvectors and eigenvalues...

Documents