lms filtering

8/12/2019 LMS Filtering

1/81


2/81

Let the filter coefficients be

w=

w0w1

...wN1

.

Filter output:

y(n) =N1k=0

wkx(n k) =wHx(n) =d(n),where

x(n) = x(n)x(n 1)...

x(n N+ 1)

.EE 524, # 11 2


3/81

Wiener-Hopf equation:

R(n)w(n) =r(n) wopt(n) =R(n)1r(n),

where

R(n) = E {x(n)x(n)H

},r(n) = E {x(n)d(n)}.

EE 524, # 11 3


4/81

Adaptive Filtering (cont.)

Example 1: Unknown system identification.

EE 524, # 11 4


5/81


Example 2: Unknown system equalization.

EE 524, # 11 5


6/81


Example 3: Noise cancellation.

EE 524, # 11 6


7/81


8/81


Example 5: Interference cancellation without reference input.

EE 524, # 11 8


9/81


Idea of the Least-Mean-Square (LMS) algorithm:

wk+1=wk (wE {|ek|2}), ()

where the indices are given as subscripts [e.g. d(k) =dk], and

E {|ek|2} = E {|dk wHk xk|2}= E {|dk|2} wHk r rHwk+ wHkRwk,

(wE {|ek|2}) = Rw r.

Use single-sample estimates ofR and r:

R= xkxHk, r=xkdk,EE 524, # 11 9


10/81

and insert them into():

wk+1=wk+ xkek, ek=dk wHk xk LMS alg.

EE 524, # 11 10


11/81

Adaptive Filtering: Convergence Analysis

Convergence analysis: Subtract wopt from both sides of the previousequation:

wk+1

wopt vk+1 =wk wopt vk +xk(d

k

xHk wk) (

)

and note that

xk(dk xHk wk) = xkdk xkxHk wk= xkd

k xkxHk wk+ xkxHk wopt xkxHk wopt

= (xkdk xkxHk wopt) xkxHk vk.

EE 524, # 11 11


12/81

Observe that

Exk(d

k xHk wk)

= r Rwopt

0

RE {vk} = RE {vk}.

Let ck= E {vk}. Then

ck+1= [I R]ck ( )

Sufficient condition for convergence:

ck+1

max.

EE 524, # 11 15


16/81

Normalized LMS

A promising variant of LMS is the so-called Normalized LMS (NLMS)algorithm:

wk+1=wk+

xk2xkek, ek=dk wHk xkNLMS alg.

The sufficient condition for convergence:

0<


17/81

Recursive Least Squares

Idea of the Recursive Least Squares (RLS) algorithm: use sample estimateRk (instead of true covariance matrix R) in the equation for the weightvector and find wk+1 as an update to wk. Let

Rk+1 =

Rk+ xk+1xHk+1rk+1 = rk+ xk+1dk+1,

where 1 is the (so-called) forgetting factor. Using the matrix inversionlemma,we obtain

R1k+1 = (Rk+ xk+1xHk+1)1=

1

R1k R1k xk+1xHk+1R1k + xHk+1

R1k xk+1

.

EE 524, # 11 17


18/81

Therefore,

wk+1 = R1k+1rk+1=R1k R1k xk+1xHk+1R1k

+ xHk+1

R1k xk+1

rk+

1

R1k R1k xk+1x

Hk+1R

1k

+ xHk+1R1k xk+1xk+1dk+1= wk gk+1xHk+1wk+ gk+1dk+1,

where

gk+1=R1k xk+1

+ xHk+1R1k xk+1 .

EE 524, # 11 18


19/81

Hence, the updating equation for the weight vector is

wk+1 = wk gk+1xHk+1wk+ gk+1dk+1= wk+ gk+1 (d

k+1 xHk+1wk)

ek,k+1

= wk+ gk+1ek,k+1.

EE 524, # 11 19


20/81

RLS algorithm:

Initialization: w0= 0, P0= 1I

For each k= 1, 2, . . ., compute:

hk = Pk1xk,

k = 1/( + hHk xk),

gk = hkk,

Pk = 1

Pk1 gkhHk

,

ek1,k = dk

wHk1xk,

wk = wk1+ gkek1,k,

ek = dk wHk xk.

EE 524, # 11 20


21/81

Example

LMS linear predictor of the signal

x(n) = 10ej2fn + e(n)

where f= 0.1 and

N= 8,

e(n) is circular unit-variance white noise,

1= 1/[10 tr(R)], 2= 1/[3 tr(R)], 3= 1/[tr(R)].EE 524, # 11 21


22/81

EE 524, # 11 22


23/81

Adaptive Beamforming

The above scheme describes narrowband beamforming, i.e.

conventional beamforming if w1, . . . , wN do not depend on theEE 524, # 11 23


24/81

input/output array signals,

adaptive beamforming ifw1, . . . , wNare determined and optimized basedon the input/output array signals.

Input array signal vector:

x(i) =

x1(i)x2(i)

...xN(i)

.Complex beamformer output:

y(i) =wHx(i).

EE 524, # 11 24


25/81

Adaptive Beamforming (cont.)

Input array signal vector:

x(k) =

x1(k)x2(k)

...

xN(k)

.

Complex beamformer output:

y(k) = wHx(k),

x(k) =

xs(k)

signal+xN(k)

noise+

xI(k)

interference.

The goal is to filter out xI and xN as much as possible and, therefore,

EE 524, # 11 25


26/81

to obtain an approximation

xS of xS. Most popular criteria of adaptive

beamforming:

MSE minimum

minw

MSE, MSE = E {|d(i) wHx(i)|2}.

Signal-to-Interference-plus-Noise-Ratio (SINR)

maxw

SINR, SINR = E {|wHxs|2}

E

{|wH(xI+ xN)

|2

}.

EE 524, # 11 26


27/81


EE 524, # 11 27


28/81


In the sequel, we consider the max SINR criterion. Rewrite the snapshotmodel as

x(k) =s(k)as+ xI(k) + xN(k),

where aS is the known steering vectorof the desired signal. Then

SINR = 2s |wHas|2

wHE {(xI+ xN)(xI+ xN)H}w =2s |wHas|2wHRw

whereR= E {(xI+ xN)(xI+ xN)H}

is the interference-plus-noise covariance matrix.

Obviously, SINR does not depend on rescaling of w, i.e. if wopt is anoptimal weight, then wopt is such a vector too. Therefore, max SINR is

EE 524, # 11 28


29/81

equivalent to

minw wHRw subject to wHaS= const.

Let const = 1. Then

H(w) = wHRw+ (1

wHas) +

(1

aHs w)

wH(w) = (Rw as) = 0 =Rw = as= wopt=R1as.

This is a spatial version of the Wiener-Hopf equation!

From the constraint equation, we obtain

= 1

aHs R1as

EE 524, # 11 29


30/81

and therefore

wopt= 1

aHs R1as

R1as MVDR beamformer.

Substitutingwopt into the SINR expression, we obtain

max SINR = SINRopt= 2s (aHs R1as)2

aHs R1RR1as

=2saHs R

1as.

If there are no interference sources (only white noise with variance 2):

SINRopt= 2s

2aHs as= N

2s

2 .

EE 524, # 11 30


31/81


Let us study what happens with the optimal SINR if the covariance matrixincludes the signal component:

Rx= E {xxH} =R + 2sasaHs .

Using the matrix inversion lemma, we have

R1x as = (R + 2sasa

HS)

1as

=

R1 R

1asaHs R

1

1/2s + aHs R

1as

as

= 1 aHs R1as1/2s + a

Hs R

1asR1as

= R1as.

EE 524, # 11 31


32/81

Optimal SINR is not affected!

However, the above result holds only if

there is an infinitenumber of snapshots and

aS is knownexactly.

EE 524, # 11 32


33/81


Gradient algorithm maximizing SNR(very similar to LMS):

wk+1=wk+ (as xkxHk wk),

where, again, we use the simple notation wk =w(k) and xk =x(k). The

vector wk converges to wopt R1as if

0< < 2

max= 0< < 2

tr{R}.

The disadvantage of the gradient algorithms is that the convergence maybe very slow, i.e. it depends on the eigenvalue spread ofR.

EE 524, # 11 33


34/81

Example

N= 8,

single signal from s= 0 andSNR = 0 dB,

single interference from I= 30 and INR= 40 dB,

1= 1/[50 tr(R)], 2= 1/[15 tr(R)], 3= 1/[5tr(R)].EE 524, # 11 34


35/81

EE 524, # 11 35


36/81


Sample Matrix Inversion (SMI) Algorithm:

wSMI=R1aS,where

R is the sample covariance matrix

R= 1K

Kk=1

xkxHk.

Reed-Mallet-Brennan (RMB) rule: under mild conditions, the mean losses(relative to the optimal SINR) due to the SMI approximation ofwopt donot exceed 3 dB if

K 2N.Hence, the SMI provides very fast convergence rate, in general.

EE 524, # 11 36


37/81


Loaded SMI:wLSMI=R1DLaS, RDL=R + I,

where the optimal weight 22. LSMI allows convergence faster thanN snapshots!

LSMI convergence rule: under mild conditions, the mean losses (relative tothe optimal SINR) due to the LSMI approximation ofwopt do not exceedfew dBs if

K Lwhere L is the number of interfering sources. Hence, the LSMI providesfaster convergence rate than SMI (usually, 2N

L)!

EE 524, # 11 37


38/81

Example

N= 10,

single signal from s= 0 andSNR = 0 dB,

single interference from I= 30 and INR= 40 dB,

SMI vs. LSMI.EE 524, # 11 38


39/81

EE 524, # 11 39


40/81

EE 524, # 11 40


41/81

EE 524, # 11 41


42/81


Hung-Turner (Projection) Algorithm:

wHT= (I X(XHX)1XH)aS,

i.e. data-orthogonal projection is used instead of inverse covariance matrix.For Hung-Turner method, a satisfactory performance is achieved with

K L.

Optimal value ofKKopt=

(N+ 1)L 1.

Drawback: number of sources should be known a priori.

EE 524, # 11 42


43/81

EE 524, # 11 43


44/81

This effect is sometimes referred to as the signal cancellation phenomenon.Additional constraints are required to stabilize the mean beam response

minw wHRw subject to CHw=f.

1. Point constraints: Matrix of constrained directions:

C= [aS,1,aS,2 aS,M],

where aS,i are all taken in the neighborhood ofaS and include aS as well.Vector of constraints:

f=

11...1

.EE 524, # 11 44


45/81


46/81


47/81

EE 524, # 11 47


48/81


Generalized Sidelobe Canceller (GSC): Let us decompose

wopt=R1C(CHR1C)1f

into two components, one in the constrained subspace, and one orthogonalto it:

wopt = (PC+ PC)

Iwopt

= C(CHC)1 CHR1C(CHR1C)1 I

f

+PCR1C(CHR1C)1f.

EE 524, # 11 48


49/81

Generalizing this approach, we obtain the following decomposition for wopt:

wopt=wq Bwa,

wherewq= C(C

HC)1f

is the so-called quiescentweight vector,

BHC= 0,

B is the blocking matrix, and wa is the new adaptive weight vector.

EE 524, # 11 49


50/81

Generalized Sidelobe Canceller (GSC):

Choice ofB is not unique. We can take B =PC. However, in this caseB is not of full rank. More common choice is to assume N (N M)full-rank matrix B. Then, the vectors z = BHx and wa both haveshorter length(N

M)

1 relative to the N

1 vectors xand wq.

Since the constrained directions are blockedby the matrix B, the signalcannot be suppressed and, therefore, the weight vector wa can adapt

EE 524, # 11 50


51/81

freelyto suppress interference by minimizing the output GSC power

QGSC = (wq Bwa)HR(wq Bwa)= wHqRwq wHqRBwa wHa BHRwq

+wHa BHRBwa.

The solution is wa,opt= (BHRB)1BHRwq.

EE 524, # 11 51


52/81


Generalized Sidelobe Canceller (GSC): Noting that

y(k) =wHq x(k), z(k) =BHx(k),

we obtain

Rz = E {z(k)z(k)H}= BHE {x(k)x(k)H}B= BHRB,

ryz = E {z(k)y

(k)}= BHE {x(k)x(k)H}wq= BHRwq.

EE 524, # 11 52


53/81

Hence,wa,opt=R

1

z

ryz

Wiener-Hopf equation!

EE 524, # 11 53


54/81

How to Choose B?

ChooseN M linearly independent vectors bi:

B = [b1b2 bNM]

so that

bi ck, i= 1, 2, . . . , N M, k= 1, 2, . . . , M ,where ck is the kth column ofC.

There are manypossible choices ofB!

EE 524, # 11 54


55/81

Example: GSC in the Particular Case of Normal

Direction (Single) Constraint and for a Particular Choiceof Blocking Matrix:

EE 524, # 11 55


56/81

In this particular example

C=

1

1...1

.

BH = 1 1 0 0 00 1

1

0 0

... ... ... ... ...0 0 0 1 1 ,

and

x(k) =

x1(k)x2(k)

...xN(k)

, z(k) =

x1(k) x2(k)x2(k) x3(k)

...xN1(k) xN(k)

.

EE 524, # 11 56


57/81

Partially Adaptive Beamforming

In many applications, number of interfering sources is much less than thenumber of adaptive weights [adaptive degrees of freedom (DOFs)]. In suchcases, partially adaptive arrayscan be used.

Idea: use nonadaptive preprocessor reducing the number of adaptive

channels:y(i) =THx(i),

where

y has a reduced dimensionM

1(M < N) compared withN

1vector

x,

T is anN M full-rank matrix.EE 524, # 11 57


58/81

EE 524, # 11 58


59/81

EE 524, # 11 59


60/81


There are two types of nonadaptive preprocessors:

subarraypreprocessor,

beamspacepreprocessor.

For arbitrary preprocessor:

Ry = E {y(i)y(i)H} =THE {x(i)x(i)H}T =THRT.

Recall the previously-used representation:

R= ASAH + 2I.

EE 524, # 11 60


61/81

After the preprocessing, we have

Ry = THASAHT+ 2THT

= ASAH + Q

A = THA

Q = 2THT.

Preprocessing changes array manifold. Preprocessing may lead to colored noise.

Choosing Twith orthonormal columns, we have

THT =I ,

and, therefore, the effect of colored noise may be removed.

EE 524, # 11 61


62/81


EE 524, # 11 62


63/81

Preprocessing matrix in this particular case:

TH = 1

3

1 1 1 0 0 0 0 0 00 0 0 1 1 1 0 0 00 0 0 0 0 0 1 1 1

,(note that THT =I here!)

In the general case

T =

aS,1 0 00 aS,2 0... ... ... ...0

0 aS,M

,

where L = N/M is the size of each subarray, and THT = I holds true ifaHS,kaS,k= 1, k= 1, 2, . . . , M .

EE 524, # 11 63


64/81


65/81

Wideband Space-Time Processing (cont.)

Wideband case:

Higher dimension of the problem (N P instead ofN),

Steering vector depends on frequency.

EE 524, # 11 65


66/81


67/81

EE 524, # 11 67


68/81

Constant Modulus Algorithm (cont.)

Simple example: 2 sources, 2 antennas.

EE 524, # 11 68


69/81

Let

w= w1w2 be a beamformer. Output of beamforming:

yk=wH

xk= [w1 w

2] x1,kx2,k .

Constant modulus property:|s1,k| = |s2,k| = 1 for all k.

Possible optimization problem:

min J(w) where J(w) = E [(|yk|2 1)2].EE 524, # 11 69


70/81

EE 524, # 11 70


71/81

The CMA cost function as a function ofy (for simplicity, y is taken to be

real here).No unique minimum! Indeed, if yk = w

Hxk is CM, then anotherbeamformer is w, for any scalar that satisfies|| = 1.

EE 524, # 11 71


72/81

2 (real-valued) sources, 2 antennas

EE 524, # 11 72


73/81

Iterative Optimization

Cost function:

J(w) = E [(|yk|2 1)2], yk=wHxk.

Stochastic gradient method: wk+1=wk[J(wk)]

, where is stepsize, >0.

Derivative: Use|yk|2 =ykyk =wHxxHw.

J = 2E {(|yk|2

1) (wH

xkx

H

k w)}= 2E {(|yk|2 1) (xkxHk w)}= 2E {(|yk|2 1)xkyk}

EE 524, # 11 73


74/81

Algorithm CMA(2,2):

yk = wHk xk

wk+1 = wk xk(|yk|2 1)yk.

EE 524, # 11 74


75/81

Advantages:

The algorithm is extremely simple to implement Adaptive tracking of sources Converges to minima close to the Wiener beamformers (for each source)

Disadvantages:

Noisy and slow

Step size should be small, else unstable

Only one source is recovered (which one?) Possible convergence to local minimum (with finite data)

EE 524, # 11 75


76/81

EE 524, # 11 76


77/81

EE 524, # 11 77


78/81

Other CMAs

Alternative cost function: CMA(1,2)

J(w) = E [(|yk| 1)2] = E [(|wHxk| 1)2].EE 524, # 11 78


79/81

Corresponding CMA iteration:

yk = wHk xk

k = yk|yk| yk

wk+1 = wk+ xkk.

Similar to LMS, with update error yk|yk| yk. The desired signal is estimated

by yk|yk|.

EE 524, # 11 79


80/81

Other CMAs (cont.)

Normalized CMA (NCMA; becomes scaling independent)

wk+1=wk+

xk2xkk.

Orthogonal CMA (OCMA): whiten using data covariance R

wk+1= wk+ R1k xk

k.

EE 524, # 11 80


81/81

Least squares CMA (LSCMA): block update, trying to optimizeiteratively

minw

sHwHX2whereX= [x1 x2 xT] andsH is the best blind estimate at step k ofthe complete source vector (at all time points t= 1, 2, . . . , T )

sH = y1|y1|, y2|y2|, . . . , yT|yT|,where

yt=wHk xt, t= 1, 2, . . . , K .

and

wHk =sHXH(XXH)1.EE 524, # 11 81

lms filtering

Documents